This course will introduce you to the foundations of modern cryptography, with an eye toward practical applications.

Loading...

From the course by University of Maryland, College Park

Cryptography

387 ratings

This course will introduce you to the foundations of modern cryptography, with an eye toward practical applications.

From the lesson

Week 2

Computational Secrecy and Principles of Modern Cryptography

- Jonathan KatzProfessor, University of Maryland, and Director, Maryland Cybersecurity Center

Maryland Cybersecurity Center

In this lecture we're going to explore the asymptotic relaxation, of perfect secrecy.

We'll begin just by reviewing the notion of perfect indistinguishability.

Let pi be an encryption scheme, and A an adversary, i.e an eavesdropper.

Remember that we define the following randomized experiment,

that depends on both A and pi.

First, the attacker outputs two messages m0 and m1, in the message space.

Then, a key is generated using the key generation algorithm.

A uniform bit B is chosen, and

then we encrypt the message m sub b, using the key k, that we just generated.

This gives us a ciphertext c.

We give that ciphertext to the attacker.

Right, modeling the attacker's ability to eavesdrop and

view the ciphertext being sent between the parties.

And the attacker then outputs a bit,

b prime, which represents its guess, as to which of the two messages was encrypted.

We say that A succeeds, if its guess is correct, i.e.

If b prime is equal to b.

And the experiment evaluates to 1 if and only if, A succeeds.

We then say that the encryption scheme pi, is perfectly indistinguishable, if for

all attackers A, it holds that the probability that the experiment evaluates

to 1 that is the probability which with A succeeds, is exactly one-half.

Now, remember that we wanted to relax the notion of perfect indistinguishability,

in two different dimensions.

First all, we're going to allow security to fail, with some small probability.

Secondly, we're going to restrict our attention to efficient attackers only.

And as we mentioned last time there are two approaches for

doing this, the concrete approach, and the asymptotic approach.

Remember that in the concrete approach,

we define the notion of t epsilon indistinguishability.

Here we allow security to fail, with probability at most epsilon.

And we restrict our attention to attackers, running in time, at most t.

We can then set these parameters, t and epsilon any way we like, and

get the corresponding notion of t epsilon indistinguishability.

For the asymptotic relaxation,

we're going to do something a little bit more complicated.

What we're going to do,

is we're going to introduce the notion, of a security parameter n.

The security parameter, is a positive integer, and we can view this as allowing

the parties to choose the level of security, that they want for the scheme.

And for now, you can view n, as denoting the key length.

That is the length of the key that the parties share.

The security parameter, or the key length, is going to

be fixed by the honest parties, at the time, the system is initialized.

That is at the time they choose and share their key.

We're going to assume further, that the attacker, knows n.

The attacker knows the security parameter.

The attacker knows, the length of the key that the parties are sharing.

Now the important point, or the use of the security parameter,

is that we're now going to view the running times of all parties.

That is, the honest parties as well as the attacker,

as well as the success probability of the adversary, as functions of n.

So we're going to look at how the running time of the parties changes, as n varies.

And also look at the how the success probability of the attacker varies,

as n is increased.

So for computation indistinguishability,

we're now going to relax the notion of perfect indistinguishability as follows.

We're now going to allow security to fail, with probability negligible in n.

And we're going to restrict our attention to attackers,

running in time polynomial in n.

And I'll define both of these terms on the next slide.

Let's look at the case of polynomial first,

since this concept is probably already familiar to you.

We're interested in looking,

at the attacker's running time, as a function of the security parameter n.

So this means that for different values of the security parameter n,

we have a different amount of time that, for which the adversary runs.

That's going to be defined by some function, that maps,

the security parameter, which is a positive integer,

onto the running time of the attacker which is another positive integer.

Given some function f like that, we'll say that f is polynomial.

If there exists a finite number of constants ci, such that f of n is at most,

the summation of ci into the i, for all n.

Technically we should actually refer to this as being polynomialy bounded, right,

f not need be a polynomial itself.

All we care about is that it's upper bounded by a polynomial.

But hopefully, you'll be okay with this slight abus, abusive notation.

For the attacker's success probability, we're interested in looking at the success

probability of the attacker as a function again, of the security parameter n.

This means that for each value of n, which is again a positive integer, we get some

values some probability, which has a positive real number or possibly 0.

In fact it's in the range of 0 to 1,

but we can allow all real numbers just for generality.

So define a function f of this form, to be negligible, if for

every polynomial p, there are some bound to capital N

such that f of n is smaller than 1 over p of n, for all n larger than that bound.

What this means is that the function f is negligible,

if it's asymptotically smaller, than any inverse polynomial function.

Now, a typical example of a negligible function,

is something of the form polynomial in n, times an inverse exponential in n.

So something like poly n times 2 to the minus cn, for some constant c.

That's a very common example in cryptography, but

that's not the only possibility, for a negligible function.

Why do we make these choices for defining the terms of a small probability of

success, and the notion of an efficient attacker.

Well to some extent the choices are arbitrary, and

you can come up with an equally valid theory, by making other choices for

how to define, small and efficient.

On the other hand, there are some good reasons for making these choices.

The notion of equating efficient attackers,

with probabilistic polynomial-time algorithms, which I'll abbreviate by PPT,

is something that's borrowed from complexity theory.

Where traditionally, we define the notion of problems that can be

solved efficiently, as problems that can be solved in polynomial time.

More than that, the choices of polynomial and

negligible, have some very convenient and nice to use closure properties.

One of those, is that if you have two polynomial functions and

multiply them, the result is also a polynomial function.

And what this means is that if you have polynomially many calls,

to some polynomial-time subroutine the net result is a polynomial-time algorithm.

So if you have an algorithm and

the description of that algorithm you verified is polynomial-time, and

then that algorithm makes some calls to a subroutine that also is implemented in

polynomial-time, the net result is an algorithm running in polynomial-time.

Another such closure property, is that any polynomial function,

multiplied by any negligible function, is again negligible.

And what this means is that if you have an algorithm,

that makes polynomially many calls to some subroutine.

Where that subroutine might fail with negligible probability every time it's

called, then, the probability that the algorithm overall ever fails,

is still negligible.

So let's redefine secure encryption, using this notion of the security parameter.

It's larger the, largely the same as before.

I just want to highlight some technical points.

So a private-key encryption scheme,

as before, is defined by three algorithms, the key generation algorithm,

the encryption algorithm and the decryption algorithm.

But we're now going to additionally require that those algorithms,

run in probabilistic polynomial time.

Recall we said a few slides ago, that we want all parties to run efficiently.

And that includes the honest parties who are going to be running this algorithm.

The key generation algorithm takes as input, the security parameter in unary.

So, it takes as input formally the string, 1 to the n, i.e.

The string of n consecutive ones.

This is really just a formalism, that allows the key generation algorithm to

run in time polynomial in n, because traditionally we allow algorithms to

run in time polynomial, in the length of their input.

So we just provide it with an input of length n, to allow it to run for

time polynomial in n.

The key generation algorithm then outputs the key k as before, and

we'll assume that the key length, is at least equal to n.

The encryption algorithm, takes as input a key and a message m,

and outputs a ciphertext c as before.

One thing I just want to highlight here, is that I've made the choice to by

default, assume that encryption can take messages of arbitrary length.

That is, we're now going to assume that the message space,

consists of all binary strings of any length, rather than requiring the message

space to be explicitly defined, as part of the encryption scheme.

Sometimes, actually, we'll restrict that and only consider encryption schemes that

encrypt messages of some particular length.

The decryption algorithm, as before,

takes as input a key k and a ciphertext c, and outputs a message m, or an error.

And I've omitted the standard correctence condition that an encryption scheme

should satisfy.

We can now look at the asymptotic version, of computational indistinguishability.

Fix some encryption scheme pi and some algorithm a,

and define a randomized experiment, very similar to what we had before.

We have a, an experiment that depends on a and

pi, and in addition it's parameterized, by the security parameter n.

And we'll see how for different values of n, the experiment depends on n.

The experiment has some dependence on this value n.

So the experiment runs as follows.

We run our algorithm A, on input 1 to the n.

This is just our mechanism for

informing the attacker A, about what value of a security parameter we're using.

This also allows the attacker, to run in time, that's a function of n.

A function of its input length.

The attacker outputs m0 and m1 as before.

Although now we're going to add the condition, that m0 and m1,

have to be equal length.

This didn't come up before, because in general we were assuming that

we had a message space containing only messages of some fixed length,

whereas now that's not the case.

Here we're going to restrict the attacker to only output messages of equal length.

And we'll have more to say about this later.

The second part of the experiment is exactly as before.

We choose a key k, by running the key generation algorithm, on input 1 to the n.

We choose a uniform bit b, and then we encrypt message m sub b

using the key k that we just generated, to obtain a ciphertext c.

We give the ciphertext to A,

who outputs a bit b prime, and we'll say that a succeeds if b is equal to b prime.

And we'll define the experiment to output 1 or to evaluate to 1 in this case.

I want to highlight here that we're viewing A as a stateful algorithm

that is given 1 to then and outputs the messages and

then keeps on running, and is later given c and outputs b prime.

So, you can think of this formally as an algorithm that's maintaining state,

throughout this entire randomized experiment.

We'll then say that a scheme pi, is indistinguishable, if for

all probabilistic polynomial-time attackers A,

there's a negligible function epsilon, such that the probability with which

A succeeds in the previous experiment, is at most one-half plus epsilon of n.

So I want to highlight a couple of things here.

The first thing I want to highlight,

is that the expression on the left hand side here,

the probability with which the attacker succeeds, is indeed a function of n.

If we affix some scheme pi and some attacker A, then the, we can, then for

any particular value of n, we can evaluate or calculate, the probability with which

A succeeds when attacking that encryption scheme pi, in the preceding experiment.

And so if we imagine doing that for

all possible values of n, then we get something which is a function of n.

It's a function that maps from any particular value of

the security parameter, to some probability in the range of 0 to 1.

So the thing on the left hand side of this expression is indeed a function, and

we can ask about the asymptotic behaviour of that function.

We'll say that the scheme is secure, if for

any polynomial time probabilistic polynomial time attacker A,

the probability with which that attacker succeeds, viewed as a function of n,

is bounded by one-half plus a negligible function.

Plus a, plus some negligible quantity, i.e.

Plus some quantity, that decays to 0, faster than any inverse polynomial.

So again, we're making the choices here of restricting attention to efficient

attackers, i.e probabilistic polynomial-time attackers.

And we're allowing the possibility, that attackers can succeed,

with probability slight greater than one-half.

However, that probability must decay to 0 faster than any inverse polynomial.

Let's look at a very simple example just to illustrate,

the style of the definition.

So consider some scheme pi, just a made-up scheme but imagine that the best

attack on the scheme, is a brute-force search over the key space.

And that the key generation algorithm, just generates a uniform n bit key.

So because we're assuming that the best attack is brute-force search,

this means that if we have some attacker that runs in some time, t of n.

Right, remember,

the attacker is now allowed to make its running time depending on n.

And in this case, we're just saying that,

on any, value of the security parameter n, the attacker runs in some time t of n.

Well then, the success probability of the attacker,

is essentially going to be one-half plus the probability with

which the attacker happened to find the key in its brute-force search.

Since the key was chosen uniformly, and since the attacker,

running in time t of n, could have possibly tried at most t of n keys.

This means, that the probability with which the attacker succeeds,

is at most one-half plus t of n, over 2 to the n.

Right, again, t of n over 2 of the n,

is a probability with which the attacker found the key, and

if the attacker didn't find the key, it can do no better than guess randomly.

And so it will succeed in that case with probability one-half.

I claim that this means, that this particular scheme pi is secure.

Right, why is that?

Well we need to argue, that for

any attacker running a probabilistic polynomial time, the success

probability of that attacker is at most one-half plus a negligible function.

Well, for

any attacker running in time in polynomial time t of n must then be polynomial.

And then the function t of n divided by 2 to the n is negligible.

All right, we said earlier that any polynomial times

a negligible function remains negligible.

Here we have the polynomial t of n, times the negligible function 2 to the minus n.

And so the result is negligible.

So again, this means that for any pol, pol, probabilistic polynomial time

attacker you come up with, that runs in some time t of n.

But I don't know what it is, but I do know that it's bounded by a polynomial.

The success probability of that attacker is going to be,

at most, one-half plus negligible.

Different attackers,

will have correspondingly different negligible functions.

But for all of such attackers, they, they will have some corresponding negligible

function that bounds, their advantage over one-half.

Now one thing I wanted to point out, is this restriction in the definition where

we require the attacker to output two messages of equal length.

Now in practice, as I said when I defined encryption in the context of the,

the asymptotic notion of security, I said that we

generally want encryption schemes that can encrypt arbitrary length messages.

However, it's also the case that in general,

encryption does not hide the plaintext length.

So when I encrypt a short message, I get a ciphertext that is

typically shorter than what I get if I encrypt a longer message.

So the, an attacker whose eavesdropping on the communication, and observes not only

the ciphertext but its length, can learn something of the length of the plaintext.

In general, we are willing to allow the attacker to learn that information.

And furthermore you can show that under certain conditions, it's

really impossible to hide all information about the length of the plaintext.

So because we're willing to give that information up,

we need to take that into account in our definition.

So if we allowed the attacker to output,

two arbitrary messages of arbitrary lengths, then the attacker would be

able to succeed, with probability, much better than one-half.

He could succeed really with probability 1.

So the definition takes into account the fact that we don't try to, try to

hide the plaintext length, by forcing the attacker by restricting the attacker,

to looking at, to outputting equal length messages in 0 and then 1.

Now nevertheless, even though we've de, we've made up our definition in this way,

made our definition to correspond to what's done in practice that is, and

leak the length of the plaintext.

It's important to beware,

that leaking the plaintext length can often lead to problems in practice.

Just because we set up a definition in a particular way,

doesn't mean that the problem goes away.

It just means that our definition is limited,

to only those cases that we include in our definition.

There are some very obvious examples of where leaking the plaintext length,

can lead to problems.

A simple example that I often like to give,

is one where somebody's encrypting, yes and no answers to some question.

If an attacker can learn the exact length of the plaintext, and

can in particular learn whether what's encrypted is a two character message, or

a three character message,

then whatever encryption scheme you're using if that information is leaked,

the attacker can figure out exactly the answers you're sending.

There are also some less obvious examples.

Think about the case where your encrypting your a database search.

So you can just think of a simple case where I encrypt my query to the database

and the database evaluates the query and then sends back an encrypted result.

Well by looking at the size of the result that's returned,

the attacker can get a good sense of what I was searching for.

Think about just an encrypted Google search for example.

If I search for a if I make a search query for some popular term, I'll get

back many more, many more hits than I would if I search an unpopular term.

And so that's going to reveal information to the attacker about what it is that I'm

searching for.

Another good example is if I compress my data before I encrypt it.

In that case, if, even if I start with two messages of equal length,

they may compress to things that have different lengths,

depending on the inherent structure in those two messages.

And so if I just blindly compress my data, and

then encrypt the result, and the attacker can view the length of the ciphertext and

from that, get some information about the length of my compressed data.

Then the attacker can glean from that some information about my

original uncompressed data.

So these are just three examples of the kind of things that can come up

in practice.

We've defined in this lecture a notion of computational secrecy, and in fact,

from now on, we're going to assume the computational setting by default.

So we have left the world of perfectly secret cryptography and

perfectly secure cryptography in general, and from now on,

we're going to only consider the computational setting.

Furthermore, almost always we're going to be dealing with

an asymptotic notion of security.

And I'll only mention a concrete notion of security in passing, at some points.

In the next few lectures what we'll do,

is we'll progress toward constructing a scheme,

that achieves our notion of computational security that we've just defined.

And does better than the one-time pad, in terms of the length of the key,

that's required.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.