Well, we're almost done with our discussion of symmetric encryption. There are just a couple of odds and ends that I'd like to discuss before we move on to the next topic. So the first thing I'd like to mention is how we derive many keys from one key. And it, actually, this comes up all the time in practice, so I'd like to make sure you know how to do this correctly. So what's the setting that we're looking at? Well, imagine we have a certain source key that's generated by one of, a number of methods. Imagine the source key is generated by a hardware random number generator or perhaps is generated by a key exchange protocol which we're going to discuss later. But anyhow, there are a number of ways in which a source key might be generated between Alice and Bob, such that the attacker doesn't know what the source key is. But now, as we said, in many cases, we actually need many keys to secure a session, not just one single source key. For example, if you remember, in TLS there were unidirectional keys and we needed keys in each direction. And in fact, in each direction, we needed multiple keys. We needed a MAC key, we needed an encryption key. We need an IV, and so on. Similarly nonce based encryption, you remember, there were multiple keys that were being used, and so on. And so, the question is, how do we use the one source key that we just derived, either from a hardware process or by key exchange, and generate a bunch of keys from it that we could then use to secure our session. The way that's done, is using a mechanism called a key derivation function, KDF. And I want to talk a little bit about how KDF's are constructed. So first of all, suppose we have a secure PRF, that happens to have key space K. And now, suppose that it so happens that our source key SK is uniform in the key K. In this case, the source key is, in fact, a uniform random key for the secure PRF F. And we can use it directly to generate keys, all the keys that we need to secure the session. So in this case, the KDF is really simple. The key derivation function would just work as follows. It would take as input the source key. It would take an input, a parameter context, which I'm gonna describe in just a minute. And then it's gonna take a length input as input as well. And then what it will do is it will basically evaluate the PRF on zero. Then it will evaluate the PRF on one. Then it will evaluate the PRF on two, up until L. And I will talk about what this context is in just a second. And then, basically, you would use as many bits of the output as you would need to generate all the keys for the session. So, if you need unidirectional keys you would generate, you know, one key in each direction where each key might include an encryption key and a MAC key. And so, you would basically generate as many bits as you need and then finally cut off the output at the time when you've generated enough keys to secure your session. Okay so this is a fairly straight forward mechanism it's basically using the secure PRF as a pseudo random generator. And the only question is what is its context string. Well, I'll tell you that the context string is basically a unique string that identifies the application. So in fact you might have multiple applications on the same system that's trying to establish multiple secure keys. Maybe you have SSH running as one process, you have a web server running as another process, IPsec as a third process and all three need to have secret keys generated. And this context variable basically tries to separate the three of them. So, let me ask you, more precisely, what do you think the purpose of this context variable is? So I guess I've given it away and this context variable is supposed to basically separate applications, so that even if, for example, the three services that we just talked about, SSH, web server, and IPsec, if they all happen to obtain the same source key from the hardware random number generator then the context since it's different for the three apps will make sure that they still get three independent strings that they can then use to secure the sessions. I just want you to remember that, even though this is actually fairly straightforward, and we discussed this before, the context string is actually important, and it does need to be specific to the application, so that each application gets its own session keys, even if multiple applications happen to sample the same SK. The next question is, what do we do if the source key actually isn't uniform? Well, now we got a problem. If the source key is not a uniform key for the pseudo random function then we can no longer assume that the output of the pseudo random function is indistinguishable from random. In fact, if we just use the KDF that we just described then the output might not look random to the adversary and then he might be able to anticipate some of the session keys that we'll be using and thereby break the session. So then we have a problem. Now why would this source key not be uniform? Well there are many reasons why this happened. For example if you use a key exchange protocol, it so happens typically that key exchange protocols will generate a high entropy key. But the high entropy key is gonna be distributed in some subspace of the key space. So it's not going to be a uniform string. It will be uniform in some subset of a larger set, And we'll see examples of that as soon as we talk about key exchange protocols. And so KDFs have to kind of accommodate for the fact that key exchange protocols actually don't generate uniform bit strings. The other problem is, that, in fact, the hardware random number generator you're using might actually produce biased outputs. We don't wanna rely on the non bias of the hardware random number generator. And so all we want to assume is that it generates a high entropy string, but one that might be biased. In which case, we have to somehow clean this bias. And so this introduces this, this paradigm for building KDFs. This is called the extract-then-expand paradigm, where the first step of the KDF is to extract a pseudo random key from the actual source key. So in a picture you can think about it like this. In some sense these are the different possible values of the source key. This is the horizontal line and the vertical axis is basically the probability of each one of these values, and you can see that this is a kind of a bumpy function which would say that the source key is not uniformly distributed in the key space. What we do in this case is we use what's called an extractor. So an extractor is something that takes a bumpy distribution and makes it into a uniform distribution over the key space. In our case we're actually just gonna be using what are called computational extractors, namely extractors that don't necessarily produce uniform distribution at the end but they generated distribution that's indistinguishable from uniform. Now extractors typically take as input something called a salt, and a salt just like in a salad, it kind of adds flavor to things, what it does is basically kind of jumbles things around, so that no matter what the input distribution is, the output distribution is still going to be indistinguishable from random. So a salt basically, what is it? It's a non-secret string, so it's publicly known. It doesn't matter if the adversary knows what the salt is, and it's fixed forever. The only point is that when you chose it, you chose one at random. And then the hope is that the funny distribution that you're trying to extract from kinda doesn't inherently depends on the salt that you chose and hence as a result using your salt, you will actually get a distribution that looks indistinguishable from random. So essentially the salt, you know, you can just bang it the keyboard a couple of times when you generate it but it just needs to be something that's random initially but then it's fixed forever, and it's fine if the adversary knows what it is and nevertheless the extractor is able to extract the entropy and output a uniformly random string K. In some sense the salt is only there to defend against adversarially bad distributions that might mess up our extractor. Okay, so now that we have extracted a pseudo random key. Now, we might as well just use it in a KDF that we just saw using a secure pseudo random function to expand the key into as many bits as we need to actually secure the session. Okay, so there are these two steps. The first one is we extract a pseudo-random key, and then once we have a pseudo-random key we already know how to extend it into as many keys as we need using a pseudo-random function. So the standardized way of doing this is called HKDF. This is a KDF, a key derivation function that's built from HMAC. And here HMAC is used both as the PRF for expanding and an extractor for extracting the initial pseudo-random key. So let me explain how this works. So in the extract step, we're gonna use our salt which you remember is a public value just happened to be generated at random at the beginning of time. And we use this salt as the HMAC key. And then the source key we're gonna use as the HMAC data. So we're kind of using a public value as a key. And nevertheless, one can argue that HMAC has extraction properties, such that, when we apply HMAC, the resulting key is going to look indistinguishable from random, assuming that the source key actually has enough entropy to it. And now that we have the pseudo random key we're simply going to use HMAC as a PRF to generate a session key of you know as many bits as we need for the session keys. Okay. So that basically concludes our discussion of HKDF. And I just want you to remember that, once you obtain a source key, either from hardware or from a key exchange protocol, the way you convert it into session keys is not by using that sample directly. You would never use a source key directly as a session key in a protocol. What you would do is you will run the source key through a KDF. And the KDF would give you all the keys and output that you need, for, the randomness, for the random keys to be used in your protocol. And a typical KDF to use is HKDF, which is actually getting quite a bit of traction out there. Okay. The last topic I wanna talk about in this segment is, how do you extract keys from passwords. These are called password based KDFs or PBKDFs. The problem here is that passwords have relatively low entropy. In fact, we're gonna talk about passwords later on in the course when we talk about user authentication. And so, I'm not gonna say too much here. I'll just say passwords generally have very little entropy estimated on the order of twenty bits of entropy, say. And as a result, there is simply not enough entropy to generate session keys out of a password. And yet we still need to do it very frequently. We still need to derive encryption keys and MACs and so on out of passwords, so the question is how to do it. The first thing is, you know, for this kind of purpose, don't use HKDF. That's not what it's designed for. What will happen is that the derived keys will actually be vulnerable to something called a dictionary attack, which we're gonna talk about much later in the course when we talk about user authentication. So, the way PBKDFs defend against this low entropy problem that results in a dictionary attack is by two means. First of all, as before they use a salt, a public, random value that's fixed forever. But in addition, they also use what's called a slow hash function. And let me describe kind of the standard approach to deriving keys from passwords. This is called PKCS#5, and in particular, the version I'll describe is what's called PBKDF1. And I should say that this mechanism is implemented in most crypto libraries so you shouldn't have to implement this yourself. All you would do, you know, you would call a function, you know, derived key from password. You would give the password in as input, and you would get a key as output. But you should be aware of course that this key is not going to have high entropy so in fact it will be guessable. What these PBKDFs try to do is make the guessing problem as hard as possible. Okay. So the way they work, first of all, is, as we said, they basically hash, the concatenation of the password and the salt. And then the hash itself is designed to be a very slow hash function. And the way we build a slow hash function is by taking one particular hash function, say, SHA-256, and we iterate it many, many times, C times. So you can imagine 1000 times, perhaps even a million times. And what do I mean by iterating it? So, well, we take the password and the salt. And we put them inside of one input to the hash function. And then we apply the hash function, oops, let me write it like this. And then we apply the hash function and we get an output, and then we apply the hash function again and we get another output. And we do this again and again and again maybe a thousand times or a million times depending on how fast your processors are and then finally we get the final output that we actually output as, as the key output of this key derivation function. Now what is the point here? Iterating a function 10,000 times or even a million times is going to take very little time on a modern CPU, and as a result, it doesn't really affect the user's experience. The user types in his password, it gets hashed a million times and then it gets output. And maybe that could even take, you know a tenth of a second and the user wouldn't even notice it. However the attacker, all he can do is he can try all the passwords in the dictionary, because we know people tend to pick passwords in dictionaries, and so he could just try them one by one, remember the SALT is public, so he knows what the SALT is. And so he can just try this hash one by one. However because the hash function is slow, each attempt is gonna take him a tenth of second. So if he needs to run through a dictionary, you know, with, with a 200 billion passwords in it, because the hash function is slow, this is gonna take quite awhile. And by doing that, we slow down the dictionary attack, and we make it harder for the attacker to get our session keys. Not impossible, just harder. That's all this is trying to do. Okay, so this is basically what I wanted to say about password based KDFs. As I said, this is not something you would build yourself. All crypto libraries have an implementation of a PKCS#5 mechanism. And you would just call the appropriate function to convert a password into a key, and then use the resulting key. Okay, in the next segment, we're gonna see how to use symmetric encryption in a way that allows us to search on the cipher texts.