So in this video we'll finally discuss Huffman's Algorithm which is a greedy Algorithm that constructs the prefix free binary code minimizing the average encoding length. So, let me just briefly remind you the formal statement of the compositional problem that I left you with last time. So, the input is just a frequency for each symbol I, coming from some alphabet sigma. So the responsibility of the algorithm is to get optimal compression, so to compute an optimal code. So the code has to be binary. We have to use only zeros and ones. We have to be prefix free meaning the encodings of any two characters, neither one can be a prefix of the other, that's to facilitate unambiguous decoding. And finally, the average number of bits needed to encode a character, where the average is with respect to the input frequencies, should be as small as possible. Now remember these kinds of codes correspond to binary trees. The prefix free condition just says that the symbols of sigma should be in one to one correspondence with the leaves of the tree. And finally remember the encoding lengths of the various symbols just correspond to depths of the corresponding leaves. So we can formally express the averaging coding length. Which given a legal tree capital T. I'm using the notation capital L of T. So what do we do? We just take the average over the symbols of the alphabet weighted by the provided frequencies and we just look at the number of bits used to encode that symbol. Equivalently depth of the corresponding leaf in the tree T. So we want the tree that makes this number as small as possible. So this task is a little bit different than any we've seen so far in this course, right? All we're given as an input is an array of numbers and yet we have to produce this actual full blown tree. So how can we just take a bunch of numbers and produce them in a sensible, principled way? Some kind of tree whose leaves correspond to symbols of the alphabet. So let's spend a slide just thinking about, you know, at a high level where would this tree come from, how would you build it up given this unstructured input. So there's certainly no unique answer to this question and one idea which is very natural but that turns out to be sub optimal is to build a tree using a top down approach, which you can also think of as an. Initiation of the divide and conquer algorithm design paradigm. The divide and conquer paradigm, you'll recall, involves breaking the given sub-problem into usually multiple, smaller sub-problems. They're solved recursively, and the solutions are then combined into one for the original problem. Because trees, the desired output here, have a recursive substructure, it's natural to think about applying this paradigm to this problem. Specifically you love to just lean on recursive call to construct for you the left sub tree, another sub call constructing the right sub tree. And then you can fuse the results together under a common root vertex. So it's not clear how to do this partitioning of the symbols into two groups, but one idea to get the most bang for your buck, the most information out of the first bit of your encoding. You might want to split them in, the symbols, into groups that have roughly, as close to as possible, of 50% of the overall frequency. So this topped out approach is sometimes called Fanno-Shannon coding. Fanno was Huffman's graduate adviser. Shannon is the Claude Shannon inventor of information theory. But Huffman in a term paper believe it or not, realized the topped down is not the way to go. The way to go is to build the tree from the bottom up. Not only are we going to get optimal codes, but we're going to get the blazingly fast greedy algorithm that constructs them. So what do I mean by bottom up? I mean we're going to start with just a bunch of nodes, each one labelled with one symbol of the alphabet. So, in effect, we're starting with the leaves of our tree. And then we're going to do successive mergers. We're going to take at each step two sub-trees thus far and link them together as sub-trees under a common internal node. So, I think you'll see what I mean in a picture. So imagine we want to build a tree where the leaves are supposed to be just A, B, C, and D. So one merger would be oh, well let's just take the leaves C and D and link them as siblings under a common ancestor. Now in the second step let's merge the leaf B with the sub-tree we got from the previous merge, the sub-tree comprising the nodes C, D, and their common ancestor. So now of course we have no choice but to merge the only two sub-trees we have left, and then that gives us a single tree. Which is in fact exactly the same lopsided tree we were using in the previous video as a running example. So let me explain what I hope is clear and what is maybe unclear at this juncture. I hope what's intuitively clear is that the bottom of approaches, a systematic way to build trees that have a prescribed set of leaves. So what do we want? We want trees whose leaves are labeled with the symbols of the alphabet sigma. So if we have an alphabet with n symbols, we're going to start with just the N leaves. What does a merger do? Well, there's two things. First of all, it introduces a new internal node, an unlabelled node. And secondly, it takes two of our old subtrees and fuses them into one, merges them into one. We take two subtrees, we make one the left child of this new internal node, the other, the right child of this new internal node. So that drops the number of subtrees we're working with by one. So if we start with the N leaves and we do N minus one successive mergers, what happens? Well on the one hand, we introduce N minus one new unlabeled internal nodes. And on the other hand, we construct a single tree. And the leaves of this tree that we get are in one to one correspondence with the alphabet letters as desired. Now what I don't expect you to have an intuition for is what should we be merging with what and why? Forget about, you know, how do we get an optimal tree at the end of the day. I mean, even just if we wanted to design a greedy algorithm. If we just wanted to make a myopic decision, that looks good right now, how would we even do that? What's our greedy criteria that's going to guide us to merge a particular pair of trees together? So we can re-frame this quandary in the same kind of question we asked for minimum cost spanning trees and really more generally with greedy algorithms. When you're making irrevocable decisions which strikes fear in your heart, is that this decision will come back and haunt you later on. You'll only realize at the end of the algorithm that you made some horrible mistake early on in the algorithm. So just as for MST's, we ask, you know, when can we be sure that including an edge irrevocably is not a mistake, it's safe in the tree that we're building? Here we want to ask, you know, we have to do a merger, we want to do successive mergers and how do we know that a merger is safe? That it doesn't prevent us from eventually computing an optimum solution. Well here's the way to look at things that will at least give us an intuitive conjecture for this question. We'll save the proof for the next video. So what are the ramifications when we merge two subtrees, each containing a collection of symbols? Well, when we merge two subtrees, we introduce a new internal node which unites these two subtrees under them, and if you think about it, at the end of the day on the final tree, this is yet another internal node that's going to be on the root to leaf path, for all of the leaves in these two sub trees. That is, if you're a symbol and you're watching your subtree get merged with somebody else. You're bummed out, your like, man that's another bit in my encoding. That's yet one more node I have to pass through to get back to the root. I think this will become even more clear if we look at an example. So naturally, we'll use our four symbol alphabet A, B, C, D. And initially, before we've merged anything, each of these is just its own leaf A, B. C, D. So there's no internal nodes above 'em. In the sense, everybody's encoding length at the beginning is zero bits. So now, imagine we've merged C and D, we introduce a new internal node, which is the common an-ancestor of these two leaves. And as a result, C and D are bummed out. They said, well, there's one bit that we're going to, to have to incur our encoding length, there's one new internal node we're always going to have to pass through enroute back to the root of the eventual tree. Now suppose next we merge B with the subtree containing both C and D. Well everybody but A is bummed out about, about this merger because we introduce another internal node, and it contributes one bit to the encoding of each of B, C, and D. It contributes an extra one to the encoding of C and D, and it contributes a zero to the encoding of B. So all of their encodings in some sense inc, get incremented, go up by one, compared to how things were before. Now in the final iteration, we have to merge everything together. We have no choice, there's only two sub-trees left. So here, everybody's encoding length gets bumped up by one. So, A finally picks up a bit at zero to encode it and B, C, and D each pick up an additional bit of one, which is prepenned into their encodings thus far. And, in general what you'll notice is that the final encoding length of a symbol, is precisely the number of mergers that its subtree has to endure. Every time your subtree gets merged with somebody else you pick up another bit in your encoding, and that's because there's one more internal node that you're going to have to traverse enroute to the root of the final tree. In this example, the symbols C and D, well they got merged with somebody else in each of the three iterations. So, they're the two symbols that wind up with the encoding length of three. The symbol B, it didn't get merged with anybody in the first iteration, only the second two. That's why it has an encoding length of two. So this is really helpful. So this, lets us relate, the operations that our algorithm actually does, namely mergers back to the objective function that we care about, namely the average encoding length. Mergers increase the encoding lengths of the participating symbols by one. So this allows us to segway into a design of a greedy heuristic for how to do mergers. Let's just think about the very first iteration. So we have our N original symbols, and we have to pick two to merge. And remember the consequences of a merge is going to be an increase in the encoding length by one bit, whichever two symbols we're going to pick. Now we want to do is minimize the average encoding length with respect to the frequencies that were given. So which pair of symbols are we the least unhappy to suffer an increment to their encoding length, was going to be the symbols that are the least frequent. That's going to increase your averaging poding length by the least. So that's the green merging hiuristic. Somethings gotta increase. Pick the ones that are the least frequent to be the ones that get incremented. So that seems like a really good idea of how to proceed in the first iteration. The next question is, how are we going to recurse? So, I'll let you think about that in the following quiz. So let's agree that the first iteration of our greedy heuristic is going to merge together the two symbols that possess the lowest frequencies. Let's call those two symbols little A and little B. The question is then how do we make further progress? What do we do next? Well, one thing that would be really nice is if we could somehow recurse on a smaller subproblem. Well, which smaller subproblem? Well, what does it mean that we've merged together the symbols A and B? Well, If you think about it, in the tree that we finally construct by virtue of us merging A and B, we're forcing the algorithm to output a tree in which A and B are siblings, in which they have exactly the same parent. So what does it mean for the encoding that we compute that A and B are going to be siblings with the same parent? It means that their encodings are going to be identical, save to the lowest order bits. So A will get encoded with a bunch of bits followed by a zero, B will be encoded by exactly the same prefix of bits followed by A1. So they're going to have almost of the same encoding, so for our recursion let's just treat them as the same symbol. So let's introduce a new meta symbol, let's call it AB, which represents the conjunction of A and B. So it's meant to represent all of the frequencies of either one, all of the occurrences of either one of them. But remember the input to the computational problem that we're studying is not just an alphabet, but also frequencies of each of these symbols of that alphabet. So my question for you is when we introduce this new meta symbol A B. What should be the frequency that we define for this meta symbol? All right, so hopefully your intuition suggested answer C, that for this recursion to make sense, for it to conform to our semantics of what this merging does. We should define the frequency of this new meta symbol to be the sum of the frequencies of the two symbols that it's replacing. That's because, remember this meta symbol AB is meant to represent all occurrences of both A and B. So it should be the sum of their frequencies. So I'm now ready to formally describe Huffman's greedy algorithm. Let me first describe it on an example and then I think of the general code will be, self evident. So let's just use our usual example, so we're going to have letters A, B, C, D, with frequencies 60, 25, 10, and 5. So we're going to use the bottom up approach, so we begin with each symbol just as its own node, A, B, C, D. I'm annotating in red the frequencies 60, 25, 10, 5. The greedy heuristic says initially we should merge together the two nodes that have the smallest frequencies. So that would be the C and the D with their frequencies of 10 and 5. Now based on the idea of the last slide when we merged C and D, we replaced them with a meta symbol cd whose frequency is the sum of the frequencies of C and D, namely fifteen. Now we just run another iteration of the greedy algorithm meaning we merge together the two nodes, the two symbols that have the smallest frequencies. So this is now B 25, and CD 15. So now we're down to just two symbols, the original symbol A, which still has frequencies 60 and the in some sense, meta, meta symbol BC, D who is now cumulative frequency is 40. So when we're down to two nodes, that's going to be the point in Huffman's algorithm where we hit the base case and the recursion begins to unwind. So if you just have two symbols to encode there's pretty much only, one sensible way to do it. One is a zero, and one is a one. So at this point, the final recursive call just returns now a tree with two leaves corresponding to the symbols A and BCD. And now is the recursion on lines we in effect undo the mergers. So for each merge we do a corresponding split of the meta node and we replace that with an internal node. And then two children corresponding to the symbols that were merged to make that meta node. So for example, when we want to undo the merge of the B and the CD, we take the node labeled BCD and we split it into two. The left, left child being B, the right child being CD. So the original outer most call, what it gets from its rehersive call is this tree with three nodes and it has to undue it's merger, it merged C and D. So it takes the leaf labeled with symbol CD and splits it into two. It replaces it with a new unlabeled internal node, left child C, right child D. So how does the algorithm work in general? Well just as you'd expect given the discussion on this concrete example. I'm going to take as the base case when the alphabet has just two symbols, in that case the only thing to do is encode one with a zero, the other with a one. Otherwise, we take the two symbols of the alphabet that have the smallest frequencies. You can break ties arbitrarily, it's not going to matter, it's going to be optimal either way. We then replace these two low frequency symbols A and B with meta symbol AB, intended to represent both of them in some sense. As we just discussed with those semantics, we should be defining the frequency of the meta symbol AB as the sum of the frequencies of the symbols that it comprises. We now have a well defined, smaller sub problem. It has one fewer symbol than the one we were given. So, we can recursively compute a solution for it. So, what the recursive call returns is a tree who's leaves are in one to one correspondence with sigma prime. That is, it does not have a leaf labeled A, it does not have a leaf labeled B. It does have a leaf labeled AB, so we want to extend to this tree T prime to be one who's leaves correspond to all of sigma. And the obvious way to do that is to split the leaf labeled AB, replace that with a new internal unlabeled node with two children which are labeled A and B. The resulting tree capital T with leaves and correspondents to the original alphabet sigma is then the final output of Huffman's Algorithm. As always, with a greedy algorithm we may have intuition, it may seem like a good idea. But we can't be sure without a rigorous argument. That is the subject of the next video.