The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

From the course by Stanford University

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

250 ratings

Stanford University

250 ratings

Course 3 of 4 in the Specialization Algorithms

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

From the lesson

Week 2

Kruskal's MST algorithm and applications to clustering; advanced union-find (optional).

- Tim RoughgardenProfessor

Computer Science

In this optional video, I'm going to give you a glimpse of the research frontier on the problem of computing Minimum Cost, Spanning Trees. We've now seen two excellent algorithms for this problem. Prim's algorithm and Kruskal's algorithm. And we had suitable data structures both running near linear time. Big O of M Log N, where M is the number of edges, N is the number of vertices. Now going ahead we should be pretty happy with those algorithms. We're only a log factor slower than what it takes to read the input. But again, the good algorithm designer should never be content, never be complacent. Should always ask can we do better? Maybe we do even better than M Log N. Well, believe it or not, you can do better than these implementations. At least in principle, at least, in theory. You have to work pretty hard and I'm definitely not going to discuss the algorithms or the analysis that prove this fact. But let me just mention a couple references that give MST algorithms with asymptotic run times even better than M Log N. So quite shockingly, if you're happy with randomized algorithms, as we were with say, the quit sort sorting algorithm. Then the minimum cost spanning tree problem can be solved in linear time. That's obviously an optimal algorithm. You have to look at the whole graph to compute the minimum cost spanning tree. But you can solve the problem in time a constant factor, larger than what it takes to read the input. This algorithm is much, much newer than Prim and Kruskal's algorithm, as you might expect. It does date back to the 20th century, but definitely the latter stages of last century. So the names of a couple of the inventors of this algorithm will already be familiar to those of you that paid close attention in part one. so the Carver here is the same as the author of the randomized and cut algorithm you studied in part one. Tarjan's name is coming up a couple times in these courses, but for example when we discuss strongly connected component algorithms. So, what if you're not happy with the bound just on the bound just on the expected running time, with the expectation over the coin flips of the algorithm? What if you wanted a deterministic algorithm? So, if we think of this randomized algorithm as being analogous to quick sort in the sorting problem, what would be the analog of merge sort? Well, it turns out we do not know whether or not there is a linear time deterministic algorithm for minimum spanning trees, that's an open question. You might think, here at this, late date, 2012, 2013 we'd know everything there is to know about minimum cost spanning trees. We do not know, if there exists a linear time deterministic algorithm. We do know that there's a deterministic algorithm, with a running time, absurdly close to linear. Specifically, there's an algorithm with running time M, the number of edges times alpha of N. So what, you ask, is alpha of N? What is this function? Well, it's something called the inverse Ackermann function. So, defining the Ackermann function and its inverse actually takes a little bit of work. So I'm not going to do it right now. For those of you that are curious, you should check out the optional material on the union find data structure. Which is yet another context where sort of unbelievably, this inverse Ackermann function arises. But for the present context, what you should is Is that this is a crazy slow growing function. It's not constant. For any number C, I can exhibit a large enough value of N, so that this function values is lease C. So that alpha of N is a lease C. So, it's not bounded by a constant, but it's mind boggling how slow growing this function is. Let me actually just give you an incredibly slow growing function which actually goes much faster than the inverse Ackermann, just to prove the point. Specifically, the inverse Ackermann function grows quite a bit slower than log star N. Log star N I can define for you easily. We recall our definition of log base two, way back when we first demystified logarithms at the beginning of part one. If you type N into your calculator, and you keep dividing by two, you count the number of times you divide by two until you drop below one, that's log based two of N. For log star, instead of dividing by two, you're going to hit the log button on your calculator, ad you count how many times you hit log, until, the result drops below one. That number, the number of times you hit log, is defined to be Log star of N.

The log star function as the inverse of the function which is a tower of 2s. The function which takes as input you know, an integer say T and raises two to itself T times. So, to appreciate just how slow growing the log star function is, let alone the inverse Ackerman function. What you should do is type in the biggest number you can into your nearest calculator or computer. Just keep typing in nines as long as you can. Then go ahead and evaluate log star. Keep applying log function till it drops below one. Probably, the result's going to be something in the neighborhood of five. So log star of the biggest number in your calculator or computer is going to be five. That's how, that's how slow growing this function is. So, the upshot is that the algorithms community has the time complexity of the MST problem almost nailed, but not quite, in the deterministic case. The right answer is somewhere between M and M times inverse Ackermann function, and we don't know where. But, you know what? It gets weirder.

So check out the following result of Pettie and Ramachandran. They in a sense solve, the deterministic minimum cost spanning tree problem by exhibiting, an algorithm, who's time complexity is optimal. So they gave an algorithm and they gave a proof, just in the spirit of what we did for sorting. They gave a proof that no algorithm could have running time asymptotically better than theirs. But what they didn't do, is explicitly evaluate what the time complexity of their algorithm is. So they managed to prove optimality, without actually evaluating exactly what's its running time. So it's certainly somewhere between linear and linear times inverse Ackermannn. We know that's the true time complexity of the problem. We know an algorithm that achieves an optimal complexity. But to this day we do not know what that optimal time complexity actually is, as a function of the graph size. So those are some of the most advanced things that we do know, about the minimum cost spanning tree problem. Let me just mention a couple of things that we still, to this day, do not know. So let me start with the randomized algorithms. Now, maybe you're reaction is, there's no open questions in the randomized algorithms world, because we know you need linear time to solve the problem. And I've told you that there is a randomized algorithm with expected running time that is linear. So what more could you want? Well, what I want is I want an algorithm that's not just linear time but also simple enough that I can teach it to other people. So ideally, it would be another graduate course like this. But I was actually very happy to have a randomized algorithm, linear time, simple enough to cover in a graduate course. The current linear time algorithms do not have that property. They're more complicated than I can even cover in a graduate course. To accomplish this task, it turns out to be enough to solve a seemingly simpler task, namely, to get a simple randomized linear time algorithm for the MST verification problem. So let me tell you what that means. So in a MST problem you're supposed to optimize amongst all exponentially minimum spanning trees. You got to find me the one with the smallest sum of edge cost. In MST verification, the job seems simpler. I'm actually going to hand you, a candidate MST or I'm going to hand you a spanning tree. It may or may not be the best one and you just need to check, is it the optimal one or not. Furthermore, in the event that it's not optimal you should tell me edges that are not in the spanning tree. Edges that are too expensive that I should throw out.

The reason it's enough to design a linear time algorithm for this seemingly simpler problem, is that the content of the Karger-Klein-Tarjan paper, is a reduction. A randomized reduction from optimizing over spanning trees, to the seemingly simpler problem of MST verification. Moreover, all of the novel content in the Karger-Klein-Tarjan algorithm. It's linear time, it's randomized, and it is really simple. I teach this stuff in that paper in my graduate classes. But it needs this MST verification subroutine as a black box. And the only known implementations that are linear time for MST verification are quite complicated. So, find a simple way to do MST verification that runs in linear time. And you're good to go with your simple optimal MST algorithm. For deterministic algorithms, the holy grail is obvious. We'd love to have a deterministic algorithm for MST that runs in linear time. Or at the very least, we just need to figure out, you know, what is the best possible time complexity of any deterministic MST algorithm. So one of the takeaways of this discussion is that, you know? For all the amazing advances by computer scientists on the design and analysis of algorithms over the past 50 years or so. There are still totally fundamental things that we do not understand. So there's still great ideas to come in the future. So if you've been intrigued by some of the things that I've said in this video. And you want to learn more about these advanced minimum cost spanning tree algorithms. For further reading, I'd recommend a survey by Jason Eisner, called State of the Art Minimum Spanning Tree Algorithms. It's about fifteen years old now. But it's still an amazing resource for learning about this advanced material.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.