The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

Del curso dictado por Universidad de Stanford

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

432 ratings

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

De la lección

Week 2

Kruskal's MST algorithm and applications to clustering; advanced union-find (optional).

- Tim RoughgardenProfessor

Computer Science

In this optional video, I'm going to give you a glimpse of the research frontier on

the problem of computing Minimum Cost, Spanning Trees.

We've now seen two excellent algorithms for this problem.

Prim's algorithm and Kruskal's algorithm. And we had suitable data structures both

running near linear time. Big O of M Log N, where M is the number

of edges, N is the number of vertices.

Now going ahead we should be pretty happy with those algorithms.

We're only a log factor slower than what it takes to read the input.

But again, the good algorithm designer should never be content,

never be complacent. Should always ask can we do better?

Maybe we do even better than M Log N. Well, believe it or not, you can do

better than these implementations. At least in principle,

at least, in theory. You have to work pretty hard and I'm

definitely not going to discuss the algorithms or the analysis that prove

this fact. But let me just mention a couple

references that give MST algorithms with asymptotic run times even better than M

Log N. So quite shockingly, if you're happy with

randomized algorithms, as we were with say, the quit sort sorting algorithm.

Then the minimum cost spanning tree problem can be solved in linear time.

That's obviously an optimal algorithm. You have to look at the whole graph to

compute the minimum cost spanning tree. But you can solve the problem in time a

constant factor, larger than what it takes to read the input.

This algorithm is much, much newer than Prim and Kruskal's algorithm, as you

might expect. It does date back to the 20th century,

but definitely the latter stages of last century.

So the names of a couple of the inventors of this algorithm will already be

familiar to those of you that paid close attention in part one.

so the Carver here is the same as the author of the randomized and cut

algorithm you studied in part one. Tarjan's name is coming up a couple times

in these courses, but for example when we discuss strongly connected component

algorithms. So, what if you're not happy with the

bound just on the bound just on the expected running time, with the

expectation over the coin flips of the algorithm?

What if you wanted a deterministic algorithm?

So, if we think of this randomized algorithm as being analogous to quick

sort in the sorting problem, what would be the analog of merge sort?

Well, it turns out we do not know whether or not there is a linear time

deterministic algorithm for minimum spanning trees,

that's an open question. You might think, here at this, late date,

2012, 2013 we'd know everything there is to know about minimum cost spanning

trees. We do not know, if there exists a linear

time deterministic algorithm. We do know that there's a deterministic

algorithm, with a running time, absurdly close to linear.

Specifically, there's an algorithm with running time M,

the number of edges times alpha of N. So what, you ask, is alpha of N?

What is this function? Well, it's something called the inverse

Ackermann function. So, defining the Ackermann function and

its inverse actually takes a little bit of work.

So I'm not going to do it right now. For those of you that are curious, you

should check out the optional material on the union find data structure.

Which is yet another context where sort of unbelievably, this inverse Ackermann

function arises. But for the present context, what you

should is Is that this is a crazy slow growing function.

It's not constant. For any number C, I can exhibit a large

enough value of N, so that this function values is lease C.

So that alpha of N is a lease C. So, it's not bounded by a constant,

but it's mind boggling how slow growing this function is.

Let me actually just give you an incredibly slow growing function which

actually goes much faster than the inverse Ackermann, just to prove the

point. Specifically, the inverse Ackermann

function grows quite a bit slower than log star N.

Log star N I can define for you easily. We recall our definition of log base two,

way back when we first demystified logarithms at the beginning of part one.

If you type N into your calculator, and you keep dividing by two, you count the

number of times you divide by two until you drop below one,

that's log based two of N. For log star, instead of dividing by two,

you're going to hit the log button on your calculator,

ad you count how many times you hit log, until, the result drops below one.

That number, the number of times you hit log, is defined to be Log star of N.

The log star function as the inverse of the function which is a tower of 2s.

The function which takes as input you know, an integer say T and raises two to

itself T times. So, to appreciate just how slow growing

the log star function is, let alone the inverse Ackerman function.

What you should do is type in the biggest number you can into your nearest

calculator or computer. Just keep typing in nines as long as you

can. Then go ahead and evaluate log star.

Keep applying log function till it drops below one.

Probably, the result's going to be something in the neighborhood of five.

So log star of the biggest number in your calculator or computer is going to be

five. That's how, that's how slow growing this

function is. So, the upshot is that the algorithms

community has the time complexity of the MST problem almost nailed, but not quite,

in the deterministic case. The right answer is somewhere between M

and M times inverse Ackermann function, and we don't know where.

But, you know what? It gets weirder.

So check out the following result of Pettie and Ramachandran.

They in a sense solve, the deterministic minimum cost spanning tree problem by

exhibiting, an algorithm, who's time complexity is optimal. So they gave an

algorithm and they gave a proof, just in the spirit of what we did for sorting.

They gave a proof that no algorithm could have running time asymptotically better

than theirs. But what they didn't do, is explicitly evaluate what the time

complexity of their algorithm is. So they managed to prove optimality,

without actually evaluating exactly what's its running time.

So it's certainly somewhere between linear and linear times inverse

Ackermannn. We know that's the true time complexity

of the problem. We know an algorithm that achieves an

optimal complexity. But to this day we do not know what that

optimal time complexity actually is, as a function of the graph size.

So those are some of the most advanced things that we do know, about the minimum

cost spanning tree problem. Let me just mention a couple of things

that we still, to this day, do not know. So let me start with the randomized

algorithms. Now, maybe you're reaction is, there's no

open questions in the randomized algorithms world, because we know you

need linear time to solve the problem. And I've told you that there is a

randomized algorithm with expected running time that is linear.

So what more could you want? Well, what I want is I want an algorithm

that's not just linear time but also simple enough that I can teach it to

other people. So ideally, it would be another graduate

course like this. But I was actually very happy to have a randomized algorithm,

linear time, simple enough to cover in a graduate course.

The current linear time algorithms do not have that property.

They're more complicated than I can even cover in a graduate course.

To accomplish this task, it turns out to be enough to solve a seemingly simpler

task, namely, to get a simple randomized linear time algorithm for the MST

verification problem. So let me tell you what that means.

So in a MST problem you're supposed to optimize amongst all exponentially

minimum spanning trees. You got to find me the one with the

smallest sum of edge cost. In MST verification, the job seems

simpler. I'm actually going to hand you, a

candidate MST or I'm going to hand you a spanning tree.

It may or may not be the best one and you just need to check, is it the optimal one

or not. Furthermore, in the event that it's not

optimal you should tell me edges that are not in the spanning tree.

Edges that are too expensive that I should throw out.

The reason it's enough to design a linear time algorithm for this seemingly simpler

problem, is that the content of the Karger-Klein-Tarjan paper, is a

reduction. A randomized reduction from optimizing

over spanning trees, to the seemingly simpler problem of MST verification.

Moreover, all of the novel content in the Karger-Klein-Tarjan algorithm.

It's linear time, it's randomized, and it is really simple.

I teach this stuff in that paper in my graduate classes.

But it needs this MST verification subroutine as a black box.

And the only known implementations that are linear time for MST verification are

quite complicated. So, find a simple way to do MST

verification that runs in linear time. And you're good to go with your simple

optimal MST algorithm. For deterministic algorithms, the holy

grail is obvious. We'd love to have a deterministic

algorithm for MST that runs in linear time.

Or at the very least, we just need to figure out, you know, what is the best

possible time complexity of any deterministic MST algorithm.

So one of the takeaways of this discussion is that, you know?

For all the amazing advances by computer scientists on the design and analysis of

algorithms over the past 50 years or so. There are still totally fundamental

things that we do not understand. So there's still great ideas to come in

the future. So if you've been intrigued by some of

the things that I've said in this video. And you want to learn more about these

advanced minimum cost spanning tree algorithms.

For further reading, I'd recommend a survey by Jason Eisner, called State of

the Art Minimum Spanning Tree Algorithms. It's about fifteen years old now.

But it's still an amazing resource for learning about this advanced material.

Coursera brinda acceso universal a la mejor educación del mundo, al asociarse con las mejores universidades y organizaciones, para ofrecer cursos en línea.