[MUSIC] We now have a very nice algorithm for the set cover problem, the sample-and-iterate algorithm. Let us now turn to its analysis. As a reminder, the algorithm repeatedly picks a set at random with probability proportional to x sub s, put it in the cover, and stops once the set's chosen from a set cover. How do we analyze this? This is perhaps the trickiest analysis in this chapter and maybe in the entire course. So we have to go slowly and pay attention. First, sum notations. How do we express the cost of the output? The cost of the output is the sum over every step of the cost of the set chosen at iteration t. What is the range of t? T goes all the way to capital T, the total number of iterations, which is something random, the stopping time of our process. So the cost of the output is the sum over all t, from 1 to capital T, of C sub t. This is what we want to analyze. How do we analyze it? It's a random quantity. We're going to analyze its expected value. Expected value of a sum from t going from 1 to capital T of C sub t. What did I tell you in my last discussion? When we have the expectation of the sum, we invert summations and expectation. So we'd like to write that this is the sum of the expectations. Except there's a problem here. T itself is random. So we'd like to write that this is capital T times the expected value of Ct. Or we'd like to write that this is the sum of all t from 1 to capital T of the expected value of C sub t. But what do we do with capital T? Capital T is random so we cannot exchange expectation and summation here. So we have a problem here. We cannot apply the recipe I gave last time. So what do we do now? We turn to probablists and ask them for a trick. And here it is. If we have a stopping time, if we have some random variables bounded from above. If the expected value of X sub t is less than u, and the stopping time and expectation is less than infinity, then the expected value of the sum of X sub t is less than mu times the expectation of capital T. This is one way in which we sort of invert expectation and summation. So as soon as we have an upper bound for X sub t, then we can use it and invert summations and we can continue our analysis. So the next step will be let's try to analyze the expected value of X sub t and bounded from above. What is X sub t in our case? X sub t is the cost of a set chosen at time t. So we want to upper bound the expected value of C sub t, the expected cost of a set chosen at time t. What is it? It's equal to the sum over every set of the indicated function of whether that set is chosen at time t. And you think expectations. And then what do you do? You invert expectation and summation. And then what do you do? You have an expectation of an indicator function. And what do you do if this is a probability? We have done this before, I'll skip this. And sum of every set of the probability that this set is chosen during a duration t times the cost of a set. What is the probability that S is chosen? We said X sub s divided by the sum of all Xs for all sets. So we've replaced the probability by this by its value, and we get this expression in the numerator, sum of every s of x sub s cs. Denominator, sum of every s of x sub s. This is independent of little t. Let's call this mu. Now we can apply Wald's equation, and what do we get? The expected cost of the output is the expected number of iterations times mu where mu is this quantity. So to analyze the cost of the output, now we need to look at the expected value of T, the expected number of iterations. Once we have that, we will know the value of the expected cost of the output. So what is the expected number of iterations? For that, we'll need more notations. How do we determine the number of iterations? It depends on whether the sets we have chosen from a set cover. How do we know that? We have to look at the elements that are not yet covered. Let us define n sub T as the number of elements that are not yet covered after t iterations. What do we know about n sub T? Initially, no element is covered. n0 is n. In the end, every element is covered, n capital T is zero. Right before the end, we are not quite done yet. At least one element is not yet covered. So nT-1 is at least 1. How will this be useful to analyze capital T? The answer is Wald's equation for dependent decrements. What is this? Consider a random decreasing sequence. We will apply it for our n sub Ts. And the stopping time, T, whose expectation is finite. Assume that a certain property is satisfied on the n sub ts, how they change at each step in expectation. Then we have an upper bound for the expected value of t of the total number of durations. So in order to bound e sub t, e of t, we want this expression as our upper bound. And for that, we need to analyze the change in n sub t over one step. The number of elements not yet covered at time t versus a time t plus 1. Consider this. What does this represent? You condition on n sub t. You assume you know how many elements are not yet covered. n sub t- n sub t + 1 is exactly the number of elements that will be covered in the next iteration, in iteration t + 1. And we want the expectation, so let's fix an element E among the n sub t that are not yet covered. And let's look at the probability that E is covered in the next iteration. The probability that it's covered is the probability that the set S that is chosen by the algorithm at iteration t + 1 contains element E. What is that? It's the sum of every set containing E of the probability that S is chosen. X sub s, normalized. And what do we know about this? Well, because of the LP relaxation constraint, we know that this is at least 1. And therefore, we know that this quantity is at least 1 over the sum of all the x sub s's. Sum over every element, and this gives us our desired lower bound on the expected value of nt minus nt plus 1 given nt. What is this lower bound? It's the sum, this expected value is at least n sub t over the sum. All right. The expected value of the change given n sub t is at least n sub t over this sum. This is our function f of n sub t. And so now, we can apply Wald's equation for dependent decrements. And what do we get? The expected value of capital T is at most 1 plus expected value of integral. Integral of what? Let's see. N sub t- 1 is at least 1. N sub 0 is n. And what is f of z? 1 over f of z is the sum of all the x sub s's over n sub t for z over z. So here we go. And this integral, we can take the sum out, and we know that this integral is exactly a logarithm. So we get this upper bound on the expected value of T, 1 plus the sum of the x sub s times the log of n. Put everything together. The expected cost of the output is at most the expected value of capital T, 1 plus the sum of the x's by the log of n times the expected cost of c sub t, which is this quantity here, sum of csxs over the sum of the xs. What is this? Since the numerator is the objective of the LP, this is at most 1 + log n times OPT. So what have we proved? We've proved that the algorithm, the sample-and-iterate algorithm gives as an output a collection of sets that is a set cover and the average cost of the output is at most 1 + log n times OPT. Very good. So we're almost done. We've set cover. What remains to be done? I will discuss further improvements In particular, how to remove the word average from the theorem. Then I will discuss possible or impossible improvements, and finally give you a little bit of background about the people who have worked on this.