I'm now ready to describe the Lloyd Algorithm for k-means clustering, and I will use this simple data set to describe how the Lloyd Algorithm works. This data set is almost too simple, but it will help us to figure out the steps of the Lloyd Algorithm. Now, the Lloyd Algorithm starts from selecting k datapoints as cluster center. Randomly selecting them, here is our selection. After the centers I selected, what is the most reasonable assignment of datapoints to centers? Of course, the best way to assign data points to center is to assign each data set to the closest centers. And this step is called centers to clusters step of the Lloyd Algorithm. Now, after we constructed clusters, we should question our wisdom in initial choice of centers. Let's take a look at the blue cluster. Its center is definitely in the wrong position. Because as we learned from the center of gravity theorem, the optimal position of centers for every cluster should be at the center of gravity of the cluster. And therefore the clusters to centers step of the Lloyd algorithm is essentially moving all centers to the centers of clusters. And after we've done it, what should we do? Well, we will do exactly the same we do before. We once again will assign data point to clusters. Assigning each data point to its closest cluster, and in this case, that's how the new clusters will look like. And this is, once again, centers to clusters step of the Lloyd Algorithm. And after it is done, we once again move centers to the center of gravity until point stop moving at this stage when all points stop moving, we declare that Lloyd Algorithm is complete. So to summarize, the Lloyd Algorithm selects k arbitrary points at cluster centers and iteratively alternates between centers to clusters, and clusters to centers step. Of course, a few question remains. Will the Lloyd Algorithm ever converge? If it converge, how much time will it take? And, if it converge, will it converge to global optimum of the k-means clustering problem or to local optimum of the k-means clustering problem? Let's firs prove that the Lloyd Algorithm will converge. Indeed, if a data point is assigned to a new center during the centers to cluster step, then the squared error distortion is reduced because this center must be closer to the point than the previous center was. On the other hand, if center is moved during the cluster to center step. Then the squared error distortion is also reduced since the center of gravity of the cluster is the only point that minimizes the distortion by the center of gravity theorem. Therefore, the Lloyd Algorithm converges. But it unfortunately cannot converge to local minimum rather than global minimum and that's why it is causing me to run the Lloyd Algorithm many times starting from different centers. It also may take quite a long time to converge. So now with Lloyd Algorithm, we can actually select the yeast genes that show some variation in behavior during seven points when we studied the diauxic shift. And they will be roughly 200, 300 genes in the set of genes that change expression during the diauxic shift because indeed, the lion's share of yeast genes are not implicated in the diauxic shift. But 200 is already large number, and we want to cluster these genes. This as a result of the clustering of this set into six clusters, we can average each of these set of genes resulting in this plot and you can clearly see different clustering behavior. Genes with increasing gene expression, sometimes also after increase there is a small decrease in gene expression on top of this slide and genes with decreasing gene expression at the bottom of this slide. We will learn later how this different behavior translate into interesting biology, but now our next step is to introduce a new idea for clustering which is called soft clustering.