Hi, I'm Jaemyung Kim from Sungkyunkwan University. From this video, I want to take or look at about the motion learning algorithm with all supervised learning, especially on the clustering. Today's topic is about k-means and k-medoids. I want to talk about the introduction first and then explain sequentially with k-means and k-medoids. In clustering problem, let's first define what is cluster? Cluster can be said a group of the similar object. Good cluster means have high intra-cluster similarity and row inter-cluster similarity. Clustering has no predefined group information, and classification needs the predefined group as a class. Clustering also is learning by observation. Class discovery is the aim of the clustering. If we had these data's on here, you can see in this slide there is no labels on the data. The clustering want to make a hidden structures of it by grouping of similar object. I've explained roughly about the clustering problem, but we need more problem definition detail with this. I can say clustering problem is when we have a given vectors of object and you talk partition problem from V into k clusters. Firstly, we have a vector V from v1 to v_n. As I said, good clustering should minimize the sum of distance error criterion. If we can define the errors or which E_i we can sum of all of the object and it would be the total error of the clustering. The error criterion would be the dissimilarity of each object in cluster. For example, we know the distance of v_n object i, we can summarize all of them in one cluster. It would be done error criterion. Based on error criterion, we can make the cluster two clusters like this. We can also make our cluster like this. From now on, let me introduce about k-means algorithm. We use gravity center of the object. The clustering algorithm gets input. The k is the number of cluster and the input of the set vector V of n objects. With this algorithm, we can get the output or set of k clusters minimizing the sum of distance error criterion. This is the very simple step to the k-Means method. First, we need to choose K objects as the initial cluster center. Then we repeatedly do this step, fast it for each object p in V, find the nearest center p, and assign p to it. After that, we compute the mean of the cluster, as the new clusters. This is a graphical example how k-Mean works. We first need to make our three cluster centers. In this case, the k value is three. Then we can assign all of the object to the clusters, nearest cluster centers. There is a three output as our cluster and this is a one-step. Then we re-calculate the cluster center among the cluster object. The cluster center would be moved to here and there and then we may reassign to the each object for the nearest cluster center. Then we can also update the cluster center by calculating all object in cluster. K-means clustering it's very simple. It is relatively efficient of calculation. There is a three object about the efficiency. One is the number of objects, the other is of the number of clusters, and t is the number of iterations. Normally, our n value is much greater than k and t. It's so relatively efficient. But there is a weakness of k-means clustering method. It can be applicable only when mean is defined. Equilibrium distance would be okay, but it is not proper for all the spaces. Second, it has a weakness on need to specifying k in advance and also it is unable to handle noisy data and outliers. It is not suitable to discover cluster with non-convex shapes. Let's move on to the k-medoids algorithm. The purpose of k-medoids is to find representative object coded medoids in cluster. This algorithm only distances between two objects needed to be defined. P-A-M, which means partitioning around medoids is the implication of k-medoid method. It does not scale well for large dataset. The method is very simple. First we select k object as our initial cluster centers, and then assign non-selected object to the nearest centers. Then we calculate the cost of summation of distances. For each pair of non-selected object h and selected object c, select h and de-select c if the cost of swapping c with h improves the cost. I also want to explain this algorithm with great example. You can randomly select the k in here with three medoids. Then we made a cluster of it, then we'll randomly select the h on the cluster and if we had interval with changing of center to h we accept it as a center. By the center of the h, we reassign all of the object to the centers. Let's just sum up this video with this slide. I've said about the introduction of clustering, which one and our aims and good criterion of clustering. K-means clustering algorithm is one of the representative idea to cluster. K medoids is one of the variation. It uses the medoid which is called an object. Thank you.