Hi, in this lesson, you will learn how to build suffix tree of a string given its suffix array in linear time. At first we'll explore some connections between suffix array and suffix tree, and then we'll learn to compute some additional information to the suffix array. And then finally we will use suffix array and the traditional information called LCP array to build a suffix tree. First recall the problem. It's very simple. You're given a string S and you need to compute its suffix tree. And you already know how to do that actually. But the algorithm you know works in square time, and so it will work only for short strings, maybe up to 1,000 or 10,000 characters. And if you want to build suffix tree for strings of length of millions or billions, you will need a much faster algorithm. And after you learn this lesson, you will know how to build suffix tree in time, length of string times logarithm of these lengths, because you can build suffix array in this time and then construct suffix tree from the suffix array in linear time. So the general plan is to construct suffix array in time as log S, then compute some additional information called LCP array in the linear time. And then given both suffix array and this additional information, construct the suffix tree in linear time. First, let's explore how suffix array and suffix tree are connected. Here we have a string S, ababaa$. And again, we insert $ in the nth which is smaller than any of the characters of the string both to build suffix array and then to build suffix tree from it. And on the left, we have in the column. All the suffixes of the string S sorted in lexicographic order. So that is basically the suffix array. And on the right, we have the fully built suffix tree of the string, which is already compressed so that you see that on the edges we have not single letters but whole sub strings of string S. And by the way, interesting question is how do we store suffix tree? We shouldn't, of course, store the sub strings that are written on the edges directly because that could lead to quadratic memory usage and we want linear memory usage. So instead of storing the sub strings themselves, we just store the index of the start, and index of the end index of the corresponding sub string. So for each edge, we store two indexes, the start of that edge in the string and end of that edge in the string. And to store the nodes, we just store, for example, an array of pointers to the children nodes. And that array is indexed by the first character of the edge outgoing from this node into the child. And we can store the information about the edge itself in the node for which this edge is going from its parent. This is one of the ways to store everything but you may organize everything in another way. The important thing is that you shouldn't store edges as substrings. So what corresponds in the suffix tree to the suffix array elements? Let's take the first element of the suffix array. Actually, it is corresponding to suffix in the string S and that is corresponding to a leaf in the suffix tree and also to the path from a root vertex to the corresponding leaf vertex in the tree. So the first element of the suffix array corresponds to this route highlighted in blue, and then if we go to the next element of the suffix array, we get another route from route vertex to the leaf number 1. And then if we go to the next element, we get route from the route vertex to the leaf number 2. And note that the indexes of the leaves, and the indexes of the suffixes are just in the sorted order. So, those are not positions in the string S. Those are numbers of the suffixes in the increasing order, from 0 to number of the suffixes minus 1. So, each of the elements of the suffix array corresponds to some path from root to leaf in the suffix tree, that is what we know. That is unfortunately not yet sufficient to build the tree from the suffix array because there are many ways to create some paths from root to different nodes. Which corresponds to suffixes of the suffix array. So we will need some additional properties. And this additional property we will need is call longest common prefix, or often it is just said as lcp. So lcp of two strings S and T is the longest such string u, that it is both a prefix of S and of T. And we denote by big LCP(S, T), the function which returns the length of the lcp of strings S and T. For example, LCP("ababc" and "abc") is 2 because their longest common prefix is ab, and it's length is 2. And LCP("a","b") = 0 because their longest common prefix is empty. Now let's look again at the suffix array and suffix tree and also take into account LCP between the neighboring elements of the suffix array. So when we look at the first element of the suffix array, we just have an edge corresponding to it in the suffix tree. And when we have the next element, we have a path from root to another vertex. But if we compute the longest common prefix of this element of the suffix array with the previous element of the suffix array, we'll see that this longest common prefix is empty. And that corresponds to empty intersection, between the previous path and the new path. The only common node is the root node, and they don't have any edges in the intersection. However, if we proceed to the next suffix, it has a common prefix of length 1 with the previous suffix. And it corresponds to the common path in the tree highlighted in yellow, starting in the root node and going through edge a to another node which is still a common node for the current suffix and the previous one. So this is how LCP corresponds to the tree. If you go to the next suffix, it again has the same longest common prefix with the previous suffix. So we have the same common path from root to the next node by edge a and then the part of the path is different from the current suffix and from the previous one. If we go to the next suffix, their longest common prefix with the previous one is even longer, and so the common part of the path is now consisting of three nodes and two edges. A root node, next node by edge a and next node by edge ba. And the rest of the path is unique to the current suffix. If we go to the next suffix, it again doesn't have any common prefix with the previous one so the only common in the path is the root node. And the next suffix has longest common prefix of ba, and that's why we see this path from root to another node via edge ba. And this is the common part of the path for the current suffix and the previous one. So we see that basically all the nodes but the leaves are corresponding to the longest common prefix of the neighboring suffixes in the suffix array. And this is how we can actually build the suffix tree by first computing the longest common prefixes of the neighboring elements in the suffix array. And then building those internal nodes. And then, in the way of that, we will also build the leaves as the ending points of the paths corresponding to the suffixes from the suffix array. So this is the plan of what we'll do. But first, we'll need to compute those longest common prefixes for the elements of the suffix array.