0:04

Okay. First, we're gonna look at the search algorithm for, digraphs and this is

Â the finding paths, what are all the vertices that we can get to from a given

Â vertex along a directed path. and again, this is, little more complex for a digraph

Â it would seem, than for a graph. so in this case, those are the, that's the set

Â of vertices that you can get to, from the given vertex x, s. Notice that, this set

Â is characterized by every edge crossing the boundary, it goes in. if there were an

Â edge that went out, that would give, another member of the set.. well actually,

Â looks more complicated to a human, but to the computer, it looks exactly, precisely

Â the same. in fact. the method that we looked at for undirected graphs is

Â actually a digraph processing algorithm. it treats, every connection between two

Â vertices as two directed edges, one in each direction. So, DFS, that we looked at

Â last time is actually a digraph algorithm. And we used precisely the same code. So to

Â visit a vertex V, we mark the vertex as visited, and recursively visit all

Â unmarked vertices W, that you can get to from V. let's look at the demo, just to, .

Â Reinforce that. So, here's a sample digraph with the edges over at the right.

Â let's look at that first search on that digraph. so, we're going to look at the

Â vertices that we can get to from vertex zero in this digraph. Again, we have two

Â vertex index to raise. one called marked, which says whether we can get there from

Â V, and the other called edge two, which gives us the vertex that took us there.

Â with that we can recover the paths from vertex zero to each vertex that can be

Â reached from vertex zero. So we start off by visiting vertex zero and now check the

Â edges that are adjacent to it with directed edges going out. so there's five

Â and then there's gonna be one. but five is unmarked so we have to re-cursedly visit

Â five. So we mark five, and we say we got there from zero. so the path from, to five

Â is zero to five. and so now we're gonna recursively visit all the unmarked

Â vertices pointed to from five . In this case it's just four. My four is

Â unmarked so we're gonna recursively visit four and say we got there from five. and

Â now recursively we have to check all the unmarked vertices pointing from four.

Â there's three. and two, first we do three and that's unmarked. so we gotta visit

Â three. and say that we got there from four. and now to visit three. We've looked

Â at all the vertices pointing from three. we can check five. We've already been to

Â five that's marked so we don't have to do anything. and then we check two. two is

Â unmarked so we continue with the depth first search and visit two. so now to

Â visit we mark two, and say we've got there from three. and now we check the vertices

Â that we can get two from two. In this case it's zero, which we've already been to.

Â And three, which we've already been to. so. now we're done with vertex two. and we

Â can return and continue the search from three well actually that was the last one

Â from three, so we're done with three as well. so now we're at four. We still have

Â checked the edge from four to two. so now we do that. And of course we've been to

Â two, so we don't have any further processing. and we're done with four. the,

Â , four was the only edge we get to from five. So we're going to be done with five

Â as well. and then, what about zero, well we have to check one. 1's not visited, so

Â we visit one, mark it. and we turn and then we're done with zero, and that gives,

Â the set of all vertices that are reachable from zero, and not only that the edge to

Â array. Gives the information that we need to reconstruct the path from any of those,

Â from zero to any of those vertices using precisely the same method that we used

Â before. We get the four from five, we get the five from zero, so zero, five, four is

Â the path to four. And we can do that for any vertex in that cell. Okay. So what

Â about the code? the code is exactly the same as for , undirected graphs. That's

Â the code for undirected graphs. that we looked at last time to get the code for

Â digraphs, we just changed the name, its the s ame code, otherwise. the recursive,

Â the constructor, builds the array of marked vertices, and also builds edge too,

Â just to, avoid clutter, left that one off this slide, and then it makes the call to

Â DFS. then the recursive DFS does the work. It marks the vertex and for every adjacent

Â vertex. If its not marked, it does the DFS. and then the client can ask whether

Â any ver-, any given vertex is reachable from s after the constructor has done its

Â work. that's depth-first search in directed graphs, actually, we already did

Â it. now here's just a couple of applications where this kind of code is

Â used. one is, a so called program control flow analysis. actually every program can

Â be viewed as a digraph. where, the vertices are basic blocks of instructions

Â that are just executed one after the other with no conditionals. and then edges

Â represent, a, jump. If there's an if statement, vertex left, two edges going

Â out of it, or, or a loop, which involves a conditional. So, , analyzing a program,

Â people write systems in analyse program, to look at their structure by studying

Â their diagrams, for example. one thing that happens often is there's unreachable

Â code. another, another thing you might want to do is determine whether you can

Â get to this exit or not, by doing this digraph processing. so that's actually a

Â widely used technique in, in developing software, software development. to try to

Â improve code by doing this kind of digraph-processing. And ofcourse these

Â digraphs can be huge. another classic use of depth for search in digraphs is garbage

Â collection that is used in systems like Java where data structures or digraphs. we

Â build objects and then we create references to other objects. and so the

Â data that any programs use is really set as a digraph. So there's the idea of

Â roots, so your program has e-. Some. live objects, that it can access through,

Â whatever state, it's in, but, a language like Java, there's, automatic garbage

Â collection, which means the programmer, when it's done with an object, maybe it

Â overwrites one of these pointers or something. there's gonna be some, blocks,

Â that, are not directly accessible by the program. and so, what's interesting is,

Â the set of reachable objects that, can be indirectly access. By the program starting

Â and following a chain of pointers. so those are the ones that can't be collected

Â or reclaimed by the system for reusing the memory. But all the other ones, the gray

Â ones that can't be reached by the program there's no reason to . Keep them live, you

Â may as well collect them and return them for use, re-use. So there's a so-called

Â marked and sweep out rhythm that actually dates back to 1960, where. We run DFS to

Â mark all ritual objects. and then go through and sweep through all possible

Â objects. And if it's object is unmarked it's garbage so add it to the list of free

Â memory. and that's a classic method that's still widely used. it uses an extra bit

Â per object'cause you have to have to have that for the mark. But still, it's

Â effective and useful digraph solution. So DFS with reachability that we've just

Â showed and path finding is similar. And there's a couple of other simple digraph

Â problems that we'll consider. These are so far examples. But it's also interesting

Â that DFS is the basis for solving digraph problems that are not so simple or

Â immediate to solve. And this was pointed out 40 years ago by Bob Tarjan in a

Â seminal paper that showed that. First search, can allow us to solve problems

Â that seem pretty complicated actually, in linear time, and we're gonna look at an

Â example of that later on. so that's depth for search. what about breadth for search.

Â Well in the same way that we saw for depth for search every undirected graph is

Â actually a digraph that has edges in both direction. So bfs is really a directed

Â graph algorithm and we can use exactly the same code to find shortest paths from a

Â source to any given vertex. So we use a que. We put the source on a que and mark

Â it as visited and. And as long as the queue is non-empty, we remove the least

Â recently added vertex and add to the que ue and mark as visited all the unmarked

Â vertices that you can get to from that vertex. And the same proof shows that BFS

Â computes shortest paths, the ones with the fewest number of edges from S to each

Â other vertex in the digraph in time proportional to in linear time. So you

Â want the GPS in your car, you BFS when you're driving around lower Manhattan. So,

Â let's look at the demo again just to see, the distinction between, breadth first

Â search, in digraphs and see how it works. So this is a, a small graphing, a smaller

Â digraphic and with six vertices and eight edges. we take a Q and we take a source

Â vertex and put on the Q to get started. then, Q is not empty, so remove zero and

Â we check all, all vertices that are adjacent that we get to. So. we're gonna,

Â in zero was zero distance, from zero, so first we will check two. and that one is

Â not marked, so we mark it and put it on the queue. and then we'll check one. and

Â that one's not marked, so we mark it and put it on the queue. then we're done with

Â zero. now queue's not empty so we pull the least recently added off, that's two. and

Â now we're going to check the vertices, you can get from two. I noticed both one and

Â two are distance one from zero. And now, since we're going from two, everything

Â that we encounter will be distance two from the source. So we find four, it's

Â distance two from the source, and we get there from vertex two. Unmarked, so we

Â fill in those data structures and put it on the queue. and then we're done with

Â two, so we go back to the queue, and 1's on the queue. So we pull one off and it's

Â distance one from zero. Remember the first showed that everything in the queue is one

Â of two distance, either k or k plus one. In this case, we've got one at distance

Â one, four at distance two. So now we're going to pull one off the queue. M- and

Â look at the edges you can get to, places you can get to from one. Now we check two

Â but that's already marked so we ignore it. and then we're done with one. Now four is

Â left on the Q so we pull it off and check adjacent vertices. In this case three,

Â it's unmarked so we put it on the Q. Then we're done with four. Then from three we

Â check five and that's unmarked and it's one more distance from the source so we

Â put it on the Q. And then finally. Oh we check two which we already visited so we

Â don't have to, to do anything. And then finally we pull five off the Q. check. Or

Â you get two from five and it's zero, which is marked, so we're done. and so that's

Â breadth-first search whig, which gives us this directed tree from the source. Which

Â gives the shortest path to all the vertices that you can get to from that

Â source. you can use a version of this to solve a more general problem known as the

Â multiple-source shortest paths problem. In this problem you're given a digraph and a

Â set of source vertices, and you want to find the shortest path from any vertix in

Â the set to each other vertix. So for example, in this case if the set is one,

Â seven, and ten, what's the shortest path to four? From one of those vertices. Well,

Â it turns out in this case to be seven, six to four. Shortest path to five is seven,

Â six, zero, five. Shortest path to twelve is ten to twelve. That's a more general

Â problem but it's actually easy to solve. How do we implement this? We just use a

Â different constructor. We just use BFS but initialize by, put all the source vertices

Â on the queue to get started. So that is every vertex is, so you put those on the

Â queue and they're zero from the desired source. And then any vertex you can get to

Â from any one of those is going to be. One and so forth, so the results still gives

Â away. The edge to array will still give a way to get from any vertex, the shortest

Â way to get from any vertex to each of the sour-, source vertices. here's an

Â application of depth-first search. Let's say you want to crawl the whole web, well,

Â all the web that you can access from some starting web page, say like Princeton's

Â starting webpage. Again, the digraph model, each vertex is a webpage, each edge

Â is a link on that webpage to some other webpage. and so all we want to do is get

Â to all the other vertices on the web. And, so solution is, well, we don't actually

Â build the digraph we just use an implicit digraph, because for every web page we can

Â find the links to other web pages on it and we'll just build those as we encounter

Â them. So we're gonna start with a source, which is the root web page. We're gonna

Â have a queue of the sites that we still need to explore. what we're going to do is

Â also have a set of discovered websites, this corresponds to our marked array but

Â since we don't know how many vertices there are on the web all we're going to do

Â is keep track of the ones that we've been to. so this is, don't use the vertex

Â indexed array we don't even bother with because we'll just use the vertex names

Â and do the look up as indicated we could do. So all we do is In the, is but for a

Â search the same method is if the queue's not empty. Take a website off of the cue.

Â And just add to the queue all the websites to which it links. but all of those

Â websites, you check whether they're in the set of the ones that you've seen already.

Â now, you might run out of space, for this set before you get to the whole web. but

Â anyway, conceptually, this is a way that you could go. one thing to think about is

Â why not use DFS for this. well One reason is you, you're gonna go far away in your

Â search for the web. Maybe, maybe that's what you want, but the real problem, in

Â point of fact is there's some web pages that would trap such a search by creating

Â new web pages and make links to'em the first time that you visit'em. So, they,

Â trap searches like that by, cuz DFS would always go to a new web page like that and

Â it'd keep creating new ones and you wouldn't get very far. With breadt-first

Â search you're taking a wide search of the pages that are close. and that's often

Â maybe what you want for such a search. and, just to how simple the idea is, this

Â is complete code for a, it's kind of a bare bones web crawler but it'll get to a

Â lot of websites. so let's look at, do this example because again it really indicates

Â the power of using appropriate abstractions to implement the algorithms

Â that we're interested in. so this one we're gonna use a cue. Queue of strings so

Â that's the websites that we have to still yet to go to. and then a set of strings is

Â gonna be the ones that we've already been to that's equivalent to the marked array.

Â we'll start with Princeton's website. add it to the queue of places we have to go

Â and also mark it. Discovered that ad just means mark it. so now, while the queue's

Â not empty. What we're going to do is read the raw HTML from the next website in the

Â queue. So this is code using our input library that gets that job done. And then,

Â this is a little fooling around with regular expressions. We'll talk about

Â algorithms like this later on, that essentially tries to find all URLs within

Â the text of that website. And for all of those URL's. And we'll. Look at workers

Â behind this code later on in this course. For all those URL's it's gonna check. If

Â it's marked that's discovered doc contains and if it's not marked it'll say it will

Â mark it and put it on the queue. a very s- simple program with a very powerful effect

Â that illustrates breadth-first search.

Â