Hello again. Now that we've introduced you to site maps, let's discuss another method for controlling how search engines crawl your website and discover new content. The robots.txt file, is a protocol that allows you to specify which pages should or should not be indexed and specify the rate at which files are accessed by search engine robots in this lesson we'll learn about the importance of these files, but also recognize their limitations by looking at an example robots.txt file, we'll learn how to create and place these files in your website. When we just discussed sitemaps, we learned that sitemaps are an inclusionary file, meaning they tell search engines which information you want included in search results. You will also have the ability to exclude specific pages from being crawled. You can do this through a robots.txt file. The robtos.txt file is a protocol that was created in the early days of the Internet to prevent robots from crawling areas of the web they were not supposed to access. That protocol today is often to refered to simply as robots.txt. This is a simple text file that can be uploaded to your server. If a robot wants to visit a page on a website, let's say example.com/bluewidgets. The robot will first go to the website's robot.txt file. This file will be located at example.com/robots.txt. Search robots will then check to see if there are any instructions prohibiting it from crawling that page. It's important to note that while robots.txt files provide search engine bots with your preferences for what should or should not be crawled, robots can choose to ignore information in this file. Search engines, like Google or Bing, tend to respect this file. But people create robots for many reasons. And these reasons can sometimes be malicious. Some robots are created to scan the web for vulnerabilities so people can hack into a person's website. Other robots will crawl websites and harvest email addresses to sell to companies or spammers. Also remember that robots.txt is a publicly available file. Meaning anyone can see what sections of your server you do not want robots to access. For example, we can view the robots.txt file of UC David Extension's website. If you go to extension.ucdavis.edu/robots.txt or robots.txt, you can see what the Robots.txt file looks like. If you take a look at their file, you will see a couple different areas that provide key information to search engine crawlers. First, is the introductory text that explains what the robots.txt file is. This information isn't necessary, but many website content management systems include this information by default. You will also see here, that the file points out that these instructions will be ignored unless it is placed at the root of your host. This means that you always access the sites robots.txt file at the example.com/robots.txt. If it is placed anywhere else on the site, such as example.com/site/robots.txt. It will be ignored. Under that area, you will see specific instructions. For example, User-agent specifies what user agents should crawl website content. For example, you can set up specific instructions for Google, Bing, or Yahoo. Most of the time, however, there is no real reason to do so. They asterisk works as a wildcard & means that all robots should follow the below instructions. The crawl delay sets how many seconds a robot should take before accessing a new page own your site. This can help prevent overload of server resources which may crash your website. The pound symbol, or hashtag, is used to make a note. This is for humans to read and is not read by robots. In this example, it's a note by the webmaster to remind himself, or other webmasters, but the below items are specific directories of the site. The disallow directive instructs robots not to crawl the content it is referencing. For example, do not crawl the directory/includes. So now that we know that the asterisk and disallow directives mean. If we wanted to exclude robots from accessing our entire site, we would simply enter the instructions User-agent followed by an asterisk. And then on a new line, we would enter Disallow followed by a forward slash. If we wanted to allow robots complete access to the site, we would simply keep the disallow field blank. For example, we would simply enter, User-agent, followed by an asterisk or wild card and then on a new line type Disallow, and not include any information after that. You could also simply not use a robots.txt file, which will automatically make all information public. You should now understand what a robots.txt file is used for, how to create and read a robots.txt file, as well as where this information should be placed in your server. That completes the video portion of this lesson.