How To Stop Search Engines From Indexing Certain Pages
Did you know that you can stop Google from indexing certain pages on your website? You can, with a certain tool called “robots.txt”
Robots.txt, as you might have already guessed, is a text file that you can put on your website that will direct programs that crawl the web (web crawling bots) and give them certain directions. If you had a page that you didn’t want any web crawler to access, you could command web crawlers not to index it.
Let’s say you were running a personal website for MC Hammer, and you had a webpage concerning his finances that you didn’t want any search engines to crawl. Here’s what you would do:
1) Create a file called robots.txt in the root directory of your website. If you had www.mchammer.com <http://www.mchammer.com>, the link would look like http://www.mchammer.com/robots.txt
2) If you were trying to block web crawler access to “seenBetterDays.html” The contents of this file will look like:
User-agent: *
Disallow: /seenBetterDays.html
If the file was in a subdirectory, it would look like this:
User-agent: *
Disallow: /subdirectoryname/seenBetterDays.html
The asterisk after User-agent denotes that this rule applies to all robots, not just Google, Yahoo, or any one robot specifically.
If you wanted to disclude the entire subdirectory, it would look like this.
User-agent: *
Disallow: /subdirectoryname/
This will block web crawler access to all files within /subdirectory name/. If you wanted to disallow access to the entire subdirectory, except for the file “exception.html,”, you would put this:
User-agent: *
Disallow: /subdirectory name/
Allow: /subdirectory name/exception.html
Finally, if you decided that you’ve had enough of the internet and all its pervasive indexing and searching, you would put in this content:
User-agent: *
Disallow: /
This basically means no robot will ever visit www.mchammer.com <http://www.mchammer.com> again until you remove the file.
There are many, many, many more uses for robots.txt, but we’ve covered the basics in this article so that if you have a certain webpage or group of webpages that you don’t want to be crawled, you can just include a file and keep your information a little bit more private.
Posted in: Austin Web Design, How To, WWW Learning Center
Comments are closed.
Latest & Greatest
- Responsive Web Design in Austin: Why It Matters For Your Local Business
- How to Prepare Your Website for a PR Campaign
- Why Defining Your Organization’s Strategy is Key to Brand and Marketing Development
- Empathetic Storytelling in an AI World
- Customer Retention: A Comprehensive Guide to Retaining Your Customers
- Top Reasons Why Web Designs Don’t Launch
- Your Website is About Them, Not You: Digital Customer Experience