<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>WEBii.net News &#38; Tips &#187; robots.txt</title>
	<atom:link href="http://webii.net/blog/tag/robotstxt/feed/" rel="self" type="application/rss+xml" />
	<link>http://webii.net/blog</link>
	<description>web design . development . marketing . hosting . domains</description>
	<lastBuildDate>Tue, 15 May 2012 16:11:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>How To: Stop Search Engines From Indexing Certain Pages</title>
		<link>http://webii.net/blog/2009/02/how-to-stop-search-engines-from-indexing-certain-pages/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=how-to-stop-search-engines-from-indexing-certain-pages</link>
		<comments>http://webii.net/blog/2009/02/how-to-stop-search-engines-from-indexing-certain-pages/#comments</comments>
		<pubDate>Thu, 26 Feb 2009 22:27:55 +0000</pubDate>
		<dc:creator>Bobby Martinez</dc:creator>
				<category><![CDATA[Austin Web Design]]></category>
		<category><![CDATA[How To]]></category>
		<category><![CDATA[robots.txt]]></category>
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://webii.net/blog/?p=517</guid>
		<description><![CDATA[Did you know that you can stop Google from indexing certain pages in your website? You can, with a certain tool called &#8220;robots.txt&#8221; Robots.txt , as you might have already guessed, is a text file that you can put on your website that will direct programs that crawl the web (web crawling bots) and give [...]]]></description>
			<content:encoded><![CDATA[<!-- Start Shareaholic LikeButtonSetTop Automatic --><!-- End Shareaholic LikeButtonSetTop Automatic --><p>Did you know that you can stop Google from indexing certain pages in  your website? You can, with a certain tool called &#8220;robots.txt&#8221;</p>
<p>Robots.txt , as you might have already guessed, is a text file that you  can put on your website that will direct programs that crawl the web  (web crawling bots) and give them certain directions. If you had a page  that you didn&#8217;t want any web crawler to access, you could command web  crawlers not to index it.</p>
<p>Let&#8217;s say you were running a personal website for MC Hammer, and you had  a webpage concerning his finances that you didn&#8217;t want any search  engines to crawl. Here&#8217;s what you would do:</p>
<p style="padding-left: 30px;">1) Create a file called robots.txt in the root directory of your  website. If you had <a href="http://www.mchammer.com/">www.mchammer.com</a> <a href="http://www.mchammer.com/">&lt;http://www.mchammer.com&gt;</a>, the link  would look like <a href="http://www.mchammer.com/robots.txt">http://www.mchammer.com/robots.txt</a></p>
<p style="padding-left: 30px;">2). If you were trying to block web crawler access to  &#8220;seenBetterDays.html&#8221; The contents of this file will look like:</p>
<p style="padding-left: 30px;">User-agent: *<br />
Disallow: /seenBetterDays.html</p>
<p style="padding-left: 30px;">If the file was in a subdirectory, if would look like:</p>
<p style="padding-left: 30px;">User-agent: *<br />
Disallow: /subdirectoryname/seenBetterDays.html</p>
<p style="padding-left: 30px;">The asterisk after User-agent denotes that this rule applies to all  robots, not just Google, yahoo, or any one robot specifically.<br />
If you wanted to disclude the entire subdirectory, it would look like this.</p>
<p style="padding-left: 30px;">User-agent: *<br />
Disallow: <em>/subdirectoryname/</em></p>
<p style="padding-left: 30px;">This will block web crawler access to all files within /subdirectory  name/ . If you wanted to disallow access to the entire subdirectory,  except for the file &#8220;exception.html,&#8221;, you would put this:</p>
<p style="padding-left: 30px;">User-agent: *<br />
Disallow: <em>/subdirectory name/</em><br />
Allow: /subdirectory name/exception.html</p>
<p style="padding-left: 30px;">Finally, if you decided that you&#8217;ve had enough of the internets and all  its pervasive indexing and searching, you would put in this content:</p>
<p style="padding-left: 30px;">User-agent: *<br />
Disallow: /</p>
<p>This basically means no robot will ever visit <a href="http://www.mchammer.com/">www.mchammer.com</a> <a href="http://www.mchammer.com/">&lt;http://www.mchammer.com&gt;</a> again, until you remove the file.</p>
<p>There are many, many, many more uses for robots.txt, but we&#8217;ve covered  the basics in this article, so that if you have a certain webpage or  group of webpages that you don&#8217;t want to be crawled, you can just  include a file and keep your information a little bit more private.</p>
<div class="shr-publisher-517"></div><!-- Start Shareaholic LikeButtonSetBottom Automatic --><!-- End Shareaholic LikeButtonSetBottom Automatic -->]]></content:encoded>
			<wfw:commentRss>http://webii.net/blog/2009/02/how-to-stop-search-engines-from-indexing-certain-pages/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

