Excluding The Spider From Your Site



The Ultraseek Search engine spiders UT web-space, starting at your campus homepage and radiating downward following the links it finds on each page as it proceeds on its trek through the university network. If you want to exclude this spidering, you can do so by giving the engine instructions when it arrives at your web-site. These instructions can take on one of two forms:
  • Include the following meta tag in the pages which you don't want spidered:

  • <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

    "NOINDEX" signals the engine not to index the page while "NOFOLLOW" prevents any harvesting of links from the page.

  • Alternatively, add a plain-text file in your web-server's document root called "robots.txt". Inside, include something like the following:

User-agent: *
Disallow:/duplicate/
Disallow:/tmp/
Disallow:/cgi-bin/
Disallow:/dev/


The wild-card character following "User-agent" signals that you want all engines (not just Ultraseek's) to take notice of what follows. The "Disallow" statements tell the engine not to spider thru the pages in the duplicate or tmp directories. Note that a separate "Disallow" statement is required by each directory; syntax constraints prevent listing more than more directory per statement.

Be aware that while Ultraseek's Search engine abides by these directives, others aren't so cordial and obliging and may roguishly spider your site despite your best efforts.
Web Robots Pages