|
If you're reading this, chances are you've seen the SiteSell robot, "SBIder," visiting your site while looking through your server logs. Our software obeys robots.txt files and robot META tags in HTML. These are the
standard mechanisms for Webmasters to tell Web robots which portions of
a site a robot is welcome to access.
Why Are We Crawling The Web?
SiteSell is gathering a statistical representation of topics presented
on the Web as a whole. Each Web page visited is categorized under
the topics that it represents, allowing our customers to know the
percentage of Web pages that are about any particular topic.
The actual content of all Web pages is removed from all SiteSell systems
after being spidered, categorized and scored, usually within 48 hours of
being visited by SBIder.
Sysadmins/robots.txt
The SiteSell "SBIder" is based upon the open source Nutch project. If
you have a concern or interest about the spider software itself, please
contact: nutch-agent@lucene.apache.org.
Our software obeys the robots.txt exclusion standard, described at
http://www.robotstxt.org/wc/exclusion.html#robotstxt. To ban the
SiteSell spider from visiting all of your website, place the following
in your robots.txt file:
User-agent: SBIder
Disallow: /
To ban the SiteSell spider from visiting a portion of your website,
adjust the URL in the Disallow line to that specific portion of the
website.
For information on how to ban all Nutch-based spiders (those run by
companies other than SiteSell), please see:
http://lucene.apache.org/nutch/bot.html
Webmasters/Robots META
If you do not have permission to edit the /robots.txt file on your
server, you can still tell robots not to index your pages or follow your
links. The standard mechanism for this is the robots META tag, as
described at http://www.robotstxt.org/wc/meta-user.html.
Contact Us
If your site has problems or you have further questions about what we do
with the information collected, please contact us:
http://support.sitesell.com/contact-support.html
|