E-business Success. Simple. Real.
If you're reading this, chances are you've seen the SiteSell robot, "SBIder," visiting your site while looking through your server logs. Our software obeys robots.txt files and robot META tags in HTML. These are the standard mechanisms for webmasters to tell Web robots which portions of a site a robot is welcome to access.
Why Are We Crawling the Web?
SiteSell is gathering a statistical representation of topics presented on the Web as a whole. Each web page visited is categorized under the topics that it represents, allowing our customers to know the percentage of web pages that are about any particular topic.
The actual content of all Web pages is removed from all SiteSell systems after being spidered, categorized and scored, usually within 48 hours of being visited by SBIder.
The SiteSell "SBIder" is based upon the open source Nutch project. If you have a concern or interest about the spider software itself, please contact: firstname.lastname@example.org.
Our software obeys the robots.txt exclusion standard, described at http://www.robotstxt.org/wc/exclusion.html#robotstxt. To ban the SiteSell spider from visiting all of your website, place the following in your robots.txt file:
To ban the SiteSell spider from visiting a portion of your website, adjust the URL in the Disallow line to that specific portion of the website.
For information on how to ban all Nutch-based spiders (those run by companies other than SiteSell), please see:
If you do not have permission to edit the /robots.txt file on your server, you can still tell robots not to index your pages or follow your links. The standard mechanism for this is the robots META tag.
If your site has problems or you have further questions about what we do with the information collected, please contact us: