|
The HelpSpy Spider (user-agent name HelpSpy) is currently crawling the Internet, indexing sites
that will be listed in the new HelpSpy directory.
The Spider has been written so as to create as small a load on external sites
as possible and site owners may choose not to have the Spider crawl their
sites by creating a robots.txt file in the site root directory.
Details of how to create robots.txt files to stop web directories
searching sites can be found here.
Allowing the HelpSpy Spider to crawl your site (by not restricting access
with a robots.txt file) will ease that task of keeping
entries within the directory current, and will allow bad links to automatically
be removed.
In order to reduce unnecessary requests, existence of the robots.txt
file may be cached by the HelpSpy Spider for up to an hour.
It is not currently possible to add URL's the list of crawled sites,
we are expecting to add this feature soon.
Homepage.
|