In some cases, default settings in Netpeak Spider are not suitable for crawling your website. Usually, it happens during the crawling of:
- Slow websites – with low query processing speed.
- Websites with an additional protection from crawling – with a restriction on the number of simultaneous requests to the server.
To reduce the load on a website, you can use the following settings:
- On the ‘General’ tab:
- Reduce the number of crawling threads to minimum (for instance, to one or two threads). By default, Netpeak Spider uses 10 threads and provides quite fast crawling for most websites. If the current method doesn’t help, you can try to set one thread with 1500-3000 ms delay between requests to minimize the crawling speed;
- On the ‘Restrictions’ tab you can set:
- Max number of crawled URLs;
- Max depth of crawling (the distance from the initial URL to a target URL, measured by clicks);
- Max URL depth (the number of segments in a URL of an analyzed page).
- On the ‘Rules‘ tab:
- Restrict the crawling area by excluding exact folders or entire directories.