On the ‘General’ settings tab, you can change the interface language, crawling speed, and basic crawling settings.
It is possible to choose either English or Russian interface language in Netpeak Spider. Click on the button with a corresponding name and choose a necessary option from a dropdown list.
Please note that the program has to be restarted to make the settings fully come into force.
2. Crawling speed
2.1. A number of threads
Each thread creates a separate connection with a website, so please be careful as sensitive to load websites may struggle with displaying information. It is possible to adjust the number of threads during crawling to find an optimal value for the analyzed website. By default, a number of threads is equal to 10.
2.2. Delay between requests
It is the amount of time between each query to a web server. For sensitive to high load and protected websites, it is recommended to set this parameter up to prevent the overloading or to overcome website protection.
The delay is separately applied for each thread, that is why it is recommended to use one thread and a 1500-3000 ms delay between requests to imitate user behavior.
2.3. Response timeout
This is the maximum waiting time for server response measured in milliseconds before the crawler considers a page as broken with the ‘Timeout‘ response code and switches to the next URL. This setting also impacts detecting ‘Connection Error‘.
- Minimum possible value – 50 ms.
- Maximum possible value – 90 000 ms.
3. Basic crawling settings
3.1. Crawl only in directory
The program will crawl the site inside a particular category without leaving it.
Please take into account that Netpeak Spider orients itself according to a segment in a URL of a page. Consequently, the website must have an appropriate structure to use this mode. Thus, during crawling inside a category of product pages by the address example.com/category-1, goods from example.com/category-1/product will be included to reports but product pages with the address example.com/product will not because their URLs starts from a different URL section, even if the crawled category has links to these pages.
3.2. Crawl all subdomains
If it is checked, subdomains will be considered as a part of the analyzed website and links to these subdomains will be considered internal. Otherwise, all the results received from the subdomains will be not be considered a part of the crawled website and links to them will be considered external.
3.3. Crawl external links
Choose this parameter to add all external links to the main table. Note that the same parameters and issues are checked for external links as well as for internal ones. Thus, the ‘Issues‘ panel will show the total number of issues for internal and external links. However, it is possible to create a report only for external links using the segmentation feature.
3.5. Check images
We recommend enabling this configuration because:
- It allows the program to collect common SEO-parameters for images.
- It affects detection of the ‘Broken images‘ and ‘Max image size‘ issues.
3.6. Check other MIME types
This setting stands for collection of information about documents, video and audio files, etc. As well as with previous files, Netpeak Spider doesn’t scan their content but collects their common SEO parameters.
You have the ability to use the built-in templates for a specific crawling method: from the default template, suitable for most standard SEO tasks, to the method that allows crawling websites the similar way as search engine robots do.
4. Multi-domain crawling
Tick to enable feature that allows to crawl multiple domains simultaneously.
Program starts crawling domains from URLs with 0 click depth. To do so, add the list of the needed URLs to the main table.
5. Data backup
Tick to let the program back up the collected data automatically. This is useful when there is a risk of a sudden computer shutdown and data loss.
The data will be saved at the intervals you specify, as well as when you stop (or pause) the crawling and when it is complete. Please note: the shorter the interval is the more often the copy will be made and the longer the analysis will take.
If the program is closed suddenly, the next time you run the Netpeak Spider, it will open a temporary project, which was saved during the last backup. To save the temporary project, go to the menu ‘Project’ → ‘Save’ and specify the file path where it will be located.