Start a conversation

General Settings in Netpeak Spider

  1. Language.
  2. Crawling speed.
  3. Basic сrawling settings.

On the ‘General’ settings tab, you can change the interface language, crawling speed, and basic crawling settings.

General Settings

1. Language

It is possible to choose either English or Russian interface language in Netpeak Spider. Click on the button with a corresponding name and choose a necessary option from a dropdown list.  

Please note that the program has to be restarted to make the settings fully come into force.

2. Crawling speed

2.1. A number of threads

Each thread creates a separate connection with a website, so please be careful as sensitive to load websites may struggle with displaying information. It is possible to adjust the number of threads during crawling to find an optimal value for the analyzed website. By default, a number of threads is equal to 10.


2.2. Delay between requests

It is the amount of time between each query to a web server. For sensitive to high load and protected websites, it is recommended to set this parameter up to prevent the overloading or to overcome website protection.

The delay is separately applied for each thread, that is why it is recommended to use one thread and a 1500-3000 ms delay between requests to imitate user behavior.


2.3. Response timeout

This is the maximum waiting time for server response measured in milliseconds before the crawler considers a page as broken with the ‘Timeout‘ response code and switches to the next URL. This setting also impacts detecting ‘Connection Error‘.

  • Minimum possible value – 50 ms.
  • Maximum possible value – 90 000 ms.


2.4. JavaScript rendering

Tick to enable JavaScript rendering. This might be useful when a part of the content is generated or the whole site is developed using JS frameworks.

JavaScript rendering in Netpeak Spider is implemented through the use of built-in Chromium web browser. JS is executed only on compliant HTML pages (returning the 200 OK status code), scripts of analytics systems are blocked, images and iframes are not loaded.

‘AJAX timeout‘ is the main setting which sets the latency of JavaScript execution after loading of the entire page and all resource files (JS/CSS). Note that the more the AJAX timeout is, the longer the crawling will take. In most cases, 2 seconds set by default are enough for JavaScript execution. However, if the crawled site has AJAX requests that need more time to execute, it is possible to set a custom value. We do not recommend setting the value too low as it might be not enough for the code to fully execute.


3. Basic crawling settings

3.1. Crawl only in directory

The program will crawl the site inside a particular category without leaving it.

Please take into account that Netpeak Spider orients itself according to a segment in a URL of a page. Consequently, the website must have an appropriate structure to use this mode. Thus, during crawling inside a category of product pages by the address example.com/category-1, goods from example.com/category-1/product will be included to reports but product pages with the address example.com/product will not because their URLs starts from a different URL section, even if the crawled category has links to these pages.


3.2. Crawl all subdomains

If it is checked, subdomains will be considered as a part of the analyzed website and links to these subdomains will be considered internal. Otherwise, all the results received from the subdomains will be not be considered a part of the crawled website and links to them will be considered external.  


3.3. Crawl external links

Choose this parameter to add all external links to the main table. Note that the same parameters and issues are checked for external links as well as for internal ones. Thus, the ‘Issues‘ panel will show the total number of issues for internal and external links. However, it is possible to create a report only for external links using the segmentation feature.


3.4. Check JavaScript, CSS, and PDF

The program gathers information (response code, size, etc.) about JavaScript, CSS and PDF formats found on the website. Take into account that Netpeak Spider doesn’t analyze their content.


3.5. Check images

We recommend enabling this configuration because:

  • It allows the program to collect common SEO-parameters for images.
  • It affects detection of the ‘Broken images‘ and ‘Max image size‘ issues. 


3.6. Check other MIME types

This setting stands for collection of information about documents, video and audio files, etc. As well as with previous files, Netpeak Spider doesn’t scan their content but collects their common SEO parameters. 

You have the ability to use the built-in templates for a specific crawling method: from the default template, suitable for most standard SEO tasks, to the method that allows crawling websites the similar way as search engine robots do.


Choose files or drag and drop files
Was this article helpful?
Yes
No

Still Thinking?

Thousands of specialists around the world use Netpeak Spider and Checker. Register to start your 14-day free trial!