Netpeak Spider allows setting up crawling limits and changing issue restrictions on the ‘Restrictions‘ tab of its settings.
1. Crawling Restrictions
This section contains the following settings:
- Max number of crawled URLs.
Sometimes a website may be so large that there is no need to wait for the end of the crawling. Main types of issues will be obvious from the beginning. By limiting the max number of crawled URLs, you can always determine the number of pages sufficient to analyze the website.
- Max crawling depth.
Crawling depth is a number of clicks from an initial URL to the analyzed one. This setting allows controlling how deep the program will crawl the website.
- Max URL depth.
URL depth is a number of segments in the URL of the analyzed page. This setting allows controlling how deep the program will crawl the website’s folders. Note that URL depth is a static parameter, not depending on the initial URL.
- Max number of redirects → this value impacts several parameters:
- How many redirects will be considered in a chain (the program will follow them).
- A number of redirects to spot the corresponding issue in a sidebar.
For instance, if the value is 5, Netpeak Spider will follow only 4 redirects, and the 5th one will be considered as an issue.
Take into consideration that the ‘0‘ value means that no restrictions are set.
2. Issue Restrictions
Breaking restrictions set on the ‘Issue restrictions‘ section will lead to generating the corresponding issue reports in the program. Issues are detected after crawling ends or stops automatically if the corresponding parameters are enabled. Their detection can be also started from the ‘Analysis‘ tab.
The following issue restrictions can be set:
- Title length → the number of characters in the <title> tag from 10 to 70. If the content of a title will be less than the minimum number of characters, the program will add the page to the ‘Short Title‘ report. If it is more than the maximum number of characters, then it will be added to the ‘Max Title Lenght‘ report.
- Description length → the number of characters in the <meta name="description" /> tag from 50 to 320. The Short Description’ and ‘Max Description Length’ reports are generated similarly to the corresponding reports for the Title tag.
- Content size → the number of characters in the content of the <body> tag from 500 to 50000. If the content size will be less than the minimum number of characters, the program will add such page to the ‘Min Content Size‘ report. f the content size is more than the maximum number of characters, the program will add such page to the ‘Max Content Size ‘.
- Max H1 length → the number of characters in the <h1> header should not exceed 65, otherwise, the ‘Max H1 length‘ report will be generated.
- Max HTML size → pages, containing more characters than set in the settings will be included to the ‘Max HTML Size‘.
- Max URL length → URL length of a page should not exceed 2 000 characters by default, otherwise, such pages will be added to the ‘Max URL Length‘ report.
- Max response time → server response time should not exceed 500 ms by default, otherwise, such pages will be added to the ‘Long Server Response Time‘ report.
- Max internal links → a page should contain less than 100 internal links, otherwise, it will be included to the ‘Max Internal Links‘ report.
- Max external links → a page should contain less than 10 external links, otherwise, it will be included to the ‘Max External Links‘ report.
- Max image size → an image size on a page should not exceed 100 kB, otherwise, the ‘Max Image Size‘ report will be generated.
- Min text/HTML ratio → pure text ratio (the ‘Content Size‘ parameter) to all content on a page (the ‘HTML Size‘ parameter) should be more than 10%, otherwise, the ‘Min text/HTML ratio‘ report will be generated.
All default issue restrictions in Netpeak Spider are based on Google recommendations. However, you can change them according to your needs. For example, increasing max image size affects detecting the corresponding issue, thus it will be detected in a different way when the crawling is restarted. It works similarly for other issue restrictions.