How can I limit the crawling according to the folder, depth or part of a URL in Netpeak Spider?

Modified on Mon, 09 Oct 2023 at 07:44 PM

If you need to set a limit for the website crawling (e.g. exclude subdomains or include only one folder), you can use the following settings:

  • Checkboxes on the ‘General‘ tab:
    • Crawl only in directory – allows crawl exact website directory without leaving it.
    • Crawl all subdomains – turn this function off to consider pages from subdomains outside the host specified in ‘Initial URL‘ as external.

  • On the ‘Restrictions‘ tab you can set:
    • Max number of crawled URLs – allows limiting number of crawled pages for scanning.
    • Max crawling depth – allows determining how deep the program will crawl a website, based on the number of clicks from the initial URL to the crawled one.
    • Max URL depth – allows determining how deep the program will crawl into directories of a website, based on the number of segments in a URL.
    • Max number of redirects – this value has an influence on several parameters:
      • The number of redirects the program will follow to reach the target URL.
      • The number of redirects to determine the corresponding issue in the sidebar.

Save settings as a template

In case, if there are some questions remained after reading the article, please contact our Customer Support.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article