Start a conversation

How can I limit the crawling according to the folder, depth or part of a URL in Netpeak Spider?

If you need to set a limit for the website crawling (e.g. exclude subdomains or include only one folder), you can use the following settings:

  • Checkboxes on the ‘General‘ tab:
    • Crawl only in directory – allows crawl exact website directory without leaving it.
    • Crawl all subdomains – turn this function off to consider pages from subdomains outside the host specified in ‘Initial URL‘ as external.

  • On the ‘Restrictions‘ tab you can set:
    • Max number of crawled URLs – allows limiting number of crawled pages for scanning.
    • Max crawling depth – allows determining how deep the program will crawl a website, based on the number of clicks from the initial URL to the crawled one.
    • Max URL depth – allows determining how deep the program will crawl into directories of a website, based on the number of segments in a URL.
    • Max number of redirects – this value has an influence on several parameters:
      • The number of redirects the program will follow to reach the target URL.
      • The number of redirects to determine the corresponding issue in the sidebar.

Save settings as a template

In case, if there are some questions remained after reading the article, please contact our Customer Support.

Choose files or drag and drop files
Was this article helpful?
Yes
No