Overview (sidebar)

Modified on Mon, 09 Oct 2023 at 07:41 PM

Overview in Netpeak Spider is a summary report of a crawled website, containing pages grouped by different criterions and shows an absolute and relative number of pages belonging to each group. You can find it under ‘Reports → Overview‘ in a sidebar.

Overview

All groups and subgroups of pages in the ‘Overview‘ tab are presented below. 


Group

Description

Page Status

Internal

All internal pages of the crawled website.

External

Pages from external websites linked from the crawled website.

Refresh Redirected

Pages that contain the Refresh tag (in HTTP server response headers or in Meta Refresh of the block), pointing to another URL. If the link in the Refresh tag points to the same page where the tag is located, then such page will not be included in this category.

Compliant

HTML pages returning a 2xx status code and not hidden from search engine robots by indexing instructions (robots.txt, Canonical, Meta Robots, etc.).

Non-compliant

HTML pages returning a status code different from 2xx or pages that are hidden from search engine robots by indexing instructions.

Noindex

Pages that contain the ‘noindex’ directive in the ‘content‘ attribute.


Nofollow

Pages that contain directives restricting following links (these might be located in HTTP headers of server response or in the block).

Disallowed

Pages that are hidden from search engine robots by directives in robots.txt file.

Canonicalized

Pages containing the Canonical tag that points to another URL (note that in case when a page has the Canonical pointing to the same page, it won’t appear in this category).

2xx HTML

All HTML pages with a 2xx status code.

Broken

Pages with 4xx or 5xx status codes.


Page Type

• HTML
• Javascript
• Redirect
• CSS
• Images
• PDF
• XML
• PlainText
• GZIP

JSON

JSON stands for JavaScript Object Notation. It represents an open-standard text format of data-interchange based on JavaScript.

Other

Pages which type Netpeak Spider can’t proceed (e.g. .pptx, .dmg, etc.).

Unknown

Those pages for which it was not possible to get the type because they returned incorrect status code (e.g. Timeout).

Protocol

HTTP

Pages with HTTP protocol.

HTTPS

Pages with HTTPS protocol.


  • Host →  page grouping by the domain (including subdomains) and displaying a number of them for each host. Hosts are sorted by the number of dots in the URL from the min to max, that is why domain.com will appear higher than blog.domain.com.

  • Status Code → page grouping according to their status codes (2xx Success, 3xx Redirection, 4xx Client Error, 5xx Server Error, Timeout, etc.).

  • Content-Type → page grouping by content type.

  • Robots.txt, Meta Robots и X-Robots-Tag → page grouping by the directives in robots.txt, Meta Robots, and X-Robots-Tag.

  • AMP HTML →  grouping based on AMP technology.

  • Click Depth → page grouping by their click depth – a number of clicks from the initial URL.


All categories presented in the summary are interactive, so if you click on any group, Netpeak Spider will show you the report containing all the pages that belong to it.

You can also find such functions on this panel as:

  • Apply → updates data during crawling.

  • Collapse All → collapses all data.

  • Extended Copy (Ctrl+Shift+C) → copies data from the sidebar in an extended view, which allows you to paste it into any tabular editor.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article