Start a conversation

XML Sitemap Validator

  1. How to open the tool and start working with it.
  2. Tool features.
  3. Issues detected by the ‘XML sitemap validator’.
  • XML sitemap validator is the built-in tool that helps you solve the following tasks:

  • To find issues in a sitemap
  • To extract a list of links from an XML sitemap and move them to the main program menu for further crawling.
  • To ping search engines notifying of changes in XML sitemaps.

  • Sitemap check can be performed without crawling the website itself.

    1. How to open the tool and start working with it

    The tool can be opened in three ways:

  • Via ‘Tools/Run → XML sitemap validator’ in the control panel

XML sitemap validator

  • By using the ‘Alt+X’ hotkey
  • Via ‘List of URLs → Download from the sitemap’ in the main menu
  • Download from the sitemap

    To start searching for issues:

    • 1. Enter the sitemap URL in the corresponding field and click on the ‘Start’ button. When crawling is complete, the main table will display a list of pages contained in the sitemap. The tool has two viewing modes:

  • URL (sitemap content) → displays all pages contained in the sitemap;
  • Sitemaps → displays all sitemaps contained in the index sitemap.

  • 2. The table will help you examine the following attributes in the sitemap:
  • Loc → URL of the page
  • Lastmod → date of the last file change
  • Changefreq → frequency of the page updating
  • Priority → priority of the page towards other pages

    • 3. You can find sitemap issue reports on the corresponding tab on the right side of the tool. The issues presented in this tool are based on the official Standard Sitemap Protocol documentation, which is supported by Google, Yandex, and Bing.


    • 4. Click on the issue title to filter the results and see the list of pages containing this issue. Also, on the ‘Information’ panel, you can see a description of each issue and its target parameter.

  • sitemap reports

    5. To set custom filter settings, reset the applied filter and click on the ‘Set filter’ button. You will see a window where you can set the filtering conditions.

    2. Tool features

    The following features are available in the tool:

  • Apply → applies the current filter and updates the data in the table.
  • Extended copy → copies data in a sidebar into the clipboard, so you can paste it to the external table.
  • If you want to ping your sitemap to Google and Bing, you can do it using the corresponding button.


    Export sitemap

    The results can be exported by several methods:

  • By using the ‘Export’ button → exports the current table with all results.
  • By using the ‘Save URLs to File’ button → saves the list of sitemap URLs to a text document.


By using the ‘Transfer’ and ‘Transfer URLs and Close’ buttons you can also move the results of the sitemap analysis to the main table. 

Save URLs from sitemap to file

When the work is finished, you can delete the results using one of two methods:

  • Click on the ‘New sitemap’ button.
  • Use the ‘Clear’ button in the main menu of the tool.


New sitemap

3. Issues detected by the ‘XML sitemap validator’


Issues

Description

Errors

Broken Sitemap

Indicates unavailable sitemaps or the ones with a 4xx or higher status code: unable to get data.


Target parameter: Status code

Invalid Sitemap Parent Tag

Indicates sitemaps with bad parent tag: according to the rules, it must be the <sitemapindex> or <urlset> tag.


Target parameter: URL 

XML Document Parsing Error

Indicates XML documents the program was unable to parse.


Target parameter: URL 

Content-Type is Invalid

Indicates sitemaps in a sitemap index file which Content-Type field in HTTP response header does not contain 'text/xml', 'application/xml' or 'text/plain'. In case when files are compressed using gzip, the 'Content-Type' field should contain 'application/gzip'.


Target parameter: Content-Type

Compression Error

Indicates sitemaps that were corrupted during compression or compressed using not the gzip format.


Target parameter: Status Code

Charset Is Not UTF-8 

Indicates sitemaps with encoding different from UTF-8.


Target parameter: Encoding

Sitemap Blocked by Robots.txt

Indicates sitemaps disallowed in the robots.txt file.


Target parameter: Disallowed

Max Sitemap File Size

Indicates sitemaps larger than 49.9 MB.


Target parameter: File Size

Max URLs In Sitemap Index File

Indicates sitemap index files that contain more than 49,999 links to sitemaps.


Target parameter: Number of URLs

Max URLs in Sitemap 

Indicates sitemaps that contain more than 49,999 URLs.


Target parameter: Number of URLs

Missing Links in Sitemap

Indicates sitemaps with no links found. It happens if a sitemap is empty or its content is excluded on the 'Rules' tab of crawling settings.


Target parameter: Number of URLs

Bad Sitemap URL Format 

Indicates page addresses in a Sitemap index file that do not comply with the standard URL syntax: scheme://[login:password@]host[:port]][/]path[?parameters][#anchor]


Target parameter: Loc

Bad URL Format

Indicates page addresses that do not comply with the standard URL syntax: scheme://[user:password@]host[:port]][/]path[?query][#fragment].


Target parameter: Loc

Max Sitemap URL Length

Indicates sitemaps with more than 2000 characters in URL (by default). Note that you can change the default value on the 'Restrictions' tab of crawling settings.


Target parameter: URL

Max URL Length

Indicates all pages with more than 2000 characters in URL (by default). Note that you can change the default value on the 'Restrictions' tab of crawling settings.


Target parameter: URL

Percent-Encoded Sitemap URLs

Indicates sitemaps that contain percent-encoded (non-ASCII) characters in URL. For instance, the URL https://example.com/例.xml is encoded as https://example.com/%E4%BE%8B.xml.


Target parameter: URL

Non-Percent-Encoded URLs in Sitemap

Indicates URLs that contain non-percent-encoded (non-ASCII) characters in URL. For instance, the URL https://example.com/例 which has to be encoded as https://example.com/%E4%BE%8B.


Target parameter: Loc

Special Characters in URL



Indicates URLs that contain '*', '{', '}' characters.


Target parameter: URL

Duplicate Sitemap 

Indicates addresses of the sitemaps that were repeatedly found in a single or several sitemap index files.


Target parameter: URL

Link to Sitemap Index File

Indicates sitemaps that contain a link to a sitemap index file.


Target parameter: Link Source

Warnings

Redirected Sitemap

Indicates sitemaps redirected with a 3xx status code. Note that in contrast to the main table, here you can see the final URLs.


Target parameter: Status Code

Invalid Sitemap Location

Indicates sitemaps that break the location rules of the Standard Sitemap Protocol. A sitemap must be placed on the same host and protocol as its content.


Target parameter: URL

Invalid URL Location

Indicates URLs that break the location rules of the Standard Sitemap Protocol. URLs in a sitemap must be placed on the same host and protocol as the sitemap.


Target parameter: URL

URL Priority Is Invalid

Indicates URLs that have incoming links from the sitemap with bad <priority> tag format.


Target parameter: Priority

Priority out of Range

Indicates URLs that have incoming links from the sitemap with the <priority> tag that is out of range (0.0 to 1.0).


Target parameter: Priority

URL Changefreq Is Invalid

Indicates URLs that have incoming links from the sitemap with bad <changefreq> tag format.


Target parameter: Changefreq

URL Lastmod Is Invalid 

Indicates URLs that have incoming links from the sitemap with bad <lastmod> tag date format.


Target parameter: Lastmod

Sitemap Lastmod Is Invalid

Indicates sitemaps that have incoming links from the sitemap index file with bad <lastmod> date format.


Target parameter: Lastmod

Long Server Response Time 

Indicates addresses of the pages with TTFB (time to first byte) exceeding 500 ms (by default). Note that you can change default value on 'Restrictions' tab of crawling settings.


Target parameter: Response Time

Robots.txt Does Not Contain a Sitemap Index

Indicates sitemap index files not found in appropriate robots.txt files.


Target parameter: Specified in robots.txt

Duplicated URLs

Indicates URLs that were repeatedly found in all sitemaps. Data in this report are grouped by the 'URL' parameter.


Target parameter: URL

Contains Byte-Order Mark 

Indicates sitemaps that contain a Byte-Order Mark (BOM) – a Unicode character used for text stream byte order indication. It causes problems with sitemap crawling, so it's highly recommended to avoid the BOM.


Target parameter: Encoding

Notices

Percent-Encoded URLs

Indicates URLs that contain percent-encoded (non-ASCII) characters and spaces. For instance, the URL https://example.com/例 is encoded as https://example.com/%E4%BE%8B.


Target parameter: URL

Robots.txt Does Not Contain a Sitemap

Indicates sitemap files not found in appropriate robots.txt.


Target parameter: Specified in robots.txt

Choose files or drag and drop files
Was this article helpful?
Yes
No

Still Thinking?

Thousands of specialists around the world use Netpeak Spider and Checker. Register to start your 14-day free trial!