- Sitemap generator and types of sitemaps
- Work with the tool
- Configuration of the Sitemap generator
- Submit an XML sitemap on a website
1. Sitemap generator and types of sitemaps
Sitemap generator is a built-in tool allowing you to generate sitemaps based on Google recommendations just in a few clicks.
Netpeak Spider can generate the following types of a sitemap:
- XML Sitemap → a common type of sitemap that contains crawled links. It is generated based on the official documentation Standard Sitemap Protocol.
- Image Sitemap → file containing links to crawled pages and links to all unique images on these pages. If there are no images on a page, the link will not be included in the Image Sitemap.
Before generating the Image Sitemap, please make sure that you turned on image consideration during crawling. To do this, go to the ‘Parameters’ tab in a sidebar and check if the ‘Images’ parameter in the ‘Content’ group is selected. Also, go to the ‘General’ tab of the program settings and make sure that the ‘Check images’ is enabled.
- HTML Sitemap → an HTML file containing links to all crawled pages that allows you to embed a sitemap into a corresponding website category.
- TXT Sitemap → a sitemap in .TXT format. It’s less popular, but still, a proper way to help search engines index your site.
The tool allows you to add to the site map compliant and non-compliant pages to perform a wider range of tasks. In case it is necessary to add only compliant URLs to a sitemap, they have match the following conditions:
- HTML files returning strictly the 200 OK status code
- Access is allowed in the robots.txt file (or in the virtual robots.txt)
- The Canonical tag is missing or points to the same URL
- Meta Refresh is missing or points to the same URL
- Indexing in X-Robots-Tag or Meta Robots (index) is allowed
- Links are allowed in X-Robots-Tag or Meta Robots (follow).
2. Work with the tool
Do the following steps to generate a sitemap:
1. Crawl the necessary pages.
2. Open the ‘Sitemap Generator’ tool. You can do it in several ways:
2.1. Main menu → Tools
2.2. Control panel → Run
2.3. Using the ‘Alt+G’ hotkey
3. Configure the necessary parameters
4. Click on the ‘Generate...’ button and choose the path where the file will be saved.
3. Sitemap generator configuration
You can generate a sitemap for the selected hosts or for all of them at once so that multiple files can be generated for each of them separately. In the ‘Generation target host‘ dropdown menu you will see how many URLs each host passes to the ‘Sitemap generator.’ You can also configure segmentation and create a sitemap for a particular category.
You can transfer only 100 hosts from the main table to the ‘Sitemap generator’.
Please note that if a sitemap size exceeds 49,9 Mb or contains more than 49 999 URLs, Netpeak Spider will split it into several files and will generate a sitemap index file.
Generation target host → you can choose all the hosts or one of the offered ones. In case you have a segment configured, it will automatically be used as the target host. Learn more about working with segments here → ‘Segments and how to work with them’.
For the XML sitemap and Image XML sitemap you can configure the following optional parameters:
- Last modified date → the ‘lastmod’ parameter that specifies the last date when the page was modified. It indicates to search engines whether it is necessary to crawl the page again or not because the content hasn’t been changed. The parameter can be taken from the ‘Last-Modified’ column or set manually for all URLs.
Change frequency → the ‘changefreq’ parameter, which indicates to search engines the frequency of content changes on the corresponding pages. This parameter can have the following values:
- Always (should be used to describe documents that change each time they are accessed)
- Never (should be used to describe archived URLs)
- Priority → the ‘priority’ parameter that indicates the priority (from 0.0 to 1.0) of the URL relative to other URLs on your site based on the number of its incoming or outgoing links (depending on what you selected). If this parameter is enabled, the order of pages in the sitemap will be arranged from the highest priority value to the lowest one.
- Saving traffic → a function that allows you to reduce traffic by compressing files into a .gz archive and removing all spaces and indents between tags. The same feature is available for a TXT Sitemap.
- Segmentation → feature that allows you to generate Sitemaps of 1000 URLs for services such as the Google Search Console.
- Compliance → if enabled, this feature allows adding only compliant URLs to sitemap files, namely HTML files with 2xx status code, not disallowed in crawling and indexation instructions (robots.txt, canonical, Meta Robots, etc.). They are the most important pages on your websites, because they can potentially lead organic traffic.
Hreflang → if enabled, this feature allows adding hreflang instructions spotted during crawling to sitemap files. Take into account that only instructions without issues will be added.
For an HTML sitemap you can configure the following parameters:
- Anchor text source → you can select a URL, Title tag or H1 header as the link text (anchor). Note that to use these parameters they should be enabled in a sidebar.
- Segmentation → this feature allows you to split the list of pages into separate files containing 100 or 1000 URLs each or save them all as a single file.
- Additional content → allows you to add the content of the meta description tag next to each link.
Traffic saving and compliance settings are available for TXT Sitemap.
Once you configured the necessary settings, click on the ‘Generate’ button, choose the path for saving your file and press ‘OK’.
4. Submit an XML sitemap to a website
After the sitemap was generated, follow these steps:
Copy the files from the XML Sitemap folder to the root folder of the site on the server.
Add the ‘Sitemap’ directive with the address of the downloaded sitemap to the robots.txt file.
Check your new sitemap with the XML Sitemap Validator in Netpeak Spider and ping it to Google and Bing.