- Sitemap generator and types of sitemaps
- Working with the tool
- Configuration of the Sitemap generator
- Uploading an XML sitemap on a website
1. Sitemap generator and types of sitemaps
Sitemap generator is a built-in tool allowing you to generate sitemaps based on Google recommendations just in a few clicks.
Netpeak Spider can generate the following types of a sitemap:
- XML Sitemap → a common type of sitemap that contains crawled links. It is generated based on the official documentation Standard Sitemap Protocol.
- Image Sitemap → a file containing links to crawled pages and links to all unique images on these pages returning the 200 OK response code and not closed from indexing. If there are no images on a page, the link will not be included in the Image Sitemap.
Before generating the Image Sitemap, please make sure that you turned on considering images during crawling. To do this, go to the ‘Parameters’ tab in a sidebar and check if the ‘Images’ parameter in the ‘Content’ section is selected. Also, go to the ‘General’ tab of the program settings and make sure that the ‘Check Images’ is enabled.
- HTML Sitemap → an HTML file containing links to all crawled pages that allows you to embed a sitemap into a corresponding website category.
- TXT Sitemap → a sitemap in .TXT format. It’s less popular, but still, a proper way to help search engines index your site.
Netpeak Spider will only add to a sitemap URLs that match the following conditions:
- HTML files returning strictly the 200 OK status code
- Access is allowed in the robots.txt file (or in the virtual robots.txt)
- The Canonical tag is missing or points to the same URL
- Meta Refresh is missing or points to the same URL
- Indexing in X-Robots-Tag or Meta Robots (index) is allowed
- Links are allowed in X-Robots-Tag or Meta Robots (follow).
2. Working with the tool
You can generate a sitemap for the selected subdomain or for all of them at once so that multiple files can be generated for each of them separately. You can also configure segmentation and create a sitemap for a particular category.
You can transfer only 100 hosts from the main table to the Sitemap generator.
Please note that if a sitemap size exceeds 49,9 Mb or contains more than 49 999 URLs, Netpeak Spider will split it into several files and will generate a sitemap index file.
Do the following steps to generate a sitemap:
1. Crawl the necessary pages.
2. Open the ‘Sitemap Generator’ tool. You can do it in several ways:
2.1. Main menu → Tools
2.2. Control panel → Run
2.3. Using the ‘Alt+G’ hotkey
3. Configure the necessary parameters
4. Click on the ‘Generate...’ button and choose the path where the file will be saved.
3. Sitemap generator configuration
Generation target host → you can choose all the hosts or one of the offered ones. In case you have a segment configured, it will automatically be used as the target host. Learn more about working with segments here → ‘Segments and how to work with them’.
For the XML sitemap and image XML sitemap you can configure the following optional parameters:
- Last modified date → the ‘lastmod’ parameter that specifies the last date when the page was modified. It indicates to search engines whether it is necessary to crawl the page again or not because the content hasn’t been changed. The parameter can be taken from the ‘Last-Modified’ column or set manually for all URLs.
Change frequency → the ‘changefreq’ parameter, which indicates to search engines the frequency of content changes on the corresponding pages. This parameter can have the following values:
- Always (should be used to describe documents that change each time they are accessed)
- Never (should be used to describe archived URLs)
- Priority → the ‘priority’ parameter that indicates the priority (from 0.0 to 1.0) of the URL relative to other URLs on your site based on the number of its incoming or outgoing links (depending on what you selected). If this parameter is enabled, the order of pages in the sitemap will be arranged from the highest priority value to the lowest one.
- Saving traffic → a function that allows you to reduce traffic by compressing files into a .gz archive and removing all spaces and indents between tags. The same feature is available for a TXT Sitemap.
For an HTML sitemap you can configure the following parameters:
- Anchor text source → you can select a URL, Title tag or H1 header as the link text (anchor). Note that to use these parameters they should be enabled in a sidebar.
- Segmentation → this feature allows you to split the list of pages into separate files containing 100 or 1000 URL each or save them all as a single file.
- Additional content → allows you to add the content of the meta Description tag next to each link.
Once you configured the necessary settings, click on the ‘Generate’ button, choose the path for saving your file and press ‘OK’.
4. Uploading an XML sitemap on a website
After the sitemap was generated, follow these steps:
Copy the files from the XML Sitemap folder to the root folder of the site on the server.
Add the ‘Sitemap’ directive with the address of the downloaded sitemap to the robots.txt file.
Check your new sitemap with the XML Sitemap Validator in Netpeak Spider and ping it to Google and Bing.