
A well-structured XML sitemap speeds up the indexing of your pages by search engines. However, the quality of this file varies significantly depending on the platform used to build the site, the number of pages, and the settings applied. Measuring the gap between an optimized sitemap and a default-generated sitemap helps understand where the real SEO gains lie.
Automatically generated sitemap or custom sitemap: measurable gaps
Most CMS and no-code platforms produce a sitemap.xml file upon installation. The content of this default file differs significantly from a manually refined sitemap.
Recommended read : How to Use Holiday Vouchers with Air France to Save on Your Tickets
| Criterion | Auto-generated sitemap (default) | Manually optimized sitemap |
|---|---|---|
| Included URLs | All pages, including drafts, utility pages, duplicates | Only indexable pages, filtered by status and canonical |
| Lastmod tag | Often missing or file generation date | Actual date of last content modification |
| Priority tag | Identical value for all URLs | Hierarchy reflecting the site’s structure |
| URL duplicates | Frequent (UTM parameters, versions with/without slash) | Removed or redirected before inclusion |
| File size | Can exceed the limit if not segmented | Split into sitemap index if necessary |
This table shows that the default file sends a mixed signal to indexing bots. Google crawls a limited budget of pages per session. Submitting unnecessary URLs dilutes this budget on content without SEO value.
To concretely observe the structure of a well-organized sitemap, you can visit the homepage of niklasson.net, which illustrates a clear division between content categories.
Related reading : Responsible Investment: How to Give Meaning to Your Savings with Sustainable Finance

Sitemaps on no-code platforms: duplicates and ghost URLs on Webflow and Framer
No-code tools like Webflow or Framer attract users with their quick deployment. However, their management of the XML sitemap presents specific issues that traditional CMS do not encounter in the same way.
Duplicates created by automatic generators
Webflow automatically generates a sitemap.xml upon publication. Each page, each CMS collection item, and each utility page (404, search, password) is included. Framer behaves similarly by including URL variants related to interactions or page states.
Modern crawlers like Googlebot detect these duplicates and may decide to ignore part of the sitemap. A sitemap containing non-indexable URLs loses credibility with the bots.
Cleaning a no-code site’s sitemap
- Exclude utility pages (404, search, password-protected pages) via the platform’s SEO settings or a properly configured robots.txt file
- Ensure that each URL in the sitemap has a canonical tag pointing to itself, not to another variant
- Remove tracking parameters or URL fragments added by third-party integrations before submission to Google Search Console
- Use an external crawl tool to compare the generated sitemap with the pages that are actually accessible and indexable
On Webflow, deleting a page does not always immediately remove the URL from the sitemap. A post-publication check of the sitemap.xml file is necessary to avoid submitting URLs that return a 404 code.
Lastmod and priority tags: what Google actually uses
The sitemap protocol specification includes several optional tags. Their actual usefulness for SEO does not always match what their name suggests.
Google has repeatedly confirmed that the priority tag is ignored by Googlebot. This tag, which accepts values from 0.0 to 1.0, does not influence crawl order or exploration frequency. Keeping it does not harm, but spending optimization time on it brings no measurable benefit.
The lastmod tag, on the other hand, retains its usefulness as long as it reflects the actual date of content modification. When a CMS updates this date with each regeneration of the file (without content change), Google learns to ignore it for that specific site. A reliable lastmod tag helps Google prioritize the recrawl of recently modified pages.
The changefreq tag (daily, weekly, monthly) suffers the same fate as priority: it is no longer considered by major search engines.

XML sitemap and robots.txt file: consistency between the two files
An XML sitemap works in conjunction with the robots.txt file. Inconsistencies between these two files create conflicting signals for indexing bots.
If a URL is in the sitemap but blocked by a Disallow directive in robots.txt, Google will not be able to crawl it. However, the URL remains “declared” as important. This conflict wastes a line in the sitemap and can generate errors in Google Search Console.
The reverse directive also poses a problem: a page allowed in robots.txt but absent from the sitemap will not necessarily be ignored (Google will find it via internal links), but its indexing will be slower than with an explicit declaration in the sitemap.
- Each URL in the sitemap must return an HTTP 200 code and not be blocked by robots.txt
- The location of the sitemap must be declared in robots.txt via the Sitemap: directive followed by the complete URL of the file
- Pages with a noindex meta tag should not appear in the sitemap, even if they are crawlable
Submitting the sitemap via Google Search Console remains the most direct method to signal the file to the bots. The declaration in robots.txt serves as a safety net for other search engines that do not have an equivalent tool.
An XML sitemap does not compensate for a faulty internal linking structure or low-quality content. Its role is limited to facilitating the discovery and prioritization of pages. The difference between a default sitemap and a cleaned sitemap is measurable in the Search Console coverage report: fewer reported errors, fewer pages excluded due to duplication, and a higher rate of indexed pages closer to the actual number of useful pages on the site.