Andrew Welch
Published , updated · 5 min read · RSS Feed
Please consider 🎗 sponsoring me 🎗 to keep writing articles like this.
SEO Myths: Top 5 Sitemap Myths Demystified
Sitemaps have been around since 2005, but there are still many myths and misconceptions surrounding them. We’ll demystify five SEO myths in this article.
Sitemaps have been around since 2005, when Google introduced them as a way to let Googlebot know what pages it should crawl to index your site.
Since then, they’ve been widely adopted by search engines and other tools that want to consume the web, and codified in the Sitemap protocol.
Despite sitemaps being so pervasive, they are often misunderstood.
Despite this widespread adoption, sitemaps are often misunderstood in terms of what they actually do.
This article will discuss some common misconceptions about sitemaps, and demystify what they actually do.
We’ll use the term “Google” generically to refer to anything that might want to consume your sitemaps.
Link SEO Myth #1: Sitemaps are orders
Sitemaps, like most signals to search engines, are not orders or edicts that Google must follow explicitly.
Instead, they are hints that the search engine can choose to use if they want to, or they can ignore them entirely.
First, Google has to know about your sitemap by one of the following methods:
- Finding it in the usual location of /sitemap.xml
- Finding it in the location specified in your /robots.txt file
- Being told its location by your submitting it via the Google Search Console
Then, on its own time, Google will digest the sitemap, and potentially use it to prioritize indexing (or re-indexing) of the pages specified in the sitemap.
It’s still entirely up to the whims of Google’s black box algorithm what to do with them.
Link SEO Myth #2: Your site must have a sitemap
While having a sitemap is never a bad thing, it isn’t a requirement, or in some cases, even particularly helpful.
Here’s when Google says you might need a sitemap:
- Your site is large. Generally, on large sites it’s more difficult to make sure that every page is linked by at least one other page on the site. As a result, it’s more likely Googlebot might not discover some of your new pages.
- Your site is new and has few external links to it. Googlebot and other web crawlers crawl the web by following links from one page to another. As a result, Googlebot might not discover your pages if no other sites link to them.
- Your site has a lot of rich media content (video, images) or is shown in Google News. Google can take additional information from sitemaps into account for Search.
Here’s when Google says you might not need a sitemap:
- Your site is “small”. By small, we mean about 500 pages or fewer on your site. (Only pages that you think need to be in search results count toward this total.)
- Your site is comprehensively linked internally. This means that Google can find all the important pages on your site by following links starting from the home page.
- You don’t have many media files (video, image) or news pages that you want to show in search results. Sitemaps can help Google find and understand video and image files, or news articles, on your site. If you don’t need these results to appear in Search you might not need a sitemap.
Indeed, in terms of page discovery, sitemaps are second to external and internal links.
Link SEO Myth #3: If a URL is not in your Sitemap, it isn’t indexed
As noted above, Google usually will find pages on your website to index regardless of whether they are in a sitemap or not.
Which brings up an incredibly important point about sitemaps: sitemaps are about discoverability, not indexing.
Sitemaps are about discoverability, not indexing.
If a URL is indexed by Google already, removing it from your sitemap does not remove it from Google’s index.
The term “sitemap” implies that it’s a canonical map of your website, whereas the reality is that a sitemap is closer to a Googlebot “to do” list.
The URLs in a sitemap are things Google may keep in mind when it prioritizes indexing or re-indexing the pages on your website.
If you’re constantly telling Google (via your sitemap) to prioritize indexing pages that haven’t changed, and are already in its index, it’s not actually accomplishing anything.
Link SEO Myth #4: Every URL on your website must be in your sitemap
Given that we’ve already learned that sitemaps are about discoverability, not about indexing, it should be clear that it isn’t necessary to have every URL on your website in your sitemap.
Once Google has indexed a page, it no longer needs to be in your sitemap until and unless the page is updated and needs to be re-indexed.
If you have a large website with thousands of entries in the sitemap, there are some negative effects:
- The sitemap can be resource-intensive for your server to generate and regenerate when a page changes
- The sitemap will be very large, requiring a more of Googlebot’s time to download
- The large sitemap will then also require more of Googlebot’s time to parse and traverse through
- Most of the sitemap will usually be ignored by Googlebot anyway
For exactly this reason, tools like SEOmatic for Craft CMS have a Sitemap Limit setting to limit the number of entries in a given sitemap to only the XX most recently changed pages:
Other SEO tools have similar functionality. Using it, you can limit the entries in the sitemap to only the XX most recently changed pages.
What this does is make Googlebot’s crawl budget more effective by giving it a smaller sitemap to parse through, and only providing it new pages or recently changed pages, which it should prioritize indexing.
You should likely start utilizing a Sitemap Limit once you have over a few hundred entries in a given sitemap, and set it to something reasonable based on how frequently new pages are added or existing pages updated with meaningful changes.
For the vast majority of websites, setting this to 50 or lower is all you’ll ever need.
This optimizes Googlebot’s crawl budget, and reduces the load on your server constantly regenerating & serving up huge sitemaps.
Link SEO Myth #5: There can be only one
It’s commonly thought that a sitemap is a singleton, that there is one and only one sitemap for each site, or that it’s preferable to have it be that way.
Actually, there is a concept of a Sitemap Index, which is essentially a sitemap of sitemaps.
What this allows you to do is break out your sitemaps into several smaller sitemaps, which allows Googlebot to be more efficient about parsing your sitemap for changes.
Here’s an example sitemap index from this very site:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="sitemap.xsl"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://nystudio107.com/sitemaps-1-section-blog-1-sitemap.xml</loc>
<lastmod>2023-07-14T16:59:33-04:00</lastmod>
</sitemap>
<sitemap>
<loc>https://nystudio107.com/sitemaps-1-categorygroup-blogCategories-1-sitemap.xml</loc>
<lastmod>2019-04-05T09:47:35-04:00</lastmod>
</sitemap>
<sitemap>
<loc>https://nystudio107.com/sitemaps-1-section-blogIndex-1-sitemap.xml</loc>
<lastmod>2020-09-02T22:07:56-04:00</lastmod>
</sitemap>
<sitemap>
<loc>https://nystudio107.com/sitemaps-1-section-homepage-1-sitemap.xml</loc>
<lastmod>2020-09-02T22:07:56-04:00</lastmod>
</sitemap>
<sitemap>
<loc>https://nystudio107.com/sitemaps-1-section-plugins-1-sitemap.xml</loc>
<lastmod>2023-06-13T16:03:43-04:00</lastmod>
</sitemap>
<sitemap>
<loc>https://nystudio107.com/sitemaps-1-section-pluginsIndex-1-sitemap.xml</loc>
<lastmod>2020-09-02T22:07:56-04:00</lastmod>
</sitemap>
<sitemap>
<loc>https://nystudio107.com/sitemaps-1-section-privacyPolicy-1-sitemap.xml</loc>
<lastmod>2020-09-02T22:07:56-04:00</lastmod>
</sitemap>
</sitemapindex>
Imagine this scenario: you have a site with a news blog that has thousands of entries, and your site also has a Featured Product page that lists, well, a featured product.
You post a new blog entry once a month but update the Featured Product page every day.
In this scenario, if you had one sitemap, every time you updated the Featured Product page, the “one sitemap to rule them all” would need to get regenerated, too, with the thousands of blog entries that didn’t change.
If, instead, you had a separate sitemap for each logical section of your website, only the sitemap for the Featured Product page would get updated when you change it.
This allows Googlebot to spend its time more efficiently because the sitemaps are more finely grained. It only needs to look at the subset of pages that actually were added or changed.
Link Sometimes a sitemap is just a sitemap
Imagine if your significant other gave you a home “to do” list that they rewrote by hand every week.
Next, imagine if they kept every single thing you’d already done on that list.
In addition to it being a grand waste of their time and yours, it’d also grow to fill several notepads’ worth of irrelevant tasks, and become unwieldy.
This is how you should think of a sitemap.
If you want more on this topic, check out the CraftQuest.io video 3 Myths of Sitemaps that we did.