Andrew Welch · Insights · #SEO #Sitemaps #Myths

Published , updated · 5 min read ·


Please consider 🎗 sponsoring me 🎗 to keep writing articles like this.

SEO Myths: Top 5 Sitemap Myths Demystified

Sitemaps have been around since 2005, but there are still many myths and mis­con­cep­tions sur­round­ing them. We’ll demys­ti­fy five SEO myths in this article.

Seo myths top 5 sitemap myths demystified

Sitemaps have been around since 2005, when Google intro­duced them as a way to let Google­bot know what pages it should crawl to index your site.

Since then, they’ve been wide­ly adopt­ed by search engines and oth­er tools that want to con­sume the web, and cod­i­fied in the Sitemap pro­to­col.

Despite sitemaps being so pervasive, they are often misunderstood.

Despite this wide­spread adop­tion, sitemaps are often mis­un­der­stood in terms of what they actu­al­ly do.

This arti­cle will dis­cuss some com­mon mis­con­cep­tions about sitemaps, and demys­ti­fy what they actu­al­ly do.

We’ll use the term Google” gener­i­cal­ly to refer to any­thing that might want to con­sume your sitemaps.

Link SEO Myth #1: Sitemaps are orders

Sitemaps, like most sig­nals to search engines, are not orders or edicts that Google must fol­low explicitly.

Instead, they are hints that the search engine can choose to use if they want to, or they can ignore them entirely. 

First, Google has to know about your sitemap by one of the fol­low­ing methods:

  • Find­ing it in the usu­al loca­tion of /sitemap.xml
  • Find­ing it in the loca­tion spec­i­fied in your /robots.txt file
  • Being told its loca­tion by your sub­mit­ting it via the Google Search Console

Then, on its own time, Google will digest the sitemap, and poten­tial­ly use it to pri­or­i­tize index­ing (or re-index­ing) of the pages spec­i­fied in the sitemap.

It’s still entire­ly up to the whims of Google’s black box algo­rithm what to do with them.

Link SEO Myth #2: Your site must have a sitemap

While hav­ing a sitemap is nev­er a bad thing, it isn’t a require­ment, or in some cas­es, even par­tic­u­lar­ly helpful.

Here’s when Google says you might need a sitemap:

  • Your site is large. Gen­er­al­ly, on large sites it’s more dif­fi­cult to make sure that every page is linked by at least one oth­er page on the site. As a result, it’s more like­ly Google­bot might not dis­cov­er some of your new pages.
  • Your site is new and has few exter­nal links to it. Google­bot and oth­er web crawlers crawl the web by fol­low­ing links from one page to anoth­er. As a result, Google­bot might not dis­cov­er your pages if no oth­er sites link to them.
  • Your site has a lot of rich media con­tent (video, images) or is shown in Google News. Google can take addi­tion­al infor­ma­tion from sitemaps into account for Search.

Here’s when Google says you might not need a sitemap:

  • Your site is small”. By small, we mean about 500 pages or few­er on your site. (Only pages that you think need to be in search results count toward this total.)
  • Your site is com­pre­hen­sive­ly linked inter­nal­ly. This means that Google can find all the impor­tant pages on your site by fol­low­ing links start­ing from the home page.
  • You don’t have many media files (video, image) or news pages that you want to show in search results. Sitemaps can help Google find and under­stand video and image files, or news arti­cles, on your site. If you don’t need these results to appear in Search you might not need a sitemap.

Indeed, in terms of page dis­cov­ery, sitemaps are sec­ond to exter­nal and inter­nal links.

Link SEO Myth #3: If a URL is not in your Sitemap, it isn’t indexed

As not­ed above, Google usu­al­ly will find pages on your web­site to index regard­less of whether they are in a sitemap or not.

Which brings up an incred­i­bly impor­tant point about sitemaps: sitemaps are about dis­cov­er­abil­i­ty, not indexing.

Sitemaps are about discoverability, not indexing.

If a URL is indexed by Google already, remov­ing it from your sitemap does not remove it from Google’s index.

The term sitemap” implies that it’s a canon­i­cal map of your web­site, where­as the real­i­ty is that a sitemap is clos­er to a Google­bot to do” list. 

The URLs in a sitemap are things Google may keep in mind when it pri­or­i­tizes index­ing or re-index­ing the pages on your website.

If you’re con­stant­ly telling Google (via your sitemap) to pri­or­i­tize index­ing pages that haven’t changed, and are already in its index, it’s not actu­al­ly accom­plish­ing anything.

Link SEO Myth #4: Every URL on your website must be in your sitemap

Giv­en that we’ve already learned that sitemaps are about dis­cov­er­abil­i­ty, not about index­ing, it should be clear that it isn’t nec­es­sary to have every URL on your web­site in your sitemap.

Once Google has indexed a page, it no longer needs to be in your sitemap until and unless the page is updat­ed and needs to be re-indexed.

If you have a large web­site with thou­sands of entries in the sitemap, there are some neg­a­tive effects:

  • The sitemap can be resource-inten­sive for your serv­er to gen­er­ate and regen­er­ate when a page changes
  • The sitemap will be very large, requir­ing a more of Google­bot’s time to download
  • The large sitemap will then also require more of Google­bot’s time to parse and tra­verse through
  • Most of the sitemap will usu­al­ly be ignored by Google­bot anyway

For exact­ly this rea­son, tools like SEO­mat­ic for Craft CMS have a Sitemap Lim­it set­ting to lim­it the num­ber of entries in a giv­en sitemap to only the XX most recent­ly changed pages:

Seomatic sitemap limit setting

SEO­mat­ic Sitemap Lim­it setting

Oth­er SEO tools have sim­i­lar func­tion­al­i­ty. Using it, you can lim­it the entries in the sitemap to only the XX most recent­ly changed pages.

What this does is make Google­bot’s crawl bud­get more effec­tive by giv­ing it a small­er sitemap to parse through, and only pro­vid­ing it new pages or recent­ly changed pages, which it should pri­or­i­tize indexing.

You should like­ly start uti­liz­ing a Sitemap Lim­it once you have over a few hun­dred entries in a giv­en sitemap, and set it to some­thing rea­son­able based on how fre­quent­ly new pages are added or exist­ing pages updat­ed with mean­ing­ful changes.

For the vast major­i­ty of web­sites, set­ting this to 50 or low­er is all you’ll ever need.

This opti­mizes Google­bot’s crawl bud­get, and reduces the load on your serv­er con­stant­ly regen­er­at­ing & serv­ing up huge sitemaps.

Link SEO Myth #5: There can be only one

It’s com­mon­ly thought that a sitemap is a sin­gle­ton, that there is one and only one sitemap for each site, or that it’s prefer­able to have it be that way.

Actu­al­ly, there is a con­cept of a Sitemap Index, which is essen­tial­ly a sitemap of sitemaps.

What this allows you to do is break out your sitemaps into sev­er­al small­er sitemaps, which allows Google­bot to be more effi­cient about pars­ing your sitemap for changes.

Here’s an exam­ple sitemap index from this very site:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="sitemap.xsl"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://nystudio107.com/sitemaps-1-section-blog-1-sitemap.xml</loc>
    <lastmod>2023-07-14T16:59:33-04:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://nystudio107.com/sitemaps-1-categorygroup-blogCategories-1-sitemap.xml</loc>
    <lastmod>2019-04-05T09:47:35-04:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://nystudio107.com/sitemaps-1-section-blogIndex-1-sitemap.xml</loc>
    <lastmod>2020-09-02T22:07:56-04:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://nystudio107.com/sitemaps-1-section-homepage-1-sitemap.xml</loc>
    <lastmod>2020-09-02T22:07:56-04:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://nystudio107.com/sitemaps-1-section-plugins-1-sitemap.xml</loc>
    <lastmod>2023-06-13T16:03:43-04:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://nystudio107.com/sitemaps-1-section-pluginsIndex-1-sitemap.xml</loc>
    <lastmod>2020-09-02T22:07:56-04:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://nystudio107.com/sitemaps-1-section-privacyPolicy-1-sitemap.xml</loc>
    <lastmod>2020-09-02T22:07:56-04:00</lastmod>
  </sitemap>
</sitemapindex>

Imag­ine this sce­nario: you have a site with a news blog that has thou­sands of entries, and your site also has a Fea­tured Prod­uct page that lists, well, a fea­tured product.

You post a new blog entry once a month but update the Fea­tured Prod­uct page every day.

In this sce­nario, if you had one sitemap, every time you updat­ed the Fea­tured Prod­uct page, the one sitemap to rule them all” would need to get regen­er­at­ed, too, with the thou­sands of blog entries that did­n’t change.

If, instead, you had a sep­a­rate sitemap for each log­i­cal sec­tion of your web­site, only the sitemap for the Fea­tured Prod­uct page would get updat­ed when you change it.

This allows Google­bot to spend its time more effi­cient­ly because the sitemaps are more fine­ly grained. It only needs to look at the sub­set of pages that actu­al­ly were added or changed.

Link Sometimes a sitemap is just a sitemap

Imag­ine if your sig­nif­i­cant oth­er gave you a home to do” list that they rewrote by hand every week.

Next, imag­ine if they kept every sin­gle thing you’d already done on that list.

In addi­tion to it being a grand waste of their time and yours, it’d also grow to fill sev­er­al notepads’ worth of irrel­e­vant tasks, and become unwieldy.

This is how you should think of a sitemap.

If you want more on this top­ic, check out the CraftQuest​.io video 3 Myths of Sitemaps that we did.