Sitemaps probably sound super familiar because they’re usually that one link you can find on any website footer, right? However, that’s not the only sitemap you guys need to know about. There’s another one that lists all the pages you have on your website, and it is only accessible to webmasters. As XML Sitemaps are pretty technical, let’s investigate this topic in detail.
Table of Contents
What are XML Sitemaps?
An XML Sitemap is an XML file that lists all the URLs of a website which you’ll upload to Search Console to send the URLs to Google and to track their indexation- it’s as simple as that. Each URL will be accompanied by parameters that tell search engines how to interact with every single page.
Let’s look at an XML Sitemap example by sitemaps.org using several parameters; then, we’ll analyze them later.
<?xml version="1.0" encoding="UTF-8"<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/</loc> <lastmod>2005-01-01</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> <url> <loc>http://www.example.com/catalog?item=12&desc=vacation_hawaii</loc> <changefreq>weekly</changefreq> </url>
<url> <loc>http://www.example.com/catalog?item=73&desc=vacation_porto</loc> <lastmod>2004-12-23</lastmod> <changefreq>weekly</changefreq> </url>
<url> <loc>http://www.example.com/catalog?item=74&desc=vacation_paris</loc> <lastmod>2004-12-23T18:00:15+00:00</lastmod> <priority>0.3</priority> </url>
Apart from the list of URLs, you can see these other elements, which may be relevant for your SEO strategy:
- Priority: this will tell Google what pages are more important when crawling a site. Google won’t always take this into account, but it’s essential to present a more structured site. Most of the time, search engines will only check the robots.txt and move on with the crawl. This value ranges from 0.0 to 1, with 1 being the most crucial page: your Home. This has no influence over rankings.
- Changefreq: this tells search engines how many times this page is intended to change, so the Google bot should come to visit a page more often than others (following the indicated frequency).
- Lastmod: logically, it’s when the page was modified for the last time. But Google won’t pay as much attention to this unless it is for a digital newspaper or something where the latest news will have a bigger impact.
These parameters are optional, but remember that you always have to include the urlset to encapsulate the file, the url tag to introduce a URL’s information bloc, and a loc to type in the actual URL.
Now you know that the XML Sitemap has nothing to do with the one on the footer that lists some links, right? That one is called an HTML Sitemap.
What’s the URL limit on an XML Sitemap?
An XML file can only weigh 10 megabytes, with a limit of 50,000 URLs per XML Sitemap. If you’re working with huge websites, you’ll have to split them into several files, which can also be split by category to keep your web pages more organized. It’s good to have them separated, so this URL limitation is actually a good excuse to organize your sets of pages.
But why is it good to separate pages into sets? Thanks to segmented XML Sitemaps, you can spot indexability issues easily on Google Search Console. If you see a category losing indexed pages, you’ll be able to spot the problem more easily in a smaller section than when you see a global drop, and you can’t really find which pages it is affecting by looking at the whole picture.
Similarly, suppose you analyze websites by category. In that case, it is easier to spot trends so you can identify the strategy or technology affecting that group and apply it to the other pages.
Please find more SEO tips about how to optimize your XML sitemap in our blog post.
When do you need an XML Index Sitemap?
So, based on what we mentioned above, when you have a large website that needs multiple sitemaps because there are more than 50,000 URLs, you’ll need to upload a sitemap index as well. Here is an XML Index Sitemap example for two sitemaps.
<?xml version="1.0" encoding="UTF-8"?><sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap> <loc>http://www.example.com/sitemap1.xml.gz</loc> <lastmod>2004-10-01T18:23:17+00:00</lastmod> </sitemap>
<sitemap> <loc>http://www.example.com/sitemap2.xml.gz</loc> <lastmod>2005-01-01</lastmod> </sitemap>
Which websites need an XML Sitemap?
Basically, any website must have at least one XML Sitemap: from the smallest to the biggest, especially the latter. It helps keep track of how many pages form a website, and one can compare this number of URLs to the ones that are actually being indexed!
There are other types of sitemaps where you can index images or videos.
Which pages should be in your XML Sitemap?
We keep saying that you have to list all your URLs in the XML file, but this is assuming that those pages are actually relevant to your strategy. Before listing any URL, you’ll want to run a website crawl with FandangoSEO to identify the pages that return a 200 HTTP Response Code (meaning, they are OK).
Avoid listing pages that return 404 Not Found pages or Redirects 301 or 302, because that will confuse search engines and, you know, nobody wants to do that.
How to make Google find an XML Sitemap?
Once you’ve created the file or files, you can upload the XML Sitemap to Google Webmaster Tools to kick off the indexation tracking and to spot any major drop or increase in indexed pages (hopefully it’s the second one 😉 )
Remember that the first thing a search engine does when finding a website on the Internet is to check the robots.txt file, so if you want to make sure that your sitemap is easily found, there’s the option to add the sitemap URL there too.
Generate XML Sitemaps
There’s a lot of information in this article, and it all sounds quite overwhelming; we know that. That’s why FandangoSEO has created an easy XML Sitemaps generator on the cloud, which generates them automatically, and in just a few clicks, they’re ready to be uploaded! Oh, and don’t worry about the 50,000 URLs limitation- once this number of pages is reached, it’ll jump to another XML file, and you’ll have the entire website listed correctly in the blink of an eye!