Duplicate content is a problem in SEO, although as we shall see, it is not for the reason that is usually thought. We analyze this issue to discover what are its real consequences, and how to avoid them. Follow our guidelines to get rid of it and avoid serious damage to your SEO efforts.
First, let’s start with the basics.
What is duplicate content?
According to Google’s definition, duplicate content refers to pieces of content within or between domains that match entirely or are remarkably similar. Another way to explain it would be content that lives in several locations. Now the question is what kind of duplicate content is there and why is it generated? This is what we are going to address next.
How is duplicate content defined?
From Google’s definition, we infer that there are two types of duplicate content:
- Duplicate content within the same website.
- Duplicate content found between two different domains. In this case, Google considers duplicate content scraped (copied) content.
Common causes of duplicate content
The main reasons for which duplicate content is typically created are the following:
Create different URLs with the same content
If we consciously or mistakenly create the same content for different URLs, we must know that this will give duplicate content issues to the search engine.
Sessions ID in URL
This duplicate issue often arises when each user who arrives at the page is assigned an ID that is stored in the URL. It happens when you want to track the user or if a shopping cart is available on the web.
The CMS does not use clean URLs
The main page of a website is the home page since it’s the one with the most links, both internally and externally. As it has so many inlinks and outlinks, it is essential to keep the URL clean. Sometimes URL parameters pointing to the home page are created, generating duplicate content.
Some CMSs such as WordPress include the option to paginate comments, resulting in duplicate content if the number of comments is large.
Inconsistent URL structure / URL parameters used for sorting and tracking
Pay attention to these URLs:
They are the same for you, and in fact, they are the same destination URL. But search engine bots don’t see it the same way and consider them as different URLs. So, when bots find the same content in two URLs, they treat it as a duplicate.
The same problem also arises with URL trackings, which can also be identified as a duplicate content.
This is the reason why you need to pay special attention to your URL structure.
www vs. non-www / http vs. https
It is an old problem, but it still happens that search engines make mistakes when a website is accessible with or without www. Something that occurs although to a smaller extent, with HTTP and HTTPS.
Use of parameters in e-commerce websites
We are talking about when search parameters, pagination, session IDs, filtering, or classification are added to URLs. The product page is the same, but for the GoogleBot, there are different URLs. Let’s see it with an example:
When creating URLs for tracking purposes, this problem also occurs.
Why is duplicate content bad for SEO?
Generating duplicate content is bad for SEO for two main reasons:
- If you generate duplicate content within the same site, you will harm your SEO, but you will not receive a penalty from Google. This reduces the performance of all those versions, as they are competing with each other.
- Likewise, search engines have problems consolidating link metrics for that content (page authority, trust, and relevance), especially when there are other sites that link to several versions of that content.
Can I receive a Google penalty for duplicate content?
If you generate duplicate content within the same site, the SEO will be harmed, but you will not receive a penalty from Google. This breaks a ‘myth’ since it is often thought that the search engine does ‘punish’ sites that contain duplicate content.
However, the search engine will penalize if you extract another person’s content. In this case, and as we have already pointed out, we would be talking about “copied content” rather than duplicated content since the content is not really ‘duplicated,’ but rather plagiarized.
How to prevent duplicate content from becoming a problem
Knowing already what the problem is, there are different solutions to be applied:
- First things first, audit your content with a Duplicate Content Checker to detect pages with highly similar content.
- Make sure your URLs are consistent and use Google Search Console to indicate which is the preferred domain.
- Create original content on each page of the website that interests you to appear in the rankings.
- If it is inevitable to maintain pages with very similar content, you should use a canonical tag for the preferred page so that Google can identify it.
- Avoid robots from crawling not relevant pages, using the meta tag ‘noindex.’
- Eliminate those pages that are not of interest and redirect to those which are. This last point is fundamental since if you don’t assign a redirect, the robots may encounter error 404 the next time they try to track the URLs.
- Establish 301 redirects from the non-preferred URLs to the preferred URLs to alert the search engine to your preferences.
- Use Google Search Console to remove URLs from search results and speed up the process so that Google doesn’t index and examine those pages.