Avoid duplicate content penalties and improve SEO. Sebastian Cowie covers some of the fundamentals surrounding duplicate content and gives away tips and tricks to make sure your site doesn’t suffer from these issues
Unless you’ve been living under a rock for the past year and a half, then you’ll probably have heard of the devastation that Google Panda has left in its wake. Affiliate marketers and SEO agencies participating in less than kosher tactics have seen their organic visibility plummet resulting in lost revenue, custom – and in extreme cases their livelihood.
What is Google Panda?
Google Panda (originally known as Google Farmer) was unleashed upon the world back in February 2011. The prime objective of this update was to address the ever increasing level of spam prevalent within Google’s index and increase the user experience.We’ve recently been subjected to another data refresh of the Panda update on the 20 August, making this latest release Panda 3.9.1.
As a result of this update, there were two major changes to the fundamentals of SEO and website design:
Removal of low-quality link building strategies from article and content farm sites.
- Article sites including: Ezine Articles, GoArticles, and the like took a substantial hit.
- How-to sites / low quality wiki’s and unmoderated blogs and forums were also hit particularly hard.
Subsequently, if the majority of your link profile originated from these sites, then you would have seen a decrease in organic visibility. Your site may not have been directly hit by Panda, but because of the drop in authority from these sites you may have been hit in the fallout.
Sites must have unique content that provides users with informative and engaging content (read: quality content).
- Crack down on sites with duplicate content, content thin sites, or MFA sites (affiliate sites and ecommerce platforms were hit particularly hard).
- Sites with an abnormally high ad / content ratio
- Restrictions on automatically generated pages (auto-blogs / aggregator and ecommerce platforms again).
If you were hit by Panda, as a considerably large proportion of webmasters were (11.8 per cent of queries in the United States were affected by Panda 1.0), then read on and discover how you can improve your on-site SEO.
Developer tips and tricks
Design and development agencies and SEO agencies were often seen as two entirely separate entities, with neither holding much regard for the other due to the various changes in process that each party required. However, as SEO and optimised platforms are quickly becoming the norm, most agencies are now factoring in some, if not all elements of the design, development and optimisation process when creating a new build.
Owing to the uproar surrounding the Panda update, Google published a guide on “building high quality sites” – targeted primarily towards content quality, but it also broaches a number of points that designers and developers need to take into consideration:
Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations?
We’ve all know that duplicate content was the enemy, but further scrutiny is now in place for those sites that are duplicating elements of content on multiple pages that may not provide additional information or enhance the user experience.
Ecommerce example and tips
Ecommerce platforms are particularly vulnerable to Panda, as more often than not, the platform that you’re using will have multiple paths or query strings that will enable you to get to the same product and in some instances the same product with a minor modification.
Let’s pretend you’re running an online shoe shop. Your shop has multiple variables for each shoe including colour, size and even shoe lace type. Your URL structure and number of possible duplicate pages is dependent on how your ecommerce platform handles these variables.
Most modern day ecommerce platforms have the capacity to handle the above scenario correctly, although you may have to purchase or install ‘SEO’ plug-ins. I am finding that dated platforms, or ones that have been poorly configured and installed, may find their site being indexed on a number of URLs:
In certain instances, if you were to start your search based on shoe size and colour including laces, you may find the following or similar URL structure.
How is this relevant to Panda?
While Google is getting better at differentiating between query strings and parameters that offer the user something new and those that track actions, it’s still not perfect. Help is at hand however.
There is a range of potential ways to resolve duplication within your site and help Google along its merry way.
Implementation of canonical tags
While the canonical tag is only a hint and not a directive, most major search engines will attempt to utilise the data within this tag. Read more.
“Noindex, Follow” meta implementation or Robots.txt
To try and ensure a nice flow of PR through the site the “NOINDEX, FOLLOW” meta tags should be used whenever possible over a robots.txt.
This may not be possible in all instances so instruction within a robots.txt is a suitable alternative.
Specifying URL parameters within GWT
An effective way to remove duplicate content fast is to utilise the URL parameters feature within GWT. Selecting ‘No’ effectively tells Google the content is duplicated and ‘Yes’ means that it should be indexing the content under that parameter.
GWT is a developer’s best friend.
Looking at the duplicate title / content feature:
Design and content tips to increase trustworthiness
Grammar and spelling
Although not a direct design issue, poor spelling and grammar could account for millions of pounds of lost sales around the globe, according to this BBC News article. Ask yourself, would you purchase something from a site with poor spelling and grammar? Or would you associate it with a possible scam or offshore site, thereby reducing the trust factor of the site substantially?
Google Panda misconceptions
‘Create good quality content and you’ll rank well in SERPS’
SEO evangelists are often ringing the ‘build good quality content’ bell and the theory behind generating high quality content that engages users is sound. However, the philosophical question ‘If a tree falls in a forest and no one is around to hear it, does it make a sound?’ rings true in this instance.
If you neglect to promote your content through social or traditional (read: link building) mediums and have no existing user base, then creating good quality content simply isn’t enough.
‘Panda looks at the amount of content above the fold’
It’s all too easy to bundle the Panda update with the ‘Page layout algorithm’ update because they focus on content and user-experience; however, they were separate updates.
Have you been affected by either duplicate content or the latest rollout of Panda? Or were you hit by a previous release of it and are still struggling to recover; maybe you’ve seen a recovery over the past few days? If so I’d love to hear what you’ve tried so far to try and revive your site from any penalty it may have been suffering from.