How Does Google Decide What Is Duplicate Content?
I was watching a very interesting webinar yesterday by Leslie Rohde about the effects of the Panda algorithm update and how Google decides what is duplicate content on a website.
This aspect of the Panda update had a major effect on a large number of sites, some of whom just could not work out what they had done wrong when there were pages of original, carefully researched content.
For many of us, duplicate content means not copying or plagiarising other people’s work and not having large numbers of pages that are essentially the same but with just the town or location changed – a favourite trick of many local search engine optimisers – or building an entire website by ‘borrowing’ articles from a large number of other sites without changing the angle or giving any additional value to the content – or giving any credit back to the original author.
Leslie went through the process of how the search engines decide what constitutes a low quality site and the concept of Panda – so named because there were five programmers in Google with that surname, apparently.
What they did was to ask humans a series of questions – Does the site have too many advertisements? Is there valuable and relevant content? Would you trust this site with your credit card details? And then it tried to develop an algorithm which allowed their computers to produce the same results. The problem is that, invariably, there is more than one answer to any question – particularly when it comes to humans.
So, for example, because the chap in the video had a friend who was over 6 feet tall that he trusted, he made the assumption ‘All men that are six feet tall are good people’. Cue Osama Bin Laden. Oops. Ok, so ‘all men that are six feet tall and have a beard are bad people.’ No, wrong again – Abraham Lincoln. Assessments of standards cannot be left to an algorithm, there are too many variables.
He then went on to address how Google decides what is and isn’t duplicate content. Curiously, they don’t seem to just check the entire page, they look at the snippet. And that’s why many websites were adversely affected by the Panda update when they had done nothing wrong. For most of us, the snippet would the information that is placed in the meta description of a page, but we would be wrong. Google takes the snippet pretty much from wherever it likes on the page.
This means that if you have quoted word for word a large chunk of someone else’s work – even if it’s as a reference which you will then dissect with your own views, if that’s the bit that Google has decided to take as the snippet on both your page and the original, then – in the eyes of the search engine – your page is duplicate content.
When Google produces a SERPs (search engine results page) for a particular query, sometimes at the bottom you will see a note from Google to the effect that they have some more pages but they’re not showing them to you because they are essentially the same as the ones they have given you. These are often pages which have the same snippet.
If there are 400 pages talking about a specific type of camera and the snippet for each is the same, then 399 pages are going to lose out when it comes to appearing on the SERPs – and, usually, Google will take the page with the highest domain authority. The other 399 will be left with a black mark against their site… and the more black marks you accumulate, the more likely it is that your site will be affected by the Panda algorithm.
If you have enjoyed this post, I would really appreciate it if you could like it on Facebook, retweet or +1 it on Google using the symbols in the coloured section below.
And, whilst you’re here, why not take a look around. For more posts about SEO click here
Originally posted 2011-10-14 11:02:27. Republished by Blog Post Promoter
Suite 1, 103 Leigh Road Leigh-on-Sea, Essex, SS9 1JL UK
firstname.lastname@example.org • 01702 476517
Tags: duplicate content