We always hear about how Google doesn’t like duplicate content, and will penalize a page that has the same content as another. There are plenty of articles on optimizing sites to avoid having duplicate content internally, and articles ranting about scrapers.
What I want to know is what Google thinks about duplicate content cases such as Reference.com or the Associated Press.
Head over to Reference.com, the encyclopedia branch of the Ask.com network of reference sites. Enter a search term. Now go over to Wikipedia and enter the same search term. They’re the same! Reference.com is pulling Wikipedia articles onto their site and throwing in a few ads. (How are they doing this? Does Wikipedia have some sort of API?) What does Google think of this?
Or what about Associated Press articles? They’re syndicated by many newspapers, and appear on their websites. That means the same article on multiple sites, no?
Is Google demoting these pages in their results, or are they giving them a free pass? It’s hard to tell. Reference.com as a whole has a toolbar PageRank of 8, while their iPod article is listed as N/A (while the original Wikipedia article has a PageRank of 7). So that would lead us to believe that they’re being demoted. There’s not really any way to tell for sure though, is there?
It seems that the algorithm is working, and filtering out pages such as those, but I’d like to know what the search giant’s opinion is on such pages. Is duplicate content simply duplicate content?