Understanding Duplicate Content and the Various Ways it Can Occur
Over at SEOMoz, we learn right away that duplicate content online confuses search engines. As soon as these applications find duplicated text, they have to figure out which version to use or not use, which versions to rank high and low, and where exactly to direct link metrics. This is not only an issue of canonicalization but also of plagiarism and scrapers, since stealing content is so common these days.
So after the search engine identifies multiple copies, it has to identify which piece is the original, and which is more relevant. This is because the search engine usually prefers not to show more than one page of the same content (in fact, it’s rare that it does). It will try to figure out which was the first published (the original author, discovered by date), which has the highest relevancy (perhaps the original author has more content than scraper sites) and furthermore, which version is the “best.”
Search engines do penalize plagiarists and scrapers when they are discovered, but they do not penalize republished material, if the same author is credited. In fact, you might see reprints ranked much higher than the original, if for example, the first version was a blog, and the second version on a respectable news site.
Remember too, there could also be problems with URL parameter issues, printer-friendly pages and session IDs. What can you do to avoid problems with duplicate content? First, never plagiarize your own content on your own site just to fill space. If you do republish content on article directories (or use several directories for the same article) then always use the same name so that search bots will see you are the same author.
Strive to make your pages consistent in canonicalization practice. Make sure that all of the links on your page point to one version of each page, either through redirecting, or manual retyping. CMS publishing can also help to keep things standardized throughout.
If you need your original content republished on your own site for various purposes then you can actually inform the search engines of this by creating a special tag that informs search bots that you want only one page credited for traffic and metrics. You can also order search bots to take specific action, or even not search, specific duplicate pages. (Thus your whole site will not be penalized)
All of these standards are in place to protect readers and webmasters from SPAM!

