As part of the process of search engine optimization, duplicate content is a fundamental issue that needs to be dealt with, similar to 404 errors. In a nutshell, duplicate content is a situation where the same content can be found on multiple URL’s within the same website or even on different websites on different domains.The problem this causes for SEO is that search engines robots will see multiple versions of the same content but will only index one copy, or may stop indexing a website if the search engine robot finds too many copies of the same content on a single website.

There are several ways in which one can end up with duplicate content.

  • Creating copies of the same page with different URL’s
    – This often happens with e-commerce sites or product pages where 1 product can fall into multiple categories
  • Submitting the same press release / article to multiple external websites
  • Using multiple domain names for the same website
  • Incorrect setup of web server so that a website can be accessed with or without “www” before the domain name

Here are some ways you can handle duplicate content issues.

  1. Implement a 301 redirect
    This is effective for handling a situation where the same website can be accessed through different URL’s (for example, if http://www.example.com/ can also be accessed as http://example.com/)
  2. Preparing different versions of articles or press releases
    This will prevent duplicate content being created on external websites.
  3. Use canonical tags
    Canonical tags placed in the header of the HTML code will tell search engines which URL should be indexed – this is particularly useful when limitations prevent the implementation of a 301 redirect.
    Example of a canonical tag: <link rel=”canonical” href=”http://www.example.com/prod.php?item=fish” />
  4. Use Google Webmaster Tools
    Google Webmaster Tools includes a feature that allows you to specify URL parameters to ignore, and will even suggest parameters that it has found while crawling your website.
    This can be particularly useful for websites that contain a session or user ID in the URL, or for websites where sorting content (for example, sorting a list by price) adds a parameter to the website URL.
  5. Block duplicate content in robots.txt
    When search engine robots crawl a website, it will read the robots.txt file to see if there are any pages to ignore or if the entire website is blocked from being indexed.
    By specifying duplicate content page URL’s in robots.txt, it will prevent the search engine from crawling the duplicate pages.

For more help on dealing with duplicate content and other search engine optimization issues, as well as other internet marketing strategies, contact Endai using the form on the right.