Monday, May 18, 2009

CANONICAL LINK ELEMENT – Definition & Practical Uses

CANONICAL LINK ELEMENT – Uses of Tag and VMware Communities Case Study
by Chris Pantages – WebMama.com

In short, “A canonical tag is a simple piece of HTML code that you insert into the section of a duplicate page, letting the search engines know that they are on a duplicate page and they need to find the original content elsewhere, and guide them there.” (Daily SEO Blog)

Matt Cutts, head of the Google Webspam team, consistently refers to it as the Canonical Link Element, but the popular name seems to be the canonical tag. The tag itself looks something like this:

link rel="canonical" href="http://www.website.com/original-content.html"

The value of the tag is that duplicate pages that serve up duplicate or nearly identical content confuse search engines and risk siphoning page popularity away from the main or “canonical” page. By using the canonical tag, website developers can ensure search engines are seeing the original page (the parent page) and not numerous, identical, or almost identical pages – which could be possibly a sign of spam or take up valuable space in search results.

The tag goes only on the duplicate page – and instructs the search engines to redirect any link and content metrics to the original page. In this way, the effect of the canonical tag is the same as a 301 redirect. The canonical tag must be entered on each duplicate page you want the search engine page value redirected.

Unlike a 301 redirect, the canonical tag can only redirect the page popularity within one domain. Also, the canonical tag does not redirect visitors.

The pages themselves must be identical or nearly identical. There is no need to use the canonical tag if your site does not have a structure that allows for users to find the same content through numerous paths, or you have addressed this possibility already by setting the preferred URL in Google Webmaster tools, your CMS, or any other program.

It is part of the WebMama methodology to never leave it up to the search engines to decide what pages achieve high visibility in search results. The canonical tag can help direct the engines to the parent or main page.

All major search engines (Google, Yahoo, MSN and Ask) have agreed to honor the tag.

Uses for the Canonical Link Element

Basic Uses

The most basic use of the canonical tag is to mask systemic differences in a site’s URL structure. The canonical tag can be used on the non-preferred URL to transfer the visibility to the preferred URL.

www.example.com vs. example.com
http://www.example.com vs. https://www.example.com
www.example.com vs. www.example.com/print (for “print only” versions of pages)

It can work between secure and unsecure pages.

A preferred location can also be addressed in Google Webmaster tools and Yahoo Site Explorer. These preferences can also be expressed in a sitemap.

Advanced Uses

The primary use for the canonical tag is for pages that have unavoidable duplicate content issues:
  • You have different landing pages (a/b testing, etc) for the same content
  • Different URLs depending on the path you take to get to a page. For example:
    - A site that generates different URL’s for products based on “sort by” choices
    - Pages that allow breadcrumbs to alter the URL – different paths create different URLs for the same terminal page.
  • You add tracking codes or session ID’s to track the user’s path through the site, therefore resulting in different URL’s for each parent page
  • In community discussion forums, replies/comments to each thread/question may be set up to look like separate files while the content is almost identical. This leads to lots of pages that look like duplicate content.
Case Study: VMware Communities –
http://communities.vmware.com/community/developer?view=discussions

The VMware Communities website has an architecture that unfortunately generated duplicate content, due to the fact that "thread" pages include a number of messages, and each of the individual messages were duplicated in a "message" page.

A discussion thread has a URL like this:
http://communities.vmware.com/thread/208441
has 3 replies like this:
http://communities.vmware.com/message/1243852#1243852
http://communities.vmware.com/message/1243992#1243992
http://communities.vmware.com/message/1244024#1244024

VMware decided to implement the canonical tag to the thread URL and each of the message URLs in order to tell the engines explicitly which page was the main/parent page – in this case, the single thread page. It was an easy implementation -- just a small amount of code to calculate the /thread URL for each message, and then adding the one line to the section of each page. It appears it took less than four days to implement in Google.

Miscellaneous Items

The original page URL should be an absolute URL as a best practice.

Per Google, the canonical tag is not a directive but it is “a hint that we honor strongly. We'll take your preference into account, in conjunction with other signals, when calculating the most relevant page to display in search results.” Canonical tags can be chained (e.g. page 3 refers to page 2 which refers to page 1 → link power from pages 2 and 3 go to page 1). This practice, per Google, is not recommended but permitted.

Some CMS systems already have plug-ins that allow you to specify canonical pages from the front-end. Drupal, Wordpress, and Magento already have plug-ins. If the tag gains traction, expect there to be more.

Other Resources

SEOmoz has one of the best summaries/articles.

For the more intrepid, Matt Cutts' Blog has a link to a 20-minute video he did explaining the tag and the slides he uses as part of the presentation.

Labels: , , ,

posted by Barbara 'webmama' Coll @ 7:00 AM     Permanent Link