Tuesday 30 August 2011

Google News Content Now Crawled via Googlebot

Google News is now using the Googlebot instead of a dedicated crawler to examine web content. While most elements of crawling remain the same, webmasters should be aware of some minor differences.

Changes with Googlebot Crawling
As announced on the Official Google Webmaster blog, all content for Google News will now be crawled by the standard Googlebot starting immediately. The change is presumably in effect on a global scale.

How should your behavior change? That really depends how you have your site set up currently.

For the most part, the Googlebot has been set to behave identically to the news crawler when it comes to news content; a negative "Googlebot-News" entry in the robots.txt file will still halt just the news crawling, sitemaps will still be crawled, and all analytics for actual visitors will remain the same.

There are a couple differences, however.

First, when you examine your site logs, you'll only see the Googlebot user-agent. This makes it slightly harder to know if your site is being indexed and included in Google News. Google suggests that "You can always check whether your site is included in Google News by searching with the 'site:' operator." To do this, simply go to news.google.com and enter "site:yoursite.com," conduct the search, and see if you get any results. If you do, then your site is being indexed for news content.

Additionally, the guidelines for Googlebot are now fully applicable for news content. That means that any pages that require payment or login prior to being viewable will not be fully indexed. Rather, only the title and user-visible snippet will be seen by Google's crawler. Google has offered some information and potential solutions for subscription publishers via a webmaster help article.

By and large, this is just a change to Google's back-end. Your basic algorithm ranking won't change, non-inclusion will still be respected, and all your content will go on basically as it did before. However, knowing the details of these minor changes is likely to be of some help for news sites.

No comments:

Post a Comment