How often does Google crawl?Sol Jakubowicz
Internet marketing success is almost completely based on the ability to maximize search engine optimization. You can do everything right in terms of website promotion, follow all the best and up to date SEO optimization strategies and provide users with original material and added value content. But there are millions of websites out there. Unless your site is a rare concept, it is probably one of many competing for the attention of search engines. I have always struggled with whether to look at search engine placement techniques as art, or science. (This would make an interesting discussion for a future blog post. Any input would be welcome). After all your SEO groundwork is done, there is one last thing that needs to happen. The site has to wait to be crawled by search engines. The big question is how often does Google (and the other search engines) crawl? Is there a way to make it happen sooner or more frequently? I decided to study this question in an old fashioned, yet scientific manner. Rather than just relying on checking my web stats, I decided to search for my site on Google using keyword terms I knew would be successful and checking the cached date presented (see the word cached under each result). I found the date and time of the last snapshot Google took of my site. I also repeated this for 10 other sites. I chose some of the largest, most popular sites on the web, with ever-changing content, some very small newer sites and a few medium sized sites. I then repeated this survey every day at the same time of day. After a few days, I found out what I had suspected, and in fact, what we are told by Google, openly in their webmaster guidelines: http://www.google.com/support/webmasters/bin/answer.py?answer=35769 The criteria by which Google crawls a site… depends. I could not find the hard and fast rules, but page rank, the frequency of updating and refreshing content, the relevancy of content and other factors do play a decidedly crucial role. Some of the newer sites with static content seemed to get cached about every 12 or 13 days whereas some other sites seemed to be cached daily. However, the real shocking revelation is something else I learned. On the fourth day in a row of accessing this openly available information, after checking 8 sites, I was suddenly denied access to the last cache date result. Upon clicking on the Cache link, I received the message: We’re sorry, but the computer or network you are using may be sending automated queries. To protect our users, we can’t process your request right now.
I had to contact Google and explain that I was searching for this information for 11 sites only, without any use of software or any mechanism that would breach their terms of service. Google was able to pick up on my pattern of searching for cache date information, and assumed that my searching was tantamount to automated queries. My first reaction was to take this a compliment to my efficient and quick searching skills. However, that feeling quickly disappeared when it dawned on me that I was being blocked from further accessing cache date information, based on an automated monitoring mechanism.
This experience was in fact a bigger lesson to me. It showed me the extent to which Google, and presumably other search engines, guard their criteria and procedures for their search engine spiders. Though on the one hand we know much information about search engine optimization, on the other hand we don’t even have some basic understanding of the topic. For example, does a high page rank tell the search engines to spider more often, or is a page that is spidered more often (due to other criteria) result in a higher page rank? These uncertainties and the lack of full disclosure of search engine crawling or spidering is part of the genius of Google and others. It makes us, internet marketing professionals and aficionados, have to work harder, smarter and constantly. It ensures that the system is not manipulated. Providing a good user experience to a searcher is their business, and ultimately this is good for all of us. It just means that we have to continue to use our genius to get great search engine placement in the name of ongoing website promotion.