Site Design Impact on Robots: An Examination of Search Engine Crawler Behavior at Deep and Wide Websites
D-Lib Magazine. March/April 2008.
J.A. Smith and M.L. Nelson.
No download available.
Conventional wisdom holds that search engines "prefer" sites that are wide
rather than deep, and that having a site index will result in more thorough
crawling by the Big Three crawlers – Google, Yahoo, and MSN. We created a
series of live websites, two dot-com sites and two dot-edu sites, that were
very wide and very deep. We analyzed the logs of these sites for a full year to
see if the conventional wisdom holds true. We noted some interesting site
access patterns by Google, Yahoo and MSN crawlers, which we include in this
article as GIF animations. We found that each spider exhibited different
behavior and crawl persistence. In general, width does appear to be crawled
more thoroughly than depth, and providing links on one or two "index" pages
improves crawler penetration. Google was quick to reach and explore the new
sites, whereas MSN and Yahoo were slow to arrive, and the percentage of site
coverage varied by site structure and by top-level domain. Google is clearly
king of the crawl: its lowest site coverage was 99%, whereas MSN's worst
coverage was 2.5% and Yahoo's worst coverage of a site was 3%.
@article{jas:dlibJan08,
author = {Joan A. Smith and Michael L. Nelson},
title = {Site Design Impact on Robots: An Examination of Search Engine Crawler Behavior at Deep and Wide Websites},
journal = {{D-Lib Magazine}},
volume = {14},
number = {3/4},
month = {March/April},
year = {2008},
doi = {doi:10.1045/march2008-smith},
note = {\url{http://dlib.org/dlib/march08/smith/03smith.html}}
}