Wikimedia Visitor Log Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Nov 2009 - 30 Nov 2009

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Google
 
The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 46,040,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 265,101,000 external requests, which is 17.4%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
23,629google
18,125 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
2,237 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
1,397 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
584 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
194 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
190 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
107 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
105 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.8267; url)
97 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
97 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
78 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
44 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
43 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
36 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
33 code.google.com/appenginetext/..AppEngine-Google; (url; appid: nwikiproxy)
33 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
31 code.google.com/appenginetext/..AppEngine-Google; (url; appid: npiv82)
22 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
20 docs.google.comimage/..Mozilla/5.0 (compatible; GoogleDocs; url)
19 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
18 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid: nwikiproxy)
15 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.8267; url)
15 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
13 code.google.com/appenginetext/..AppEngine-Google; (url; appid: mrictx)
13 www.google.com/bot.htmlapplication/x-javascriptMozilla/5.0 (compatible; GoogleBot/2.1; url)
13 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
9 code.google.com/appenginetext/..AppEngine-Google; (url; appid: finchproxy)
7 code.google.com/appenginetext/..AppEngine-Google; (url; appid: findadvise)
6 code.google.com/appengineimage/..AppEngine-Google; (url; appid: mrictx)
4 www.google.com/coop/cse/creftext/..FeedFetcher-Google-CoOp; (url)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: job-info)
3 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
11,672msn
7,883 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
3,462 search.msn.com/msnbot.htm-msnbot/2.0b (url)
266 search.msn.com/msnbot.htmtext/..msnbot/1.1 (url)
28 search.msn.com/msnbot.htm-msnbot/1.1 (url)
23 search.msn.com/msnbot.htmtext/..msnbot-Products/1.0 (url)
3 search.msn.com/msnbot.htmtext/..msnbot-NewsBlogs/1.1 (url)
3 search.msn.com/msnbot.htmimage/..msnbot/2.0b (url)
3 search.msn.com/msnbot.htmimage/..msnbot/1.1 (url)
8,704yahoo
6,477 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
1,727 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
147 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
123 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
97 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
51 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmlimage/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
19 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
13 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
10 misc.yahoo.com.cn/help.html-Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
10 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
9 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
9 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
4 help.yahoo.com/help/us/ysearch/slurpapplication/x-javascriptMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
3 help.yahoo.comtext/..Mozilla/5.0 (YahooYSMcm/3.0.0; url)
1,295google?
1,066 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
67 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
60 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
35 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
15 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
13 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
8 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
8 www.google.com/bot.htmlapplication/xmlDoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
6 www.google.com/bot.htmlapplication/xmlSAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
5 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
4 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
3 www.google.com/bot.htmltext/..KDDI-CA34 UP.Browser/6.2.0.10.2.2 (GUI) MMP/2.0 (compatible; KDDI-GoogleBot-Mobile/2.1; url)
3 www.google.com/bot.htmlapplication/xmlKDDI-CA34 UP.Browser/6.2.0.10.2.2 (GUI) MMP/2.0 (compatible; KDDI-GoogleBot-Mobile/2.1; url)
855naver
774 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
29 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
21 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
16 help.naver.com/robots/image/..Mozilla/5.0 (compatible; Yeti/1.0; NHN Corp.; url)
12 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
715ask
599 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
89 about.ask.com/en/docs/about/webmasters.shtml-Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
9 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible;Ask Jeeves/Teoma; url)
9 about.ask.com/en/docs/about/webmasters.shtmlimage/..Mozilla/5.0 (compatible;Ask Jeeves/Teoma; url)
4 about.ask.com/en/docs/about/webmasters.shtmlimage/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
4 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url) (via Web-Blaster/2.21 (http://www.assoziations-blaster.de/web-blast.html))
654teesoft
205 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
128 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
101 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
62 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
26 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
15 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
15 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
11 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
11 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
9 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
7 www.teesoft.info/text/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
7 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
7 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
6 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux x86_64; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/application/xmlMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/application/xmlMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.4; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/text/..Mozilla/5.0 (X11; Linux x86_64; [lang code]; rv:[..]) Gecko/.. etc (url)
535exabot
277 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
234 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); url)
15 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
8 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot-Images/3.0; url)
534pipl
534 www.pipl.com/bot/text/..Mozilla/5.0(compatible;PiplBot;url)
461cuil
446 www.cuil.com/twiceler/robot.htmltext/..Mozilla/5.0 (Twiceler-0.9 url)
7 www.cuil.comimage/..Mozilla/5.0 (compatible; heritrix/1.14.0 url)
6 www.cuil.comtext/..Mozilla/5.0 (compatible; heritrix/1.14.0 url)
457soso
448 help.soso.com/webspider.htmtext/..Sosospider(url)
4 help.soso.com/soso-image-spider.htmimage/..Sosoimagespider(url)
446baidu
273 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
75 www.baidu.jp/spider/text/..Baiduspider(url)
34 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0; url)
13 www.baidu.com/search/spider.htm-Baiduspider(url)
12 www.baidu.jp/spider/image/..BaiduImagespider(url)
11 www.baidu.jp/spider/application/xmlDoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0; url)
11 www.baidu.jp/spider/text/..BaiduImagespider(url)
6 www.baidu.com/search/spider.htmtext/..Baiduspider(url) (via babelfish.yahoo.com)
6 www.baidu.jp/spider/-Baiduspider(url)
313spinn3r
306 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
5 spinn3r.com/robot-Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
310yacy
81 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-16-generic; java 1.6.0_0; Europe/en) url
64 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-16-generic; java 1.6.0_0; Europe/en) url
33 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-16-generic; java 1.6.0_16; Europe/en) url
14 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-16-generic; java 1.6.0_16; Europe/en) url
12 yacy.net/bot.htmltext/..yacybot (amd64 Windows Vista 6.0; java 1.6.0_16; Europe/de) url
11 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-15-generic; java 1.6.0_0; Europe/en) url
10 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-164.6.1.el5; java 1.6.0; Europe/de) url
6 yacy.net/bot.htmltext/..yacybot (i386 Mac OS X 10.5.8; java 1.5.0_20; Europe/fr) url
6 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31.5; java 1.6.0_0; Europe/en) url
6 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.24-etchnhalf.1-amd64; java 1.6.0_0; Europe/de) url
6 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-ARCH; java 1.6.0_17; Europe/de) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-xen-amd64; java 1.5.0_17; Europe/en) url
5 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.26-2-openvz-686; java 1.6.0_0; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.18-6-686; java 1.6.0_16; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.30-gentoo-r6-090907; java 1.6.0_17; GMT/de) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.26-2-686; java 1.6.0_0; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-164.el5; java 1.6.0; Europe/en) url
251wikimedia
249 tools.wikimedia.de/~daniel/text/..WikiSense (url)
251youdao
219 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
12 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
7 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
3 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; ) (via Web-Blaster/2.21 (http://www.a-blast.org/web-blast.html))
3 www.youdao.com/help/webmaster/spider/application/xmlMozilla/5.0 (compatible; YoudaoBot/1.0; url; )
226dotnetdotcom
226 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
225php
67 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
34 pear.php.net/package/http_request2text/..HTTP_Request2/0.4.1 (url) PHP/5.2.11
29 pear.php.net/text/..PEAR HTTP_Request class ( url )
27 pear.php.net/application/jsonPEAR HTTP_Request class ( url )
24 pear.php.net/application/xmlPEAR HTTP_Request class ( url )
19 pear.php.net/package/http_request2text/..HTTP_Request2/0.4.0 (url) PHP/5.2.10
14 pear.php.net/package/http_request2text/..HTTP_Request2/0.3.0 (url) PHP/5.3.1RC4-dev
6 pear.php.net/package/http_request2text/..HTTP_Request2/0.3.0 (url) PHP/5.3.1
201sblog
148 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
28 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
19 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
5 fulltext.sblog.cz/screenshot/application/x-javascriptMozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
183wikipedia
96 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.2 url
24 en.wikipedia.org/wiki/Wapediaapplication/vnd.php.serializedwapedia.mobi liveupdate (url)
22 zh.wikipedia.org/w/index.php?title=台湾互联网&variant=zh-cntext/..url
14 zh.wikipedia.org/w/index.php?title=Wikipedia:历史上的今天/11月17日/Link2&variant=zh-cntext/..url
10 zh.wikipedia.org/w/index.php?title=quanzimuqun&variant=zh-cntext/..url
7 en.wikipedia.orgtext/..url
4 ko.wikipedia.orgtext/..url
178ellerdale
177 www.ellerdale.com/crawler.htmltext/..Mozilla/5.0 (compatible; winnie/1.0; url)
152boardreader
152 spider.boardreader.comtext/..BoardReader Rating Builder/1.0 url
136sogou
124 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
7 www.sogou.com/docs/help/webmasters.htm#07application/xmlSogou web spider/4.0(url)
129emining
129 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
120goo
116 help.goo.ne.jp/contact/text/..goo wikipedia (url)
118facebook
76 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
33 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
5 developers.facebook.comtext/..facebookplatform/1.0 (url)
4 developers.facebook.comimage/..facebookplatform/1.0 (url)
96daum
96 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
94majestic12
46 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.1; url)
39 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.2.5; url)
4 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.0; url)
92asterpix
92 www.asterpix.com/text/..Mozilla/5.0 (compatible; Asterbot; url)
79gigablast
79 www.gigablast.com/spider.htmltext/..Gigabot/3.0 (url)
67att
66 tibesti.research.att.com/research-crawler.htmltext/..Mozilla/5.0 (compatible; heritrix/2.0.1 url)
65commoncrawl
58 www.commoncrawl.org/bot.htmltext/..CCBot/1.0 (url)
5 www.commoncrawl.org/bot.html-CCBot/1.0 (url)
64activepeople
64 www.activepeople.nettext/..WordPress/2.8.4; url
63fairshare
59 fairshare.cctext/..Mozilla/5.0 url (X11; FreeBSD i386; en-US; rv:1.2a) Gecko/20021021
3 fairshare.cctext/..Mozilla crawl/5.0 (compatible; fairshare.cc url)
52www.
23 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
22 www.text/..GoogleBot-Image/1.0 ( urlGoogleBot.com/bot.html)
5 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
47entireweb
41 www.entireweb.com/about/search_tech/speedy_spider/text/..Speedy Spider (url)
6 www.entireweb.com/about/search_tech/speedy_spider/-Speedy Spider (url)
46snap
46 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
46archive-it
29 www.archive-it.orgimage/..Mozilla/5.0 (compatible;archive.org_bot/1.7.0; Archive-It; url)
17 www.archive-it.orgtext/..Mozilla/5.0 (compatible;archive.org_bot/1.7.0; Archive-It; url)
42wordpress
31 support.wordpress.com/contact/text/..WordPress.com mShots; url
41textdigger
40 textdigger.comtext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
37mnemoo
37 www.mnemoo.com/about/spidertext/..Mnemoo Intelligent Spider/0.1alpha (compatible; See url)
31heartrails
16 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
9 capture.heartrails.com/image/..Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
6 capture.heartrails.com/application/x-javascriptMozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
3180legs
29 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
29scoutjet
29 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
28froute
21 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
7 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
27guruji
23 www.guruji.com/en/WebmasterFAQ.htmltext/..Mozilla/5.0 (compatible; GurujiBot/1.0; url)
4 www.guruji.com/en/WebmasterFAQ.htmltext/..GurujiBot/1.0 (url)
26qdos
26 qdos.com/text/..qdos/1.1 (url)
25traslated
25 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
25rcdtokyo
19 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
6 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
24setooz
24 www.setooz.com/oozbot.htmltext/..OOZBOT/0.20 ( url ; mail address )
24tourist-information-berlin
24 www.tourist-information-berlin.comtext/..WordPress/2.8.4; url
24Anonymouse
13 Anonymouse.org/image/..url (Unix)
8 Anonymouse.org/text/..url (Unix)
3 Anonymouse.org/application/x-javascripturl (Unix)
24xrss
24 www.xrss.eu/robottext/..Mozilla/5.0 (compatible; xrss; robot; url; version 2.0)
21aport
21 www.aport.ru/helptext/..Mozilla/5.0 (compatible; AportWorm/3.2; url)
21simplepie
11 simplepie.orgapplication/xmlSimplePie/1.2 (Feed Parser; url; Allow like Gecko) Build/20090627192103
3 simplepie.orgtext/..SimplePie/1.1.1 (Feed Parser; url; Allow like Gecko) Build/20080315205903
20alexa
20 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
20mixi
10 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
10 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
20junglekey
20 www.junglekey.fr/text/..JungleKeyBot/1.1 (url)
20ac
16 www.tkl.iis.u-tokyo.ac.jp/~crawler/text/..Mozilla/5.0 (compatible; Steeler/3.5; url)
17isara
10 www.isara.orgtext/..Isara/Isara-1.0 (A non-profit search engine benefiting charity.; url; mail address )
5 www.isara.orgtext/..Isara/Isara-1.0 (Non-profit search engine that benefits charity.; url; mail address )
16ronzoo
16 www.ronzoo.com/about.phptext/..Ronzoobot/1.2 (url)
16globalspec
16 www.globalspec.com/Ocellitext/..Ocelli/1.4 (url)
16flaptor
16 www.flaptor.com/text/..HounderCrawl/Nutch-0.9 (Hounder Search Bot; url)
16aafter
15 aafter.com/crawler.htmtext/..AAfter.com Crawler/AAfter-1.0 (This bot is very focused, well-behaved, and wants to do good to internet community. For any questions, please call collect 1 214-714-2224. Team AAfter, Dallas, TX, USA; url; crawler at aafter.com)
16phonifier
13 www.phonifier.comtext/..Mozilla/5.0 (compatible; Phonifier; url)
3 www.phonifier.comtext/..aNti_miSa$puAsa/5.0 (compatible; Phonifier; url)
16hatena
8 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
8 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
15newsgator
7 www.newsgator.com/Individuals/NetNewsWire/-NetNewsWire/3.2.3 (Mac OS X; url)
15FeedBurner
14 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
15picsearch
12 www.picsearch.com/bot.htmltext/..psbot/0.1 (url)
3 www.picsearch.com/bot.htmlimage/..psbot/0.1 (url)
14topsy
14 labs.topsy.com/butterfly.htmltext/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
13sfdsdfsdf
12 sfdsdfsdf.comtext/..Mozilla/5.0 (compatible; heritrix/2.0.2 url)
12dium
12 me.dium.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 Me.dium/1.0 (url)
12bloglines
7 www.bloglines.com-Bloglines/3.1 (url; 1 subscriber)
3 www.bloglines.comtext/..Bloglines/3.1 (url; 1 subscriber)
11emusic
7 www.emusic.com/text/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1.8pre) Gecko/20071017 eMusic DLM/4.0/1.0.0.2 (url) (like Firefox/2.0.0.8)
11moreover
11 www.moreover.comtext/..Moreoverbot/5.00 (url; mail address )
10email
6 www.email.comtext/..Mozilla/5.0 (compatible; mail address url)
4 www.email.comimage/..Mozilla/5.0 (compatible; mail address url)
10hoqsearch
10 www.hoqsearch.comtext/..hoqBot/hoqBot-1.0 (hoqsearch - community based finding; url; mail address )
10quus
10 fx.quus.net/text/..url
10webzdarma
8 praso.webzdarma.cztext/..Mozilla/5.0 (compatible; heritrix/1.12.1 url)
10freebase
10 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
54,411total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
2,801PythonWikipediaBot/1.0
1,889 application/json
537 application/xml
374 text/..
1 image/..
1 -
1 application/ogg
1,155GoogleBot-Image/1.0
510 text/..
333 image/..
311 -
1 application/pdf
393Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
393 text/..
1 -
1 application/pdf
1 application/ogg
279Answersbot
279 text/..
244Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
130 text/..
78 image/..
36 application/x-javascript
1 -
193wikiwix-bot-3.0
163 text/..
30 image/..
1 -
183gsa-crawler (Enterprise; S5-MS8QQPJ5BGWAA; mail address )
183 text/..
103php wikibot classes
103 application/vnd.php.serialized
100Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
55 image/..
34 text/..
11 application/x-javascript
1 application/json
88GoogleBot-Image/1.0
88 text/..
1 -
1 image/..
75Tawbot (public svn release; plwiki)
75 text/..
46Test Webbot
46 text/..
43FAST Enterprise Crawler 6 used by Techinal Test ( mail address )
42 text/..
1 -
40DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600 Service Pack 2; )
40 text/..
38Szukacz/1.5 (robot; www.szukacz.pl/jakdzialarobot.html; mail address )
38 text/..
38SineBot/1.5.15(User:SineBot)
37 application/vnd.php.serialized
1 text/..
1 -
37DotNetWikiBot/2.72 (Microsoft Windows NT 5.2.3790 Service Pack 2; )
37 text/..
34web18bot
34 text/..
33plantspedia data crawler
33 text/..
32Pywikipediabot/2.0
32 application/json
29DotNetWikiBot/2.71 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
22 text/..
7 application/xml
26MLBot (www.metadatalabs.com/mlbot)
26 text/..
1 -
1 image/..
25Linguee Bot ( mail address )
25 text/..
23zomba-bot/0.1
22 text/..
1 image/..
1 application/x-javascript
1 application/opensearchdescription+xml
23AarghBot Linux
23 text/..
1 -
23AnomieBOT 1.0 (OrphanReferenceFixer)
23 application/json
22CorenSearchBot/1.4 en libwww-perl/5.808
22 text/..
20GoogleBot
20 text/..
1 application/json
1 image/..
1 application/opensearchdescription+xml
20dicbot 1.0
20 text/..
19dictionary-bot
16 application/xml
3 text/..
19COIBot/1.00
19 text/..
15Web Crawler
15 text/..
15OpenLink Virtuoso RDF crawler
15 text/..
1 image/..
14testcrawler
14 text/..
12DotNetWikiBot/2.71 (Microsoft Windows NT 5.2.3790 Service Pack 2; )
12 text/..
11FAST Enterprise Crawler 6 used by Daimler AG ( mail address )
11 text/..
1 -
1 application/x-javascript
11c0rwin/Nutch-1.0 (Nutch spiderman; MyEmail)
10 text/..
1 image/..
11SurakWare MediaWiki Bot/1.0
11 text/..
11gsa-crawler (Enterprise; S5-HSPD6VX6S2NJB; mail address )
11 text/..
10Geni ircpybot 1.0
6 text/..
2 application/json
2 application/xml
9Bot/WP/EN/Daniel/MediationBot1/1.2
9 text/..
9crawler mail address
9 text/..
9Mozilla/5.0 (Bgbot 0.5)
9 text/..
84am-spider/1.0
8 text/..
8Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
8 text/..
7FlickySearchBot/1.0 (testMode)
7 text/..
1 application/java-archive
1 application/opensearchdescription+xml
1 video/ogg
7GNAA-bot
7 text/..
7spider
7 text/..
1 application/xml
6CrawlerTest/Nutch-1.0-dev
6 text/..
6DotNetWikiBot/2.53 (Unix 2.6.26.2; )
6 text/..
6SuperBot/4.7.0.72 (Windows XP)
6 text/..
1 image/..
6Nokia3100/1.0 (compatible; WukongBot)
6 text/..
1 -
6TerraSpider
6 text/..
6Freebase Deathbot
6 text/..
6YaDirectBot/1.0
6 text/..
6DotNetWikiBot/2.72 (Microsoft Windows NT 6.0.6002 Service Pack 2; )
6 text/..
1 application/xml
6Jyxobot/1
6 text/..
5Draicone's bot
5 text/..
5MSR-ISRCCrawler
4 text/..
1 application/x-javascript
4Jabse.com Crawler v.2.0 www.jabse.com/crawler.php
4 text/..
4XLinkBot/1.00
4 text/..
4FAST Enterprise Crawler 6 used by Wanadoo ( mail address )
3 application/x-javascript
1 text/..
1 -
4AnomieBOT 1.0 (AFDMergeFromCleaner)
4 application/json
4SONIVIS MediaWiki API Bot 0.1.3
4 text/..
4AnomieBOT 1.0 (SourceUploader)
4 application/json
4CheMoBot/1.00
4 text/..
4gsa-crawler (Enterprise; M2-EZTD55SKVA2JA; mail address )
4 text/..
3lssbot
3 text/..
3bitlybot
3 text/..
1 image/..
3FAST Enterprise Crawler 6 used by MSN ( mail address )
3 text/..
3AnomieBOT 1.0 (PUICloser)
3 application/json
3DotSpotsBot/0.2 (crawler; support at dotspots.com)
3 text/..
3Codeton Software RSS Bot/1.0
3 text/..
1 application/xml
3AnomieBOT 1.0 (TemplateReplacer15)
3 application/json
3Mozilla/5.0 (Yahoo-MMCrawler/4.0; mail address )
3 image/..
1 text/..
3Keybot Translation-Search-Machine
3 text/..
3Xaldon WebSpider 2.7.b6
3 text/..
3 mail address (Mozilla compatible)
3 text/..
1 image/..
3DotNetWikiBot/2.71 (Microsoft Windows NT 6.1.7600.0; )
3 text/..
1 application/xml
3unblockbot/1.00
3 text/..
3SuperBot/4.7.0.72 (Win32)
3 text/..
1 image/..
6,518total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Tuesday December 15, 2009
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.