Wikimedia Visitor Log Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Dec 2009 - 31 Dec 2009

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Op.Sys. / Browsers / Google

The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 50,359,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 242,276,000 external requests, which is 20.8%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
28,071google
19,572 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
3,323 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
2,112 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
903 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
535 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
252 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
220 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
175 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
171 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
143 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
106 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
78 code.google.com/appenginetext/..AppEngine-Google; (url; appid: npiv82)
66 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
62 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
55 code.google.com/appenginetext/..AppEngine-Google; (url; appid: nwikiproxy)
36 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
33 www.google.com/bot.htmlapplication/x-javascriptMozilla/5.0 (compatible; GoogleBot/2.1; url)
31 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.8267; url)
31 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
28 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid: nwikiproxy)
26 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
18 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
18 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
13 code.google.com/appenginetext/..AppEngine-Google; (url; appid: finchproxy)
12 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.8267; url)
8 docs.google.comimage/..Mozilla/5.0 (compatible; GoogleDocs; url)
8 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
8 code.google.com/appenginetext/..AppEngine-Google; (url; appid: search)
3 www.google.com/coop/cse/creftext/..FeedFetcher-Google-CoOp; (url)
13,664msn
9,483 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
4,028 search.msn.com/msnbot.htm-msnbot/2.0b (url)
97 search.msn.com/msnbot.htmtext/..msnbot/1.1 (url)
37 search.msn.com/msnbot.htmtext/..msnbot-Products/1.0 (url)
10 search.msn.com/msnbot.htm-msnbot/1.1 (url)
4 search.msn.com/msnbot.htmtext/..msnbot-NewsBlogs/1.1 (url)
8,693yahoo
6,674 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
1,382 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
143 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
122 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
108 developer.yahoo.com/searchmonkey/useragentimage/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
65 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
59 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
29 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
28 developer.yahoo.com/searchmonkey/useragentapplication/x-javascriptMozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
22 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmlimage/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
13 misc.yahoo.com.cn/help.html-Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
13 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
10 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
8 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
7 help.yahoo.comtext/..Mozilla/5.0 (YahooYSMcm/3.0.0; url)
3 help.yahoo.com/help/us/ysearch/slurpapplication/x-javascriptMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
1,435google?
1,006 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
186 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
71 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
54 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
25 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
24 www.google.com/bot.htmlapplication/xmlDoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
20 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
19 www.google.com/bot.htmlapplication/xmlSAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
8 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
7 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
4 www.google.com/bot.htmltext/..KDDI-CA34 UP.Browser/6.2.0.10.2.2 (GUI) MMP/2.0 (compatible; KDDI-GoogleBot-Mobile/2.1; url)
4 www.google.com/bot.htmlapplication/xmlKDDI-CA34 UP.Browser/6.2.0.10.2.2 (GUI) MMP/2.0 (compatible; KDDI-GoogleBot-Mobile/2.1; url)
3 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
1,432soso
718 help.soso.com/webspider.htmapplication/x-javascriptSosospider(url)
709 help.soso.com/webspider.htmtext/..Sosospider(url)
863naver
804 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
30 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
19 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
8 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
829spinn3r
820 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
6 spinn3r.com/robot-Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
596ask
486 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
103 about.ask.com/en/docs/about/webmasters.shtml-Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
4 about.ask.com/en/docs/about/webmasters.shtmlimage/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
576cuil
566 www.cuil.com/twiceler/robot.htmltext/..Mozilla/5.0 (Twiceler-0.9 url)
9 www.cuil.com/twiceler/robot.htmlapplication/xmlMozilla/5.0 (Twiceler-0.9 url)
508pipl
508 www.pipl.com/bot/text/..Mozilla/5.0(compatible;PiplBot;url)
505exabot
306 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); url)
176 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
12 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
8 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot-Images/3.0; url)
455baidu
272 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
86 www.baidu.jp/spider/text/..Baiduspider(url)
42 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
13 www.baidu.jp/spider/application/xmlDoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
12 www.baidu.jp/spider/text/..BaiduImagespider(url)
9 www.baidu.jp/spider/image/..BaiduImagespider(url)
8 www.baidu.com/search/spider.htm-Baiduspider(url)
7 www.baidu.jp/spider/-Baiduspider(url)
441teesoft
127 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
87 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
67 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
42 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
27 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
15 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
8 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
7 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
6 www.teesoft.info/text/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
6 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux x86_64; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/application/xmlMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
238youdao
212 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
10 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
5 www.youdao.com/help/webmaster/spider/application/xmlMozilla/5.0 (compatible; YoudaoBot/1.0; url; )
4 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
3 www.youdao.com/help/webmaster/spider/audio/midiMozilla/5.0 (compatible; YodaoBot/1.0; url; )
224sblog
163 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
31 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
19 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
10 fulltext.sblog.cz/screenshot/application/x-javascriptMozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
216php
68 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
45 pear.php.net/package/http_request2text/..HTTP_Request2/0.4.1 (url) PHP/5.2.11
32 pear.php.net/package/http_request2text/..HTTP_Request2/0.3.0 (url) PHP/5.3.1
31 pear.php.net/application/xmlPEAR HTTP_Request class ( url )
20 pear.php.net/text/..PEAR HTTP_Request class ( url )
19 pear.php.net/application/jsonPEAR HTTP_Request class ( url )
213ellerdale
212 www.ellerdale.com/crawler.htmltext/..Mozilla/5.0 (compatible; winnie/1.0; url)
161sogou
154 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
7 www.sogou.com/docs/help/webmasters.htm#07application/xmlSogou web spider/4.0(url)
147wikipedia
66 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.2 url
35 zh.wikipedia.org/w/index.php?title=李恪&variant=zh-cntext/..url
25 en.wikipedia.org/wiki/Wapediaapplication/vnd.php.serializedwapedia.mobi liveupdate (url)
13 en.wikipedia.orgtext/..url
4 zh.wikipedia.org/w/index.php?title=xingzhengmingling_(meiguotext/..url)&variant=zh-cn
146entireweb
142 www.entireweb.com/about/search_tech/speedy_spider/text/..Speedy Spider (url)
4 www.entireweb.com/about/search_tech/speedy_spider/-Speedy Spider (url)
137wikimedia
135 tools.wikimedia.de/~daniel/text/..WikiSense (url)
127yacy
17 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-17-generic; java 1.6.0_0; Europe/en) url
17 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-17-generic; java 1.6.0_0; Europe/en) url
9 yacy.net/bot.htmltext/..yacybot (x86_64 Mac OS X 10.6.2; java 1.6.0_17; Europe/de) url
9 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-17-generic; java 1.6.0_0; Europe/en) url
8 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-16-generic; java 1.6.0_0; Europe/en) url
8 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.26-2-openvz-686; java 1.6.0_0; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_17; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-15-generic; java 1.6.0_0; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.26-2-openvz-686; java 1.6.0_0; UTC/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-sabayon; java 1.6.0_17; UTC/en) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-17-generic; java 1.6.0_16; Europe/en) url
125fairshare
123 fairshare.cctext/..Mozilla/5.0 url (X11; FreeBSD i386; en-US; rv:1.2a) Gecko/20021021
124dotnetdotcom
124 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
122goo
119 help.goo.ne.jp/contact/text/..goo wikipedia (url)
115emining
115 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
113facebook
83 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
20 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
5 developers.facebook.comtext/..facebookplatform/1.0 (url)
5 developers.facebook.comimage/..facebookplatform/1.0 (url)
97ronzoo
51 www.ronzoo.com/about.phptext/..Ronzoobot/1.3 (url)
24 www.ronzoo.com/about.phptext/..Ronzoobot/1.4 (url)
22 www.ronzoo.com/about/text/..Ronzoobot/1.4 (url)
97textdigger
96 textdigger.comtext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
96majestic12
58 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.2; url)
26 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.1; url)
10 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.2.5; url)
94guruji
41 www.guruji.com/WebmasterFAQ.htmlapplication/xmlGurujiBot/1.0 (url)
21 www.guruji.com/WebmasterFAQ.htmltext/..GurujiBot/1.0 (url)
20 www.guruji.com/en/WebmasterFAQ.htmltext/..Mozilla/5.0 (compatible; GurujiBot/1.0; url)
6 www.guruji.com/en/WebmasterFAQ.htmlimage/..GurujiImageBot/1.0 (url)
6 www.guruji.com/en/WebmasterFAQ.htmltext/..GurujiImageBot/1.0 (url)
88www.
38 www.text/..GoogleBot-Image/1.0 ( urlGoogleBot.com/bot.html)
37 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
7 www.text/..Google - GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
3 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
3 www.image/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
85daum
85 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
84asterpix
84 www.asterpix.com/text/..Mozilla/5.0 (compatible; Asterbot; url)
78activepeople
78 www.activepeople.nettext/..WordPress/2.8.4; url
59mnemoo
59 www.mnemoo.com/about/spidertext/..Mnemoo Intelligent Spider/0.1alpha (compatible; See url)
58freebase
58 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
52att
45 tibesti.research.att.com/research-crawler.htmltext/..Mozilla/5.0 (compatible; heritrix/2.0.2 url)
7 tibesti.research.att.com/research-crawler.htmltext/..Mozilla/5.0 (compatible; heritrix/2.0.1 url)
49commoncrawl
47 www.commoncrawl.org/bot.htmltext/..CCBot/1.0 (url)
46snap
46 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
45wordpress
31 support.wordpress.com/contact/text/..WordPress.com mShots; url
43moose
43 www.moose.at/text/..Mozilla/5.0 (compatible; MooseBot/1.1; Linux i686; de; url)
42heartrails
21 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
11 capture.heartrails.com/application/x-javascriptMozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
10 capture.heartrails.com/image/..Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
42traslated
42 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
41ac
21 www.yama.info.waseda.ac.jp/~takuya/en/aboutSRVC.htmltext/..SearchEngineVerificationCrawler/Nutch-1.0 (The purpose of this crawling is to collect web pages for verifying search engines.; url; mail address dot waseda dot ac dot jp)
16 www.tkl.iis.u-tokyo.ac.jp/~crawler/text/..Mozilla/5.0 (compatible; Steeler/3.5; url)
3580legs
32 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
3 www.80legs.com/spider.htmlimage/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
34netarkivet
13 netarkivet.dk/website/info.htmltext/..Mozilla/5.0 (compatible; heritrix/1.12.1b url)
13 netarkivet.dk/website/info.htmlimage/..Mozilla/5.0 (compatible; heritrix/1.12.1b url)
6 netarkivet.dk/website/info.htmltext/..Mozilla/5.0 (compatible; heritrix/1.5.0-200506132127 url)
30xrss
30 www.xrss.eu/robottext/..Mozilla/5.0 (compatible; xrss; robot; url; version 2.0)
27z-add
25 w3.z-add.co.uk/linkcheck/text/..Z-Add Link Checker (url)
26setooz
26 www.setooz.com/oozbot.htmltext/..OOZBOT/0.20 ( url ; mail address )
26alexa
26 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
26tourist-information-berlin
26 www.tourist-information-berlin.comtext/..WordPress/2.8.4; url
24buddybuzz
18 www.buddybuzz.net/yptrinotext/..Mozilla/5.0 (compatible; heritrix/1.14.3.r6601url)
6 www.buddybuzz.net/yptrinotext/..Mozilla/5.0 (compatible; heritrix/1.14.3.r6601 url)
24Anonymouse
13 Anonymouse.org/image/..url (Unix)
8 Anonymouse.org/text/..url (Unix)
3 Anonymouse.org/application/x-javascripturl (Unix)
23froute
18 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
5 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
23rcdtokyo
17 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
6 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
23aport
23 www.aport.ru/helptext/..Mozilla/5.0 (compatible; AportWorm/3.2; url)
23gigablast
23 www.gigablast.com/spider.htmltext/..Gigabot/3.0 (url)
22qdos
22 qdos.com/text/..qdos/1.1 (url)
20oneriot
15 www.oneriot.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 OneRiot/1.0 (url)
5 www.oneriot.comimage/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 OneRiot/1.0 (url)
19simplepie
11 simplepie.orgapplication/xmlSimplePie/1.2 (Feed Parser; url; Allow like Gecko) Build/20090627192103
3 simplepie.orgtext/..SimplePie/1.1.1 (Feed Parser; url; Allow like Gecko) Build/20080315205903
18mira
18 mira.com/bot.htmltext/..BluuBot/1.00 (url)
18phonifier
11 www.phonifier.comtext/..Mozilla/5.0 (compatible; Phonifier; url)
7 www.phonifier.comtext/..aNti_miSa$puAsa/5.0 (compatible; Phonifier; url)
17weblio
16 www.weblio.jp/text/..Mozilla/5.0 (compatible; WeblioBot; url)
16scoutjet
16 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
16mixi
8 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
8 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
16aafter
15 aafter.com/crawler.htmtext/..AAfter.com Crawler/AAfter-1.0 (This bot is very focused, well-behaved, and wants to do good to internet community. For any questions, please call collect 1 214-714-2224. Team AAfter, Dallas, TX, USA; url; crawler at aafter.com)
14mashget
8 www.mashget.comtext/..Mashgetbot/2.1 (url)
5 www.mashget.comapplication/jsonMashGetBot1.0(url)
14hatena
7 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
7 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
13picmole
13 www.picmole.comtext/..Mozilla/5.0 (compatible;picmole/1.0 url)
13FeedBurner
12 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
12newsgator
8 www.newsgator.com/Individuals/NetNewsWire/-NetNewsWire/3.2.3 (Mac OS X; url)
12topsy
12 labs.topsy.com/butterfly.htmltext/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
12kalooga
7 www.kalooga.com/info.html?page=crawlertext/..Mozilla/5.0 (compatible; KaloogaBot; url)
5 www.kalooga.com/info.html?page=crawlerimage/..Mozilla/5.0 (compatible; KaloogaBot; url)
11cesca
11 www.cesca.cattext/..Mozilla/5.0 (compatible; heritrix/1.14.1 url)
11tineye
8 tineye.com/crawler.htmlimage/..TinEye/1.1 (url)
10bloglines
6 www.bloglines.com-Bloglines/3.1 (url; 1 subscriber)
62,109total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
1,785PythonWikipediaBot/1.0
904 application/json
548 application/xml
332 text/..
1 image/..
1 -
1 application/x-javascript
1,007GoogleBot-Image/1.0
411 text/..
299 image/..
297 -
474Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
474 text/..
1 -
1 application/ogg
380LinkParser/2.0
380 text/..
315Answersbot
315 text/..
199Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
101 text/..
53 image/..
45 application/x-javascript
1 application/json
191gsa-crawler (Enterprise; S5-MS8QQPJ5BGWAA; mail address )
191 text/..
169php wikibot classes
169 application/vnd.php.serialized
121wikiwix-bot-3.0
119 text/..
1 -
1 image/..
99MPUploadBot; PHP 5.2.6-3ubuntu4.2
99 application/vnd.php.serialized
1 -
96GoogleBot-Image/1.0
95 text/..
1 image/..
1 -
1 application/x-javascript
91Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
45 image/..
32 text/..
14 application/x-javascript
1 application/json
49lssbot
49 text/..
1 image/..
49rdfbot/1.0 (Indian Language Web Search Engine; Rediff.com; rdfbot mail address )
49 text/..
1 application/x-javascript
46Test Webbot
46 text/..
43Szukacz/1.5 (robot; www.szukacz.pl/jakdzialarobot.html; mail address )
43 text/..
41TerraSpider
41 text/..
39Pywikipediabot/2.0
39 application/json
39MLBot (www.metadatalabs.com/mlbot)
39 text/..
1 -
1 image/..
35MPUploadBot; PHP 5.2.6-3ubuntu4.4
35 application/vnd.php.serialized
1 -
35UnitiveBot
32 image/..
3 text/..
35plantspedia data crawler
35 text/..
34SineBot/1.5.15(User:SineBot)
33 application/vnd.php.serialized
1 text/..
1 -
26zomba-bot/0.1
20 text/..
6 image/..
1 application/x-javascript
26dictionary-bot
23 application/xml
3 text/..
25AnomieBOT 1.0 (OrphanReferenceFixer)
25 application/json
24GoogleBot
24 text/..
1 image/..
24DotNetWikiBot/2.71 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
16 text/..
8 application/xml
24msramlbot
24 text/..
23testcrawler
23 text/..
23MSR-ISRCCrawler
19 text/..
4 application/x-javascript
1 image/..
22LinkParser/1.00
22 text/..
21CorenSearchBot/1.4 en libwww-perl/5.808
21 text/..
17COIBot/1.00
17 text/..
16dicbot 1.0
16 text/..
15web18bot
15 text/..
14HTMLParser/1.6
14 text/..
1 application/json
14Geni ircpybot 1.0
10 text/..
2 application/json
2 application/xml
12TweetMemeBot (Feed Parser; Allow like Gecko)
6 application/xml
6 text/..
1 image/..
1 application/ogg
12SurakWare MediaWiki Bot/1.0
12 text/..
12OpenLink Virtuoso RDF crawler
12 text/..
1 image/..
11FLMBot
11 text/..
10Tawbot (public svn release; plwiki)
10 text/..
10Bub's wikibot (Wikibot/2009092504; JWBF/1.2; Java/1.6)
10 text/..
9c0rwin/Nutch-1.0 (Nutch spiderman; MyEmail)
9 text/..
1 image/..
9crawler mail address
9 text/..
8QuickFinder Crawler
8 text/..
1 -
8Bot/WP/EN/Daniel/MediationBot1/1.2
8 text/..
8Mozilla/5.0 (Bgbot 0.5)
8 text/..
7GNAA-bot
7 text/..
7spider
7 text/..
1 application/xml
1 image/..
7AOL Reference Center Bot/1.0
7 text/..
1 -
64am-spider/1.0
6 text/..
6DotNetWikiBot/2.53 (Unix 2.6.26.2; )
6 text/..
6Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
6 text/..
6YaDirectBot/1.0
6 text/..
1 image/..
6DotNetWikiBot/2.71 (Microsoft Windows NT 5.2.3790 Service Pack 2; )
6 text/..
5FAST Enterprise Crawler/6.6.17 ( mail address )
5 text/..
1 -
1 application/x-javascript
5XLinkBot/1.00
5 text/..
5HTMLParser/2.0
5 text/..
5bot.rumba.kz
5 text/..
5DotNetWikiBot/2.72 (Microsoft Windows NT 6.0.6002 Service Pack 2; )
5 text/..
1 application/xml
5CheMoBot/1.00
5 text/..
4Baiduspider/Nutch-1.0 (robot_nutch_ics_ict; mail address )
4 text/..
1 -
4AnomieBOT 1.0 (WikiProjectWorker)
4 application/json
4Jabse.com Crawler v.2.0 www.jabse.com/crawler.php
4 text/..
1 application/xml
4Freebase Deathbot
4 text/..
4FAST Enterprise Crawler 6 used by National Instruments ( mail address )
4 text/..
3XMLecho/Palomar 1.0 HTTP Crawler
3 text/..
3bitlybot
3 text/..
1 image/..
3DotNetWikiBot/2.72 (Microsoft Windows NT 5.2.3790 Service Pack 2; )
3 text/..
3DotSpotsBot/0.2 (crawler; support at dotspots.com)
3 text/..
3Moholibot
2 text/..
1 image/..
3FAST Enterprise Crawler 6 used by Wanadoo ( mail address )
2 application/x-javascript
1 text/..
1 -
3FlickySearchBot/1.0 (testMode)
3 text/..
3AnomieBOT 1.0 (AFDMergeFromCleaner)
3 application/json
3AnomieBOT 1.0 (SourceUploader)
3 application/json
3SmartieSpider
3 text/..
3TVersity Media Robot
3 text/..
3Xaldon WebSpider 2.7.b6
3 text/..
3Jyxobot/1
3 text/..
3unblockbot/1.00
3 text/..
5,940total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Fri, Feb 26, 2010 13:49
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.