Wikimedia Visitor Log Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Jan 2010 - 31 Jan 2010

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Op.Sys. / Browsers / Google

The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 45,673,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 253,426,000 external requests, which is 18.0%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
28,397google
20,375 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
3,145 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
1,967 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
836 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
656 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
225 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
205 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
161 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
161 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
132 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
96 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
67 code.google.com/appenginetext/..AppEngine-Google; (url; appid: npiv82)
65 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
37 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
29 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
27 code.google.com/appenginetext/..AppEngine-Google; (url; appid: nwikiproxy)
25 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
23 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
18 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
18 code.google.com/appenginetext/..AppEngine-Google; (url; appid: finchproxy)
15 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
15 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.8267; url)
15 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid: nwikiproxy)
12 www.google.com/bot.htmlapplication/x-javascriptMozilla/5.0 (compatible; GoogleBot/2.1; url)
10 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.8267; url)
10 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
8 docs.google.comimage/..Mozilla/5.0 (compatible; GoogleDocs; url)
5 www.google.orgtext/..Naveen/Nutch-1.0 (Naveen; url; mail address )
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: lullar-data),gzip(gfe) (via translate.google.com)
3 www.google.com/coop/cse/creftext/..FeedFetcher-Google-CoOp; (url)
9,462msn
6,028 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
3,267 search.msn.com/msnbot.htm-msnbot/2.0b (url)
54 search.msn.com/msnbot.htmtext/..msnbot/1.1 (url)
39 search.msn.com/msnbot.htmtext/..msnbot-media/1.1 (url)
37 search.msn.com/msnbot.htmtext/..msnbot-Products/1.0 (url)
12 search.msn.com/msnbot.htmimage/..msnbot-media/1.1 (url)
9 search.msn.com/msnbot.htmimage/..msnbot/2.0b (url)
6 search.msn.com/msnbot.htm-msnbot/1.1 (url)
4 search.msn.com/msnbot.htmtext/..msnbot-NewsBlogs/1.1 (url)
7,875yahoo
7,035 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
203 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
131 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
112 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
86 developer.yahoo.com/searchmonkey/useragentimage/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
73 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
67 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
64 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmlimage/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
23 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
17 developer.yahoo.com/searchmonkey/useragentapplication/x-javascriptMozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
14 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
10 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
10 misc.yahoo.com.cn/help.html-Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
8 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-PSC/1.0 (url)
6 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
5 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0 ; url)
3 help.yahoo.comtext/..Mozilla/5.0 (YahooYSMcm/3.0.0; url)
1,372google?
1,019 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
134 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
78 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
35 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
22 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
18 www.google.com/bot.htmlapplication/xmlDoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
17 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
16 www.google.com/bot.htmlapplication/xmlSAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
8 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
7 www.google.com/bot.htmlapplication/xmlKDDI-CA34 UP.Browser/6.2.0.10.2.2 (GUI) MMP/2.0 (compatible; KDDI-GoogleBot-Mobile/2.1; url)
6 www.google.com/bot.htmltext/..KDDI-CA34 UP.Browser/6.2.0.10.2.2 (GUI) MMP/2.0 (compatible; KDDI-GoogleBot-Mobile/2.1; url)
5 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
4 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
926naver
853 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
30 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
28 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
13 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
678soso
429 help.soso.com/webspider.htmtext/..Sosospider(url)
236 help.soso.com/webspider.htmapplication/x-javascriptSosospider(url)
4 help.soso.com/soso-image-spider.htmtext/..Sosoimagespider(url)
4 help.soso.com/webspider.htm-Sosospider(url)
629cuil
612 www.cuil.com/twiceler/robot.htmltext/..Mozilla/5.0 (Twiceler-0.9 url)
16 www.cuil.com/twiceler/robot.htmlapplication/xmlMozilla/5.0 (Twiceler-0.9 url)
622ask
502 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
113 about.ask.com/en/docs/about/webmasters.shtml-Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
4 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url) (via Web-Blaster/2.21 (http://www.assoziations-blaster.de/web-blast.html))
487baidu
305 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
70 www.baidu.jp/spider/text/..Baiduspider(url)
44 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
21 www.baidu.jp/spider/text/..BaiduImagespider(url)
13 www.baidu.jp/spider/application/xmlDoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
13 www.baidu.jp/spider/-Baiduspider(url)
8 www.baidu.com/search/spider.htm-Baiduspider(url)
5 www.baidu.jp/spider/image/..BaiduImagespider(url)
5 help.baidu.jp/system/05.htmltext/..Baiduspider(url)
471yacy
121 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-17-generic; java 1.6.0_0; Europe/en) url
46 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-18-generic; java 1.6.0_0; Europe/en) url
43 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-17-generic; java 1.6.0_0; Europe/en) url
38 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32.2; java 1.6.0_17; Europe/de) url
30 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.23.17-dbserv; java 1.6.0_04; Europe/en) url
18 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-15-generic; java 1.6.0_0; Europe/en) url
14 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-17-generic; java 1.6.0_0; Europe/en) url
13 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-17-generic; java 1.6.0_16; GMT/en) url
10 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-164.6.1.el5; java 1.6.0; Europe/de) url
10 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-17-generic; java 1.6.0_16; Europe/en) url
9 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_17; Europe/de) url
9 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-amd64; java 1.6.0_12; Europe/en) url
6 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.24-23-server; java 1.6.0_17; Europe/de) url
6 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-18-generic; java 1.6.0_0; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (x86 Windows 7 6.1; java 1.6.0_17; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Windows 7 6.1; java 1.6.0_14; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31.8-0.1-desktop; java 1.6.0_0; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.30-2-amd64; java 1.6.0_0; SystemV/en) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.26-2-openvz-686; java 1.6.0_0; UTC/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31.5; java 1.6.0_0; Europe/en) url
463pipl
463 www.pipl.com/bot/text/..Mozilla/5.0(compatible;PiplBot;url)
393exabot
234 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); url)
148 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
10 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
323spinn3r
316 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
5 spinn3r.com/robot-Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
251teesoft
75 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
50 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
41 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
26 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
11 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
6 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/text/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
229youdao
215 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
6 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
3 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
3 www.youdao.com/help/webmaster/spider/application/xmlMozilla/5.0 (compatible; YoudaoBot/1.0; url; )
208php
65 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
38 pear.php.net/package/http_request2text/..HTTP_Request2/0.4.1 (url) PHP/5.2.11
30 pear.php.net/text/..PEAR HTTP_Request class ( url )
28 pear.php.net/application/xmlPEAR HTTP_Request class ( url )
27 pear.php.net/application/jsonPEAR HTTP_Request class ( url )
10 pear.php.net/package/http_request2text/..HTTP_Request2/0.3.0 (url) PHP/5.3.1
8 pear.php.net/package/http_request2text/..HTTP_Request2/0.4.1 (url) PHP/5.2.12
203entireweb
198 www.entireweb.com/about/search_tech/speedy_spider/text/..Speedy Spider (url)
5 www.entireweb.com/about/search_tech/speedy_spider/-Speedy Spider (url)
193sblog
132 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
35 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
18 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
5 fulltext.sblog.cz/screenshot/application/x-javascriptMozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
3 fulltext.sblog.cz/robot/-SeznamBot/2.0 (url)
162wikimedia
159 tools.wikimedia.de/~daniel/text/..WikiSense (url)
157sogou
147 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
7 www.sogou.com/docs/help/webmasters.htm#07application/xmlSogou web spider/4.0(url)
130fairshare
128 fairshare.cctext/..Mozilla/5.0 url (X11; FreeBSD i386; en-US; rv:1.2a) Gecko/20021021
123emining
123 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
116facebook
85 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
21 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
5 developers.facebook.comtext/..facebookplatform/1.0 (url)
5 developers.facebook.comimage/..facebookplatform/1.0 (url)
113wikipedia
87 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.2 url
12 en.wikipedia.org/wiki/Wapediaapplication/vnd.php.serializedwapedia.mobi liveupdate (url)
7 en.wikipedia.orgtext/..url
5 ko.wikipedia.orgtext/..url
109goo
108 help.goo.ne.jp/contact/text/..goo wikipedia (url)
95www.
39 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
37 www.text/..GoogleBot-Image/1.0 ( urlGoogleBot.com/bot.html)
13 www.text/..Google - GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
4 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
90dotnetdotcom
89 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
89majestic12
79 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.2; url)
6 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.1; url)
84textdigger
83 textdigger.comtext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
76activepeople
76 www.activepeople.nettext/..WordPress/2.8.4; url
72gigablast
72 www.gigablast.com/spider.htmltext/..Gigabot/3.0 (url)
70daum
69 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
61traslated
61 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
5880legs
54 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
4 www.80legs.com/spider.htmlimage/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
52guruji
21 www.guruji.com/en/WebmasterFAQ.htmlimage/..GurujiImageBot/1.0 (url)
19 www.guruji.com/en/WebmasterFAQ.htmltext/..GurujiImageBot/1.0 (url)
8 www.guruji.com/WebmasterFAQ.htmlapplication/xmlGurujiBot/1.0 (url)
4 www.guruji.com/WebmasterFAQ.htmltext/..GurujiBot/1.0 (url)
50freebase
50 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
48wordpress
32 support.wordpress.com/contact/text/..WordPress.com mShots; url
4 josefboberg.wordpress.comtext/..WordPress/MU; url
37tourist-information-berlin
37 www.tourist-information-berlin.comtext/..WordPress/2.8.4; url
34picmole
34 www.picmole.comtext/..Mozilla/5.0 (compatible;picmole/1.0 url)
33z-add
29 w3.z-add.co.uk/linkcheck/text/..Z-Add Link Checker (url)
30aport
30 www.aport.ru/helptext/..Mozilla/5.0 (compatible; AportWorm/3.2; url)
29archive-it
20 archive-it.org/files/site-owners.htmlimage/..Mozilla/5.0 (compatible;archive.org_bot/1.7.0; Archive-It; url)
9 archive-it.org/files/site-owners.htmltext/..Mozilla/5.0 (compatible;archive.org_bot/1.7.0; Archive-It; url)
28kplaces
17 www.kplaces.comimage/..Mozilla/5.0 (compatible; heritrix/1.14.0 url)
11 www.kplaces.comtext/..Mozilla/5.0 (compatible; heritrix/1.14.0 url)
28Anonymouse
15 Anonymouse.org/image/..url (Unix)
10 Anonymouse.org/text/..url (Unix)
3 Anonymouse.org/application/x-javascripturl (Unix)
28snap
26 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
26froute
21 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
5 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
26heartrails
16 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
7 capture.heartrails.com/image/..Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
3 capture.heartrails.com/application/x-javascriptMozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
25qdos
25 qdos.com/text/..qdos/1.1 (url)
25alexa
25 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
22rcdtokyo
17 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
5 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
20asterpix
20 www.asterpix.com/text/..Mozilla/5.0 (compatible; Asterbot; url)
19scoutjet
19 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
19simplepie
9 simplepie.orgapplication/xmlSimplePie/1.2 (Feed Parser; url; Allow like Gecko) Build/20090627192103
5 simplepie.orgtext/..SimplePie/1.1.1 (Feed Parser; url; Allow like Gecko) Build/20080315205903
19oneriot
12 www.oneriot.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 OneRiot/1.0 (url)
4 www.oneriot.comimage/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 OneRiot/1.0 (url)
18proximic
18 www.proximic.comtext/..Mozilla/5.0 (compatible; proximic; url)
18chenli
18 chenli.com.cntext/..Chen Li/Nutch-1.0 (Nutch spiderman; url; mail address )
17flaptor
17 www.flaptor.com/text/..HounderCrawl/Nutch-0.9 (Hounder Search Bot; url)
17mixi
10 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
7 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
16mnemoo
16 www.mnemoo.com/about/spidertext/..Mnemoo Intelligent Spider/0.1alpha (compatible; See url)
16moose
16 www.moose.at/text/..Mozilla/5.0 (compatible; MooseBot/1.1; Linux i686; de; url)
14kosmix
10 www.kosmix.com/crawler.htmltext/..voyager/2.0 (url)
4 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
14aafter
13 aafter.com/crawler.htmtext/..AAfter.com Crawler/AAfter-1.0 (This bot is very focused, well-behaved, and wants to do good to internet community. For any questions, please call collect 1 214-714-2224. Team AAfter, Dallas, TX, USA; url; crawler at aafter.com)
14FeedBurner
14 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
14hatena
7 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
7 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
13phonifier
9 www.phonifier.comtext/..Mozilla/5.0 (compatible; Phonifier; url)
4 www.phonifier.comtext/..aNti_miSa$puAsa/5.0 (compatible; Phonifier; url)
12newsgator
9 www.newsgator.com/Individuals/NetNewsWire/-NetNewsWire/3.2.3 (Mac OS X; url)
12setooz
12 www.setooz.com/oozbot.htmltext/..OOZBOT/0.20 ( url ; mail address )
12memidex
12 www.memidex.com/_bottext/..Mozilla/5.0 (compatible; Memibot/1.0; url )
12xrss
12 www.xrss.eu/robottext/..Mozilla/5.0 (compatible; xrss; robot; url; version 2.0)
11zscho
11 zscho.de/text/..Zscho.de Crawler/Nutch-1.0-Zscho.de-semantic_patch (Zscho.de Crawler, collecting for machine learning; url )
11ac
8 www.yama.info.waseda.ac.jp/~takuya/en/aboutSRVC.htmltext/..SearchEngineVerificationCrawler/Nutch-1.0 (The purpose of this crawling is to collect web pages for verifying search engines.; url; mail address dot waseda dot ac dot jp)
11tineye
7 tineye.com/crawler.htmlimage/..TinEye/1.1 (url)
3 tineye.com/crawler.htmltext/..TinEye/1.1 (url)
10github
8 github.com/pauldix/typhoeus/tree/mastertext/..Typhoeus - url
10bloglines
6 www.bloglines.com-Bloglines/3.1 (url; 1 subscriber)
10princexml
9 www.princexml.comimage/..Prince/7.1 (url)
10kalooga
6 www.kalooga.com/info.html?page=crawlertext/..Mozilla/5.0 (compatible; KaloogaBot; url)
4 www.kalooga.com/info.html?page=crawlerimage/..Mozilla/5.0 (compatible; KaloogaBot; url)
55,686total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
1,670PythonWikipediaBot/1.0
838 application/json
534 application/xml
297 text/..
1 image/..
1 -
873GoogleBot-Image/1.0
382 text/..
256 image/..
235 -
343LinkParser/2.0
343 text/..
294Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
294 text/..
1 -
1 application/pdf
1 application/ogg
222Answersbot
222 text/..
194Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
100 text/..
51 image/..
43 application/x-javascript
1 application/json
179gsa-crawler (Enterprise; S5-MS8QQPJ5BGWAA; mail address )
179 text/..
137wikiwix-bot-3.0
134 text/..
2 image/..
1 -
132php wikibot classes
132 application/vnd.php.serialized
1 -
1 text/..
103MPUploadBot; PHP 5.2.6-3ubuntu4.4
103 application/vnd.php.serialized
1 -
81GoogleBot-Image/1.0
80 text/..
1 image/..
1 -
68crawler mail address
68 text/..
1 -
64Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
32 image/..
26 text/..
6 application/x-javascript
1 application/json
59MPUploadBot; PHP 5.2.6-3ubuntu4.5
59 application/vnd.php.serialized
45Test Webbot
45 text/..
44spider
44 text/..
1 application/xml
1 image/..
36Pywikipediabot/2.0
36 application/json
35TerraSpider
35 text/..
34SineBot/1.5.15(User:SineBot)
33 application/vnd.php.serialized
1 text/..
33plantspedia data crawler
33 text/..
31MLBot (www.metadatalabs.com/mlbot)
31 text/..
1 -
1 image/..
25dictionary-bot
18 application/xml
7 text/..
23HTMLParser/1.6
16 text/..
7 application/json
20DotNetWikiBot/2.71 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
16 text/..
4 application/xml
19CorenSearchBot/1.4 en libwww-perl/5.808
19 text/..
19AnomieBOT 1.0 (OrphanReferenceFixer)
19 application/json
18UnitiveBot
16 image/..
2 text/..
17LinkParser/1.00
17 text/..
17msramlbot
17 text/..
14GoogleBot
14 text/..
1 application/x-javascript
1 image/..
14MSR-ISRCCrawler
9 text/..
3 application/x-javascript
2 image/..
14Bub's wikibot (Wikibot/2009092504; JWBF/1.2; Java/1.6)
14 text/..
13SurakWare MediaWiki Bot/1.0
13 text/..
12Mozilla/5.0 (compatible; Nigma.ru/3.0; mail address )
12 text/..
12COIBot/1.00
12 text/..
10KIT webcrawler/0.2.4
10 text/..
1 application/ogg
10dicbot 1.0
10 text/..
9Bot/WP/EN/Daniel/MediationBot1/1.2
9 text/..
9SuperBot/4.7.0.72 (Windows XP)
9 text/..
1 image/..
9Tawbot (public svn release; plwiki)
9 text/..
9 mail address (Mozilla compatible)
9 text/..
1 image/..
9Mozilla/5.0 (Bgbot 0.5)
9 text/..
8DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
8 text/..
1 application/xml
7DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600 Service Pack 2; )
7 text/..
1 application/xml
7testcrawler
7 text/..
7TweetMemeBot (Feed Parser; Allow like Gecko)
4 text/..
3 application/xml
1 -
1 audio/midi
7DotNetWikiBot/2.53 (Unix 2.6.26.2; )
7 text/..
7Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
7 text/..
6DotNetWikiBot/2.72 (Microsoft Windows NT 5.2.3790 Service Pack 2; )
6 text/..
6zomba-bot/0.1
6 text/..
1 image/..
6XLinkBot/1.00
6 text/..
6HTMLParser/2.0
6 text/..
1 -
6GNAA-bot
6 text/..
6msnbot
6 text/..
1 -
1 image/..
6YaDirectBot/1.0
6 text/..
6IssueCrawler
6 text/..
54am-spider/1.0
5 text/..
5FAST Enterprise Crawler 6 used by Wanadoo ( mail address )
4 application/x-javascript
1 text/..
1 -
5AnomieBOT 1.0 (AFDMergeFromCleaner)
5 application/json
1 text/..
5Mozilla/5.0 (Apibot 0.19)
5 application/vnd.php.serialized
5AOL Reference Center Bot/1.0
5 text/..
4IScraperBot/0.1
3 application/xml
1 text/..
1 -
4ListasBot 3
4 text/..
4Geni ircpybot 1.0
2 application/json
2 text/..
1 application/xml
4OpenLink Virtuoso RDF crawler
4 text/..
4SONIVIS MediaWiki API Bot 0.1.3
4 text/..
4SmartieSpider
4 text/..
4CheMoBot/1.00
4 text/..
3bitlybot
3 text/..
1 image/..
3DotNetWikiBot/2.71 (Microsoft Windows NT 6.0.6001 Service Pack 1; )
2 text/..
1 application/xml
3DoCoMo/2.0 SH904i(c100;TB;W24H16)(Y!J-AGENT)(robot)
3 text/..
1 -
1 image/..
3DotNetWikiBot/2.71 (Microsoft Windows NT 6.0.6002 Service Pack 2; )
2 text/..
1 application/xml
3Mozilla/2.0 mzscheme-webbot/1.1
2 text/..
1 application/yaml
3DotNetWikiBot/2.6 (Microsoft Windows NT 6.0.6002 Service Pack 2; )
3 text/..
3RMI Digital Add Tagging Crawler
3 text/..
3Freebase Deathbot
3 text/..
3AnomieBOT 1.0 (SourceUploader)
3 application/json
3unblockbot/1.00
3 text/..
5,155total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Thu, Mar 11, 2010 1:57
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.