Wikimedia Visitor Log Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Jun 2009 - 30 Jun 2009

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Google
 
The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 36,047,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 246,349,000 external requests, which is 14.6%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
19,640google
16,839 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
1,967 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
167 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
144 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
110 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
72 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
71 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
68 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
50 code.google.com/appenginetext/..AppEngine-Google; (url)
43 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
30 code.google.com/appengineapplication/jsonpython-wikitools/0.1.1 AppEngine-Google; (url)
29 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
15 code.google.com/appengineapplication/xmlAppEngine-Google; (url)
7 www.google.com/feedfetcher.htmlapplication/jsonFeedFetcher-Google; (url)
7 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
6 www.google.com/feedfetcher.htmltext/..Google OpenSocial agent (url)
4 code.google.com/appengineimage/..AppEngine-Google; (url)
3 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
6,434yahoo
5,076 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
652 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
224 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
120 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
111 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
78 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
65 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
31 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
18 misc.yahoo.com.cn/help.html-Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
13 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
13 help.yahoo.com/help/us/ysearch/slurpapplication/x-javascriptMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
12 help.yahoo.com/help/us/ysearch/crawling/crawling-01.htmltext/..Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; url)
9 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
5 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-PSC/1.0 (url)
3 developer.yahoo.com/searchmonkey/useragentimage/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
2,894msn
1,486 search.msn.com/msnbot.htmtext/..msnbot/1.1 (url)
668 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
195 search.msn.com/msnbot.htm-msnbot/1.1 (url)
191 search.msn.com/msnbot.htm-msnbot/2.0b (url)
182 search.msn.com/msnbot.htmtext/..msnbot-media/1.1 (url)
77 search.msn.com/msnbot.htmtext/..renlifangbot/1.0 (url)
49 search.msn.com/msnbot.htmimage/..msnbot-media/1.1 (url)
23 search.msn.com/msnbot.htmimage/..msnbot/1.1 (url)
7 search.msn.com/msnbot.htmapplication/oggmsnbot/1.1 (url)
6 search.msn.com/msnbot.htmimage/..msnbot/2.0b (url)
4 search.msn.com/msnbot.htmapplication/oggmsnbot/2.0b (url)
1,801google?
1,583 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
88 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
44 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
26 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
23 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
10 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
6 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
6 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
4 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
3 www.google.com/bot.htmlapplication/xmlDoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
794exabot
673 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
102 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot-Images/3.0; url)
13 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
5 www.exabot.com/go/robotimage/..Mozilla/5.0 (compatible; Exabot-Images/3.0; url)
565yanga
464 www.yanga.co.uk/text/..Yanga WorldSearch Bot v1.1/beta (url)
101 www.yanga.co.uk/image/..Yanga WorldSearch Bot v1.1/beta (url)
490naver
419 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
26 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
17 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
13 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
7 help.naver.com/robots/text/..Mozilla/5.0 (compatible; Yeti/1.0; NHN Corp.; url)
4 help.naver.com/robots/image/..Mozilla/5.0 (compatible; Yeti/1.0; NHN Corp.; url)
434searchme
166 www.searchme.com/support/text/..Mozilla/5.0 (compatible; Charlotte/1.0t; url)
135 www.searchme.com/support/text/..Mozilla/5.0 (compatible; Charlotte/1.1; url)
91 www.searchme.com/support/image/..Mozilla/5.0 (compatible; Charlotte/1.0t; url)
35 www.searchme.com/support/application/x-javascriptMozilla/5.0 (compatible; Charlotte/1.0t; url)
6 www.searchme.com/support/-Mozilla/5.0 (compatible; Charlotte/1.1; url)
380pipl
380 www.pipl.com/bot/text/..Mozilla/5.0(compatible;PiplBot;url)
330baidu
191 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
80 www.baidu.jp/spider/text/..Baiduspider(url)
26 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0; url)
14 www.baidu.com/search/spider.htm-Baiduspider(url)
8 www.baidu.jp/spider/-Baiduspider(url)
5 www.baidu.jp/spider/application/xmlDoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0; url)
310teesoft
79 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
58 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
48 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
32 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
13 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
10 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
8 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
7 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
7 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
6 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/text/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/application/xmlMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/application/xmlMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux x86_64; [lang code]; rv:[..]) Gecko/.. etc (url)
271cuil
267 www.cuil.com/twiceler/robot.htmltext/..Mozilla/5.0 (Twiceler-0.9 url)
246wikipedia
91 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.1 url
49 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.0 url
21 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.0.0 url
20 en.wikipedia.org/wiki/Wapediaapplication/vnd.php.serializedwapedia.mobi liveupdate (url)
15 en.wikipedia.orgtext/..url
12 zh.wikipedia.org/w/index.php?title=File:DOG.jpg&variant=zh-cntext/..url
11 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.2 url
8 zh.wikipedia.org/w/index.php?title=Special:sousuo/zhaitenglongfu&variant=zh-cntext/..url
5 zh.wikipedia.org/w/index.php?title=阿古姆二世&variant=zh-cntext/..url
4 zh.wikipedia.org/w/index.php?title=那一年幸福時光&variant=zh-cntext/..url
4 ko.wikipedia.orgtext/..url
3 ms.wikipedia.orgtext/..url
235yacy
27 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-12-generic; java 1.6.0_13; Europe/en) url
24 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_05; Europe/de) url
18 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.24-16-server; java 1.6.0_07; Europe/en) url
15 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-12-generic; java 1.6.0_0; Europe/en) url
14 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.24-24-generic; java 1.6.0_07; Europe/en) url
12 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-11-generic; java 1.6.0_0; Europe/en) url
10 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.24-23-generic; java 1.6.0_07; Europe/de) url
9 yacy.net/bot.htmltext/..yacybot (x86 Windows Vista 6.1; java 1.6.0_13; Europe/de) url
8 yacy.net/bot.htmltext/..yacybot (x86 Windows Vista 6.1; java 1.6.0_13; America/en) url
8 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-11-generic; java 1.6.0_13; Europe/en) url
7 yacy.net/bot.htmltext/..yacybot (i386 Mac OS X 10.5.7; java 1.5.0_16; Europe/de) url
6 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.18-128.1.10.el5PAE; java 1.5.0_18; Europe/en) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-amd64; java 1.6.0_12; Europe/en) url
5 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.27.24-170.2.68.fc10.i686; java 1.6.0_0; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-12-generic; java 1.6.0_13; GMT/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-12-generic; java 1.6.0_0; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-12-netbook; java 1.6.0_13; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (x86 Windows 2003 5.2; java 1.6.0_13; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.24.2dedibox-r8-1-c7; java 1.6.0_0; Europe/fr) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.9-023stab048.4-smp; java 1.6.0; GMT/en) url
3 yacy.net/bot.htmltext/..yacybot (x86 Windows 2003 5.2; java 1.6.0_13; GMT01:00/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Windows Server 2008 6.0; java 1.6.0_14; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.24-16-server; java 1.6.0_07; Zulu/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 FreeBSD 7.1-PRERELEASE; java 1.6.0_07; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-11-generic; java 1.6.0_0; Europe/en) url
206dotnetdotcom
206 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
191wikimedia
189 tools.wikimedia.de/~daniel/text/..WikiSense (url)
187soso
176 help.soso.com/webspider.htmtext/..Sosospider(url)
10 help.soso.com/soso-image-spider.htmimage/..Sosoimagespider(url)
160scoutjet
160 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
158sblog
100 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
25 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
24 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
5 fulltext.sblog.cz/screenshot/application/x-javascriptMozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
158ask
152 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
4 about.ask.com/en/docs/about/webmasters.shtmlimage/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
149majestic12
104 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.2.5; url)
41 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.2.4; url)
3 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.2.3; url)
135daum
135 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
135youdao
93 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
14 www.youdao.com/help/webmaster/spider/image/..Mozilla/5.0 (compatible;YodaoBot-Image/1.0;url;)
10 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
4 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
4 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible;YodaoBot-Image/1.0;url;)
3 www.youdao.com/help/webmaster/spider/audio/midiMozilla/5.0 (compatible; YodaoBot/1.0; url; )
121gigablast
121 www.gigablast.com/spider.htmltext/..Gigabot/3.0 (url)
120sogou
104 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
6 www.sogou.com/docs/help/webmasters.htm#07application/xmlSogou web spider/4.0(url)
5 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url) (via Web-Blaster/2.21 (http://www.a-blast.org/web-blast.html))
118mnemoo
118 www.mnemoo.com/about/spidertext/..Mnemoo Intelligent Spider/0.1alpha (compatible; See url)
118php
56 pear.php.net/text/..PEAR HTTP_Request class ( url )
29 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
17 pear.php.net/package/http_request2text/..HTTP_Request2/0.4.0 (url) PHP/5.2.8
9 pear.php.net/application/jsonPEAR HTTP_Request class ( url )
5 pear.php.net/package/http_request2text/..HTTP_Request2/0.4.0 (url) PHP/5.2.9
110cydral
105 www.cydral.comtext/..CydralSpider/3.0 (Cydral Image Search; url)
5 www.cydral.comimage/..CydralSpider/3.0 (Cydral Image Search; url)
103setooz
82 www.setooz.com/oozbot.htmltext/..OOZBOT/0.20 ( -- ; url ; mail address )
21 www.setooz.com/oozbot.htmltext/..OOZBOT/0.20 ( url ; mail address )
96goo
94 help.goo.ne.jp/contact/text/..goo wikipedia (url)
80kosmix
45 www.kosmix.com/crawler.htmlapplication/xmlvoyager/1.0 url
35 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
63loc
61 www.loc.gov/minerva/crawl.htmltext/..Mozilla/5.0 (compatible; archive.org_bot/1.6.0 url)
58facebook
37 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
11 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
6 developers.facebook.comtext/..facebookplatform/1.0 (url)
46boardreader
46 spider.boardreader.comtext/..BoardReader Rating Builder/1.0 url
42freebase
42 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
38entireweb
34 www.entireweb.com/about/search_tech/speedy_spider/text/..Speedy Spider (url)
4 www.entireweb.com/about/search_tech/speedy_spider/-Speedy Spider (url)
36edu
27 iws.seu.edu.cn/services/falcons/contact_us.jsptext/..Mozilla/5.0 (compatible; Falconsbot; url)
7 iws.seu.edu.cn/services/falcons/contact_us.jspimage/..Mozilla/5.0 (compatible; Falconsbot; url)
3580legs
28 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 80bot/0.71; url;) Gecko/2008032620
7 www.80legs.com/spider.htmlimage/..Mozilla/5.0 (compatible; 80bot/0.71; url;) Gecko/2008032620
30qdos
30 qdos.com/text/..qdos/1.1 (url)
30guruji
21 www.guruji.com/en/WebmasterFAQ.htmltext/..Mozilla/5.0 (compatible; GurujiBot/1.0; url)
6 www.guruji.com/WebmasterFAQ.htmlapplication/xmlGurujiBot/1.0 (url)
28alexa
28 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
27froute
21 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
6 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
27snap
27 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
26wordpress
21 support.wordpress.com/contact/text/..WordPress.com mShots; url
24emining
24 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
23ellerdale
23 www.ellerdale.com/crawler.htmltext/..Mozilla/5.0 (compatible; EllerdaleBot/ 1.0; url)
23rcdtokyo
15 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
8 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
23dium
14 me.dium.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 Me.dium/1.0 (url)
9 me.dium.comimage/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 Me.dium/1.0 (url)
23hatena
12 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
11 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
23mixi
12 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
11 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
22newsgator
7 www.newsgator.comtext/..NewsGator/2.0 Bot (url)
7 www.newsgator.comtext/..NewsGatorOnline/2.0 (url) bot
3 www.newsgator.com/application/xmlFeedDemon/2.7 (url; Microsoft Windows)
22greenivory
22 greenivory.frtext/..GreenIvory/Nutch-0.9 (GreenIvory-BlueCrane; url; mail address )
22emusic
14 www.emusic.com/application/jsonMozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1.8pre) Gecko/20071017 eMusic DLM/4.0/1.0.0.2 (url) (like Firefox/2.0.0.8)
5 www.emusic.com/application/jsonMozilla/5.0 (Windows; Windows NT 6.0; en-US; rv:1.8.1.8pre) Gecko/20071017 eMusic DLM/4.0/1.0.0.2 (url) (like Firefox/2.0.0.8)
22Anonymouse
12 Anonymouse.org/image/..url (Unix)
8 Anonymouse.org/text/..url (Unix)
21spinn3r
21 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.0); url) Gecko/20021130
20weblio
18 www.weblio.jp/text/..Mozilla/5.0 (compatible; WeblioBot; url)
18justsystems
18 www.justsystems.com/jp/tech/crawler/text/..JUST-CRAWLER(url)
18jumptap
15 www.jumptap.com/jumpbottext/..Nokia6820/2.0 (5.88) Profile/MIDP-1.0 Configuration/CLDC-1.0/1.0 (Jumpbot; url; mail address )
3 www.jumptap.com/jumpbottext/..MOT-T720/G_05.07.23R MIB/2.0 Profile/MIDP-1.0 Configuration/CLDC-1.0 UP.Link/1.1/1.0 (Jumpbot; url; mail address )
18mashget
14 www.mashget.comtext/..Mashgetbot/2.1 (url)
4 www.mashget.comapplication/jsonMashGet(url)
16parliament
13 www.parliament.uktext/..uk_parliament (url; mail address )
3 www.parliament.ukimage/..uk_parliament (url; mail address )
16FeedBurner
16 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
16feedparser
16 feedparser.org/application/xmlUniversalFeedParser/4.1 url
12apache
6 lucene.apache.org/nutch/text/..NYU CS Nutch 1.0/Nutch-1.0 (NYU CS Nutch 1.0; url; mail address dot edu)
6 lucene.apache.org/nutch/text/..NYU CS Nutch 1.0/Nutch-0.9 (NYU CS Nutch 1.0; url; mail address dot edu)
12fooooo
12 fooooo.com/bot.htmltext/..Mozilla/4.0 (compatible; Fooooo_Web_Video_Crawl url)
12xrss
12 www.xrss.eu/robottext/..Mozilla/5.0 (compatible; xrss; robot; url; version 2.0)
11linkaider
11 linkaider.com/crawler/text/..LinkAider (url)
11abonti
11 www.abonti.comtext/..Mozilla/5.0 (compatible; Abonti/0.8 - url)
11picsearch
7 www.picsearch.com/bot.htmlimage/..psbot/0.1 (url)
4 www.picsearch.com/bot.htmltext/..psbot/0.1 (url)
11www.
4 www.image/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
3 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
3 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
11holmes
11 holmes.getext/..HolmesBot (url)
10moreover
10 www.moreover.comtext/..Moreoverbot/5.00 (url; mail address )
10heartrails
3 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux x86_64; en-US; rv:1.8.1.16) Gecko/20080416 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.16
3 capture.heartrails.com/image/..Mozilla/5.0 (X11; Linux x86_64; en-US; rv:1.8.1.16) Gecko/20080416 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.16
10pi
10 pi.comtext/..Mozilla/5.0 (compatible; heritrix/1.14.3 url)
10virtual-presence
10 lms.virtual-presence.orgtext/..Firebat 2.9.1 (url)
10acont
10 hilfe.acont.de/bot.htmltext/..url ACONTBOT
10duckduckgo
7 duckduckgo.com/duckduckbot.htmltext/..DuckDuckBot/1.1; (url)
10topsy
10 labs.topsy.com/butterfly.htmltext/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
38,195total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
2,293PythonWikipediaBot/1.0
1,633 text/..
591 application/xml
69 application/json
1 -
1 image/..
1,292Answersbot
1,292 text/..
1 -
1,138GoogleBot-Image/1.0
451 image/..
447 text/..
240 -
934Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
596 text/..
197 image/..
141 application/x-javascript
342Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
342 text/..
1 -
1 application/ogg
240php wikibot classes
240 application/vnd.php.serialized
1 -
201wikiwix-bot-3.0
199 text/..
1 -
1 image/..
170Crawler2
170 text/..
145Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
71 image/..
60 text/..
14 application/x-javascript
119UniFind Site Spider; email mail address
119 text/..
1 -
108gsa-crawler (Enterprise; T1-FDM9ASJ5TESAT; mail address )
108 text/..
98AarghBot
98 text/..
98Tawbot (public svn release; plwiki)
98 text/..
77DotNetWikiBot/2.64 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
49 text/..
28 application/xml
61ListasBot 3
61 text/..
56GoogleBot-Image/1.0
55 text/..
1 image/..
1 -
55 mail address (Mozilla compatible)
29 image/..
26 text/..
46DefaultsortBot
46 text/..
45crawler mail address
45 text/..
1 -
39Mozilla/5.0 (Apibot 0.01)
39 application/vnd.php.serialized
38Test Webbot
38 text/..
34SineBot/1.5.13(User:SineBot)
33 application/vnd.php.serialized
1 text/..
33AnomieBOT 1.0 (OrphanReferenceFixer)
33 application/json
29CorenSearchBot/1.4 en libwww-perl/5.808
29 text/..
29COIBot/1.00
29 text/..
25GoogleBot
25 text/..
1 application/x-javascript
1 image/..
25gigabot
22 image/..
2 text/..
1 -
25MSIndianWebcrawl
25 text/..
1 image/..
1 application/ogg
25MSR-ISRCCrawler
18 text/..
6 application/x-javascript
1 image/..
24Bot/WP/EN/E/EBot
24 text/..
21Plagiat Web Spider WSIiZ wsiz.rzeszow.pl
19 text/..
1 image/..
1 application/ogg
18dictionary-bot
17 application/xml
1 text/..
17plantspedia data crawler
17 text/..
17Jyxobot/1
17 text/..
14FAST Enterprise Crawler 6 used by Lenovo ( mail address )
14 text/..
1 -
14topyx-crawler
14 text/..
1 -
12rdfbot/1.0 (Indian Language Web Search Engine; Rediff.com; rdfbot mail address )
12 text/..
12DotNetWikiBot/2.61 (Microsoft Windows NT 5.2.3790 Service Pack 2; )
12 text/..
1 application/xml
12SurakWare MediaWiki Bot/1.0
12 text/..
1 application/xml
10Legobot
10 application/json
10Freebase Deathbot
10 text/..
10Mozilla/5.0 (Bgbot 0.5)
9 text/..
1 application/xml
9GingerCrawler/1.0 (Language Assistant for Dyslexics; www.gingersoftware.com/crawler_agent.htm; support at ginger software dot com)
9 text/..
1 image/..
9YaDirectBot/1.0
9 text/..
9FAST Enterprise Crawler 6 used by MICROLINK ( mail address )
9 text/..
8XLinkBot/1.00
8 text/..
8Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
8 text/..
8Geni ircpybot 1.0
4 text/..
2 application/json
2 application/xml
7Pywikipediabot/2.0
7 application/json
1 text/..
7kindsight/Nutch-1.0 (kscrawler; www.projectrialto.com; mail address )
7 text/..
1 -
7Draicone's bot
7 text/..
6AnomieBOT 1.0 (WikiProjectWorker)
6 application/json
6zoegle/Nutch-1.0 (spider)
6 text/..
1 application/ogg
6FAST Enterprise Crawler 6 used by ss (ss)
6 text/..
6Mozilla/4.0 (compatible; focuseekbot)
6 text/..
5DotNetWikiBot/2.7 (Microsoft Windows NT 6.0.6002 Service Pack 2; )
5 text/..
5volverine/Nutch-0.9 (agentspider; mail address )
5 text/..
1 -
1 image/..
5MLBot (www.metadatalabs.com/mlbot)
5 text/..
1 -
5Codeton Software RSS Bot/1.0
5 text/..
5SineBot/1.5.14(User:SineBot)
5 application/vnd.php.serialized
1 text/..
5DotNetWikiBot/2.7 (Microsoft Windows NT 5.1.2600 Service Pack 2; )
4 text/..
1 application/xml
5DotNetWikiBot/2.53 (Unix 2.6.26.2; )
5 text/..
5TestBot
5 text/..
5Mozilla/5.0 (compatible; Voluniabot/0.0.5; mail address )
5 text/..
5websitethumbnail.de snapshot spider
5 text/..
5Xaldon WebSpider 2.7.b6
5 text/..
1 application/x-javascript
4WebCrawler
4 text/..
4Neat Web Crawler
4 text/..
4Mozilla/5.0 (Windows; Windows NT 5.1; en-GB; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot
4 text/..
1 image/..
4QuickFinder Crawler
4 text/..
4Pybot 1.0 mail address
3 text/..
1 application/xml
4DotNetWikiBot/2.3 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
4 text/..
1 application/xml
4AnomieBOT 1.0 (SourceUploader)
4 application/json
4unblockbot/1.00
4 text/..
3AnomieBOT 1.0 (no task)
3 application/json
3rdfbot/1.0 (rdfbot mail address )
3 text/..
1 -
1 image/..
1 application/ogg
3GNAA-bot
3 text/..
3beast/Nutch-0.9 (agentspider; mail address )
3 text/..
1 image/..
1 application/ogg
3Bot/WP/EN/Quadell/polbot
3 text/..
3Mozilla/5.0 (Yahoo-MMCrawler/4.0; mail address )
2 image/..
1 text/..
3WikiNEWSticsBOT (by user Melancholie)
3 text/..
3TKBot 1.0 ( mail address )
3 application/xml
3GinioSpider
3 text/..
3Somebody 1.2 ( mail address - tell me if I am going too fast - no edit rate specified in robots.txt, I can slow down)
3 text/..
8,226total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Friday August 21, 2009
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.