Wikimedia Visitor Log Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Oct 2009 - 31 Oct 2009

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Google
 
The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 48,824,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 272,271,000 external requests, which is 17.9%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
19,636google
17,047 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
1,615 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
196 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
129 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
115 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
78 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
58 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
46 code.google.com/appenginetext/..AppEngine-Google; (url; appid nwikiproxy)
39 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.8267; url)
37 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
35 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
29 code.google.com/appenginetext/..AppEngine-Google; (url; appid: nwikiproxy)
28 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
21 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid nwikiproxy)
21 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
20 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.8267; url)
18 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
17 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
14 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid: nwikiproxy)
7 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.2235; url)
6 code.google.com/appenginetext/..AppEngine-Google; (url; appid: mrictx)
5 code.google.com/appenginetext/..AppEngine-Google; (url; appid: img-proxy)
5 code.google.com/appenginetext/..AppEngine-Google; (url; appid img-proxy)
5 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.8267; url)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid finchproxy)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: finchproxy)
4 www.google.com/coop/cse/creftext/..FeedFetcher-Google-CoOp; (url)
3 code.google.com/appengineimage/..AppEngine-Google; (url; appid: mrictx)
3 docs.google.comimage/..Mozilla/5.0 (compatible; GoogleDocs; url)
3 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
3 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.2235; url)
15,502msn
9,880 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
4,047 search.msn.com/msnbot.htm-msnbot/2.0b (url)
1,379 search.msn.com/msnbot.htmtext/..msnbot/1.1 (url)
145 search.msn.com/msnbot.htm-msnbot/1.1 (url)
28 search.msn.com/msnbot.htmimage/..msnbot/1.1 (url)
12 search.msn.com/msnbot.htmtext/..msnbot-Products/1.0 (url)
7 search.msn.com/msnbot.htmapplication/oggmsnbot/1.1 (url)
8,860yahoo
5,799 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
2,326 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
217 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
147 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
140 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
130 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
33 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
14 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
13 misc.yahoo.com.cn/help.html-Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
13 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
11 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
9 help.yahoo.com/help/us/ysearch/crawling/crawling-01.htmltext/..Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; url)
3 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp; url)
1,299google?
1,092 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
57 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
56 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
27 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
18 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
15 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
7 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
7 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
6 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
4 www.google.com/bot.htmlapplication/xmlSAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
4 www.google.com/bot.htmlapplication/xmlDoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
1,236exabot
734 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); url)
485 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
12 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
4 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot-Images/3.0; url)
1,101teesoft
328 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
223 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
164 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
106 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
41 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
29 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
20 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
20 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
18 www.teesoft.info/text/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
17 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
16 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
14 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux x86_64; [lang code]; rv:[..]) Gecko/.. etc (url)
13 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
11 www.teesoft.info/application/xmlMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
10 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
8 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
8 www.teesoft.info/text/..Mozilla/5.0 (X11; Linux x86_64; [lang code]; rv:[..]) Gecko/.. etc (url)
6 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; [lang code]; rv:[..]) Gecko/.. etc (url)
6 www.teesoft.info/application/xmlMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.4; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/application/x-javascriptMozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.2; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/-Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.4; [lang code]; rv:[..]) Gecko/.. etc (url)
831naver
724 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
35 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
28 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
17 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
15 help.naver.com/robots/image/..Mozilla/5.0 (compatible; Yeti/1.0; NHN Corp.; url)
8 help.naver.com/robots/text/..Mozilla/5.0 (compatible; Yeti/1.0; NHN Corp.; url)
633soso
625 help.soso.com/webspider.htmtext/..Sosospider(url)
4 help.soso.com/soso-image-spider.htmimage/..Sosoimagespider(url)
590pipl
590 www.pipl.com/bot/text/..Mozilla/5.0(compatible;PiplBot;url)
487ask
408 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
69 about.ask.com/en/docs/about/webmasters.shtml-Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
5 about.ask.com/en/docs/about/webmasters.shtmlimage/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
4 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url) (via Web-Blaster/2.21 (http://www.assoziations-blaster.de/web-blast.html))
433baidu
305 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
52 www.baidu.jp/spider/text/..Baiduspider(url)
27 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0; url)
14 www.baidu.jp/spider/image/..BaiduImagespider(url)
13 www.baidu.com/search/spider.htm-Baiduspider(url)
7 www.baidu.jp/spider/application/xmlDoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0; url)
5 www.baidu.jp/spider/text/..BaiduImagespider(url)
4 www.baidu.com/search/spider.htmtext/..Baiduspider(url) (via babelfish.yahoo.com)
3 www.baidu.jp/spider/-Baiduspider(url)
423cuil
403 www.cuil.com/twiceler/robot.htmltext/..Mozilla/5.0 (Twiceler-0.9 url)
9 www.cuil.comimage/..Mozilla/5.0 (compatible; heritrix/1.14.0 url)
8 www.cuil.comtext/..Mozilla/5.0 (compatible; heritrix/1.14.0 url)
383dotnetdotcom
383 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
348youdao
297 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
20 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
17 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
3 www.youdao.com/help/webmaster/spider/audio/midiMozilla/5.0 (compatible; YodaoBot/1.0; url; )
3 www.youdao.com/help/webmaster/spider/application/xmlMozilla/5.0 (compatible; YoudaoBot/1.0; url; )
254wikimedia
251 tools.wikimedia.de/~daniel/text/..WikiSense (url)
205yacy
32 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-15-generic; java 1.6.0_0; Europe/en) url
28 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-15-generic; java 1.6.0_0; Europe/en) url
20 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-15-generic; java 1.6.0_16; Europe/en) url
18 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-6-amd64; java 1.5.0_14; Europe/de) url
14 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-16-generic; java 1.6.0_0; Europe/en) url
10 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.24-8-pve; java 1.6.0_12; Etc/en) url
8 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-16-generic; java 1.6.0_0; Europe/en) url
7 yacy.net/bot.htmltext/..yacybot (x86 Windows Vista 6.0; java 1.6.0_07; Europe/de) url
6 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-16-generic; java 1.6.0_16; Europe/en) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-128.7.1.el5; java 1.6.0; Europe/de) url
5 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_15; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-164.el5; java 1.6.0; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.4.21-4.EL; java 1.6.0_16; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-11-generic; java 1.6.0_0; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-15-generic; java 1.6.0_16; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-16-generic; java 1.6.0_16; Europe/en) url
204boardreader
204 spider.boardreader.comtext/..BoardReader Rating Builder/1.0 url
199wikipedia
99 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.2 url
26 en.wikipedia.org/wiki/Wapediaapplication/vnd.php.serializedwapedia.mobi liveupdate (url)
23 en.wikipedia.orgtext/..url
22 zh.wikipedia.org/w/index.php?title=苏西特纳河&variant=zh-cntext/..url
21 zh.wikipedia.org/w/index.php?title=quanzimuqun&variant=zh-cntext/..url
185sblog
121 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
32 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
22 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
6 fulltext.sblog.cz/screenshot/application/x-javascriptMozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
4 fulltext.sblog.cz/robot/-SeznamBot/2.0 (url)
167scoutjet
167 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
162activepeople
162 www.activepeople.nettext/..WordPress/2.8.4; url
156php
55 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
38 pear.php.net/package/http_request2text/..HTTP_Request2/0.4.0 (url) PHP/5.2.10
31 pear.php.net/text/..PEAR HTTP_Request class ( url )
25 pear.php.net/application/jsonPEAR HTTP_Request class ( url )
4 pear.php.net/package/http_request2text/..HTTP_Request2/0.3.0 (url) PHP/5.3.0
128daum
128 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
128facebook
78 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
41 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
5 developers.facebook.comtext/..facebookplatform/1.0 (url)
4 developers.facebook.comimage/..facebookplatform/1.0 (url)
113mnemoo
113 www.mnemoo.com/about/spidertext/..Mnemoo Intelligent Spider/0.1alpha (compatible; See url)
106sogou
93 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
5 www.sogou.com/docs/help/webmasters.htm#07application/xmlSogou web spider/4.0(url)
4 www.sogou.com/docs/help/webmasters.htm#07text/..Sogouwebrobot(url)
3 www.sogou.com/docs/help/webmasters.htm#07image/..Sogou Pic Spider/3.0(url)
100goo
94 help.goo.ne.jp/contact/text/..goo wikipedia (url)
4 help.goo.ne.jp/door/crawler.htmltext/..ichiro/3.0 (url)
95majestic12
64 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.2.5; url)
13 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.1; url)
12 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.0; url)
3 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.2.4; url)
90gigablast
90 www.gigablast.com/spider.htmltext/..Gigabot/3.0 (url)
89emining
89 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
87yanga
71 www.yanga.co.uk/text/..Yanga WorldSearch Bot v1.1/beta (url)
16 www.yanga.co.uk/image/..Yanga WorldSearch Bot v1.1/beta (url)
64entireweb
55 www.entireweb.com/about/search_tech/speedy_spider/text/..Speedy Spider (url)
9 www.entireweb.com/about/search_tech/speedy_spider/-Speedy Spider (url)
5780legs
50 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
7 www.80legs.com/spider.htmlimage/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
55kinolexikon
55 www.kinolexikon.comtext/..WordPress/2.8.4; url
51xrss
51 www.xrss.eu/robottext/..Mozilla/5.0 (compatible; xrss; robot; url; version 2.0)
50snap
50 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
48spinn3r
44 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
3 spinn3r.com/robot-Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
48telehouse
48 telehouse.ru/crawler.htmltext/..Mozilla/5.0 (compatible; Dolphin/1.0; url)
44textdigger
43 textdigger.comtext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
43wordpress
33 support.wordpress.com/contact/text/..WordPress.com mShots; url
41aport
41 www.aport.ru/helptext/..Mozilla/5.0 (compatible; AportWorm/3.2; url)
39freebase
39 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
38diveintopython
38 diveintopython.org/http_web_services/text/..OpenAnything/1.6 url
35traslated
35 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
35www.
14 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
12 www.text/..GoogleBot-Image/1.0 ( urlGoogleBot.com/bot.html)
5 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
3 www.image/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
32junglekey
32 www.junglekey.fr/text/..JungleKeyBot/1.1 (url)
31webzdarma
30 praso.webzdarma.cztext/..Mozilla/5.0 (compatible; heritrix/1.12.1 url)
29qdos
29 qdos.com/text/..qdos/1.1 (url)
27discoveryengine
25 discoveryengine.com/discobot.htmltext/..Mozilla/5.0 (compatible; discobot/1.1; url)
26simplepie
10 simplepie.orgapplication/xmlSimplePie/1.2 (Feed Parser; url; Allow like Gecko) Build/20090627192103
5 simplepie.orgtext/..SimplePie/1.1.1 (Feed Parser; url; Allow like Gecko) Build/20080315205903
5 simplepie.orgapplication/xmlSimplePie/1.1.1 (Feed Parser; url; Allow like Gecko) Build/20080315205903
25froute
19 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
6 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
25dium
24 me.dium.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 Me.dium/1.0 (url)
23rcdtokyo
17 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
6 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
21setooz
21 www.setooz.com/oozbot.htmltext/..OOZBOT/0.20 ( url ; mail address )
21Anonymouse
12 Anonymouse.org/image/..url (Unix)
8 Anonymouse.org/text/..url (Unix)
20
20 text/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0/1.0 (bot; url)
20newsgator
4 www.newsgator.comtext/..NewsGatorOnline/2.0 (url) bot
4 www.newsgator.com/Individuals/NetNewsWire/-NetNewsWire/3.2 (Mac OS X; url)
3 www.newsgator.comtext/..NewsGator/2.0 Bot (url)
20princeton
20 www.cs.princeton.edu/cass/text/..nu_tch-princeton/Nu_tch-1.0-dev (princeton crawler for cass project; url; zhewang a_t cs ddot princeton dot edu)
20kosmix
17 www.kosmix.com/crawler.htmlapplication/xmlvoyager/1.0 url
3 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
20heartrails
13 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
4 capture.heartrails.com/application/x-javascriptMozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
3 capture.heartrails.com/image/..Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
20mixi
11 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
9 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
19alexa
19 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
19holmes
19 holmes.getext/..HolmesBot (url)
18iiit
18 research.iiit.nettext/..Image Retrieval/Nutch-0.9 (Image Retrieval System; url; mail address )
17isara
13 www.isara.orgtext/..Isara/Isara-1.0 (Non-profit search engine that benefits charity.; url; mail address )
4 www.isara.orgtext/..Isara/Isara-1.0 (A non-profit search engine for the benefit of charity.; url; mail address )
17asterpix
17 www.asterpix.com/text/..Mozilla/5.0 (compatible; Asterbot; url)
17moose
17 www.moose.at/text/..Mozilla/5.0 (compatible; MooseBot/1.1; Linux i686; de; url)
17hatena
9 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
7 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
16tourist-information-berlin
16 www.tourist-information-berlin.comtext/..WordPress/2.8.4; url
16kalooga
9 www.kalooga.com/info.html?page=crawlertext/..Mozilla/5.0 (compatible; KaloogaBot; url)
7 www.kalooga.com/info.html?page=crawlerimage/..Mozilla/5.0 (compatible; KaloogaBot; url)
15uniqs
14 uniqs.infotext/..WordPress/2.8.4; url
14shopwiki
14 www.shopwiki.com/wiki/Help:Bottext/..ShopWiki/1.0 ( url)
14jumptap
12 www.jumptap.com/jumpbottext/..Nokia6820/2.0 (5.88) Profile/MIDP-1.0 Configuration/CLDC-1.0/1.0 (Jumpbot; url; mail address )
13phonifier
11 www.phonifier.comtext/..Mozilla/5.0 (compatible; Phonifier; url)
13bloglines
7 www.bloglines.com-Bloglines/3.1 (url; 1 subscriber)
3 www.bloglines.comtext/..Bloglines/3.1 (url; 1 subscriber)
12emusic
9 www.emusic.com/text/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1.8pre) Gecko/20071017 eMusic DLM/4.0/1.0.0.2 (url) (like Firefox/2.0.0.8)
12fairshare
10 fairshare.cctext/..Mozilla crawl/5.0 (compatible; fairshare.cc url)
12aafter
11 aafter.com/crawler.htmtext/..AAfter.com Crawler/AAfter-1.0 (This bot is very focused, well-behaved, and wants to do good to internet community. For any questions, please call collect 1 214-714-2224. Team AAfter, Dallas, TX, USA; url; crawler at aafter.com)
12moreover
12 www.moreover.comtext/..Moreoverbot/5.00 (url; mail address )
12FeedBurner
12 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
12picsearch
9 www.picsearch.com/bot.htmltext/..psbot/0.1 (url)
3 www.picsearch.com/bot.htmlimage/..psbot/0.1 (url)
12babaloo
12 www.babaloo.sitext/..BabalooSpider/1.3 (BabalooSpider; url; mail address )
11att
11 tibesti.research.att.com/research-crawler.htmltext/..Mozilla/5.0 (compatible; heritrix/2.0.1 url)
11mashget
7 www.mashget.comtext/..Mashgetbot/2.1 (url)
4 www.mashget.comapplication/jsonMashGetBot1.0(url)
11delfi
6 search.delfi.lt/?c=crawlertext/..Dolphin/1.4 (url)
5 otsing.delfi.ee/?c=crawlertext/..Dolphin/1.4 (url)
11search2
11 search2.nettext/..S2Bot/1.0 (url; mail address )
10lenky
10 www.lenky.frtext/..LenkyBot/1.1 (url)
10topsy
10 labs.topsy.com/butterfly.htmltext/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
55,640total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
2,364PythonWikipediaBot/1.0
985 application/json
751 text/..
628 application/xml
1 -
1 image/..
1,301GoogleBot-Image/1.0
538 text/..
463 image/..
297 -
3 application/pdf
475Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
475 text/..
1 -
1 image/..
1 application/ogg
371Answersbot
371 text/..
260Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
142 text/..
84 image/..
34 application/x-javascript
1 -
215gsa-crawler (Enterprise; S5-MS8QQPJ5BGWAA; mail address )
215 text/..
204wikiwix-bot-3.0
170 text/..
34 image/..
1 -
169php wikibot classes
169 application/vnd.php.serialized
1 text/..
113rdfbot/1.0 (Indian Language Web Search Engine; Rediff.com; rdfbot mail address )
113 text/..
1 image/..
97Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
53 image/..
35 text/..
9 application/x-javascript
1 application/json
89Tawbot (public svn release; plwiki)
89 text/..
85GoogleBot-Image/1.0
85 text/..
1 -
1 image/..
66crawler mail address
66 text/..
46Test Webbot
46 text/..
45COIBot/1.00
45 text/..
42SineBot/1.5.15(User:SineBot)
41 application/vnd.php.serialized
1 text/..
37web18bot
37 text/..
34DotNetWikiBot/2.71 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
26 text/..
8 application/xml
33JavaCrawler/1.1
33 text/..
1 -
1 image/..
1 application/ogg
30CorenSearchBot/1.4 en libwww-perl/5.808
30 text/..
29Nokia3100/1.0 (compatible; WukongBot)
29 text/..
1 -
28plantspedia data crawler
28 text/..
23GoogleBot
23 text/..
1 application/x-javascript
1 image/..
23MLBot (www.metadatalabs.com/mlbot)
23 text/..
1 image/..
22rdfbot/1.0 (rdfbot mail address )
22 text/..
1 image/..
21dictionary-bot
18 application/xml
3 text/..
21AarghBot Linux
21 text/..
20gsa-crawler (Enterprise; T1-EW7TFYE5SGSAS; mail address )
20 text/..
1 application/ogg
20DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
20 text/..
18MSR-ISRCCrawler
16 text/..
2 application/x-javascript
16AnomieBOT 1.0 (OrphanReferenceFixer)
16 application/json
15 mail address (Mozilla compatible)
14 text/..
1 image/..
15Jyxobot/1
15 text/..
14Mozilla/5.0 (Apibot 0.01)
14 application/vnd.php.serialized
14topyx-crawler
14 text/..
1 -
13DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600 Service Pack 2; )
13 text/..
1 -
1 application/xml
11SurakWare MediaWiki Bot/1.0
11 text/..
1 application/xml
11Szukacz/1.5 (robot; www.szukacz.pl/html/jak_dziala_robot.html; mail address )
11 text/..
1 image/..
10Bot/WP/EN/Daniel/MediationBot1/1.2
10 text/..
10Geni ircpybot 1.0
6 text/..
2 application/json
2 application/xml
10spider
10 text/..
1 application/xml
10Mozilla/5.0 (Bgbot 0.5)
10 text/..
9Codeton Software RSS Bot/1.0
9 text/..
9AnomieBOT 1.0 (WikiProjectTagger)
9 application/json
8Pywikipediabot/2.0
8 application/json
8atuahene-robot/0.1
7 text/..
1 image/..
1 application/x-javascript
1 application/xml
1 application/opensearchdescription+xml
8Szukacz/1.5 (robot; www.szukacz.pl/jakdzialarobot.html; mail address )
8 text/..
8GNAA-bot
8 text/..
8Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
8 text/..
8Mozilla/5.0 (Yahoo-MMCrawler/4.0; mail address )
7 text/..
1 image/..
74am-spider/1.0
7 text/..
7FAST Enterprise Crawler 6 used by Lenovo ( mail address )
7 text/..
1 -
7testcrawler
7 text/..
7Mozilla/5.0 (compatible; Crawling; mail address )
7 text/..
7OpenLink Virtuoso RDF crawler
7 text/..
1 application/xml
1 image/..
7DotNetWikiBot/2.71 (Microsoft Windows NT 5.2.3790 Service Pack 2; )
7 text/..
7Mozilla/5.0 compatible (Solsoft Crawler)
7 text/..
1 image/..
6DotNetWikiBot/2.7 (Microsoft Windows NT 6.1.7600.0; )
4 text/..
2 application/xml
6XLinkBot/1.00
6 text/..
6DotNetWikiBot/2.53 (Unix 2.6.26.2; )
6 text/..
6Draicone's bot
6 text/..
6CheMoBot/1.00
6 text/..
5beast/Nutch-1.0 (agentspider; mail address )
5 text/..
1 image/..
5YaDirectBot/1.0
5 text/..
4bitlybot
4 text/..
1 image/..
4infraEnterprise v8 Web Crawler
4 text/..
4ealbum/Nutch-1.0 (ealbum crawler; www.ealbum.com; mail address )
4 text/..
1 image/..
4FAST Enterprise Crawler 6 used by Wanadoo ( mail address )
3 application/x-javascript
1 text/..
1 -
4Freebase Deathbot
4 text/..
4SONIVIS MediaWiki API Bot 0.1.3
4 text/..
1 -
4AnomieBOT 1.0 (SourceUploader)
4 application/json
4AOL Reference Center Bot/1.0
4 text/..
4Xaldon WebSpider 2.7.b6
4 text/..
1 application/x-javascript
3SitiosEnMexico/Spider-1.0
3 text/..
3AnomieBOT 1.0 (WikiProjectWorker)
3 application/json
1 text/..
3Jabse.com Crawler v.2.0 www.jabse.com/crawler.php
3 text/..
3Perfectbot
3 text/..
3Abot
3 text/..
3AnomieBOT 1.0 (AFDMergeFromCleaner)
3 application/json
3unblockbot/1.00
3 text/..
6,664total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Monday December 14, 2009
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.