Wikimedia Visitor Log Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Sep 2009 - 30 Sep 2009

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Google
 
The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 39,255,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 260,603,000 external requests, which is 15.1%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
17,541google
15,646 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
1,116 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
182 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
117 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
85 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
74 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
54 code.google.com/appenginetext/..AppEngine-Google; (url; appid nwikiproxy)
43 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.8267; url)
40 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
30 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
26 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid nwikiproxy)
24 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.2235; url)
22 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
18 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
14 code.google.com/appengineapplication/jsonPython-urllib/1.17 AppEngine-Google; (url; appid vittyo-site)
8 www.google.com/coop/cse/creftext/..FeedFetcher-Google-CoOp; (url)
6 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.8267; url)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid finchproxy)
3 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.2235; url)
3 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
7,711yahoo
5,850 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
1,298 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
164 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
102 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
83 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
81 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
40 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmlimage/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
40 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
15 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
11 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
9 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
4 misc.yahoo.com.cn/help.html-Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
3 help.yahoo.com/help/us/ysearch/slurpapplication/x-javascriptMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
3 help.yahoo.comtext/..Mozilla/5.0 (YahooYSMcm/3.0.0; url)
5,530msn
3,098 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
1,286 search.msn.com/msnbot.htmtext/..msnbot/1.1 (url)
909 search.msn.com/msnbot.htm-msnbot/2.0b (url)
159 search.msn.com/msnbot.htm-msnbot/1.1 (url)
32 search.msn.com/msnbot.htmimage/..msnbot/1.1 (url)
20 search.msn.com/msnbot.htmtext/..librabot/2.0/1.0 (url)
8 search.msn.com/msnbot.htmapplication/oggmsnbot/1.1 (url)
7 search.msn.com/msnbot.htmimage/..msnbot/2.0b (url)
7 search.msn.com/msnbot.htmtext/..msnbot-Products/1.0 (url)
3,696teesoft
1,012 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
687 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
567 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
371 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
125 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
98 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
96 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
86 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
65 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
59 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
57 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
51 www.teesoft.info/text/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
50 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
44 www.teesoft.info/application/xmlMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
34 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux x86_64; [lang code]; rv:[..]) Gecko/.. etc (url)
30 www.teesoft.info/application/xmlMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
28 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
25 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; [lang code]; rv:[..]) Gecko/.. etc (url)
21 www.teesoft.info/text/..Mozilla/5.0 (X11; Linux x86_64; [lang code]; rv:[..]) Gecko/.. etc (url)
17 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.4; [lang code]; rv:[..]) Gecko/.. etc (url)
15 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; [lang code]; rv:[..]) Gecko/.. etc (url)
11 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.2; [lang code]; rv:[..]) Gecko/.. etc (url)
11 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
11 www.teesoft.info/-Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
10 www.teesoft.info/application/x-javascriptMozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
9 www.teesoft.info/application/x-javascriptMozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
9 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.4; [lang code]; rv:[..]) Gecko/.. etc (url)
8 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.0; [lang code]; rv:[..]) Gecko/.. etc (url)
7 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.2; [lang code]; rv:[..]) Gecko/.. etc (url)
6 www.teesoft.info/application/xmlMozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
6 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.0; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/application/x-javascriptMozilla/5.0 (X11; Linux x86_64; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; PPC Mac OS X 10.4; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/-Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/application/xmlMozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; PPC Mac OS X 10.4; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/application/jsonMozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/application/jsonMozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; PPC Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
1,747google?
1,574 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
55 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
49 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
12 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
11 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
10 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
7 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
7 www.google.com/bot.htmltext/..KDDI-CA34 UP.Browser/6.2.0.10.2.2 (GUI) MMP/2.0 (compatible; KDDI-GoogleBot-Mobile/2.1; url)
6 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
6 www.google.com/bot.htmlapplication/xmlKDDI-CA34 UP.Browser/6.2.0.10.2.2 (GUI) MMP/2.0 (compatible; KDDI-GoogleBot-Mobile/2.1; url)
5 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
693naver
592 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
34 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
23 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
19 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
13 help.naver.com/robots/text/..Mozilla/5.0 (compatible; Yeti/1.0; NHN Corp.; url)
8 help.naver.com/robots/image/..Mozilla/5.0 (compatible; Yeti/1.0; NHN Corp.; url)
547soso
539 help.soso.com/webspider.htmtext/..Sosospider(url)
4 help.soso.com/soso-image-spider.htmimage/..Sosoimagespider(url)
536exabot
500 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
21 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter/tests); url)
14 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
455pipl
455 www.pipl.com/bot/text/..Mozilla/5.0(compatible;PiplBot;url)
410cuil
396 www.cuil.com/twiceler/robot.htmltext/..Mozilla/5.0 (Twiceler-0.9 url)
5 www.cuil.comimage/..Mozilla/5.0 (compatible; heritrix/1.14.0 url)
4 www.cuil.comtext/..Mozilla/5.0 (compatible; heritrix/1.14.0 url)
3 www.cuil.com/twiceler/robot.htmlapplication/xmlMozilla/5.0 (Twiceler-0.9 url)
389baidu
249 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
64 www.baidu.jp/spider/text/..Baiduspider(url)
41 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0; url)
15 www.baidu.jp/spider/application/xmlDoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0; url)
9 www.baidu.com/search/spider.htm-Baiduspider(url)
5 www.baidu.jp/spider/-Baiduspider(url)
277youdao
251 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
11 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
4 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
3 www.youdao.com/help/webmaster/spider/audio/midiMozilla/5.0 (compatible; YodaoBot/1.0; url; )
3 www.youdao.com/help/webmaster/spider/image/..Mozilla/5.0 (compatible;YodaoBot-Image/1.0;url;)
274yanga
178 www.yanga.co.uk/text/..Yanga WorldSearch Bot v1.1/beta (url)
96 www.yanga.co.uk/image/..Yanga WorldSearch Bot v1.1/beta (url)
249sblog
161 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
45 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
31 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
10 fulltext.sblog.cz/screenshot/application/x-javascriptMozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
237yacy
56 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-15-generic; java 1.6.0_0; Europe/en) url
17 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-6-amd64; java 1.5.0_14; Europe/de) url
16 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-128.7.1.el5; java 1.6.0; Europe/de) url
15 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-15-generic; java 1.6.0_16; Europe/en) url
12 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-15-generic; java 1.6.0_0; Europe/en) url
11 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-15-generic; java 1.6.0_13; Europe/en) url
10 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.26-2-686; java 1.6.0_0; Europe/de) url
9 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-15-generic; java 1.6.0_14; Europe/en) url
7 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-15-generic; java 1.6.0_0; Europe/de) url
7 yacy.net/bot.htmltext/..yacybot (amd64 FreeBSD 7.1-STABLE; java 1.6.0_07; Europe/en) url
7 yacy.net/bot.htmltext/..yacybot (amd64 FreeBSD 7.1-PRERELEASE; java 1.6.0_07; Europe/en) url
6 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.26-2-686; java 1.5.0_17; Europe/de) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.16.46-0.12-smp; java 1.6.0_15; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-amd64; java 1.6.0_12; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.24_UB; java 1.6.0_0; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-15-generic; java 1.6.0_14; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.24-7-pve; java 1.6.0_13; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (x86 Windows 7 6.1; java 1.6.0_16; Europe/de) url
226ask
206 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
8 about.ask.com/en/docs/about/webmasters.shtmlimage/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
8 about.ask.com/en/docs/about/webmasters.shtml-Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
4 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url) (via Web-Blaster/2.21 (http://www.assoziations-blaster.de/web-blast.html))
190wikipedia
99 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.2 url
34 zh.wikipedia.org/w/index.php?title=苏西特纳河&variant=zh-cntext/..url
27 en.wikipedia.org/wiki/Wapediaapplication/vnd.php.serializedwapedia.mobi liveupdate (url)
15 en.wikipedia.orgtext/..url
8 zh.wikipedia.org/w/index.php?title=File:BSicon_exBR?CKE2.svg&variant=zh-cntext/..url
173boardreader
173 spider.boardreader.comtext/..BoardReader Rating Builder/1.0 url
164wikimedia
162 tools.wikimedia.de/~daniel/text/..WikiSense (url)
162mnemoo
162 www.mnemoo.com/about/spidertext/..Mnemoo Intelligent Spider/0.1alpha (compatible; See url)
151dotnetdotcom
151 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
149php
63 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
34 pear.php.net/package/http_request2text/..HTTP_Request2/0.4.0 (url) PHP/5.2.10
26 pear.php.net/application/jsonPEAR HTTP_Request class ( url )
24 pear.php.net/text/..PEAR HTTP_Request class ( url )
131daum
131 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
128facebook
81 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
38 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
5 developers.facebook.comtext/..facebookplatform/1.0 (url)
4 developers.facebook.comimage/..facebookplatform/1.0 (url)
114goo
108 help.goo.ne.jp/contact/text/..goo wikipedia (url)
4 help.goo.ne.jp/door/crawler.htmltext/..ichiro/3.0 (url)
111sogou
94 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
12 www.sogou.com/docs/help/webmasters.htm#07application/xmlSogou web spider/4.0(url)
108activepeople
108 www.activepeople.nettext/..WordPress/2.8.4; url
107majestic12
82 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.2.5; url)
16 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.0; url)
6 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.2.4; url)
94gigablast
94 www.gigablast.com/spider.htmltext/..Gigabot/3.0 (url)
81tourist-information-berlin
81 www.tourist-information-berlin.comtext/..WordPress/2.8.4; url
62freebase
62 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
56kosmix
26 www.kosmix.com/crawler.htmlapplication/xmlvoyager/1.0 url
18 www.kosmix.com/crawler.htmltext/..voyager/2.0 (url)
12 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
51emining
51 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
50xrss
50 www.xrss.eu/robottext/..Mozilla/5.0 (compatible; xrss; robot; url; version 2.0)
45wordpress
31 support.wordpress.com/contact/text/..WordPress.com mShots; url
5 josefboberg.wordpress.comtext/..WordPress/MU; url
4 mrsmvp.wordpress.comtext/..WordPress/MU; url
3 jamesmessig.wordpress.comtext/..WordPress/MU; url
44moose
44 www.moose.at/text/..Mozilla/5.0 (compatible; MooseBot/1.1; Linux i686; de; url)
39entireweb
34 www.entireweb.com/about/search_tech/speedy_spider/text/..Speedy Spider (url)
5 www.entireweb.com/about/search_tech/speedy_spider/-Speedy Spider (url)
38spinn3r
34 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
3 spinn3r.com/robot-Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
38gastspiele
38 www.gastspiele.comtext/..WordPress/2.8.4; url
34aport
34 www.aport.ru/helptext/..Mozilla/5.0 (compatible; AportWorm/3.2; url)
33weitz
33 weitz.de/drakma/text/..Drakma/0.11.5 (LispWorks 5.0.2; Linux; 2.6.18-5-686-bigmem; url)
33snap
31 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
30qdos
30 qdos.com/text/..qdos/1.1 (url)
29princeton
28 www.cs.princeton.edu/cass/text/..nu_tch-princeton/Nu_tch-1.0-dev (princeton crawler for cass project; url; zhewang a_t cs ddot princeton dot edu)
29dium
27 me.dium.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 Me.dium/1.0 (url)
2880legs
24 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
4 www.80legs.com/spider.htmlimage/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
27linkedin
27 www.linkedin.comtext/..LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta Commons-HttpClient/3.1 url)
26textdigger
23 textdigger.comtext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
3 textdigger.comimage/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
25traslated
25 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
24froute
19 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
5 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
24guruji
24 www.guruji.com/en/WebmasterFAQ.htmltext/..Mozilla/5.0 (compatible; GurujiBot/1.0; url)
23heartrails
16 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
4 capture.heartrails.com/image/..Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
3 capture.heartrails.com/application/x-javascriptMozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
23Anonymouse
13 Anonymouse.org/image/..url (Unix)
8 Anonymouse.org/text/..url (Unix)
22mixi
12 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
10 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
22simplepie
6 simplepie.orgapplication/xmlSimplePie/1.2 (Feed Parser; url; Allow like Gecko) Build/20090627192103
6 simplepie.orgapplication/xmlSimplePie/1.1.1 (Feed Parser; url; Allow like Gecko) Build/20080315205903
21searchme
7 www.searchme.com/support/text/..Mozilla/5.0 (compatible; Charlotte/1.1; url)
7 www.searchme.com/support/text/..Mozilla/5.0 (compatible; Charlotte/1.0t; url)
4 www.searchme.com/support/image/..Mozilla/5.0 (compatible; Charlotte/1.0t; url)
21rcdtokyo
14 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
7 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
21cmu
12 boston.lti.cs.cmu.edu/crawler/text/..SapphireWebCrawler/Nutch-1.0-dev (Sapphire Web Crawler using Nutch; url; mail address )
9 boston.lti.cs.cmu.edu/crawler/image/..SapphireWebCrawler/1.0 (Sapphire Web Crawler using Nutch; url; mail address )
21mashget
12 www.mashget.comtext/..Mashgetbot/2.1 (url)
9 www.mashget.comapplication/jsonMashGetBot1.0(url)
20alexa
20 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
20www.
8 www.text/..GoogleBot-Image/1.0 ( urlGoogleBot.com/bot.html)
7 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
3 www.image/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
20holmes
20 holmes.getext/..HolmesBot (url)
20greenivory
20 greenivory.frtext/..GreenIvory/Nutch-0.9 (GreenIvory-BlueCrane; url; mail address )
19newsgator
5 www.newsgator.comtext/..NewsGator/2.0 Bot (url)
4 www.newsgator.comtext/..NewsGatorOnline/2.0 (url) bot
19shopwiki
19 www.shopwiki.com/wiki/Help:Bottext/..ShopWiki/1.0 ( url)
19kinolexikon
16 www.kinolexikon.comtext/..WordPress/2.8.4; url
3 www.kinolexikon.detext/..WordPress/2.8.4; url
19hatena
10 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
9 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
18setooz
18 www.setooz.com/oozbot.htmltext/..OOZBOT/0.20 ( url ; mail address )
18attheblogzone
18 tagitlink.attheblogzone.infotext/..WordPress/2.8.4; url
17emusic
6 www.emusic.com/application/jsonMozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1.8pre) Gecko/20071017 eMusic DLM/4.0/1.0.0.2 (url) (like Firefox/2.0.0.8)
4 www.emusic.com/text/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1.8pre) Gecko/20071017 eMusic DLM/4.0/1.0.0.2 (url) (like Firefox/2.0.0.8)
3 www.emusic.com/application/jsonMozilla/5.0 (Windows; Windows NT 6.0; en-US; rv:1.8.1.8pre) Gecko/20071017 eMusic DLM/4.0/1.0.0.2 (url) (like Firefox/2.0.0.8)
17edu
13 iws.seu.edu.cn/services/falcons/contact_us.jsptext/..Mozilla/5.0 (compatible; Falconsbot; url)
4 iws.seu.edu.cn/services/falcons/contact_us.jspimage/..Mozilla/5.0 (compatible; Falconsbot; url)
16picmole
16 www.picmole.comtext/..Mozilla/5.0 (compatible;picmole/1.0 url)
16telehouse
16 telehouse.ru/crawler.htmltext/..Mozilla/5.0 (compatible; Dolphin/1.0; url)
15
15 text/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0/1.0 (bot; url)
15moreover
15 www.moreover.comtext/..Moreoverbot/5.00 (url; mail address )
15sitescooper
14 sitescooper.orgtext/..sitescooper/3.1.2 (url) libwww-perl/5.79
15weblio
12 www.weblio.jp/text/..Mozilla/5.0 (compatible; WeblioBot; url)
3 www.weblio.jp/info/crawler.jsptext/..Mozilla/5.0 (compatible; Webliobot/0.1; url)
14aafter
13 aafter.com/crawler.htmtext/..AAfter.com Crawler/AAfter-1.0 (This bot is very focused, well-behaved, and wants to do good to internet community. For any questions, please call collect 1 214-714-2224. Team AAfter, Dallas, TX, USA; url; crawler at aafter.com)
14topsy
14 labs.topsy.com/butterfly.htmltext/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
14archive-it
9 www.archive-it.orgimage/..Mozilla/5.0 (compatible;archive.org_bot/1.7.0; Archive-It; url)
5 www.archive-it.orgtext/..Mozilla/5.0 (compatible;archive.org_bot/1.7.0; Archive-It; url)
13phonifier
11 www.phonifier.comtext/..Mozilla/5.0 (compatible; Phonifier; url)
13accelobot
13 www.accelobot.comtext/..Mozilla/5.0 (compatible; heritrix/1.14.3 url)
13domain
11 yourwebsite.domain.com/text/..Mozilla/5.0 (compatible; heritrix/1.8.0 url)
12jumptap
10 www.jumptap.com/jumpbottext/..Nokia6820/2.0 (5.88) Profile/MIDP-1.0 Configuration/CLDC-1.0/1.0 (Jumpbot; url; mail address )
11bloglines
7 www.bloglines.com-Bloglines/3.1 (url; 1 subscriber)
11wikipediavspredator
11 wikipediavspredator.com/text/..WikipediaVsPredator/1.0 url
10picsearch
6 www.picsearch.com/bot.htmltext/..psbot/0.1 (url)
4 www.picsearch.com/bot.htmlimage/..psbot/0.1 (url)
10babaloo
10 www.babaloo.sitext/..BabalooSpider/1.3 (BabalooSpider; url; mail address )
10kalooga
6 www.kalooga.com/info.html?page=crawlertext/..Mozilla/5.0 (compatible; KaloogaBot; url)
4 www.kalooga.com/info.html?page=crawlerimage/..Mozilla/5.0 (compatible; KaloogaBot; url)
43,792total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
2,303PythonWikipediaBot/1.0
1,113 text/..
634 application/json
556 application/xml
1 -
1 application/x-javascript
1 image/..
1,719GoogleBot-Image/1.0
656 image/..
639 text/..
423 -
1 application/pdf
467Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
467 text/..
1 -
1 image/..
1 application/ogg
372Answersbot
372 text/..
1 -
248Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
134 text/..
78 image/..
36 application/x-javascript
1 -
217gsa-crawler (Enterprise; S5-MS8QQPJ5BGWAA; mail address )
217 text/..
190wikiwix-bot-3.0
182 text/..
8 image/..
1 -
127GoogleBot-Image/1.0
126 text/..
1 image/..
1 -
1 application/x-javascript
103Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
54 image/..
39 text/..
10 application/x-javascript
1 application/json
1 application/ogg
97Tawbot (public svn release; plwiki)
97 text/..
86DotNetWikiBot/2.71 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
72 text/..
14 application/xml
78AarghBot Linux
78 text/..
1 -
55COIBot/1.00
55 text/..
46Test Webbot
46 text/..
43MPUploadBot; PHP 5.2.6-3ubuntu4.2
43 application/vnd.php.serialized
42php wikibot classes
42 application/vnd.php.serialized
40web18bot
40 text/..
39crawler mail address
39 text/..
36CorenSearchBot/1.4 en libwww-perl/5.808
36 text/..
31FAST Enterprise Crawler/6.7.0 ( mail address )
31 text/..
1 application/xml
1 image/..
28SineBot/1.5.14(User:SineBot)
27 application/vnd.php.serialized
1 text/..
28CrawlerTest/Nutch-1.0-dev
28 text/..
1 -
27GoogleBot
27 text/..
1 application/x-javascript
1 image/..
25MSR-ISRCCrawler
21 text/..
4 application/x-javascript
1 image/..
24plantspedia data crawler
24 text/..
22dictionary-bot
19 application/xml
3 text/..
21Jyxobot/1
21 text/..
20MLBot (www.metadatalabs.com/mlbot)
20 text/..
1 -
1 image/..
18FAST Enterprise Crawler 6 used by FAST ( mail address )
16 text/..
2 -
16SD Crawler/Nutch-0.9 (automated Crawler)
16 text/..
16AnomieBOT 1.0 (OrphanReferenceFixer)
16 application/json
16Mozilla/5.0 (Yahoo-MMCrawler/4.0; mail address )
12 image/..
4 text/..
1 -
15 mail address (Mozilla compatible)
15 text/..
1 image/..
13dicbot 1.0
13 text/..
12Nokia3100/1.0 (compatible; WukongBot)
12 text/..
1 -
11SurakWare MediaWiki Bot/1.0
11 text/..
11FAST Enterprise Crawler 6 used by a (a)
11 text/..
1 -
10Bot/WP/EN/Daniel/MediationBot1/1.2
10 text/..
10OpenLink Virtuoso RDF crawler
10 text/..
1 image/..
10Mozilla/5.0 (Bgbot 0.5)
10 text/..
9FAST Enterprise Crawler 6 used by Lenovo ( mail address )
9 text/..
1 -
1 application/xml
9spider
8 text/..
1 application/xml
9Mozilla/5.0 compatible (Solsoft Crawler)
9 text/..
8rdfbot/1.0 (Indian Language Web Search Engine; Rediff.com; rdfbot mail address )
8 text/..
8YaDirectBot/1.0
8 text/..
1 -
8FAST Enterprise Crawler 6 used by MICROLINK ( mail address )
8 text/..
7Pope Web Crawler
7 text/..
1 -
1 application/opensearchdescription+xml
7Geni ircpybot 1.0
4 text/..
2 application/json
1 application/xml
7DotNetWikiBot/2.71 (Microsoft Windows NT 5.2.3790 Service Pack 2; )
7 text/..
1 application/xml
6gsa-crawler (Enterprise; S5-KUKT7ERTD8NJB; mail address )
6 text/..
1 -
6c0rwin/Nutch-1.0 (Nutch spiderman; MyEmail)
6 text/..
1 application/x-javascript
1 image/..
6Mozilla/5.0 (Apibot 0.01)
6 application/vnd.php.serialized
6XLinkBot/1.00
6 text/..
6DotNetWikiBot/2.53 (Unix 2.6.26.2; )
6 text/..
6GNAA-bot
6 text/..
6Draicone's bot
6 text/..
6Acre/acre/dev/23:79812 spencerbots.spencermountain.user.dev.freebaseapps.com
6 application/json
5LinguaBot/v0.001-dev (MultiLinual Sarch Engine v0.001; LinguaSeek; mail address )
5 text/..
1 -
5beast/Nutch-1.0 (agentspider; mail address )
5 text/..
1 image/..
5Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
5 text/..
5JavaCrawler/1.1
5 text/..
1 -
5Szukacz/1.5 (robot; www.szukacz.pl/html/jak_dziala_robot.html; mail address )
5 text/..
5Freebase Deathbot
5 text/..
5GingerCrawler/1.0 (Language Assistant for Dyslexics; www.gingersoftware.com/crawler_agent.htm; support at ginger software dot com)
5 text/..
5websitethumbnail.de snapshot spider
5 text/..
5WikiNEWSticsBOT (by user Melancholie)
5 text/..
5Plagiat Web Spider WSIiZ wsiz.rzeszow.pl
5 text/..
1 application/pdf
1 image/..
1 application/ogg
5CheMoBot/1.00
5 text/..
4bitlybot
4 text/..
1 image/..
4PR Crawler/Nutch-1.0 (data mining develpment project; mail address )
4 text/..
4FAST Enterprise Crawler 6 used by Wanadoo ( mail address )
3 application/x-javascript
1 text/..
4ListasBot 3
4 text/..
4SONIVIS MediaWiki API Bot 0.1.3
4 text/..
4AnomieBOT 1.0 (SourceUploader)
4 application/json
4Mozilla/4.0 (compatible; focuseekbot)
4 text/..
3Anomebot v2.0
2 application/json
1 text/..
3infraEnterprise v8 Web Crawler
3 text/..
3GoogleBot/2.1
2 image/..
1 text/..
1 application/x-javascript
3FAST Enterprise Crawler 6 used by MSN ( mail address )
3 text/..
3Jabse.com Crawler v.2.0 www.jabse.com/crawler.php
3 text/..
3QuickFinder Crawler
3 text/..
3Perfectbot
3 text/..
3DotNetWikiBot/2.3 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
3 text/..
3AnomieBOT 1.0 (AFDMergeFromCleaner)
3 application/json
3AOL Reference Center Bot/1.0
3 text/..
3gsa-crawler (Enterprise; S5-MMAT6R2TG8JJT; mail address )
3 text/..
3FAST Enterprise Crawler 6 used by
3 text/..
3SineBot/1.5.15(User:SineBot)
3 application/vnd.php.serialized
1 text/..
3unblockbot/1.00
3 text/..
6,939total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Friday October 30, 2009
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.