Wikimedia Visitor Log Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Aug 2009 - 31 Aug 2009

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Google
 
The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 38,801,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 234,902,000 external requests, which is 16.5%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
18,724google
16,204 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
1,800 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
179 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
143 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
92 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
78 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
45 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
43 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
37 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
30 code.google.com/appenginetext/..AppEngine-Google; (url)
21 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
14 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
12 www.google.com/coop/cse/creftext/..FeedFetcher-Google-CoOp; (url)
11 code.google.com/appengineapplication/xmlAppEngine-Google; (url)
4 code.google.com/appengineapplication/jsonPython-urllib/1.17 AppEngine-Google; (url)
8,177teesoft
2,079 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
1,498 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
1,256 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
904 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
301 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
241 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
239 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
170 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
163 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
160 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
143 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
126 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.4; [lang code]; rv:[..]) Gecko/.. etc (url)
108 www.teesoft.info/text/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
95 www.teesoft.info/application/xmlMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
94 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
76 www.teesoft.info/application/xmlMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
59 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux x86_64; [lang code]; rv:[..]) Gecko/.. etc (url)
57 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
38 www.teesoft.info/text/..Mozilla/5.0 (X11; Linux x86_64; [lang code]; rv:[..]) Gecko/.. etc (url)
31 www.teesoft.info/application/x-javascriptMozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
28 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
27 www.teesoft.info/-Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
24 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.2; [lang code]; rv:[..]) Gecko/.. etc (url)
21 www.teesoft.info/application/x-javascriptMozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
19 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.4; [lang code]; rv:[..]) Gecko/.. etc (url)
18 www.teesoft.info/application/xmlMozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
17 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.2; [lang code]; rv:[..]) Gecko/.. etc (url)
16 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.0; [lang code]; rv:[..]) Gecko/.. etc (url)
12 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.0; [lang code]; rv:[..]) Gecko/.. etc (url)
12 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; PPC Mac OS X 10.4; [lang code]; rv:[..]) Gecko/.. etc (url)
12 www.teesoft.info/application/xmlMozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
11 www.teesoft.info/-Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
9 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; PPC Mac OS X 10.4; [lang code]; rv:[..]) Gecko/.. etc (url)
8 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; PPC Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
8 www.teesoft.info/application/jsonMozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
8 www.teesoft.info/text/..Mozilla/4.0 (compatible) Greasemonkey AutoPager/0.5.2.2 (url)
7 www.teesoft.info/application/x-javascriptMozilla/5.0 (X11; Linux x86_64; [lang code]; rv:[..]) Gecko/.. etc (url)
7 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/application/jsonMozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux i686 (x86_64); [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/application/xmlMozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; PPC Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/application/x-javascriptMozilla/5.0 (Macintosh; Intel Mac OS X 10.4; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/image/..12345 AutoPager/0.5.2.2 (url)
3 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 5.2; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/text/..Mozilla/5.0 (X11; Linux i686 (x86_64); [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url) AutoPager/0.5.2.2 (http://www.teesoft.info/)
3 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/-Mozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
6,333yahoo
5,483 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
257 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
182 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
128 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
83 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
67 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmlimage/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
49 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
24 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
16 misc.yahoo.com.cn/help.html-Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
12 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
10 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
7 help.yahoo.comtext/..Mozilla/5.0 (YahooYSMcm/3.0.0; url)
4 help.yahoo.com/help/us/ysearch/slurpapplication/x-javascriptMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
4 help.yahoo.com/help/us/ysearch/crawling/crawling-01.htmltext/..Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; url)
5,145msn
2,897 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
1,409 search.msn.com/msnbot.htmtext/..msnbot/1.1 (url)
567 search.msn.com/msnbot.htm-msnbot/2.0b (url)
196 search.msn.com/msnbot.htm-msnbot/1.1 (url)
34 search.msn.com/msnbot.htmimage/..msnbot/1.1 (url)
15 search.msn.com/msnbot.htmtext/..librabot/1.0 (url)
10 search.msn.com/msnbot.htmtext/..msrabot/2.0/1.0 (url)
8 search.msn.com/msnbot.htmapplication/oggmsnbot/1.1 (url)
5 search.msn.com/msnbot.htmimage/..msnbot/2.0b (url)
1,897google?
1,674 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
57 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
34 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
30 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
26 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
25 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
13 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
10 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
6 www.google.com/bot.htmlapplication/x-wikiMozilla/5.0(compatible;GoogleBot/2.1;url)
6 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
5 www.google.com/bot.htmlapplication/xmlDoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
4 www.google.com/bot.htmltext/..KDDI-CA34 UP.Browser/6.2.0.10.2.2 (GUI) MMP/2.0 (compatible; KDDI-GoogleBot-Mobile/2.1; url)
3 www.google.com/bot.htmlapplication/xmlKDDI-CA34 UP.Browser/6.2.0.10.2.2 (GUI) MMP/2.0 (compatible; KDDI-GoogleBot-Mobile/2.1; url)
555exabot
529 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
11 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter/tests); url)
9 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
4 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot-Images/3.0; url)
479naver
400 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
34 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
18 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
12 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
8 help.naver.com/robots/text/..Mozilla/5.0 (compatible; Yeti/1.0; NHN Corp.; url)
4 help.naver.com/robots/image/..Mozilla/5.0 (compatible; Yeti/1.0; NHN Corp.; url)
431kosmix
384 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
43 www.kosmix.com/crawler.htmlapplication/xmlvoyager/1.0 url
4 www.kosmix.com/crawler.htmltext/..voyager/2.0 (url)
431soso
419 help.soso.com/webspider.htmtext/..Sosospider(url)
6 help.soso.com/soso-image-spider.htmimage/..Sosoimagespider(url)
3 help.soso.com/soso-blog-spider.htmtext/..Sosoblogspider(url)
365cuil
360 www.cuil.com/twiceler/robot.htmltext/..Mozilla/5.0 (Twiceler-0.9 url)
4 www.cuil.com/twiceler/robot.htmlapplication/xmlMozilla/5.0 (Twiceler-0.9 url)
364pipl
364 www.pipl.com/bot/text/..Mozilla/5.0(compatible;PiplBot;url)
361baidu
198 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
69 www.baidu.jp/spider/text/..Baiduspider(url)
38 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0; url)
17 www.baidu.jp/spider/application/xmlDoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0; url)
10 www.baidu.jp/spider/text/..BaiduImagespider(url)
9 www.baidu.com/search/spider.htm-Baiduspider(url)
9 www.baidu.jp/spider/-Baiduspider(url)
8 www.baidu.jp/spider/image/..BaiduImagespider(url)
271yanga
158 www.yanga.co.uk/text/..Yanga WorldSearch Bot v1.1/beta (url)
112 www.yanga.co.uk/image/..Yanga WorldSearch Bot v1.1/beta (url)
269sblog
177 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
50 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
26 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
12 fulltext.sblog.cz/screenshot/application/x-javascriptMozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
4 fulltext.sblog.cz/robot/-SeznamBot/2.0 (url)
223ask
213 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
6 about.ask.com/en/docs/about/webmasters.shtmlimage/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
4 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url) (via Web-Blaster/2.21 (http://www.assoziations-blaster.de/web-blast.html))
216youdao
189 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
7 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
6 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
5 www.youdao.com/help/webmaster/spider/image/..Mozilla/5.0 (compatible;YodaoBot-Image/1.0;url;)
3 www.youdao.com/help/webmaster/spider/audio/midiMozilla/5.0 (compatible; YodaoBot/1.0; url; )
209wikipedia
107 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.2 url
27 en.wikipedia.orgtext/..url
24 en.wikipedia.org/wiki/Wapediaapplication/vnd.php.serializedwapedia.mobi liveupdate (url)
15 zh.wikipedia.org/w/index.php?title=徐昌先&variant=zh-cntext/..url
13 zh.wikipedia.org/w/index.php?title=zhankaluo・feisiqiela&variant=zh-cntext/..url
7 zh.wikipedia.org/w/index.php?title=东风10型柴油机车&variant=zh-cntext/..url
5 zh.wikipedia.org/w/index.php?title=鄭忠&variant=zh-cntext/..url
191adsafemedia
191 www.adsafemedia.comtext/..Mozilla/5.0 (compatible; heritrix/${pom.version} url)
190yacy
20 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-14-generic; java 1.6.0_0; Europe/en) url
15 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-15-generic; java 1.6.0_0; Europe/en) url
14 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-128.4.1.el5; java 1.6.0_14; Europe/de) url
14 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-128.2.1.el5; java 1.6.0_14; Europe/de) url
12 yacy.net/bot.htmltext/..yacybot (i386 Mac OS X 10.5.8; java 1.5.0_19; Europe/de) url
10 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-15-generic; java 1.6.0_0; Europe/de) url
9 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-15-generic; java 1.6.0_14; Europe/en) url
8 yacy.net/bot.htmltext/..yacybot (amd64 Windows Vista 6.1; java 1.6.0_13; Europe/de) url
7 yacy.net/bot.htmltext/..yacybot (x86 Windows 2003 5.2; java 1.6.0_14; Europe/en) url
7 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-amd64; java 1.6.0_12; Europe/en) url
6 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-15-generic; java 1.6.0_14; Europe/en) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.27.25-0.1-default; java 1.6.0_15; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (x86 Windows Vista 6.0; java 1.6.0_15; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-6-amd64; java 1.5.0_14; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.30.1; java 1.6.0_0; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-128.7.1.el5; java 1.6.0; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-14-generic; java 1.6.0_14; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.9-023stab048.4-smp; java 1.5.0_14; Zulu/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-14-generic; java 1.6.0_0; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-14-generic; java 1.6.0_14; Europe/en) url
176wikimedia
174 tools.wikimedia.de/~daniel/text/..WikiSense (url)
169boardreader
169 spider.boardreader.comtext/..BoardReader Rating Builder/1.0 url
150mnemoo
150 www.mnemoo.com/about/spidertext/..Mnemoo Intelligent Spider/0.1alpha (compatible; See url)
145daum
145 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
135sogou
108 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
12 www.sogou.com/docs/help/webmasters.htm#07application/xmlSogou web spider/4.0(url)
7 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url) (via Web-Blaster/2.21 (http://www.a-blast.org/web-blast.html))
5 www.sogou.com/docs/help/webmasters.htm#07text/..Sogouwebrobot(url)
125emining
125 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
114traslated
114 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
113goo
111 help.goo.ne.jp/contact/text/..goo wikipedia (url)
104php
41 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
22 pear.php.net/application/jsonPEAR HTTP_Request class ( url )
21 pear.php.net/package/http_request2text/..HTTP_Request2/0.4.0 (url) PHP/5.2.10
18 pear.php.net/text/..PEAR HTTP_Request class ( url )
103facebook
69 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
28 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
4 developers.facebook.comtext/..facebookplatform/1.0 (url)
103gigablast
103 www.gigablast.com/spider.htmltext/..Gigabot/3.0 (url)
101majestic12
86 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.2.5; url)
12 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.2.4; url)
94dotnetdotcom
94 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
71freebase
71 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
5980legs
48 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
11 www.80legs.com/spider.htmlimage/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
54tourist-information-berlin
54 www.tourist-information-berlin.comtext/..WordPress/2.8.4; url
49linkaider
49 linkaider.com/crawler/text/..LinkAider (url)
43justsystems
43 www.justsystems.com/jp/tech/crawler/text/..JUST-CRAWLER(url)
41askpeter
41 www.askpeter.infotext/..Mozilla/5.0 (compatible; askpeter_bot/5.1; url)
40wordpress
30 support.wordpress.com/contact/text/..WordPress.com mShots; url
4 josefboberg.wordpress.comtext/..WordPress/MU; url
39edu
32 iws.seu.edu.cn/services/falcons/contact_us.jsptext/..Mozilla/5.0 (compatible; Falconsbot; url)
7 iws.seu.edu.cn/services/falcons/contact_us.jspimage/..Mozilla/5.0 (compatible; Falconsbot; url)
34snap
34 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
32cmu
27 boston.lti.cs.cmu.edu/crawler/text/..SapphireWebCrawler/Nutch-1.0-dev (Sapphire Web Crawler using Nutch; url; mail address )
5 boston.lti.cs.cmu.edu/crawler/image/..SapphireWebCrawler/1.0 (Sapphire Web Crawler using Nutch; url; mail address )
32entireweb
29 www.entireweb.com/about/search_tech/speedy_spider/text/..Speedy Spider (url)
3 www.entireweb.com/about/search_tech/speedy_spider/-Speedy Spider (url)
30qdos
30 qdos.com/text/..qdos/1.1 (url)
30greenivory
30 greenivory.frtext/..GreenIvory/Nutch-0.9 (GreenIvory-BlueCrane; url; mail address )
29spinn3r
27 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
29froute
22 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
7 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
29gastspiele
29 www.gastspiele.comtext/..WordPress/2.8.4; url
28archive-it
18 www.archive-it.orgimage/..Mozilla/5.0 (compatible;archive.org_bot/1.7.0; Archive-It; url)
10 www.archive-it.orgtext/..Mozilla/5.0 (compatible;archive.org_bot/1.7.0; Archive-It; url)
27weitz
27 weitz.de/drakma/text/..Drakma/0.11.5 (LispWorks 5.0.2; Linux; 2.6.18-5-686-bigmem; url)
27weblio
24 www.weblio.jp/text/..Mozilla/5.0 (compatible; WeblioBot; url)
26princeton
25 www.cs.princeton.edu/cass/text/..nu_tch-princeton/Nu_tch-1.0-dev (princeton crawler for cass project; url; zhewang a_t cs ddot princeton dot edu)
26Anonymouse
13 Anonymouse.org/image/..url (Unix)
11 Anonymouse.org/text/..url (Unix)
24hoqsearch
24 www.hoqsearch.comtext/..hoqBot/hoqBot-1.0 (hoqsearch - community based finding; url; mail address )
23alexa
23 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
22setooz
22 www.setooz.com/oozbot.htmltext/..OOZBOT/0.20 ( url ; mail address )
22rcdtokyo
15 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
7 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
21mashget
12 www.mashget.comtext/..Mashgetbot/2.1 (url)
5 www.mashget.comapplication/jsonMashGetBot1.0(url)
4 www.mashget.comapplication/jsonMashGetApp(url)
20mixi
10 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
10 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
20proximic
20 www.proximic.comtext/..Mozilla/5.0 (compatible; proximic; url)
20hatena
11 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
9 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
19newsgator
6 www.newsgator.comtext/..NewsGatorOnline/2.0 (url) bot
5 www.newsgator.comtext/..NewsGator/2.0 Bot (url)
3 www.newsgator.com/application/xmlFeedDemon/2.7 (url; Microsoft Windows)
19holmes
19 holmes.getext/..HolmesBot (url)
19picsearch
15 www.picsearch.com/bot.htmltext/..psbot/0.1 (url)
4 www.picsearch.com/bot.htmlimage/..psbot/0.1 (url)
17emusic
11 www.emusic.com/application/jsonMozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1.8pre) Gecko/20071017 eMusic DLM/4.0/1.0.0.2 (url) (like Firefox/2.0.0.8)
4 www.emusic.com/application/jsonMozilla/5.0 (Windows; Windows NT 6.0; en-US; rv:1.8.1.8pre) Gecko/20071017 eMusic DLM/4.0/1.0.0.2 (url) (like Firefox/2.0.0.8)
17topsy
17 labs.topsy.com/butterfly.htmltext/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
17feedparser
14 feedparser.org/application/xmlUniversalFeedParser/4.1 url
16heartrails
9 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
16kinolexikon
16 www.kinolexikon.detext/..WordPress/2.8.4; url
16xrss
16 www.xrss.eu/robottext/..Mozilla/5.0 (compatible; xrss; robot; url; version 2.0)
16domain
9 yourwebsite.domain.com/text/..Mozilla/5.0 (compatible; heritrix/1.14.0 url)
4 yourwebsite.domain.com/text/..Mozilla/5.0 (compatible; heritrix/1.8.0 url)
15scoutjet
15 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
15dium
13 me.dium.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 Me.dium/1.0 (url)
15arste
15 www.arste.infotext/..Mozilla/5.0 (compatible; arste.info_bot/1.1; url)
15googlepages
15 peterpuwang.googlepages.comtext/..Peter Wang/Nutch-0.9 (Nutch spiderman; url ; MyEmail)
15aafter
14 aafter.com/crawler.htmtext/..AAfter.com Crawler/AAfter-1.0 (This bot is very focused, well-behaved, and wants to do good to internet community. For any questions, please call collect 1 214-714-2224. Team AAfter, Dallas, TX, USA; url; crawler at aafter.com)
15phonifier
10 www.phonifier.comtext/..Mozilla/5.0 (compatible; Phonifier; url)
5 www.phonifier.comtext/..aNti_miSa$puAsa/5.0 (compatible; Phonifier; url)
14likaholix
11 likaholix.com/about/crawlerimage/..Mozilla/5.0 (compatible; LikaholixCrawler/1.0; url)
3 likaholix.com/about/crawlertext/..Mozilla/5.0 (compatible; LikaholixCrawler/1.0; url)
14simplepie
6 simplepie.orgapplication/xmlSimplePie/1.1.1 (Feed Parser; url; Allow like Gecko) Build/20080315205903
4 simplepie.orgtext/..SimplePie/1.1.3 (Feed Parser; url; Allow like Gecko) Build/20081219
13www.
3 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
3 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
3 www.image/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
13guruji
13 www.guruji.com/en/WebmasterFAQ.htmltext/..Mozilla/5.0 (compatible; GurujiBot/1.0; url)
12shopwiki
12 www.shopwiki.com/wiki/Help:Bottext/..ShopWiki/1.0 ( url)
12FeedBurner
11 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
12babaloo
12 www.babaloo.sitext/..BabalooSpider/1.3 (BabalooSpider; url; mail address )
11estsoft
11 www.estsoft.com/text/..Mozilla/5.0 (compatible; Estbot/1.0; url)
11cyberin-consultants
9 www.cyberin-consultants.comtext/..nutch-solr-integration/Nutch-1.0 (test; url; mail address )
10aport
10 www.aport.ru/helptext/..Mozilla/5.0 (compatible; AportWorm/3.2; url)
10mediawiki
10 www.mediawiki.org/text/..MediaWiki OAI Harvester 0.2 (url)
47,813total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
2,598PythonWikipediaBot/1.0
1,478 text/..
630 application/xml
490 application/json
1 -
1 image/..
1,202GoogleBot-Image/1.0
513 text/..
441 image/..
248 -
661Answersbot
661 text/..
1 -
357Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
357 text/..
1 -
1 application/xml
1 application/ogg
242Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
138 text/..
65 image/..
39 application/x-javascript
1 -
235gsa-crawler (Enterprise; S5-MS8QQPJ5BGWAA; mail address )
235 text/..
215wikiwix-bot-3.0
192 text/..
22 image/..
1 -
121AarghBot Linux
121 text/..
1 -
100Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
52 image/..
39 text/..
9 application/x-javascript
1 -
1 application/ogg
93Tawbot (public svn release; plwiki)
93 text/..
88php wikibot classes
88 application/vnd.php.serialized
86GoogleBot-Image/1.0
85 text/..
1 image/..
1 -
1 application/x-javascript
1 application/xml
54Mozilla/5.0 (Yahoo-MMCrawler/4.0; mail address )
45 image/..
9 text/..
1 -
44ListasBot 3
44 text/..
38SineBot/1.5.14(User:SineBot)
37 application/vnd.php.serialized
1 text/..
36FAST Enterprise Crawler 6 used by FAST ( mail address )
33 text/..
3 -
36COIBot/1.00
36 text/..
35CorenSearchBot/1.4 en libwww-perl/5.808
35 text/..
34Test Webbot
34 text/..
34DotNetWikiBot/2.71 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
31 text/..
3 application/xml
34MPUploadBot; PHP 5.2.6-3ubuntu4.2
34 application/vnd.php.serialized
31Plagiat Web Spider WSIiZ wsiz.rzeszow.pl
28 text/..
2 image/..
1 application/ogg
1 application/pdf
30Jyxobot/1
30 text/..
25crawler mail address
25 text/..
24GingerCrawler/1.0 (Language Assistant for Dyslexics; www.gingersoftware.com/crawler_agent.htm; support at ginger software dot com)
24 text/..
22GoogleBot
22 text/..
1 application/json
1 application/x-javascript
1 image/..
22dictionary-bot
18 application/xml
4 text/..
22SD Crawler/Nutch-0.9 (automated Crawler)
22 text/..
1 -
21MPUploadBot; PHP 5.2.6-3ubuntu4.1
21 application/vnd.php.serialized
1 -
19Mozilla/5.0 (compatible; crawltest/0.1)
19 text/..
18MLBot (www.metadatalabs.com/mlbot)
18 text/..
1 -
18plantspedia data crawler
18 text/..
17web18bot
17 text/..
15DotNetWikiBot/2.64 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
14 text/..
1 application/xml
13beast/Nutch-1.0 (agentspider; mail address )
13 text/..
1 image/..
1 application/ogg
11FAST Enterprise Crawler 6 used by Lenovo ( mail address )
11 text/..
1 -
10CrawlerTest/Nutch-1.0-dev
10 text/..
10SurakWare MediaWiki Bot/1.0
10 text/..
1 application/xml
10Geni ircpybot 1.0
5 text/..
3 application/json
2 application/xml
10dicbot 1.0
10 text/..
9Network-search.net [ZSEBOT]
9 text/..
9YaDirectBot/1.0
9 text/..
8OMGCrawler 1.0
8 text/..
8Bot/WP/EN/Daniel/MediationBot1/1.2
8 text/..
8Mozilla/5.0 (Bgbot 0.5)
8 text/..
7Mozilla/5.0 (Apibot 0.01)
7 application/vnd.php.serialized
6Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
6 text/..
6Draicone's bot
6 text/..
6OpenLink Virtuoso RDF crawler
6 text/..
1 application/xml
1 image/..
6websitethumbnail.de snapshot spider
6 text/..
6Mozilla/4.0 (compatible; focuseekbot)
6 text/..
5bitlybot
5 text/..
1 image/..
5PR Crawler/Nutch-1.0 (data mining develpment project; mail address )
5 text/..
5Mozilla/5.0 (Windows; Windows NT 5.1; en-GB; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot
5 text/..
1 image/..
5DotNetWikiBot/2.53 (Unix 2.6.26.2; )
5 text/..
5FlickySearchBot/1.0 (testMode)
5 text/..
1 application/xml
1 application/opensearchdescription+xml
1 image/..
5GNAA-bot
5 text/..
5FAST Enterprise Crawler 6 used by a (a)
5 text/..
1 -
5DotNetWikiBot/2.71 (Microsoft Windows NT 5.2.3790 Service Pack 2; )
5 text/..
4XLinkBot/1.00
4 text/..
4DotNetWikiBot/2.71 (Microsoft Windows NT 6.0.6002 Service Pack 2; )
4 text/..
4A .NET Web Crawler
4 text/..
4Freebase Deathbot
4 text/..
4AnomieBOT 1.0 (AFDMergeFromCleaner)
4 application/json
4WikiNEWSticsBOT (by user Melancholie)
4 text/..
4 mail address (Mozilla compatible)
4 text/..
1 image/..
4CheMoBot/1.00
4 text/..
3infraEnterprise v8 Web Crawler
3 text/..
3FAST Enterprise Crawler 6 used by MSN ( mail address )
3 text/..
3DotNetWikiBot/2.7 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
3 text/..
1 application/xml
3FAST Enterprise Crawler 6 used by Wanadoo ( mail address )
2 application/x-javascript
1 text/..
1 -
3Perfectbot
3 text/..
3AnomieBOT 1.0 (SourceUploader)
3 application/json
3Inlibris.com XMLBot/1.0
3 text/..
3Bot/WP/EN/E/EBot
3 text/..
6,868total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Sunday October 4, 2009
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.