Wikimedia Visitor Log Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Jul 2009 - 31 Jul 2009

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Google
 
The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 37,249,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 231,682,000 external requests, which is 16.1%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
18,528google
15,948 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
1,866 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
148 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
141 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
107 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
82 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
43 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
42 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
41 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
39 code.google.com/appenginetext/..AppEngine-Google; (url)
15 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
15 code.google.com/appengineapplication/xmlAppEngine-Google; (url)
14 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
13 code.google.com/appengineapplication/jsonpython-wikitools/0.1.1 AppEngine-Google; (url)
3 code.google.com/appengineimage/..AppEngine-Google; (url)
7,232teesoft
1,763 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
1,233 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
1,070 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
705 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
477 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
280 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
213 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
173 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
172 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
137 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
107 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
102 www.teesoft.info/text/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
90 www.teesoft.info/application/xmlMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
78 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
74 www.teesoft.info/application/xmlMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
52 www.teesoft.info/application/x-javascriptMozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
52 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux x86_64; [lang code]; rv:[..]) Gecko/.. etc (url)
48 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
42 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
36 www.teesoft.info/application/x-javascriptMozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
35 www.teesoft.info/text/..Mozilla/5.0 (X11; Linux x86_64; [lang code]; rv:[..]) Gecko/.. etc (url)
27 www.teesoft.info/-Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
21 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.2; [lang code]; rv:[..]) Gecko/.. etc (url)
21 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.4; [lang code]; rv:[..]) Gecko/.. etc (url)
16 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.0; [lang code]; rv:[..]) Gecko/.. etc (url)
15 www.teesoft.info/application/xmlMozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
15 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.2; [lang code]; rv:[..]) Gecko/.. etc (url)
15 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.4; [lang code]; rv:[..]) Gecko/.. etc (url)
14 www.teesoft.info/-Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
13 www.teesoft.info/application/x-javascriptMozilla/5.0 (X11; Linux x86_64; [lang code]; rv:[..]) Gecko/.. etc (url)
11 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.0; [lang code]; rv:[..]) Gecko/.. etc (url)
11 www.teesoft.info/application/xmlMozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
10 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; PPC Mac OS X 10.4; [lang code]; rv:[..]) Gecko/.. etc (url)
7 www.teesoft.info/application/jsonMozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
7 www.teesoft.info/text/..Mozilla/4.0 (compatible) Greasemonkey AutoPager/0.5.2.2 (url)
6 www.teesoft.info/application/x-javascriptMozilla/5.0 (Macintosh; Intel Mac OS X 10.4; [lang code]; rv:[..]) Gecko/.. etc (url)
6 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; PPC Mac OS X 10.4; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; PPC Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 5.2; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/application/jsonMozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 6.1; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 5.0; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux i686 (x86_64); [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/text/..Mozilla/5.0 (Macintosh; PPC Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/image/..12345 AutoPager/0.5.2.2 (url)
3 www.teesoft.info/-Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/application/x-javascriptMozilla/5.0 (Macintosh; PPC Mac OS X 10.4; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/text/..Mozilla/5.0 (X11; Linux i686 (x86_64); [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/application/xmlMozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
6,400yahoo
5,513 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
251 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
184 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
132 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
106 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
67 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
58 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
26 help.yahoo.com/help/us/ysearch/slurpapplication/x-javascriptMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
18 misc.yahoo.com.cn/help.html-Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
13 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
11 help.yahoo.com/help/us/ysearch/crawling/crawling-01.htmltext/..Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; url)
10 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
4 developer.yahoo.com/searchmonkey/useragentimage/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
3 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-PSC/1.0 (url)
3,396msn
1,588 search.msn.com/msnbot.htmtext/..msnbot/1.1 (url)
1,174 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
253 search.msn.com/msnbot.htm-msnbot/2.0b (url)
253 search.msn.com/msnbot.htm-msnbot/1.1 (url)
46 search.msn.com/msnbot.htmtext/..msnbot-media/1.1 (url)
29 search.msn.com/msnbot.htmimage/..msnbot/1.1 (url)
17 search.msn.com/msnbot.htmtext/..msrabot/2.0/1.0 (url)
9 search.msn.com/msnbot.htmapplication/oggmsnbot/1.1 (url)
9 search.msn.com/msnbot.htmimage/..msnbot-media/1.1 (url)
7 search.msn.com/msnbot.htmtext/..librabot/1.0 (url)
6 search.msn.com/msnbot.htmimage/..msnbot/2.0b (url)
1,903google?
1,694 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
57 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
36 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
31 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
25 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
22 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
11 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
7 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
5 www.google.com/bot.htmlapplication/xmlDoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
4 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
4 www.google.com/bot.htmlapplication/xmlKDDI-CA34 UP.Browser/6.2.0.10.2.2 (GUI) MMP/2.0 (compatible; KDDI-GoogleBot-Mobile/2.1; url)
3 www.google.com/bot.htmltext/..KDDI-CA34 UP.Browser/6.2.0.10.2.2 (GUI) MMP/2.0 (compatible; KDDI-GoogleBot-Mobile/2.1; url)
919exabot
786 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
113 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot-Images/3.0; url)
9 www.exabot.com/go/robotimage/..Mozilla/5.0 (compatible; Exabot-Images/3.0; url)
6 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
5 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter/tests); url)
460naver
396 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
26 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
13 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
12 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
8 help.naver.com/robots/text/..Mozilla/5.0 (compatible; Yeti/1.0; NHN Corp.; url)
3 help.naver.com/robots/image/..Mozilla/5.0 (compatible; Yeti/1.0; NHN Corp.; url)
389kosmix
338 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
39 www.kosmix.com/crawler.htmlapplication/xmlvoyager/1.0 url
10 www.kosmix.com/crawler.htmltext/..voyager/1.0url
360searchme
131 www.searchme.com/support/text/..Mozilla/5.0 (compatible; Charlotte/1.0t; url)
107 www.searchme.com/support/text/..Mozilla/5.0 (compatible; Charlotte/1.1; url)
77 www.searchme.com/support/image/..Mozilla/5.0 (compatible; Charlotte/1.0t; url)
44 www.searchme.com/support/application/x-javascriptMozilla/5.0 (compatible; Charlotte/1.0t; url)
350pipl
350 www.pipl.com/bot/text/..Mozilla/5.0(compatible;PiplBot;url)
348cuil
345 www.cuil.com/twiceler/robot.htmltext/..Mozilla/5.0 (Twiceler-0.9 url)
309baidu
169 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
51 www.baidu.jp/spider/text/..Baiduspider(url)
32 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0; url)
17 www.baidu.jp/spider/application/xmlDoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0; url)
13 www.baidu.com/search/spider.htm-Baiduspider(url)
11 www.baidu.jp/spider/image/..BaiduImagespider(url)
9 www.baidu.jp/spider/text/..BaiduImagespider(url)
5 www.baidu.jp/spider/-Baiduspider(url)
279scoutjet
279 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
273justsystems
272 www.justsystems.com/jp/tech/crawler/text/..JUST-CRAWLER(url)
232yacy
47 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-13-generic; java 1.6.0_0; Europe/en) url
21 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-13-generic; java 1.6.0_13; Europe/en) url
17 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-13-generic; java 1.6.0_13; Europe/en) url
13 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-13-generic; java 1.6.0_14; Europe/en) url
12 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-amd64; java 1.6.0_12; Europe/en) url
11 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.24-23-generic; java 1.6.0_07; Europe/de) url
9 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.24-etchnhalf.1-amd64; java 1.5.0_14; UTC/en) url
8 yacy.net/bot.htmltext/..yacybot (amd64 Windows 7 6.1; java 1.6.0_14; Europe/de) url
7 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-13-generic; java 1.6.0_0; SystemV/en) url
6 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.27.21-0.1-default; java 1.6.0_0; Europe/de) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.24-etchnhalf.1-amd64; java 1.6.0_07; UTC/en) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18.8-xen-domU_yacy_v10; java 1.6.0_12; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-13-generic; java 1.6.0_0; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (x86 Windows 2003 5.2; java 1.6.0_14; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_14; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (x86 Windows Vista 6.0; java 1.6.0_14; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-13-generic; java 1.6.0_0; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.25-gentoo-r7; java 1.6.0_07; UTC/en) url
224soso
212 help.soso.com/webspider.htmtext/..Sosospider(url)
10 help.soso.com/soso-image-spider.htmimage/..Sosoimagespider(url)
217wikimedia
216 tools.wikimedia.de/~daniel/text/..WikiSense (url)
212dotnetdotcom
212 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
205wikipedia
102 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.2 url
24 zh.wikipedia.org/w/index.php?title=File:Ko?obrzeg10.jpg&variant=zh-cntext/..url
22 zh.wikipedia.org/w/index.php?title=Wikipedia:首页&variant=zh-cntext/..url
19 en.wikipedia.orgtext/..url
17 en.wikipedia.org/wiki/Wapediaapplication/vnd.php.serializedwapedia.mobi liveupdate (url)
5 zh.wikipedia.org/w/index.php?title=Special:sousuo/zhaitenglongfu&variant=zh-cntext/..url
3 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.1 url
3 ko.wikipedia.orgtext/..url
3 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.2.0 url
3 en.wikipedia.org/wiki/Web_crawlertext/..GoogleBot/Nutch-1.0-dev (Prototype; url; mail address )
196youdao
126 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
26 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
18 www.youdao.com/help/webmaster/spider/image/..Mozilla/5.0 (compatible;YodaoBot-Image/1.0;url;)
8 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible;YodaoBot-Image/1.0;url;)
6 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; ) (via Web-Blaster/2.21 (http://www.a-blast.org/web-blast.html))
3 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
193sblog
108 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
50 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
17 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
16 fulltext.sblog.cz/screenshot/application/x-javascriptMozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
192mnemoo
191 www.mnemoo.com/about/spidertext/..Mnemoo Intelligent Spider/0.1alpha (compatible; See url)
189ask
180 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
5 about.ask.com/en/docs/about/webmasters.shtmlimage/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
3 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url) (via Web-Blaster/2.21 (http://www.assoziations-blaster.de/web-blast.html))
159yanga
102 www.yanga.co.uk/text/..Yanga WorldSearch Bot v1.1/beta (url)
55 www.yanga.co.uk/image/..Yanga WorldSearch Bot v1.1/beta (url)
146boardreader
146 spider.boardreader.comtext/..BoardReader Rating Builder/1.0 url
127goo
122 help.goo.ne.jp/contact/text/..goo wikipedia (url)
3 help.goo.ne.jp/door/crawler.htmltext/..ichiro/3.0 (url)
114daum
113 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
112loc
108 www.loc.gov/minerva/crawl.htmltext/..Mozilla/5.0 (compatible; archive.org_bot/1.6.0 url)
103sogou
85 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
8 www.sogou.com/docs/help/webmasters.htm#07application/xmlSogou web spider/4.0(url)
5 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url) (via Web-Blaster/2.21 (http://www.a-blast.org/web-blast.html))
88php
34 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
23 pear.php.net/text/..PEAR HTTP_Request class ( url )
14 pear.php.net/package/http_request2text/..HTTP_Request2/0.4.0 (url) PHP/5.2.9
11 pear.php.net/application/jsonPEAR HTTP_Request class ( url )
5 pear.php.net/package/http_request2text/..HTTP_Request2/0.4.0 (url) PHP/5.2.10
85gigablast
85 www.gigablast.com/spider.htmltext/..Gigabot/3.0 (url)
83majestic12
70 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.2.5; url)
11 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.2.4; url)
80facebook
53 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
20 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
4 developers.facebook.comtext/..facebookplatform/1.0 (url)
70freebase
70 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
53commoncrawl
53 www.commoncrawl.org/bot.htmltext/..CCBot/1.0 (url)
45paxle
45 www.paxle.net/en/bottext/..Mozilla/5.0 (compatible; PaxleFramework/0.1.1; url)
39emining
39 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
36setooz
17 www.setooz.com/oozbot.htmltext/..OOZBOT/0.20 ( url ; mail address )
12 www.setooz.com/oozbot.htmltext/..OOZBOT/0.20 ( -- ; url ; mail address )
7 www.setooz.com/oozbot.htmltext/..OOZBOT/0.20 ( Setooz wymawiane jako say-th-uuz, oznacza mosty. ; url ; mail address )
36traslated
36 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
36linkaider
36 linkaider.com/crawler/text/..LinkAider (url)
35ellerdale
24 www.ellerdale.com/crawler.htmltext/..Mozilla/5.0 (compatible; Scarlett/ 1.0; url)
11 www.ellerdale.com/crawler.htmltext/..Mozilla/5.0 (compatible; EllerdaleBot/ 1.0; url)
34edu
24 iws.seu.edu.cn/services/falcons/contact_us.jsptext/..Mozilla/5.0 (compatible; Falconsbot; url)
9 iws.seu.edu.cn/services/falcons/contact_us.jspimage/..Mozilla/5.0 (compatible; Falconsbot; url)
33qdos
33 qdos.com/text/..qdos/1.1 (url)
33wordpress
27 support.wordpress.com/contact/text/..WordPress.com mShots; url
32snap
32 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
31entireweb
27 www.entireweb.com/about/search_tech/speedy_spider/text/..Speedy Spider (url)
4 www.entireweb.com/about/search_tech/speedy_spider/-Speedy Spider (url)
28froute
22 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
6 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
27babaloo
27 www.babaloo.sitext/..BabalooSpider/1.3 (BabalooSpider; url; mail address )
25greenivory
25 greenivory.frtext/..GreenIvory/Nutch-0.9 (GreenIvory-BlueCrane; url; mail address )
25Anonymouse
12 Anonymouse.org/image/..url (Unix)
10 Anonymouse.org/text/..url (Unix)
3 Anonymouse.org/application/x-javascripturl (Unix)
24rcdtokyo
15 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
9 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
23emusic
15 www.emusic.com/application/jsonMozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1.8pre) Gecko/20071017 eMusic DLM/4.0/1.0.0.2 (url) (like Firefox/2.0.0.8)
5 www.emusic.com/application/jsonMozilla/5.0 (Windows; Windows NT 6.0; en-US; rv:1.8.1.8pre) Gecko/20071017 eMusic DLM/4.0/1.0.0.2 (url) (like Firefox/2.0.0.8)
23mixi
12 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
11 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
23weblio
21 www.weblio.jp/text/..Mozilla/5.0 (compatible; WeblioBot; url)
22spinn3r
21 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
22weitz
22 weitz.de/drakma/text/..Drakma/0.11.5 (LispWorks 5.0.2; Linux; 2.6.18-5-686-bigmem; url)
22mashget
14 www.mashget.comtext/..Mashgetbot/2.1 (url)
8 www.mashget.comapplication/jsonMashGetApp(url)
20alexa
20 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
20hatena
10 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
10 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
18picsearch
15 www.picsearch.com/bot.htmltext/..psbot/0.1 (url)
3 www.picsearch.com/bot.htmlimage/..psbot/0.1 (url)
1780legs
7 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
7 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 80bot/0.71; url;) Gecko/2008032620
17holmes
17 holmes.getext/..HolmesBot (url)
17guruji
13 www.guruji.com/en/WebmasterFAQ.htmltext/..Mozilla/5.0 (compatible; GurujiBot/1.0; url)
16www.
6 www.text/..GoogleBot 2.X (urlGoogleBot.com/bot.html) (compatible; heritrix/2.0.0 http://www.xyz.com)
3 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
3 www.image/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
15newsgator
5 www.newsgator.comtext/..NewsGatorOnline/2.0 (url) bot
4 www.newsgator.comtext/..NewsGator/2.0 Bot (url)
15discoveryengine
14 discoveryengine.com/discobot.htmltext/..Mozilla/5.0 (compatible; discobot/1.1; url)
15feedparser
14 feedparser.org/application/xmlUniversalFeedParser/4.1 url
14googlepages
14 peterpuwang.googlepages.comtext/..Peter Wang/Nutch-0.9 (Nutch spiderman; url ; MyEmail)
14flatlandindustries
14 www.flatlandindustries.com/flatlandbottext/..flatlandbot/wikibot (Flatland Industries Web Spider; url; mail address )
12dium
10 me.dium.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 Me.dium/1.0 (url)
12phonifier
10 www.phonifier.comtext/..Mozilla/5.0 (compatible; Phonifier; url)
12FeedBurner
11 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
11ibm
11 domino.research.ibm.com/comm/research_projects.nsf/pages/sai-crawler.callingcard.htmltext/..url
11abonti
11 www.abonti.comtext/..Mozilla/5.0 (compatible; Abonti/0.8 - url)
11meta-spinner
11 www.meta-spinner.de/text/..Metaspinner/1.0 (Metaspinner Search Engine; url; mail address )
11shopwiki
11 www.shopwiki.com/wiki/Help:Bottext/..ShopWiki/1.0 ( url)
11simplepie
4 simplepie.orgtext/..SimplePie/1.1.3 (Feed Parser; url; Allow like Gecko) Build/20081219
4 simplepie.orgapplication/xmlSimplePie/1.1.1 (Feed Parser; url; Allow like Gecko) Build/20080315205903
10princeton
10 www.cs.princeton.edu/cass/text/..nu_tch-princeton/Nu_tch-1.0-dev (princeton crawler for cass project; url; zhewang a_t cs ddot princeton dot edu)
10moreover
10 www.moreover.comtext/..Moreoverbot/5.00 (url; mail address )
45,198total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
2,514PythonWikipediaBot/1.0
1,719 text/..
528 application/xml
266 application/json
1 image/..
1 -
1 application/ogg
1,539GoogleBot-Image/1.0
663 image/..
558 text/..
318 -
509Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
287 text/..
116 application/x-javascript
105 image/..
1 -
441Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
440 text/..
1 -
1 application/pdf
1 application/ogg
309php wikibot classes
309 application/vnd.php.serialized
1 text/..
215wikiwix-bot-3.0
199 text/..
15 image/..
1 -
178Answersbot
178 text/..
1 -
134Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
64 image/..
51 text/..
19 application/x-javascript
1 -
1 application/json
1 application/ogg
129GoogleBot-Image/1.0
128 text/..
1 image/..
1 -
1 application/x-javascript
118Tawbot (public svn release; plwiki)
118 text/..
1 -
96UniFind Site Spider; email mail address
96 text/..
1 -
85AarghBot
85 text/..
79AarghBot Linux
79 text/..
1 -
52DotNetWikiBot/2.64 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
42 text/..
10 application/xml
52COIBot/1.00
52 text/..
49gsa-crawler (Enterprise; S5-MS8QQPJ5BGWAA; mail address )
49 text/..
38SineBot/1.5.14(User:SineBot)
37 application/vnd.php.serialized
1 text/..
36MSIndianWebcrawl
36 text/..
1 image/..
1 application/ogg
34CorenSearchBot/1.4 en libwww-perl/5.808
34 text/..
33Test Webbot
33 text/..
28GoogleBot
28 text/..
1 application/json
1 application/x-javascript
1 image/..
28plantspedia data crawler
28 text/..
1 -
27dictionary-bot
26 application/xml
1 text/..
27Plagiat Web Spider WSIiZ wsiz.rzeszow.pl
24 text/..
2 image/..
1 application/ogg
1 application/pdf
25Mozilla/5.0 (Yahoo-MMCrawler/4.0; mail address )
22 image/..
3 text/..
1 -
22Mozilla/5.0 (Apibot 0.01)
22 application/vnd.php.serialized
22MSR-ISRCCrawler
15 text/..
4 application/x-javascript
3 image/..
1 -
21Bot/WP/EN/E/EBot
21 text/..
17gigabot
14 image/..
2 text/..
1 -
17Jyxobot/1
17 text/..
15DoubleVerify Crawler
7 image/..
7 text/..
1 application/x-javascript
14RootzaCrawler 0.1ALPHA (Experimental crawler, contact adminbox AT ivent.com.au for more info
14 text/..
14crawler mail address
14 text/..
1 -
14 mail address (Mozilla compatible)
7 image/..
7 text/..
1 -
12PR Crawler/Nutch-1.0 (data mining develpment project; mail address )
12 text/..
1 -
11FAST Enterprise Crawler 6 used by Lenovo ( mail address )
11 text/..
1 -
10DotNetWikiBot/2.7 (Microsoft Windows NT 6.0.6001 Service Pack 1; )
10 text/..
10MLBot (www.metadatalabs.com/mlbot)
10 text/..
1 -
10GingerCrawler/1.0 (Language Assistant for Dyslexics; www.gingersoftware.com/crawler_agent.htm; support at ginger software dot com)
10 text/..
1 -
1 image/..
9Bot/WP/EN/Daniel/MediationBot1/1.2
9 text/..
9SurakWare MediaWiki Bot/1.0
9 text/..
9Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
9 text/..
9YaDirectBot/1.0
9 text/..
9AnomieBOT 1.0 (WikiProjectTagger)
9 application/json
9Mozilla/5.0 (Bgbot 0.5)
8 text/..
1 application/xml
8A .NET Web Crawler
8 text/..
8ListasBot 3
8 text/..
8Draicone's bot
8 text/..
1 -
8Geni ircpybot 1.0
5 text/..
2 application/json
1 application/xml
7XLinkBot/1.00
7 text/..
7FAST Enterprise Crawler 6 used by a (a)
7 text/..
1 -
7beast/Nutch-0.9 (agentspider; mail address )
7 text/..
1 image/..
7Loserbot/1.0
7 text/..
7AnomieBOT 1.0 (OrphanReferenceFixer)
7 application/json
1 text/..
7dicbot 1.0
7 text/..
7FAST Enterprise Crawler 6 used by MICROLINK ( mail address )
7 text/..
6gsa-crawler (Enterprise; T1-FDM9ASJ5TESAT; mail address )
6 text/..
6CrawlerTest/Nutch-1.0-dev
6 text/..
6DotNetWikiBot/2.53 (Unix 2.6.26.2; )
6 text/..
6GNAA-bot
6 text/..
6websitethumbnail.de snapshot spider
6 text/..
5Mozilla/4.0 (compatible; focuseekbot)
5 text/..
4WebCrawler
4 text/..
4Jabse.com Crawler v.2.0 www.jabse.com/crawler.php
4 text/..
4DotNetWikiBot/2.61 (Microsoft Windows NT 5.2.3790 Service Pack 2; )
4 text/..
4rdfbot/1.0 (rdfbot mail address )
4 text/..
1 -
4AnomieBOT 1.0 (AFDMergeFromCleaner)
4 application/json
4AnomieBOT 1.0 (SourceUploader)
4 application/json
3bitlybot
3 text/..
1 image/..
3volverine/Nutch-0.9 (agentspider; mail address )
3 text/..
1 image/..
3DotNetWikiBot/2.7 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
3 text/..
1 -
1 application/xml
3Codeton Software RSS Bot/1.0
3 text/..
1 -
1 application/opensearchdescription+xml
3Mozilla/5.0 (Windows; Windows NT 5.1; en-GB; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot
3 text/..
1 image/..
3QuickFinder Crawler
3 text/..
3AnomieBOT 1.0 (TemplateReplacer15)
3 application/json
3SuperBot/4.7.0.72 (Windows XP)
3 text/..
1 application/x-javascript
1 image/..
1 application/x-external-editor
3SmartAndSimpleWebCrawler/1.2 (https://crawler.dev.java.net)
3 text/..
3DotNetWikiBot/2.7 (Microsoft Windows NT 6.1.7100.0; )
3 text/..
3Freebase Deathbot
3 text/..
3OpenLink Virtuoso RDF crawler
3 text/..
1 application/opensearchdescription+xml
3topyx-crawler
2 text/..
1 -
3WikiNEWSticsBOT (by user Melancholie)
3 text/..
3unblockbot/1.00
3 text/..
7,225total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Friday August 21, 2009
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.