Wikimedia Visitor Log Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Feb 2010 - 28 Feb 2010

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Op.Sys. / Browsers / Google

The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 50,987,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 300,287,000 external requests, which is 17.0%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
23,738google
15,792 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
2,994 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
1,145 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
984 code.google.com/p/crawler4j/text/..crawler4j (url)
740 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
658 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
260 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
159 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
148 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
147 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
140 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
101 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
82 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wort-des-tages)
64 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
48 code.google.com/appenginetext/..AppEngine-Google; (url; appid: npiv82)
42 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
32 code.google.com/appenginetext/..AppEngine-Google; (url; appid: nwikiproxy)
25 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
24 code.google.com/appenginetext/..AppEngine-Google; (url; appid: finchproxy)
19 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
19 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
18 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
16 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
15 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid: nwikiproxy)
10 docs.google.comimage/..Mozilla/5.0 (compatible; GoogleDocs; url)
7 www.google.com/feedfetcher.htmlimage/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
5 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.8267; url)
5 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
4 code.google.com/appenginetext/..Python-urllib/1.17 AppEngine-Google; (url; appid: lusosfera)
4 www.google.orgtext/..Naveen/Nutch-1.0 (Naveen; url; mail address )
4 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.8267; url)
4 code.google.com/appengineapplication/jsonPython-urllib/2.5 AppEngine-Google; (url; appid: loeschmonitor)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: job-info)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: lullar-data),gzip(gfe) (via translate.google.com)
3 www.google.com/coop/cse/creftext/..FeedFetcher-Google-CoOp; (url)
13,005yahoo
12,344 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
162 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
113 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
109 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
94 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmlimage/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
36 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
36 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
31 developer.yahoo.com/searchmonkey/useragentimage/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
29 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
14 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
13 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
8 misc.yahoo.com.cn/help.html-Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
5 help.yahoo.com/help/us/ysearch/slurpapplication/xmlMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
4 help.yahoo.com/help/us/ysearch/crawling/crawling-01.htmltext/..Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; url)
12,065msn
7,047 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
4,252 search.msn.com/msnbot.htm-msnbot/2.0b (url)
313 search.msn.com/msnbot.htmtext/..msnbot-media/1.1 (url)
189 search.msn.com/msnbot.htmimage/..msnbot-media/1.1 (url)
153 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url).
50 search.msn.com/msnbot.htmtext/..msnbot/1.1 (url)
33 search.msn.com/msnbot.htmtext/..msnbot-Products/1.0 (url)
10 search.msn.com/msnbot.htmimage/..msnbot/2.0b (url)
6 search.msn.com/msnbot.htmtext/..msnbot-NewsBlogs/1.1 (url)
3 search.msn.com/msnbot.htmapplication/oggmsnbot/2.0b (url)
3 search.msn.com/msnbot.htm-msnbot-media/1.1 (url)
2,097google?
1,770 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
96 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
52 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
42 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
33 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
27 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
24 www.google.com/bot.htmlapplication/vnd.php.serializedMozilla/5.0 (compatible; GoogleBot/2.1; url)
15 www.google.com/bot.htmlapplication/xmlDoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
14 www.google.com/bot.htmlapplication/xmlSAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
9 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
6 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
5 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
1,130naver
1,001 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
81 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
32 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
12 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
828cuil
805 www.cuil.com/twiceler/robot.htmltext/..Mozilla/5.0 (Twiceler-0.9 url)
16 www.cuil.com/twiceler/robot.htmlapplication/xmlMozilla/5.0 (Twiceler-0.9 url)
7 www.cuil.com/twiceler/robot.htmlapplication/vnd.php.serializedMozilla/5.0 (Twiceler-0.9 url)
692ask
596 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
91 about.ask.com/en/docs/about/webmasters.shtml-Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
547baidu
378 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
82 www.baidu.jp/spider/text/..Baiduspider(url)
39 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
14 www.baidu.jp/spider/application/xmlDoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
13 www.baidu.jp/spider/text/..BaiduImagespider(url)
7 www.baidu.com/search/spider.htm-Baiduspider(url)
7 www.baidu.jp/spider/-Baiduspider(url)
5 www.baidu.jp/spider/image/..BaiduImagespider(url)
469pipl
469 www.pipl.com/bot/text/..Mozilla/5.0(compatible;PiplBot;url)
405yacy
85 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.23.17-dbserv; java 1.6.0_04; Europe/en) url
42 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-18-generic; java 1.6.0_0; Europe/en) url
31 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-18-generic; java 1.6.0_0; Europe/en) url
30 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-20-generic; java 1.6.0_0; Europe/en) url
30 yacy.net/bot.htmltext/..yacybot (x86 Windows 7 6.1; java 1.6.0_16; America/en) url
16 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-gentoo-r6; java 1.5.0_22; Europe/el) url
13 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-18-generic; java 1.6.0_16; Europe/en) url
11 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.24-26-generic; java 1.6.0_0; Europe/de) url
10 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-18-generic; java 1.6.0_0; Europe/en) url
9 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-15-generic; java 1.6.0_0; Europe/en) url
8 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-19-generic; java 1.6.0_0; Europe/en) url
8 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-17-generic; java 1.6.0_0; Europe/en) url
6 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-17-generic; java 1.6.0_0; Europe/en) url
6 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-164.9.1.el5xen; java 1.6.0; Europe/en) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-18-generic; java 1.6.0_16; GMT/en) url
5 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.32-trunk-686; java 1.6.0_17; Europe/de) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-19-generic; java 1.6.0_16; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-164.11.1.el5; java 1.6.0; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (x86 Windows Vista 6.0; java 1.6.0_18; Europe/it) url
3 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_18; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.23.17-dbserv; java 1.6.0_04; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Windows Server 2008 6.0; java 1.6.0_18; America/en) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-11-server; java 1.6.0_0; Europe/en) url
396youdao
375 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
6 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
5 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
4 www.youdao.com/help/webmaster/spider/application/xmlMozilla/5.0 (compatible; YoudaoBot/1.0; url; )
391exabot
214 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); url)
164 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
9 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
4 www.exabot.com/go/robotimage/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); url)
390php
184 pear.php.net/text/..PEAR HTTP_Request class ( url )
87 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
38 pear.php.net/package/http_request2text/..HTTP_Request2/0.4.1 (url) PHP/5.2.12
28 pear.php.net/application/jsonPEAR HTTP_Request class ( url )
23 pear.php.net/package/http_request2application/vnd.php.serializedHTTP_Request2/0.5.1 (url) PHP/5.2.10-2ubuntu6.4
20 pear.php.net/application/xmlPEAR HTTP_Request class ( url )
7 pear.php.net/package/http_request2text/..HTTP_Request2/0.5.1 (url) PHP/5.2.12
270soso
258 help.soso.com/webspider.htmtext/..Sosospider(url)
7 help.soso.com/webspider.htm-Sosospider(url)
3 help.soso.com/webspider.htmimage/..Sosospider(url)
191wikimedia
188 tools.wikimedia.de/~daniel/text/..WikiSense (url)
188sogou
172 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
10 www.sogou.com/docs/help/webmasters.htm#07application/xmlSogou web spider/4.0(url)
5 www.sogou.com/docs/help/webmasters.htm#07image/..Sogou Pic Spider/3.0(url)
177sblog
120 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
32 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
22 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
176seoprofiler
159 www.seoprofiler.com/bot/text/..Mozilla/5.0 (compatible; spbot/1.0; url )
10 www.seoprofiler.com/bot/text/..Mozilla/5.0 (compatible; spbot/1.2; url )
4 www.seoprofiler.com/bot/text/..Mozilla/5.0 (compatible; spbot/2.0; url )
3 www.seoprofiler.com/bot/text/..Mozilla/5.0 (compatible; spbot/1.1; url )
175teesoft
55 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
36 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
28 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
18 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/image/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/application/x-javascriptMozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
14480legs
126 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
18 www.80legs.com/spider.htmlimage/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
143entireweb
142 www.entireweb.com/about/search_tech/speedy_spider/text/..Speedy Spider (url)
139facebook
102 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
25 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
7 developers.facebook.comimage/..facebookplatform/1.0 (url)
5 developers.facebook.comtext/..facebookplatform/1.0 (url)
133www.
35 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
34 www.text/..GoogleBot-Image/1.0 ( urlGoogleBot.com/bot.html)
31 www.image/..GoogleBot-Image/1.0 (urlGoogleBot.com/bot.html)
16 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
15 www.text/..Google - GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
127activepeople
127 www.activepeople.nettext/..WordPress/2.8.4; url
125fairshare
123 fairshare.cctext/..Mozilla/5.0 url (X11; FreeBSD i386; en-US; rv:1.2a) Gecko/20021021
120wikipedia
108 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.2 url
6 en.wikipedia.orgtext/..url
119scoutjet
119 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
118daum
118 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
108traslated
108 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
104dotnetdotcom
103 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
103majestic12
98 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.2; url)
4 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.1; url)
99emining
99 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
97toolserver
78 wiki.toolserver.org/view/GeoHacktext/..Geohack (url)
10 toolserver.org/~bayo/text/..LudoThecaire/1.0 (url)
5 toolserver.org/~dcoetzee/contributionsurveyor/text/..Contribution Surveyor (url)
91goo
88 help.goo.ne.jp/contact/text/..goo wikipedia (url)
88setooz
81 www.setooz.com/oozbot.htmltext/..OOZBOT/0.20 ( url ; mail address )
7 www.setooz.com/oozbot.htmltext/..OOZBOT/0.20 ( -- ; url ; mail address )
79textdigger
78 textdigger.comtext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
74ronzoo
74 www.ronzoo.com/about/text/..Ronzoobot/1.4 (url)
70spinn3r
64 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
5 spinn3r.com/robot-Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
64kosmix
37 www.kosmix.com/html/kosmos.htmlapplication/xmlMozilla/5.0(compatible;Kosmos/1.0;url)
27 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
60proximic
60 www.proximic.comtext/..Mozilla/5.0 (compatible; proximic; url)
57gigablast
57 www.gigablast.com/spider.htmltext/..Gigabot/3.0 (url)
55tourist-information-berlin
55 www.tourist-information-berlin.comtext/..WordPress/2.8.4; url
49wordpress
22 support.wordpress.com/contact/text/..WordPress.com mShots; url
8 josefboberg.wordpress.comtext/..WordPress/MU; url
5 montseantares.wordpress.comtext/..WordPress/MU; url
40oneriot
21 www.oneriot.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (url)
18 www.oneriot.comimage/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (url)
37mnemoo
37 www.mnemoo.com/about/spidertext/..Mnemoo Intelligent Spider/0.1alpha (compatible; See url)
34freebase
34 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
34FeedBurner
33 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
32archive
22 www.archive.orgtext/..Mozilla/5.0 (compatible; archive.org_bot/heritrix-1.15.4 url)
8 www.archive.orgimage/..Mozilla/5.0 (compatible; archive.org_bot/heritrix-1.15.4 url)
32aport
32 www.aport.ru/helptext/..Mozilla/5.0 (compatible; AportWorm/3.2; url)
30snap
30 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
30picsearch
30 www.picsearch.com/bot.htmltext/..psbot/0.1 (url)
28qdos
28 qdos.com/text/..qdos/1.1 (url)
26Anonymouse
14 Anonymouse.org/image/..url (Unix)
10 Anonymouse.org/text/..url (Unix)
25globalspec
25 www.globalspec.com/Ocellitext/..Ocelli/1.4 (url)
25rcdtokyo
20 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
5 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
25kalooga
17 www.kalooga.com/info.html?page=crawlerimage/..Mozilla/5.0 (compatible; KaloogaBot; url)
8 www.kalooga.com/info.html?page=crawlertext/..Mozilla/5.0 (compatible; KaloogaBot; url)
24archive-it
14 archive-it.org/files/site-owners.htmlimage/..Mozilla/5.0 (compatible;archive.org_bot/1.7.0; Archive-It; url)
10 archive-it.org/files/site-owners.htmltext/..Mozilla/5.0 (compatible;archive.org_bot/1.7.0; Archive-It; url)
23froute
18 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
5 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
22alexa
22 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
20z-add
19 w3.z-add.co.uk/linkcheck/text/..Z-Add Link Checker (url)
19quus
19 fx.quus.net/text/..url
18phonifier
12 www.phonifier.comtext/..Mozilla/5.0 (compatible; Phonifier; url)
6 www.phonifier.comtext/..aNti_miSa$puAsa/5.0 (compatible; Phonifier; url)
17bin-co
17 www.bin-co.com/php/scripts/load/text/..BinGet/1.00.A (url)
16mixi
9 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
7 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
16aafter
15 aafter.com/crawler.htmtext/..AAfter.com Crawler/AAfter-1.0 (This bot is very focused, well-behaved, and wants to do good to internet community. For any questions, please call collect 1 214-714-2224. Team AAfter, Dallas, TX, USA; url; crawler at aafter.com)
16simplepie
8 simplepie.orgapplication/xmlSimplePie/1.2 (Feed Parser; url; Allow like Gecko) Build/20090627192103
4 simplepie.orgtext/..SimplePie/1.1.1 (Feed Parser; url; Allow like Gecko) Build/20080315205903
15newsgator
6 www.newsgator.com/Individuals/NetNewsWire/-NetNewsWire/3.2.4 (Mac OS X; url; gzip-happy)
15microsoft
15 academic.research.microsoft.com/text/..librabot/2.0 (url)
15discoveryengine
14 discoveryengine.com/discobot.htmltext/..Mozilla/5.0 (compatible; discobot/1.1; url)
13picmole
13 www.picmole.comtext/..Mozilla/5.0 (compatible;picmole/1.0 url)
13heartrails
9 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
4 capture.heartrails.com/image/..Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
13topsy
13 labs.topsy.com/butterfly.htmltext/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
13github
8 github.com/pauldix/typhoeus/tree/mastertext/..Typhoeus - url
3 github.com/pauldix/typhoeus/tree/master-Typhoeus - url
13tineye
9 tineye.com/crawler.htmlimage/..TinEye/1.1 (url)
3 tineye.com/crawler.htmltext/..TinEye/1.1 (url)
13hatena
7 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
6 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
12webzdarma
10 praso.webzdarma.cztext/..Mozilla/5.0 (compatible; heritrix/1.12.1 url)
12edu
10 ws.nju.edu.cn/falcons/text/..Mozilla/5.0 (compatible; Falconsbot; url)
11asterpix
11 www.asterpix.com/text/..Mozilla/5.0 (compatible; Asterbot; url)
11delfi
7 search.delfi.lt/?c=crawlertext/..Dolphin/1.4 (url)
4 otsing.delfi.ee/?c=crawlertext/..Dolphin/1.4 (url)
10holmes
10 holmes.getext/..HolmesBot (url)
10vbseo
10 www.vbseo.comtext/..Mozilla/4.0 (vBSEO; url)
10bloglines
6 www.bloglines.com-Bloglines/3.1 (url; 1 subscriber)
60,493total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
2,205PythonWikipediaBot/1.0
1,185 application/json
634 application/xml
384 text/..
2 image/..
1 -
1 application/ogg
1,442GoogleBot-Image/1.0
644 text/..
576 image/..
222 -
1 application/pdf
337LinkParser/2.0
337 text/..
335Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
335 text/..
1 -
1 application/ogg
280Answersbot
280 text/..
1 -
241MPUploadBot; PHP 5.2.6-3ubuntu4.5
241 application/vnd.php.serialized
1 -
184gsa-crawler (Enterprise; S5-MS8QQPJ5BGWAA; mail address )
184 text/..
175php wikibot classes
163 application/vnd.php.serialized
12 text/..
147wikiwix-bot-3.0
144 text/..
3 image/..
1 -
136ClueBot/1.1
118 application/vnd.php.serialized
18 text/..
125Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
68 text/..
41 image/..
16 application/x-javascript
125GoogleBot-Image/1.0
123 text/..
2 image/..
1 -
75crawler mail address
75 text/..
69AarghBot Linux
69 text/..
1 -
65SONIVIS MediaWiki API Bot 0.1.3
65 text/..
56Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
33 image/..
23 text/..
1 application/json
1 application/x-javascript
50plantspedia data crawler
50 text/..
48gsa-crawler (Enterprise; S5-FTM7CJX3FUJAS; mail address )
48 text/..
1 -
48HTMLParser/2.0
48 text/..
40Pywikipediabot/2.0
40 application/json
40Test Webbot
40 text/..
36SineBot/1.5.15(User:SineBot)
35 application/vnd.php.serialized
1 text/..
32DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
32 text/..
1 application/xml
30MLBot (www.metadatalabs.com/mlbot)
30 text/..
1 -
1 image/..
29dictionary-bot
21 application/xml
8 text/..
28zomba-bot/0.1
28 text/..
28DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600 Service Pack 2; )
28 text/..
1 application/xml
19LinkParser/1.00
19 text/..
19DotNetWikiBot/2.71 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
16 text/..
3 application/xml
18CorenSearchBot/1.4 en libwww-perl/5.808
18 text/..
18AnomieBOT 1.0 (OrphanReferenceFixer)
18 application/json
16GoogleBot
16 text/..
1 image/..
16DotNetWikiBot/2.7 (Microsoft Windows NT 6.1.7600.0; )
12 text/..
4 application/xml
1 image/..
14HTMLParser/1.6
14 text/..
1 application/json
14Citation_bot; mail address
14 text/..
14spider
13 text/..
1 image/..
14COIBot/1.00
14 text/..
13DotNetWikiBot/2.72 (Microsoft Windows NT 5.2.3790 Service Pack 2; )
13 text/..
1 application/xml
13ZanranCrawler/0.2 ( mail address )
13 text/..
12SurakWare MediaWiki Bot/1.0
12 text/..
1 application/xml
11GoogleBot/2.1
7 text/..
4 image/..
1 -
11YaDirectBot/1.0
11 text/..
1 image/..
11rdfbot/1.0 ( Indian Language Web Search Engine ; Rediff.com ; rdfbot mail address )
11 text/..
1 application/xml
10Mozilla/5.0 (Bgbot 0.5)
10 text/..
9DoCoMo/2.0 SH904i(c100;TB;W24H16)(Y!J-AGENT)(robot)
9 text/..
1 image/..
9Bot/WP/EN/Daniel/MediationBot1/1.2
9 text/..
9GNAA-bot
9 text/..
9Tawbot (public svn release; plwiki)
9 text/..
8 mail address (Mozilla compatible)
8 text/..
1 image/..
8Jyxobot/1
8 text/..
7HTMLParser/1.42
7 text/..
6AdultsVisit.us/Nutch-1.0 (www.AdultsVisit.us; mail address )
6 text/..
6FAST Enterprise Crawler 6 used by Viacom (Viacom)
6 text/..
1 -
6WordSpider1.0
6 text/..
6Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
6 text/..
6TVersity Media Robot
6 text/..
6Mozilla/5.0 (Yahoo-MMCrawler/4.0; mail address )
5 text/..
1 image/..
6Bub's wikibot (Wikibot/2009092504; JWBF/1.2; Java/1.6)
6 text/..
5~Bot ([[:fr:w:User:TildeBot]] by [[:fr:w:User:Alphos]] mail address )
5 text/..
5XLinkBot/1.00
5 text/..
5QuickFinder Crawler
5 text/..
5DotNetWikiBot/2.53 (Unix 2.6.26.2; )
5 text/..
5SuperBot/4.7.0.72 (Windows XP)
5 text/..
5Freebase Deathbot
5 text/..
5IssueCrawler
5 text/..
4bitlybot
4 text/..
1 image/..
44am-spider/1.0
4 text/..
4DotNetWikiBot/2.8 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
3 text/..
1 application/xml
4DotNetWikiBot/2.81 (Microsoft Windows NT 6.1.7600.0; )
4 text/..
1 application/xml
4Begun Robot Crawler
4 text/..
4Netvibes Wasabi-bot v1.0
2 application/xml
1 -
1 text/..
4FAST Enterprise Crawler 6 used by a (a)
4 text/..
4msnbot
4 text/..
4Geni ircpybot 1.0
2 application/json
2 text/..
1 application/xml
4OpenLink Virtuoso RDF crawler
4 text/..
1 image/..
4AnomieBOT 1.0 (AFDMergeFromCleaner)
4 application/json
4topyx-crawler
4 text/..
1 -
4AOL Reference Center Bot/1.0
4 text/..
4TKBot 1.0 ( mail address )
4 application/xml
4Xaldon WebSpider 2.7.b6
4 text/..
4CheMoBot/1.00
4 text/..
3IScraperBot/0.1
2 application/xml
1 text/..
3'citeseerxbot'
3 text/..
3Jabse.com Crawler v.2.0 www.jabse.com/crawler.php
3 text/..
1 application/xml
3Joycrawler Robot
3 text/..
3Mozilla/5.0 (Apibot 0.01)
3 application/vnd.php.serialized
3Mozilla/4.0 (compatible; MSIE is not me; DAUMOA/1.0.0; DAUM Web Robot; Daum Communications Corp., Korea)
2 image/..
1 text/..
3TweetMemeBot (Feed Parser; Allow like Gecko)
3 text/..
1 application/xml
3menteeworld_crawler
3 text/..
3FAST Enterprise Crawler 6 used by mohamedazmil ( mail address )
3 text/..
3AnomieBOT 1.0 (SourceUploader)
3 application/json
6,881total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Thu, Mar 11, 2010 2:37
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.