Wikimedia Visitor Log Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Mar 2010 - 31 Mar 2010

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Op.Sys. / Browsers / Google

The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 44,255,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 304,251,000 external requests, which is 14.5%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
16,540google
12,067 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
1,694 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
656 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
456 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
366 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
188 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
181 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
126 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
123 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
120 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
102 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
62 code.google.com/appenginetext/..AppEngine-Google; (url; appid: npiv82)
59 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
38 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
27 code.google.com/appenginetext/..AppEngine-Google; (url; appid: finchproxy)
23 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
23 code.google.com/p/crawler4j/text/..crawler4j (url)
19 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
18 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
11 www.google.com/feedfetcher.htmlimage/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
11 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
10 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
9 code.google.com/appenginetext/..AppEngine-Google; (url; appid: nwikiproxy)
9 docs.google.comimage/..Mozilla/5.0 (compatible; GoogleDocs; url)
6 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
5 code.google.com/appenginetext/..AppEngine-Google; (url; appid: acid-rain-cn)
5 www.google.orgtext/..Naveen/Nutch-1.0 (Naveen; url; mail address )
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: myquerylog11)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: myquerylog12)
4 code.google.com/appenginetext/..Python-urllib/1.17 AppEngine-Google; (url; appid: lusosfera)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: myquerylog14)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: myquerylog15)
4 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid: nwikiproxy)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: myquerylog13)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: lullar-data),gzip(gfe) (via translate.google.com)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: myquerylog44)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: myquerylog06)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: myquerylog18)
3 code.google.com/appengineapplication/jsonMWBOT GAE Edition AppEngine-Google; (url; appid: philip-bot)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: myquerylog0)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: myquerylog08)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: myquerylog04)
3 www.google.com/coop/cse/creftext/..FeedFetcher-Google-CoOp; (url)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: myquerylog01)
12,527msn
6,026 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
5,969 search.msn.com/msnbot.htm-msnbot/2.0b (url)
170 search.msn.com/msnbot.htmtext/..msnbot-media/1.1 (url)
129 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url).
88 search.msn.com/msnbot.htmimage/..msnbot-media/1.1 (url)
44 search.msn.com/msnbot.htmimage/..msnbot/2.0b (url)
31 search.msn.com/msnbot.htmtext/..msnbot/1.1 (url)
29 search.msn.com/msnbot.htmtext/..msnbot-Products/1.0 (url)
12 search.msn.com/msnbot.htmapplication/oggmsnbot/2.0b (url)
11 search.msn.com/msnbot.htmtext/..msnbot-NewsBlogs/1.1 (url)
10 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)._
11,625yahoo
11,001 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
159 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
111 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
87 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
62 help.yahoo.com/help/us/ysearch/slurpapplication/vnd.php.serializedMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
38 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
34 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmlimage/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
30 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
27 developer.yahoo.com/searchmonkey/useragentimage/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
20 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
16 help.yahoo.com/help/us/ysearch/slurpapplication/xmlMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
14 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
10 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
6 misc.yahoo.com.cn/help.html-Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
4 help.yahoo.comtext/..Mozilla/5.0 (YahooYSMcm/3.0.0; url)
2,245google?
1,949 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
68 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
52 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
48 www.google.com/bot.htmlapplication/vnd.php.serializedMozilla/5.0 (compatible; GoogleBot/2.1; url)
37 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
26 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
17 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
12 www.google.com/bot.htmlapplication/xmlDoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
11 www.google.com/bot.htmlapplication/xmlSAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
10 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
7 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
7 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
1,176naver
1,054 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
82 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
30 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
8 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
734ask
588 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
139 about.ask.com/en/docs/about/webmasters.shtml-Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
3 about.ask.com/en/docs/about/webmasters.shtmlimage/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
3 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url) (via Web-Blaster/2.21 (http://www.assoziations-blaster.de/web-blast.html))
615cuil
590 www.cuil.com/twiceler/robot.htmltext/..Mozilla/5.0 (Twiceler-0.9 url)
14 www.cuil.com/twiceler/robot.htmlapplication/vnd.php.serializedMozilla/5.0 (Twiceler-0.9 url)
9 www.cuil.com/twiceler/robot.htmlapplication/xmlMozilla/5.0 (Twiceler-0.9 url)
607php
462 pear.php.net/text/..PEAR HTTP_Request class ( url )
64 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
34 pear.php.net/package/http_request2text/..HTTP_Request2/0.5.1 (url) PHP/5.2.12
28 pear.php.net/application/xmlPEAR HTTP_Request class ( url )
17 pear.php.net/application/jsonPEAR HTTP_Request class ( url )
602baidu
406 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
107 www.baidu.jp/spider/text/..Baiduspider(url)
43 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
12 www.baidu.jp/spider/application/xmlDoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
10 www.baidu.jp/spider/text/..BaiduImagespider(url)
7 www.baidu.jp/spider/-Baiduspider(url)
6 www.baidu.jp/spider/image/..BaiduImagespider(url)
5 www.baidu.jp/spider/application/xmlBaiduspider(url)
5 www.baidu.com/search/spider.htm-Baiduspider(url)
502pipl
502 www.pipl.com/bot/text/..Mozilla/5.0(compatible;PiplBot;url)
406exabot
215 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); url)
175 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
8 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
7 www.exabot.com/go/robotimage/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); url)
332soso
323 help.soso.com/webspider.htmtext/..Sosospider(url)
5 help.soso.com/webspider.htm-Sosospider(url)
325youdao
286 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
18 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
12 www.youdao.com/help/webmaster/spider/text/..Nokia3650/1.0 SymbianOS/6.1 Series60/1.2 Profile/MIDP-1.0 Configuration/CLDC-1.0/ (compatible; YodaoBot-Mobile/1.0; url; )
4 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
248entireweb
182 www.entireweb.com/about/search_tech/speedy_spider/text/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
30 www.entireweb.com/about/search_tech/speedy_spider/text/..Speedy Spider (url)
29 www.entireweb.com/about/search_tech/speedy_spider/text/..Mozilla/5.0 (compatible; Speedy Spider; url)
4 www.entireweb.com/about/search_tech/speedy_spider/-Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
234yacy
38 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-18-generic; java 1.6.0_0; Europe/en) url
35 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-20-generic; java 1.6.0_0; Europe/en) url
29 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-18-generic; java 1.6.0_0; Europe/en) url
9 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-164.11.1.el5; java 1.6.0; Europe/en) url
7 yacy.net/bot.htmltext/..yacybot (i386 FreeBSD 8.0-RELEASE-p2; java 1.7.0; GMT/en) url
6 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-18-generic; java 1.6.0_16; GMT/en) url
6 yacy.net/bot.htmltext/..yacybot (amd64 Windows 7 6.1; java 1.6.0_18; Europe/fr) url
5 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_17; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-gentoo-r6; java 1.6.0_17; Etc/en) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-20-generic; java 1.6.0_0; Europe/fr) url
4 yacy.net/bot.htmltext/..yacybot (x86 Windows 2003 5.2; java 1.6.0_16; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.30.7-libre-fshoppe1; java 1.6.0_0; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-19-server; java 1.6.0_0; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-20-generic; java 1.6.0_0; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (x86 Windows 7 6.1; java 1.6.0_18; Europe/de) url
208scoutjet
208 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
187sblog
128 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
35 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
22 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
175sogou
164 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
8 www.sogou.com/docs/help/webmasters.htm#07application/xmlSogou web spider/4.0(url)
3 www.sogou.com/docs/help/webmasters.htm#07image/..Sogou Pic Spider/3.0(url)
173wikimedia
170 tools.wikimedia.de/~daniel/text/..WikiSense (url)
170traslated
170 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
145toolserver
95 wiki.toolserver.org/view/GeoHacktext/..Geohack (url)
28 toolserver.org/~bayo/text/..LudoThecaire/1.0 (url)
18 toolserver.org/~guandalug/application/vnd.php.serializedGuandalugs PHPWikiBot/1.1 (url;de:User:Guandalug)
145facebook
106 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
26 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
7 developers.facebook.comtext/..facebookplatform/1.0 (url)
6 developers.facebook.comimage/..facebookplatform/1.0 (url)
145wikipedia
106 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.2 url
18 en.wikipedia.org/wiki/Web_crawlertext/..GoogleBot/Nutch-1.0 (Prototype; url; mail address )
9 en.wikipedia.orgtext/..url
5 ko.wikipedia.orgtext/..url
4 fr.wikipedia.org/wiki/Utilisateur:Salebotapplication/jsonSalebot, see url (uses Perl MediaWiki::API)
132activepeople
132 www.activepeople.nettext/..WordPress/2.8.4; url
108textdigger
107 textdigger.comtext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
103daum
103 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
10180legs
87 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
14 www.80legs.com/spider.htmlimage/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
100teesoft
32 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
21 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
15 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
9 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/application/jsonMozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
98majestic12
96 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.2; url)
96mnemoo
96 www.mnemoo.com/about/spidertext/..Mnemoo Intelligent Spider/0.1alpha (compatible; See url)
94emining
93 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
94goo
91 help.goo.ne.jp/contact/text/..goo wikipedia (url)
92spinn3r
86 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
4 spinn3r.com/robot-Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
70oneriot
52 www.oneriot.comimage/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (url)
18 www.oneriot.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (url)
67kosmix
56 www.kosmix.com/html/kosmos.htmlapplication/xmlMozilla/5.0(compatible;Kosmos/1.0;url)
11 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
65dotnetdotcom
65 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
62setooz
62 www.setooz.com/oozbot.htmltext/..OOZBOT/0.20 ( url ; mail address )
62wordpress
27 josefboberg.wordpress.comtext/..WordPress/MU; url
10 support.wordpress.com/contact/text/..WordPress.com mShots; url
6 benabb.wordpress.comtext/..WordPress/MU; url
61www.
29 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
27 www.text/..GoogleBot-Image/1.0 ( urlGoogleBot.com/bot.html)
3 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
57dummy
57 www.dummy.comtext/..TEST/hoqBot-1.0 (dummy; url; mail address )
56freebase
55 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
51FeedBurner
49 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
44semager
44 www.semager.de/blog/semager-bots/text/..Mozilla/5.0 (compatible; Semager/1.4; url)
38proximic
38 www.proximic.comtext/..Mozilla/5.0 (compatible; proximic; url)
32conceptlinkage
32 www.conceptlinkage.orgtext/..c-link wikipedia miner (url) mail address
32moose
32 www.moose.at/text/..Mozilla/5.0 (compatible; MooseBot/1.1; Linux i686; de; url)
31fairshare
29 fairshare.cctext/..Mozilla/5.0 url (X11; FreeBSD i386; en-US; rv:1.2a) Gecko/20021021
29yioop
28 www.yioop.com/bot.htmltext/..Mozilla/5.0 (compatible; YioopBot url)
29snap
29 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
28ronzoo
28 www.ronzoo.com/about/text/..Ronzoobot/1.4 (url)
28qdos
28 qdos.com/text/..qdos/1.1 (url)
27discoveryengine
27 discoveryengine.com/discobot.htmltext/..Mozilla/5.0 (compatible; discobot/1.1; url
26sf
8 liferea.sf.net/text/..Liferea/0.x.x (Linux; en_US.UTF-8; url)
8 liferea.sf.net/text/..Liferea/1.x.x (Linux; es_ES.UTF-8; url)
7 magpierss.sf.nettext/..MagpieRSS/0.7x (url)
26Anonymouse
13 Anonymouse.org/image/..url (Unix)
11 Anonymouse.org/text/..url (Unix)
26archive-it
19 archive-it.org/files/site-owners.htmlimage/..Mozilla/5.0 (compatible;archive.org_bot/1.7.0; Archive-It; url)
7 archive-it.org/files/site-owners.htmltext/..Mozilla/5.0 (compatible;archive.org_bot/1.7.0; Archive-It; url)
25newsgator
8 www.newsgator.com/Individuals/NetNewsWire/-NetNewsWire/3.2.5 (Mac OS X; url; gzip-happy)
7 www.newsgator.com/text/..FeedDemon/2.7 (url; Microsoft Windows XP)
7 www.newsgator.comtext/..NewsGatorOnline/2.0 (url; 1 subscribers)
25alexa
25 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
25heartrails
15 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
10 capture.heartrails.com/image/..Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
24puritysearch
24 www.puritysearch.net/text/..Mozilla/5.0 (compatible; Purebot/1.1; url)
23rcdtokyo
18 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
5 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
23steamheavyindustries
23 steamheavyindustries.comtext/..parsing subset of wikipedia '(url)'
22accelobot
22 www.accelobot.comtext/..Mozilla/5.0 (compatible; heritrix/1.14.3 url)
21seoprofiler
13 www.seoprofiler.com/bot/text/..Mozilla/5.0 (compatible; spbot/2.0.2; url )
7 www.seoprofiler.com/bot/text/..Mozilla/5.0 (compatible; spbot/2.0.1; url )
20sourceforge
12 linkchecker.sourceforge.net/text/..LinkChecker/4.5 (url)
6 sourceforge.net/projects/iw-robot/application/jsonAlternative MediaWiki Interwiki Robot/20100301 (in development, url) Python 3.1
18froute
14 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
4 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
18vbseo
18 www.vbseo.comtext/..Mozilla/4.0 (vBSEO; url)
17chainn
14 www.chainn.com/mxbot.htmltext/..Mozilla/5.0 (compatible; mxbot/1.0; url)
3 www.chainn.com/mxbot.htmlimage/..Mozilla/5.0 (compatible; mxbot/1.0; url)
17webzdarma
17 praso.webzdarma.cztext/..Mozilla/5.0 (compatible; heritrix/1.14.3 url)
16hi
9 vefsofnun.bok.hi.isimage/..Mozilla/5.0 (compatible; heritrix/1.14.3 url)
7 vefsofnun.bok.hi.istext/..Mozilla/5.0 (compatible; heritrix/1.14.3 url)
16quus
16 fx.quus.net/text/..url
16simplepie
6 simplepie.orgapplication/xmlSimplePie/1.2 (Feed Parser; url; Allow like Gecko) Build/20090627192103
6 simplepie.orgtext/..SimplePie/1.1.1 (Feed Parser; url; Allow like Gecko) Build/20080315205903
16bin-co
16 www.bin-co.com/php/scripts/load/text/..BinGet/1.00.A (url)
16kalooga
13 www.kalooga.com/info.html?page=crawlerimage/..Mozilla/5.0 (compatible; KaloogaBot; url)
3 www.kalooga.com/info.html?page=crawlertext/..Mozilla/5.0 (compatible; KaloogaBot; url)
16avantbrowser
8 www.avantbrowser.comtext/..Avant Browser (url)
7 www.avantbrowser.comtext/..Advanced Browser (url)
16github
10 github.com/pauldix/typhoeus/tree/mastertext/..Typhoeus - url
6 github.com/pauldix/typhoeus/tree/master-Typhoeus - url
15weblio
13 www.weblio.jp/text/..Mozilla/5.0 (compatible; WeblioBot; url)
15mixi
8 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
7 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
15feedshow
8 www.feedshow.comtext/..FeedshowOnline (url)
7 www.feedshow.comtext/..Feedshow/x.0 (url; 1 subscriber)
15jetbrains
8 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 2.0 Release Candidate 1 (url)
7 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 1.0.x (url)
14z-add
13 w3.z-add.co.uk/linkcheck/text/..Z-Add Link Checker (url)
13bloglines
7 www.bloglines.com-Bloglines/3.1 (url; 1 subscriber)
3 www.bloglines.comtext/..Bloglines/3.1 (url; 1 subscriber)
13phonifier
9 www.phonifier.comtext/..Mozilla/5.0 (compatible; Phonifier; url)
4 www.phonifier.comtext/..aNti_miSa$puAsa/5.0 (compatible; Phonifier; url)
13hatena
8 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
5 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
12topsy
12 labs.topsy.com/butterfly.htmltext/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
12aport
12 www.aport.ru/helptext/..Mozilla/5.0 (compatible; AportWorm/3.2; url)
12aafter
11 aafter.com/crawler.htmtext/..AAfter.com Crawler/AAfter-1.0 (This bot is very focused, well-behaved, and wants to do good to internet community. For any questions, please call collect 1 214-714-2224. Team AAfter, Dallas, TX, USA; url; crawler at aafter.com)
12zscho
12 zscho.de/text/..Zscho.de Crawler/Nutch-1.0-Zscho.de-semantic_patch (Zscho.de Crawler, collecting for machine learning; url )
11bnf
3 www.bnf.fr/fr/outils/a.dl_web_capture_robot.htmlimage/..Mozilla/5.0 (compatible; bnf.fr_bot; url)
10tineye
7 tineye.com/crawler.htmlimage/..TinEye/1.1 (url)
108dorm
10 www.8dorm.comtext/..nutch_test/Nutch-0.9 (nutch_test; url; mail address )
52,622total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
2,198PythonWikipediaBot/1.0
1,387 application/json
406 application/xml
403 text/..
2 image/..
1 -
1 application/ogg
1,875GoogleBot-Image/1.0
837 text/..
610 image/..
428 -
1 application/pdf
763ClueBot/1.1
596 application/vnd.php.serialized
167 text/..
569DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
569 text/..
1 -
1 application/xml
383LinkParser/2.0
383 text/..
371Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
371 text/..
1 -
1 application/pdf
1 application/ogg
275php wikibot classes
262 application/vnd.php.serialized
13 text/..
1 -
1 application/json
260Answersbot
260 text/..
159GoogleBot-Image/1.0
156 text/..
3 image/..
1 -
158wikiwix-bot-3.0
152 text/..
6 image/..
1 -
146gsa-crawler (Enterprise; S5-MS8QQPJ5BGWAA; mail address )
146 text/..
126Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
68 text/..
42 image/..
16 application/x-javascript
1 application/json
96AarghBot Linux
96 text/..
1 -
82Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
52 image/..
30 text/..
1 application/json
1 application/x-javascript
73DotNetWikiBot/2.72 (Microsoft Windows NT 5.2.3790 Service Pack 2; )
73 text/..
72MPUploadBot; PHP 5.2.6-3ubuntu4.5
72 application/vnd.php.serialized
1 -
68SONIVIS MediaWiki API Bot 0.1.3
68 text/..
61DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600 Service Pack 2; )
61 text/..
1 application/xml
61Citation_bot; mail address
61 text/..
56WikiCrawler
56 text/..
42plantspedia data crawler
42 text/..
38MLBot (www.metadatalabs.com/mlbot)
38 text/..
1 -
1 image/..
38SineBot/1.5.15(User:SineBot)
37 application/vnd.php.serialized
1 text/..
37DotNetWikiBot/2.81 (Microsoft Windows NT 6.1.7600.0; )
31 text/..
4 application/xml
2 image/..
37Mozilla/5.0 (wiki bot)
37 text/..
32DotNetWikiBot/2.8 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
28 text/..
4 application/xml
31dictionary-bot
22 application/xml
9 text/..
29Test Webbot
29 text/..
24crawler mail address
24 text/..
23spider
22 text/..
1 image/..
1 application/xml
22LinkParser/1.00
22 text/..
22COIBot/1.00
22 text/..
20ABBYY Crawler/Nutch-1.0
20 text/..
1 -
20CorenSearchBot/1.4 en libwww-perl/5.808
20 text/..
18AnomieBOT 1.0 (OrphanReferenceFixer)
18 application/json
15Ohms Law Bot 2.72.373242786 (using Microsoft Windows NT 5.1.2600 Service Pack 3; )
15 text/..
14GoogleBot
14 text/..
1 image/..
14DOCODE Spider/Nutch-1.0
14 text/..
1 application/pdf
1 application/ogg
13SurakWare MediaWiki Bot/1.0
13 text/..
1 application/xml
13ListasBot 3
13 text/..
13WukongSpider
13 text/..
12FAST Search Web Crawler git
12 text/..
11HTMLParser/2.0
11 text/..
11YaDirectBot/1.0
11 text/..
11SoxBot PHP
10 application/vnd.php.serialized
1 text/..
10~Bot ([[:fr:w:User:TildeBot]] by [[:fr:w:User:Alphos]] mail address )
10 text/..
10QBot
10 text/..
10Mozilla/5.0 (Bgbot 0.5)
10 text/..
9HTMLParser/1.6
9 text/..
9Bot/WP/EN/Daniel/MediationBot1/1.2
9 text/..
9Tawbot (public svn release; plwiki)
9 text/..
9DotNetWikiBot 2.72.37298563 (using Microsoft Windows NT 5.1.2600 Service Pack 3; )
9 text/..
9dicbot 1.0
9 text/..
8Ohms Law Bot 2.72.373511026 (using Microsoft Windows NT 5.1.2600 Service Pack 3; )
8 text/..
8Mozilla/5.0 (compatible; SnapPreviewBot; en-US; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9
8 text/..
8GNAA-bot
8 text/..
8Jyxobot/1
8 text/..
7Mozilla/5.0 (X11; Linux i686; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 SnapPreviewBot
7 text/..
7UCMore Crawler App
7 text/..
7XLinkBot/1.00
7 text/..
7Bub's wikibot (Wikibot/2009092504; JWBF/1.2; Java/1.6)
7 text/..
6Pywikipediabot/2.0
6 application/json
1 text/..
6TrueKnowledgeBot bot mail address >
4 application/vnd.php.serialized
2 application/xml
6FAST Enterprise Crawler 6 used by HCL ( mail address )
6 text/..
6SiocWikiBot/1.0
5 application/vnd.php.serialized
1 text/..
54am-spider/1.0
5 text/..
5DoCoMo/2.0 SH904i(c100;TB;W24H16)(Y!J-AGENT)(robot)
5 text/..
1 image/..
5DotNetWikiBot/2.7 (Microsoft Windows NT 6.1.7600.0; )
3 text/..
2 application/xml
5Begun Robot Crawler
5 text/..
5Ohms Law Bot 2.72.373210301 (using Microsoft Windows NT 5.1.2600 Service Pack 3; )
5 text/..
5Freebase Deathbot
5 text/..
5FLMBot
5 text/..
5Mozilla/5.0 (Yahoo-MMCrawler/4.0; mail address )
3 image/..
2 text/..
4IScraperBot/0.1
2 application/xml
2 text/..
4KIT webcrawler/0.2.4
4 text/..
1 image/..
4OHMcrawly/OHMcrawly-0.1 (Beta) (Crawler der hochschulinternen Suchmaschine)
4 text/..
4Mozilla/5.0 (compatible; Feedtrace-bot/0.2; mail address )
3 text/..
1 image/..
4DotNetWikiBot/2.72 (Microsoft Windows NT 6.1.7600.0; )
4 text/..
4Jbot
4 text/..
4DotNetWikiBot/2.53 (Unix 2.6.26.2; )
4 text/..
4Geni ircpybot 1.0
2 application/json
2 text/..
1 application/xml
4AnomieBOT 1.0 (AFDMergeFromCleaner)
4 application/json
4DotNetWikiBot/2.8 (Microsoft Windows NT 6.0.6002 Service Pack 2; )
4 text/..
4Mozilla/5.0 (compatible; Windows NT 6.0) Gecko/20090624 Firefox/3.5 NjuiceBot
4 text/..
1 image/..
4Keybot Translation-Search-Machine
4 text/..
4CheMoBot/1.00
4 text/..
3bitlybot
3 text/..
1 -
1 image/..
3DotNetWikiBot/2.7 (Microsoft Windows NT 6.0.6002 Service Pack 2; )
3 text/..
1 application/xml
3Hexabot V1.3 - curl - api.php
3 text/..
3Erel Bot
3 text/..
3Mozilla/5.0 (compatible; sgbot v0.01a, mail address )
3 text/..
3TheKeens bot
3 text/..
3Netvibes Wasabi-bot v1.0
1 -
1 application/xml
1 text/..
1 application/opensearchdescription+xml
3Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
3 text/..
3HTMLParser/1.4
3 text/..
3('python-wikitools/1.1 (User:Mr.Z-bot)',)
3 application/json
3('python-wikitools/1.2 (User:Mr.Z-bot)',)
3 application/json
3AnomieBOT 1.0 (SourceUploader)
3 application/json
3Controller=Hannes Röst; de-wikipedia; PythonWikipediaBot/1.0
3 application/xml
1 application/json
1 text/..
3DotNetWikiBot/2.81 (Unix 2.6.31.20; )
3 text/..
1 application/xml
3TVersity Media Robot
3 text/..
3DotNetWikiBot 2.72.372740614 (using Microsoft Windows NT 5.1.2600 Service Pack 3; )
3 text/..
3FAST Enterprise Crawler 6 used by Finally ( mail address )
3 text/..
1 -
3unblockbot/1.00
3 text/..
8,719total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Wed, May 19, 2010 11:02
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.