Wikimedia Traffic Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Sep 2010 - 30 Sep 2010

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Op.Sys. / Browsers / Google

The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 47,613,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 372,490,000 external requests, which is 12.8%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
15,926yahoo
15,377 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
138 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
83 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
82 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
53 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmlimage/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
42 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
23 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
18 developer.yahoo.com/searchmonkey/useragentimage/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
17 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
17 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
17 help.yahoo.com/help/us/ysearch/slurpapplication/xmlMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
16 help.yahoo.com/help/us/ysearch/crawling/crawling-01.htmltext/..Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; url)
16 help.yahoo.com/help/us/ysearch/slurpapplication/vnd.php.serializedMozilla/5.0 (compatible Yahoo! Slurp/3.0 url)
14 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
3 help.yahoo.com/help/us/ysearch/slurpapplication/vnd.php.serializedMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
3 developer.yahoo.com/yql/providertext/..Mozilla/5.0 (compatible; Yahoo Pipes 2.0; url) Gecko/20090729 Firefox/3.5.2
11,320google
8,968 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
676 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
329 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
276 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
212 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
116 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
107 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
67 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
66 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
59 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid: wikipedia-raw)
59 code.google.com/appenginetext/..AppEngine-Google; (url; appid: npiv82)
44 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
34 www.google.com/bot.htmlimage/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
34 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
28 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
26 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
25 code.google.com/appenginetext/..AppEngine-Google; (url; appid: aneproxy)
23 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
19 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
13 www.google.com/bot.htmlimage/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
12 www.google.com/feedfetcher.htmlimage/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
10 code.google.com/appenginetext/..AppEngine-Google; (url; appid: ortografia4)
10 code.google.com/appenginetext/..AppEngine-Google; (url; appid: boxapp)
10 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
8 code.google.com/appenginetext/..oohEmbed.com AppEngine-Google; (url; appid: oohembed)
7 code.google.com/appenginetext/..AppEngine-Google; (url; appid: ortopedianew)
7 docs.google.comimage/..Mozilla/5.0 (compatible; GoogleDocs; url)
6 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
6 code.google.com/p/crawler4j/text/..crawler4j (url)
5 code.google.com/appengineapplication/jsonMWBOT GAE Edition AppEngine-Google; (url; appid: philip-bot)
5 code.google.com/appengineapplication/jsonPython-urllib/2.5 AppEngine-Google; (url; appid: loeschmonitor)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: finchproxy)
4 www.google.comtext/..Mozilla/5.0 (compatible; heritrix/2.0.0 url)
4 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
3 code.google.com/p/crawler4j/image/..crawler4j (url)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: dustbunnytycoonmonitor)
3 www.google.com/coop/cse/creftext/..FeedFetcher-Google-CoOp; (url)
7,249facebook
4,546 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
2,458 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
202 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.1 (url)
20 www.facebook.com/externalhit_uatext.php-facebookexternalhit/1.0 (url)
16 developers.facebook.comimage/..facebookplatform/1.0 (url)
6 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.1 (url)
3,592msn
2,166 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
912 search.msn.com/msnbot.htm-msnbot/2.0b (url)
264 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)._
101 search.msn.com/msnbot.htmtext/..msnbot-media/1.1 (url)
89 search.msn.com/msnbot.htmimage/..msnbot-media/1.1 (url)
29 search.msn.com/msnbot.htmtext/..msnbot-Products/1.0 (url)
16 search.msn.com/msnbot.htmimage/..msnbot/2.0b (url)
5 search.msn.com/msnbot.htmtext/..msnbot-UDiscovery/2.0b (url)
3 search.msn.com/msnbot.htmtext/..msnbot-NewsBlogs/2.0b (url)
3,503google?
3,093 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
235 www.google.com/bot.htmlapplication/vnd.php.serializedMozilla/5.0 (compatible; GoogleBot/2.1; url)
64 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
37 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
28 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
14 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
10 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
9 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
1,644naver
1,559 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
43 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
31 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
9 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
1,060baidu
690 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
303 www.baidu.jp/spider/text/..Baiduspider(url)
31 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
12 www.baidu.jp/spider/text/..BaiduImagespider(url)
7 www.baidu.jp/spider/-Baiduspider(url)
6 www.baidu.jp/spider/application/xmlBaiduspider(url)
5 www.baidu.com/search/spider.htm-Baiduspider(url)
975yandex
672 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexBot/3.0; url)
194 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexImages/3.0; url)
70 yandex.com/botsimage/..Mozilla/5.0 (compatible; YandexImages/3.0; url)
13 yandex.com/bots-Mozilla/5.0 (compatible; YandexBot/3.0; url)
9 yandex.com/botsimage/..Mozilla/5.0 (compatible; YandexImageResizer/2.0; url)
6 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexBot/3.0; MirrorDetector; url)
5 yandex.com/bots-Mozilla/5.0 (compatible; YandexImages/3.0; url)
699ask
521 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
173 about.ask.com/en/docs/about/webmasters.shtml-Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
3 about.ask.com/en/docs/about/webmasters.shtmlimage/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
551soso
493 help.soso.com/webspider.htmtext/..Sosospider(url)
48 help.soso.com/webspider.htm-Sosospider(url)
8 help.soso.com/soso-blog-spider.htmtext/..Sosoblogspider(url)
544mnemoo
395 www.mnemoo.com/about/spidertext/..Mnemoo WikiSearch Spider/0.1alpha (compatible; See url)
149 www.mnemoo.com/about/spidertext/..Mnemoo Intelligent Spider/0.1alpha (compatible; See url)
480exabot
252 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
213 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); url)
14 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
434youdao
339 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
53 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible;YodaoBot-Image/1.0;url;)
13 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
13 www.youdao.com/help/webmaster/spider/image/..Mozilla/5.0 (compatible;YodaoBot-Image/1.0;url;)
13 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
342traslated
342 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
337php
144 pear.php.net/text/..PEAR HTTP_Request class ( url )
62 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
42 pear.php.net/application/xmlPEAR HTTP_Request class ( url )
33 pear.php.net/image/..PEAR HTTP_Request class ( url )
27 pear.php.net/application/jsonPEAR HTTP_Request class ( url )
10 pear.php.net/package/http_request2text/..HTTP_Request2/0.5.1 (url) PHP/5.2.14
7 pear.php.net/package/http_request2text/..HTTP_Request2/0.5.2 (url) PHP/5.2.14
7 pear.php.net/package/http_request2text/..HTTP_Request2/0.5.1 (url) PHP/5.2.13
5 pear.php.net/package/http_request2text/..HTTP_Request2/0.5.2 (url) PHP/5.2.10-2ubuntu6.4
262scoutjet
262 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
261entireweb
258 www.entireweb.com/about/search_tech/speedy_spider/text/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
254yacy
61 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-24-generic; java 1.6.0_18; Europe/en) url
33 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.33.8-149.fc13.x86_64; java 1.6.0_18; Europe/en) url
25 yacy.net/bot.html-yacybot (amd64 Linux 2.6.33.8-149.fc13.x86_64; java 1.6.0_18; Europe/en) url
11 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.30-gentoo-r6-090907; java 1.6.0_17; GMT/de) url
11 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-24-generic; java 1.6.0_18; Europe/de) url
9 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.26-2-686; java 1.6.0_0; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.24-23-server; java 1.6.0_20; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-amd64; java 1.6.0_20; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Windows Vista 6.0; java 1.6.0_23-ea; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Windows 7 6.1; java 1.6.0_21; Europe/fr) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.34.7-56.fc13.x86_64; java 1.6.0_18; America/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.34-12-desktop; java 1.6.0_17; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.32-24-generic; java 1.6.0_18; America/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 FreeBSD 8.0-RELEASE-p3; java 1.6.0_07; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.24-1-686; java 1.6.0_0; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28.3-vs2.3.0.36.4; java 1.6.0_20; Universal/en) url
3 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_21; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Windows Server 2008 R2 6.1; java 1.6.0_21; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-24-server; java 1.6.0_18; America/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-24-generic; java 1.6.0_20; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.35; java 1.6.0_18; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.32-24-generic; java 1.6.0_18; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-194.11.3.el5; java 1.6.0_14; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-24-generic; java 1.6.0_18; Europe/fr) url
222majestic12
210 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.3; url)
8 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.2; url)
204wikipedia
147 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.6 url
21 en.wikipedia.org/wiki/User:NicoV/Wikipedia_Cleaner/Documentationtext/..WikiCleaner (url)
21 en.wikipedia.org/wiki/Web_crawlertext/..GoogleBot/Nutch-1.0 (Prototype; url; mail address )
5 en.wikipedia.orgtext/..url
3 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle Custom/0.9.6 url
183sogou
168 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
12 www.sogou.com/docs/help/webmasters.htm#07image/..Sogou Pic Spider/3.0(url)
17980legs
117 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url) Gecko/2008032620
34 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
18 www.80legs.com/spider.htmlimage/..Mozilla/5.0 (compatible; 008/0.83; url) Gecko/2008032620
8 www.80legs.com/webcrawler.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url) Gecko/2008032620
159toolserver
109 wiki.toolserver.org/view/GeoHacktext/..Geohack (url)
38 toolserver.org/~bayo/text/..LudoThecaire/1.0 (url)
4 toolserver.org/~dispenser/text/..WebWikipedia Python/2.6 (url)
3 toolserver.org/~para/cgi-bin/kmlexporttext/..url libwww-perl/5.835
3 toolserver.org/~guandalug/application/vnd.php.serializedGuandalugs PHPWikiBot/1.1 (url;de:User:Guandalug)
153metamoji
152 www.metamoji.com/jp/crawler.htmltext/..Mozilla/5.0 (compatible; MetamojiCrawler/1.0; url
144wikimedia
142 tools.wikimedia.de/~daniel/text/..WikiSense (url)
136goo
130 help.goo.ne.jp/contact/text/..goo wikipedia (url)
3 help.goo.ne.jp/help/article/1142/text/..DoCoMo/2.0 P900i(c100;TB;W24H11) (compatible; ichiro/mobile goo; url)
131sblog
91 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
26 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
13 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
126ayna
93 www.ayna.comtext/..Mozilla/5.0 (compatible; Ayna url)
33 www.ayna.comtext/..Mozilla/5.0 (compatible; ayna-crawler url)
124wordpress
14 support.wordpress.com/contact/text/..WordPress.com mShots; url
11 almanac2010.wordpress.comtext/..WordPress/MU; url
9 josefboberg.wordpress.comtext/..WordPress/MU; url
8 zosotruthtalk.wordpress.comtext/..WordPress/MU; url
7 benabb.wordpress.comtext/..WordPress/MU; url
5 spacebarshift.wordpress.comtext/..WordPress/MU; url
4 brianakira.wordpress.comtext/..WordPress/MU; url
3 chopshoptopcop.wordpress.comtext/..WordPress/MU; url
3 terrytao.wordpress.comtext/..WordPress/MU; url
102daum
102 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
99semager
93 www.semager.de/blog/semager-bots/text/..Mozilla/5.0 (compatible; Semager/1.4; url)
5 www.semager.de/blog/semager-bots/application/jsonMozilla/5.0 (compatible; Semager/1.4; url)
93github
88 github.com/pauldix/typhoeus/tree/mastertext/..Typhoeus - url
4 github.com/edsu/linkypediaapplication/jsonlinkpyediabot v0.1: url
82kosmix
66 www.kosmix.com/html/kosmos.htmlapplication/xmlMozilla/5.0(compatible;Kosmos/1.0;url)
16 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
79emining
77 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
74FeedBurner
74 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
68sf
23 liferea.sf.net/text/..Liferea/1.x.x (Linux; es_ES.UTF-8; url)
22 liferea.sf.net/text/..Liferea/0.x.x (Linux; en_US.UTF-8; url)
22 magpierss.sf.nettext/..MagpieRSS/0.7x (url)
68waw
68 dubi.itinfo.waw.plimage/..WordPress/2.8.6; url
62justsystems
62 www.justsystems.com/jp/tech/crawler/text/..JUST-CRAWLER(url)
61newsgator
23 www.newsgator.com/text/..FeedDemon/2.7 (url; Microsoft Windows XP)
22 www.newsgator.comtext/..NewsGatorOnline/2.0 (url; 1 subscribers)
14 www.newsgator.com/Individuals/NetNewsWire/-NetNewsWire/3.2.7 (Mac OS X; url; gzip-happy)
58rubyforge
57 rubyforge.org/projects/mechanize/text/..WWW-Mechanize/0.9.3 (url)
56chug
56 crawler.chug.nettext/..Mozilla/5.0 (compatible; heritrix/3.0.0 url)
54freebase
54 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
52textdigger
52 textdigger.comtext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
51archive-it
42 archive-it.org/files/site-owners.htmltext/..Mozilla/5.0 (compatible;archive.org_bot; Archive-It; url) Firefox
9 archive-it.org/files/site-owners.htmlimage/..Mozilla/5.0 (compatible;archive.org_bot; Archive-It; url) Firefox
49heartrails
18 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.9) Gecko/20100913 HeartRails_Capture/1.0.4 (url) Namoroka/3.6.9
14 capture.heartrails.com/image/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.9) Gecko/20100913 HeartRails_Capture/1.0.4 (url) Namoroka/3.6.9
8 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.8) Gecko/20100730 HeartRails_Capture/1.0.4 (url) Namoroka/3.6.8
7 capture.heartrails.com/image/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.8) Gecko/20100730 HeartRails_Capture/1.0.4 (url) Namoroka/3.6.8
48bsurprised
45 bsurprised.com/text/..BSurprised WikiBox 0.1 (url)
3 bsurprised.com/text/..BSurprised WikiBox 0.1.1 (url)
47jetbrains
24 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 2.0 Release Candidate 1 (url)
23 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 1.0.x (url)
45avantbrowser
23 www.avantbrowser.comtext/..Advanced Browser (url)
22 www.avantbrowser.comtext/..Avant Browser (url)
45feedshow
23 www.feedshow.comtext/..Feedshow/x.0 (url; 1 subscriber)
22 www.feedshow.comtext/..FeedshowOnline (url)
43www.
24 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
10 www.text/..GoogleBot-Image/1.0 ( urlGoogleBot.com/bot.html)
5 www.text/..Google - GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
3 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
39weblio
38 www.weblio.jp/text/..Mozilla/5.0 (compatible; WeblioBot; url)
39spinn3r
35 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux x86_64; en-US; rv:1.9.0.19; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/2010040121 Firefox/3.0.19
3 spinn3r.com/robot-Mozilla/5.0 (X11; Linux x86_64; en-US; rv:1.9.0.19; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/2010040121 Firefox/3.0.19
37accelobot
37 www.accelobot.comtext/..Mozilla/5.0 (compatible; heritrix/1.14.3 url)
34hatena
30 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
4 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
33Anonymouse
19 Anonymouse.org/text/..url (Unix)
14 Anonymouse.org/image/..url (Unix)
32snap
30 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
30rcdtokyo
23 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
7 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
29teesoft
9 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
7 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
26gigablast
26 www.gigablast.com/spider.htmltext/..Gigabot/3.0 (url)
26oneriot
25 www.oneriot.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (url)
25tinyurl
25 tinyurl.com/64t5ntext/..Rome Client (url) Ver: 0.9
25ponderer
25 ponderer.org/download/annotate_google.user.jstext/..annotate_google; url
24alexa
23 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
24graemef
24 graemef.comtext/..NewsGator FetchLinks extension/0.2.0 (url)
23blogbridge
23 www.blogbridge.com/text/..BlogBridge 2.13 (url)
23rssreader
23 www.rssreader.comtext/..RssReader/1.0.xx.x (url) Microsoft Windows NT 5.1.2600.0
23timewe
23 timewe.nettext/..CDR/1.7.1 Simulator/0.7(url) Profile/MIDP-1.0 Configuration/CLDC-1.0
23winpodder
23 winpodder.comtext/..WinPodder (url)
23orcabrowser
23 www.orcabrowser.comtext/..Orca Browser (url)
23plagger
23 plagger.org/text/..Plagger/0.x.xx (url)
23kula
23 kula.jp/endotext/..endo/1.0 (Mac OS X; ppc i386; url)
23it-influentials
23 search.it-influentials.com/bot.htmtext/..Mozilla/5.0 (compatible;FindITAnswersbot/1.0;url)
23seebot
23 seebot.orgtext/..Lynx/2.8 (;url)
22zipcommander
22 www.zipcommander.com/text/..1st ZipCommander (Net) - url
22zootycoon
22 www.zootycoon.comtext/..Zoo Tycoon 2 Client -- url
22snarfware
22 www.snarfware.com/text/..Snarfer/0.x.x (url)
22ranchero
22 ranchero.com/netnewswire/text/..NetNewsWire/2.x (Mac OS X; url)
22rssbandit
22 www.rssbandit.orgtext/..RssBandit/1.5.0.10 (WinNT 5.1.2600.0; url) (WinNT 5.1.2600.0; )
22nemui
22 mozshot.nemui.org/text/..Mozilla/5.0 (Gecko/20070310 Mozshot/0.0.20070628; url)
21abonti
21 www.abonti.comtext/..Mozilla/5.0 (compatible; Abonti/0.91 - url)
21puritysearch
21 www.puritysearch.net/text/..Mozilla/5.0 (compatible; Purebot/1.1; url)
21discoveryengine
16 discoveryengine.com/discobot.htmltext/..Mozilla/5.0 (compatible; discobot/1.1; url
5 discoveryengine.com/discobot.htmlimage/..Mozilla/5.0 (compatible; discobot/1.1; url
21feeds4all
21 www.feeds4all.com/feedzcollectortext/..FeedZcollector v1.x (Platinum) url
18froute
14 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
4 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
18gramtrans
18 gramtrans.com/text/..GramTrans (url)
17memidex
17 www.memidex.com/_bottext/..Mozilla/5.0 (compatible; Memibot/1.0; url )
17webperf
5 www.webperf.deimage/..webPerf 0.02 (visit url for more informations)
3 www.webperf.detext/..webPerf 0.02 (visit url for more informations)
3 www.webperf.deimage/..webPerf 0.03 (visit url for more informations)
17mixi
9 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
8 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
16fairshare
9 fairshare.cctext/..Mozilla/5.0 url (X11; FreeBSD i386; en-US; rv:1.2a) Gecko/20021021
5 fairshare.cctext/..Mozilla crawl/5.0 (compatible; fairshare.cc url)
16holmes
16 holmes.getext/..HolmesBot (url)
16vbseo
16 www.vbseo.comtext/..Mozilla/4.0 (vBSEO; url)
15chainn
13 www.chainn.com/mxbot.htmltext/..Mozilla/5.0 (compatible; mxbot/1.0; url)
15bloglines
8 www.bloglines.com-Bloglines/3.1 (url; 1 subscriber)
3 www.bloglines.comtext/..Bloglines/3.1 (url; 1 subscriber)
3 www.bloglines.comapplication/xmlBloglines/3.1 (url; 1 subscriber)
14topsy
14 labs.topsy.com/butterfly/text/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
14wise-guys
9 www.wise-guys.nl/text/..Mozilla/4.0 (compatible; Vagabondo/4.0/CGM; url)
3 webagent.wise-guys.nl/text/..Mozilla/4.0 (compatible; Vagabondo/4.0; mail address ; url; http://www.wise-guys.nl/)
14js-kit
14 js-kit.com/text/..JS-Kit URL Resolver, url
14picsearch
11 www.picsearch.com/bot.htmltext/..psbot/0.1 (url)
3 www.picsearch.com/bot.htmlimage/..psbot/0.1 (url)
13turnitin
13 www.turnitin.com/robot/crawlerinfo.htmltext/..TurnitinBot/2.1 (url)
13mediawiki
8 www.mediawiki.org/wiki/Extension:XMLRCtext/..rc2udp.py (url) Python-urllib/1.17
4 www.mediawiki.org/text/..MediaWiki OAI Harvester 0.2 (url)
13search
13 www.search.ch/rim.htmltext/..UltraSpider3000/1.0 (url)
12simplepie
7 simplepie.orgapplication/xmlSimplePie/1.2 (Feed Parser; url; Allow like Gecko) Build/20090627192103
3 simplepie.orgapplication/xmlSimplePie/1.1.1 (Feed Parser; url; Allow like Gecko) Build/20080315205903
12yioop
11 www.yioop.com/bot.phptext/..Mozilla/5.0 (compatible; YioopBot url)
12moose
12 www.moose.at/about.phptext/..Mozilla/5.0 (compatible; Moose/1.2; Linux i686; de; url)
11cogitoergosum
11 cogitoergosum.co.cctext/..WordPress/MU; url
11rohitkhatkar
11 rohitkhatkar.com/text/..RohitKhatkar Spider/Nutch-1.1 (url; mail address )
10linkedin
6 www.linkedin.comtext/..LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta Commons-HttpClient/3.1 url)
4 www.linkedin.comimage/..LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta Commons-HttpClient/3.1 url)
10etceterra
10 etceterra.org/text/..etceterra.org (url)
53,723total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
3,428PythonWikipediaBot/1.0
2,325 application/json
990 application/xml
112 text/..
1 image/..
1 -
1,539ExactusBot-v0.1
1,539 text/..
1,281ClueBot/1.1
1,048 application/vnd.php.serialized
233 text/..
1 -
1,129GoogleBot-Image/1.0
567 text/..
283 image/..
279 -
1 application/pdf
545Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
545 text/..
1 -
1 image/..
1 application/ogg
1 application/vnd.php.serialized
387LinkParser/2.0
387 text/..
269Answersbot
269 text/..
250Onespot Crawler
177 application/json
73 text/..
243Citation_bot; mail address
243 text/..
211wikiwix-bot-3.0
192 text/..
18 image/..
1 -
206php wikibot classes
189 application/vnd.php.serialized
17 text/..
1 -
190spider
189 text/..
1 image/..
1 -
1 application/json
162Peachy MediaWiki Bot API Version 1.0
157 application/vnd.php.serialized
5 text/..
1 -
150GoogleBot-Image/1.0
138 text/..
12 image/..
1 -
149GoogleBot-News
148 text/..
1 -
125Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
68 text/..
56 image/..
1 application/x-javascript
1 application/json
101SiocWikiBot/1.0
98 application/vnd.php.serialized
3 text/..
82Pywikipediabot/2.0
82 application/json
79PywikiBot 1.0 mail address
79 text/..
76Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
41 image/..
35 text/..
1 application/json
1 application/x-javascript
73crawler mail address
73 text/..
55DotNetWikiBot/2.81 (Microsoft Windows NT 6.1.7600.0; )
41 text/..
10 application/xml
4 image/..
1 application/ogg
54MLBot (www.metadatalabs.com/mlbot)
54 text/..
1 -
1 image/..
1 application/vnd.php.serialized
54Mozilla 5.0 (Apibot 0.20)
54 application/vnd.php.serialized
1 text/..
53Opera/8.01 (J2ME/MIDP; MXit WebBot/1.1.4.0) Opera Mini/3.1
26 image/..
25 application/vnd.wap.xhtml+xml
2 text/..
1 -
51plantspedia data crawler
51 text/..
49TheKeens bot
49 text/..
47MoovidaBot/0.1
47 text/..
46dicbot 1.0
46 text/..
44python-wikitools/1.2 (User:Mr.Z-bot)
44 application/json
41ZanranCrawler/0.3 ( mail address )
41 text/..
40Mozilla/5.0 (iPhone; CPU iPhone OS 4_0_1 like Mac OS X; fr-fr) OrangeBot-Mobile AppleWebKit/532.9 (KHTML
37 image/..
3 text/..
40CorenSearchBot/1.5 en libwww-perl/5.834
40 text/..
39Test Webbot
39 text/..
39SineBot/1.5.17(User:SineBot)
38 application/vnd.php.serialized
1 text/..
35MediaWiki::Bot/3.1.6 (User:SporkBot)
35 application/json
32infraEnterprise v8 Web Crawler
28 -
4 text/..
32COIBot/2.0
32 text/..
27GoogleBot
27 text/..
1 image/..
27Jyxobot/1
27 text/..
26Mozilla/5.0 wtvbot/0.7-snapshot
26 text/..
1 -
1 image/..
1 application/opensearchdescription+xml
23Mozilla/5.0 (compatible; SnapPreviewBot; en-US; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9
23 text/..
1 -
23UCMore Crawler App
23 text/..
1 -
238qiu-spider/Nutch-1.0 (this is a crawler of 8qiu; www.8qiu.com; mail address )
22 text/..
1 image/..
22Mozilla/5.0 (X11; Linux i686; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 SnapPreviewBot
22 text/..
20('python-wikitools/1.2 (User:BernsteinBot)',)
20 application/json
20HTMLParser/1.6
19 text/..
1 application/json
1 image/..
20DotNetWikiBot/2.95 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
20 text/..
1 application/xml
19NATE.ROBOT Mozilla/5.0 (Windows; Windows NT 5.1; en-US) AppleWebKit/533.4 KHTML Chrome/5.0.375.125 Safari/533.4
19 text/..
1 application/xml
18MSR-ISRCCrawler
13 text/..
5 image/..
1 application/json
1 application/x-javascript
17DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
16 text/..
1 application/xml
17AarghBot Linux
17 text/..
16Web search crawler. For details mail address
16 text/..
16MystBot/1.5 fr libwww-perl/5.836
16 text/..
16AnomieBOT 1.0 (OrphanReferenceFixer)
16 application/json
16COIBot/1.00
16 text/..
15Mozilla 5.0 (Apibot 0.20b)
15 application/vnd.php.serialized
15SurakWare MediaWiki Bot/1.0
15 text/..
1 application/xml
14Bot/WP/EN/Daniel/MediationBot1/1.2
14 text/..
14VWBot - CorenSearchBot/1.5 en derivative
14 text/..
13cdoplab spider/Nutch-1.1
13 text/..
13Tawbot (public svn release; plwiki)
13 text/..
12~Bot ([[:fr:w:User:TildeBot]] by [[:fr:w:User:Alphos]] mail address )
12 text/..
11Opera/8.01 (J2ME/MIDP; MXit WebBot/1.1.3.0) Opera Mini/3.1
5 application/vnd.wap.xhtml+xml
5 image/..
1 text/..
11Jbot
11 text/..
11Twitterbot/0.1
11 text/..
1 image/..
9FAST Enterprise Crawler 6 used by ANS ( mail address )
9 text/..
1 -
9Bub's wikibot (Wikibot/2010040100; JWBF/1.2; Java/1.6)
9 text/..
9Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; en-US; rv:1.9.2.8; flipboard.com/crawler rv:0.0.5) Gecko Firefox
7 image/..
2 text/..
8Mozilla/5.0 (X11; Linux x86_64; de-DE; rv:1.9.0.19) Gecko/2010062510 ThumbShotsBot (KFSW 3.0.6-3)
5 image/..
3 text/..
8Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
8 text/..
8ResCompSpider/Nutch-1.1
8 text/..
8ibo2bot
8 text/..
8TVersity Media Robot
8 text/..
8 mail address (Mozilla compatible)
8 text/..
74am-spider/1.0
7 text/..
7XLinkBot/1.00
7 text/..
7DotNetWikiBot/2.94 (Microsoft Windows NT 6.1.7600.0; )
7 text/..
1 application/xml
7Mozilla/4.0 (compatible; MT search portal spider/3.0; mail address )"
7 application/xml
1 text/..
7Crawler 0.0
7 text/..
1 image/..
7Mozilla/5.0 (compatible; Windows NT 6.0) Gecko/20090624 Firefox/3.5 NjuiceBot
7 text/..
1 image/..
6sciencebot/1.0
6 text/..
6CaBot Script (running on nightshade.toolserver.org)
6 application/vnd.php.serialized
6Teoma/Nutch-1.0 ( Question and Answer Search; mail address )
6 text/..
6HTMLParser/2.0
6 text/..
1 -
6COMODOspider/Nutch-1.0
6 text/..
6Geni ircpybot 1.0
3 application/json
3 text/..
1 application/xml
6AnomieBOT 1.0 (SourceUploader)
6 application/json
1 text/..
6DotNetWikiBot/2.9 (Microsoft Windows NT 6.1.7600.0; )
4 text/..
2 application/xml
6Mozilla/5.0 (Bgbot 0.5)
6 text/..
6DotNetWikiBot/2.94 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
6 text/..
1 application/xml
5Jabse.com Crawler v.2.0 www.jabse.com/crawler.php
5 text/..
5msramlbot
5 text/..
5Freebase Deathbot
5 text/..
5SONIVIS MediaWiki API Bot 0.1.3
5 text/..
5Casper Bot Search
5 text/..
1 -
1 application/xml
5LPbot/Nutch-1.1
5 text/..
1 image/..
5('python-wikitools/1.2 (User:LaraBot)',)
5 application/json
4herbertBot
4 text/..
4Handelabra WikiBot
3 text/..
1 application/vnd.php.serialized
1 -
4.NET Client Parser
4 application/xml
4PicselSpider/1.0
4 text/..
4IIT Bombay CFILT NLP Bot/Nutch-1.1 (IITB CFILT Crawler)
4 text/..
1 image/..
4DotNetWikiBot/2.95 (Unix 2.6.32.25; )
3 text/..
1 application/xml
4AnomieBOT 1.0 (AFDMergeFromCleaner)
4 application/json
4DotNetWikiBot/2.9 (Unix 2.6.26.2; )
4 text/..
4unblockbot/1.00
4 text/..
3bitlybot
3 text/..
1 image/..
3Opera/9.80 (J2ME/MIDP; Opera Mini/5.0 (iPhone; CPU iPhone 0S 3.0 like Mac 0S X; en-us; compatible; GoogleBot/19.916; U; en) Presto/2.5.25
2 image/..
1 text/..
3gsa-crawler (Enterprise; S5-EVJT7EMTU8NAB; mail address )
3 text/..
3Opera/9.80 (J2ME/MIDP; Opera Mini/5.0 (iPhone; CPU iPhone 0S 3.0 like Mac 0S X; en-us; compatible; GoogleBot/20.2463; U; en) Presto/2.5.25
2 image/..
1 text/..
3kmccrew Bot Search
3 text/..
3Mozilla/5.0 (Apibot 0.01)
3 application/vnd.php.serialized
3Mybot1
3 text/..
3FAST Enterprise Crawler 6 used by a (a)
3 text/..
3core-I-bot/1.0
3 text/..
1 image/..
3Twib::Crawler
3 text/..
1 image/..
3AnomieBOT 1.0 (RandomPagePicker)
3 application/json
3DotNetWikiBot/2.9 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
3 text/..
3Mozilla/5.0 (compatible; AMZNKAssocBot/4.0)
3 text/..
3HBC Archive Indexerbot 0.9a
3 text/..
3Mozilla/5.0 (iPhone; CPU iPhone OS 4_0_1 like Mac OS X; fr-fr) OrangeBot-Mobile AppleWebKit/532.9 KHTML Version/4.0.5 Mobile/8A306 Safari/( mail address )
3 text/..
3IssueCrawler
3 text/..
3AnomieBOT 1.0 (DeletionSortingCleaner)
3 application/json
1 text/..
12,222total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Tue, Oct 19, 2010 3:21
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.