Wikimedia Traffic Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Oct 2010 - 31 Oct 2010

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Op.Sys. / Browsers / Google

The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 42,646,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 374,455,000 external requests, which is 11.4%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
14,882yahoo
14,368 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
110 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
93 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
59 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
58 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmlimage/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
43 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
25 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
21 developer.yahoo.com/searchmonkey/useragentimage/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
19 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
17 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
16 help.yahoo.com/help/us/ysearch/crawling/crawling-01.htmltext/..Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; url)
15 help.yahoo.com/help/us/ysearch/slurpapplication/vnd.php.serializedMozilla/5.0 (compatible Yahoo! Slurp/3.0 url)
14 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
14 help.yahoo.com/help/us/ysearch/slurpapplication/xmlMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
3 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
8,755google
5,908 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
709 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
486 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
276 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
226 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
199 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
175 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
120 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
73 code.google.com/appenginetext/..AppEngine-Google; (url; appid: npiv82)
66 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
64 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid: wikipedia-raw)
51 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
40 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
35 code.google.com/appenginetext/..AppEngine-Google; (url; appid: ortografia4)
27 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
25 www.google.com/bot.htmlimage/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
22 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
21 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
17 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url),gzip(gfe) AppEngine-Google; (http://code.google.com/appengine; appid: image-proxy2)
15 www.google.com/bot.htmlimage/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
13 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url),gzip(gfe) AppEngine-Google; (http://code.google.com/appengine; appid: jpg-images)
13 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url),gzip(gfe) AppEngine-Google; (http://code.google.com/appengine; appid: gif-images)
12 www.google.com/feedfetcher.htmlimage/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
12 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
12 code.google.com/appenginetext/..AppEngine-Google; (url; appid: boxapp)
10 code.google.com/appenginetext/..AppEngine-Google; (url; appid: ortopedianew)
10 code.google.com/p/crawler4j/text/..crawler4j (url)
9 code.google.com/appenginetext/..AppEngine-Google; (url; appid: aneproxy)
8 code.google.com/appenginetext/..AppEngine-Google; (url; appid: good-proxy)
8 code.google.com/appenginetext/..oohEmbed.com AppEngine-Google; (url; appid: oohembed)
7 docs.google.comimage/..Mozilla/5.0 (compatible; GoogleDocs; url)
7 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
6 code.google.com/appengineapplication/jsonMWBOT GAE Edition AppEngine-Google; (url; appid: philip-bot)
5 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
5 code.google.com/appenginetext/..AppEngine-Google; (url; appid: lullar-data),gzip(gfe) (via translate.google.com)
5 code.google.com/appenginetext/..AppEngine-Google; (url; appid: nwikiproxy)
4 code.google.com/appenginetext/..Mozilla/5.0 (Windows; Windows NT 5.1; zh-CN; rv:1.8.1.14) Gecko/20080404 (FoxPlus) Firefox/2.0.0.14 AppEngine-Google; (url; appid: 1000hottest)
4 code.google.com/appenginetext/..WikiBot/0.1 AppEngine-Google; (url; appid: newikipedia)
3 code.google.com/p/crawler4j/image/..crawler4j (url)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: findadvise)
3 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
3 www.google.com/coop/cse/creftext/..FeedFetcher-Google-CoOp; (url)
7,326facebook
4,607 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
2,397 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
270 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.1 (url)
26 www.facebook.com/externalhit_uatext.php-facebookexternalhit/1.0 (url)
14 developers.facebook.comimage/..facebookplatform/1.0 (url)
9 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.1 (url)
3 developers.facebook.comtext/..facebookplatform/1.0 (url)
3,055google?
2,844 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
68 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
46 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
26 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
18 www.google.com/bot.htmlapplication/vnd.php.serializedMozilla/5.0 (compatible; GoogleBot/2.1; url)
12 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
12 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
11 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
7 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
2,107msn
1,112 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
425 search.msn.com/msnbot.htm-msnbot/2.0b (url)
355 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)._
77 search.msn.com/msnbot.htmtext/..msnbot-media/1.1 (url)
70 search.msn.com/msnbot.htmimage/..msnbot-media/1.1 (url)
39 search.msn.com/msnbot.htmtext/..msnbot-Products/1.0 (url)
8 search.msn.com/msnbot.htmtext/..msnbot-NewsBlogs/2.0b (url)
7 search.msn.com/msnbot.htmimage/..msnbot/2.0b (url)
5 search.msn.com/msnbot.htmtext/..msnbot-UDiscovery/2.0b (url)
3 search.msn.com/msnbot.htmapplication/xmlmsnbot/2.0b (url)._
1,648naver
1,549 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
53 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
34 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
9 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
1,466bing
1,142 www.bing.com/bingbot.htmtext/..Mozilla/5.0 (compatible; bingbot/2.0; url)
313 www.bing.com/bingbot.htm-Mozilla/5.0 (compatible; bingbot/2.0; url)
9 www.bing.com/bingbot.htmimage/..Mozilla/5.0 (compatible; bingbot/2.0; url)
1,198yandex
859 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexBot/3.0; url)
193 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexImages/3.0; url)
98 yandex.com/botsimage/..Mozilla/5.0 (compatible; YandexImages/3.0; url)
13 yandex.com/bots-Mozilla/5.0 (compatible; YandexBot/3.0; url)
12 yandex.com/botsimage/..Mozilla/5.0 (compatible; YandexImageResizer/2.0; url)
7 yandex.com/bots-Mozilla/5.0 (compatible; YandexImages/3.0; url)
7 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexBot/3.0; MirrorDetector; url)
1,008baidu
582 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
362 www.baidu.jp/spider/text/..Baiduspider(url)
22 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
14 www.baidu.jp/spider/text/..BaiduImagespider(url)
8 www.baidu.jp/spider/-Baiduspider(url)
8 www.baidu.jp/spider/application/xmlBaiduspider(url)
6 www.baidu.com/search/spider.htm-Baiduspider(url)
513ask
460 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
45 about.ask.com/en/docs/about/webmasters.shtml-Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
5 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url) (via Web-Blaster/2.21 (http://www.assoziations-blaster.de/web-blast.html))
425youdao
347 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
39 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible;YodaoBot-Image/1.0;url;)
15 www.youdao.com/help/webmaster/spider/image/..Mozilla/5.0 (compatible;YodaoBot-Image/1.0;url;)
13 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
8 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
3 toolbar.youdao.com/image/..Youdao Toolbar (url)
389soso
373 help.soso.com/webspider.htmtext/..Sosospider(url)
8 help.soso.com/soso-blog-spider.htmtext/..Sosoblogspider(url)
6 help.soso.com/webspider.htm-Sosospider(url)
382exabot
197 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
174 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); url)
11 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
347majestic12
328 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.3; url)
11 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.2; url)
6 www.majestic12.co.uk/bot.php?text/..User-Agent :Mozilla/5.0 (compatible; MJ12bot/v1.3.3; url)
329traslated
329 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
260entireweb
256 www.entireweb.com/about/search_tech/speedy_spider/text/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
3 www.entireweb.com/about/search_tech/speedy_spider/-Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
239scoutjet
239 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
227sogou
212 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
12 www.sogou.com/docs/help/webmasters.htm#07image/..Sogou Pic Spider/3.0(url)
216php
73 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
63 pear.php.net/application/xmlPEAR HTTP_Request class ( url )
49 pear.php.net/text/..PEAR HTTP_Request class ( url )
29 pear.php.net/package/http_request2text/..HTTP_Request2/0.5.2 (url) PHP/5.2.14
211wordpress
33 driwancybermuseum.wordpress.comtext/..WordPress/MU; url
29 josefboberg.wordpress.comtext/..WordPress/MU; url
15 zosotruthtalk.wordpress.comtext/..WordPress/MU; url
8 bwfloor.wordpress.comtext/..WordPress/MU; url
6 almanac2010.wordpress.comtext/..WordPress/MU; url
5 benabb.wordpress.comtext/..WordPress/MU; url
5 spacebarshift.wordpress.comtext/..WordPress/MU; url
4 hershlagdesign.wordpress.comtext/..WordPress/MU; url
4 mannaismayaadventure.wordpress.comtext/..WordPress/MU; url
3 warorpeace.wordpress.comtext/..WordPress/MU; url
3 nikolaykot.wordpress.comtext/..WordPress/MU; url
3 worldwright.wordpress.comtext/..WordPress/MU; url
3 support.wordpress.com/contact/text/..WordPress.com mShots; url
3 syiahali.wordpress.comtext/..WordPress/MU; url
3 kterrl.wordpress.comtext/..WordPress/MU; url
3 bladecyberpunk.wordpress.comtext/..WordPress/MU; url
3 hongkongwillie.wordpress.comtext/..WordPress/MU; url
199wikipedia
74 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.8 url
55 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.6 url
19 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.7 url
19 en.wikipedia.org/wiki/User:NicoV/Wikipedia_Cleaner/Documentationtext/..WikiCleaner (url)
10 en.wikipedia.orgtext/..url
7 en.wikipedia.org/wiki/Web_crawlertext/..GoogleBot/Nutch-1.0 (Prototype; url; mail address )
5 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.9 url
3 fr.wikipedia.org/wiki/Utilisateur:Salebotapplication/jsonSalebot, see url (uses Perl MediaWiki::API)
186mnemoo
139 www.mnemoo.com/about/spidertext/..Mnemoo Intelligent Spider/0.1alpha (compatible; See url)
47 www.mnemoo.com/about/spidertext/..Mnemoo WikiSearch Spider/0.1alpha (compatible; See url)
168toolserver
119 wiki.toolserver.org/view/GeoHacktext/..Geohack (url)
39 toolserver.org/~bayo/text/..LudoThecaire/1.0 (url)
4 toolserver.org/~dispenser/text/..WebWikipedia Python/2.6 (url)
3 toolserver.org/~para/cgi-bin/kmlexporttext/..url libwww-perl/5.835
15780legs
103 www.80legs.com/webcrawler.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url) Gecko/2008032620
52 www.80legs.com/webcrawler.htmlimage/..Mozilla/5.0 (compatible; 008/0.83; url) Gecko/2008032620
153yacy
26 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-25-generic; java 1.6.0_18; Europe/en) url
10 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.32-25-generic; java 1.6.0_18; America/en) url
9 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-028stab070.4; java 1.6.0_0; Europe/en) url
8 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.32-25-generic; java 1.6.0_18; Europe/en) url
6 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-25-generic; java 1.6.0_18; Europe/de) url
6 yacy.net/bot.htmltext/..yacybot (amd64 Windows 7 6.1; java 1.6.0_21; Europe/fr) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.35-22-generic; java 1.6.0_22; Europe/fr) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-194.11.3.el5xen; java 1.6.0_0; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.24-etchnhalf.1-amd64; java 1.5.0_14; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 FreeBSD 8.0-RELEASE-p3; java 1.6.0_07; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 FreeBSD 8.0-RELEASE; java 1.6.0_07; Europe/ru) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.30-gentoo-r6-090907; java 1.6.0_17; GMT/de) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.32-25-generic-pae; java 1.6.0_18; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-22-server; java 1.6.0_20; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-amd64; java 1.6.0_20; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.35-22-generic; java 1.6.0_20; Europe/en) url
137wikimedia
135 tools.wikimedia.de/~daniel/text/..WikiSense (url)
123goo
112 help.goo.ne.jp/contact/text/..goo wikipedia (url)
5 help.goo.ne.jp/help/article/1142/text/..DoCoMo/2.0 P900i(c100;TB;W24H11) (compatible; ichiro/mobile goo; url)
111sblog
59 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
20 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
18 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
13 fulltext.sblog.cz/text/..SeznamBot/3.0-alpha (url)
109semager
83 www.semager.de/blog/semager-bots/text/..Mozilla/5.0 (compatible; Semager/1.4; url)
25 www.semager.de/blog/semager-bots/application/jsonMozilla/5.0 (compatible; Semager/1.4; url)
107ayna
107 www.ayna.comtext/..Mozilla/5.0 (compatible; Ayna url)
101daum
101 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
83freebase
82 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
77kosmix
68 www.kosmix.com/html/kosmos.htmlapplication/xmlMozilla/5.0(compatible;Kosmos/1.0;url)
9 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
74sf
25 magpierss.sf.nettext/..MagpieRSS/0.7x (url)
24 liferea.sf.net/text/..Liferea/0.x.x (Linux; en_US.UTF-8; url)
24 liferea.sf.net/text/..Liferea/1.x.x (Linux; es_ES.UTF-8; url)
65textdigger
65 textdigger.comtext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
64newsgator
24 www.newsgator.com/text/..FeedDemon/2.7 (url; Microsoft Windows XP)
23 www.newsgator.comtext/..NewsGatorOnline/2.0 (url; 1 subscribers)
15 www.newsgator.com/Individuals/NetNewsWire/-NetNewsWire/3.2.7 (Mac OS X; url; gzip-happy)
63emining
61 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
57dotnetdotcom
57 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
57moose
57 www.moose.at/about.phptext/..Mozilla/5.0 (compatible; Moose/1.2; Linux i686; de; url)
55weblio
51 www.weblio.jp/text/..Mozilla/5.0 (compatible; WeblioBot; url)
3 www.weblio.jp/info/crawler.jspimage/..Mozilla/5.0 (compatible; Webliobot/0.1; url)
54heartrails
27 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.9) Gecko/20100913 HeartRails_Capture/1.0.4 (url) Namoroka/3.6.9
27 capture.heartrails.com/image/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.9) Gecko/20100913 HeartRails_Capture/1.0.4 (url) Namoroka/3.6.9
54chug
54 crawler.chug.nettext/..Mozilla/5.0 (compatible; heritrix/3.0.0 url)
53Anonymouse
41 Anonymouse.org/text/..url (Unix)
12 Anonymouse.org/image/..url (Unix)
53bsurprised
39 bsurprised.com/text/..BSurprised WikiBox 0.1 (url)
12 bsurprised.com/text/..BSurprised WikiBox 0.1.3 (url)
52feedshow
26 www.feedshow.comtext/..FeedshowOnline (url)
26 www.feedshow.comtext/..Feedshow/x.0 (url; 1 subscriber)
51avantbrowser
25 www.avantbrowser.comtext/..Advanced Browser (url)
25 www.avantbrowser.comtext/..Avant Browser (url)
49www.
22 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
13 www.text/..GoogleBot-Image/1.0 ( urlGoogleBot.com/bot.html)
8 www.text/..Google - GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
4 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
48jetbrains
24 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 1.0.x (url)
24 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 2.0 Release Candidate 1 (url)
48FeedBurner
47 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
45mia
45 www.mia.am/bot/text/..Mozilla/5.0 (compatible; Miabot/1.0; url)
44cogitoergosum
44 cogitoergosum.co.cctext/..WordPress/MU; url
44yioop
43 www.yioop.com/bot.phptext/..Mozilla/5.0 (compatible; YioopBot url)
43discoveryengine
33 discoveryengine.com/discobot.htmltext/..Mozilla/5.0 (compatible; discobot/1.1; url
7 discoveryengine.com/discobot.htmlimage/..Mozilla/5.0 (compatible; discobot/1.1; url
3 discoveryengine.com/discobot.htmlapplication/oggMozilla/5.0 (compatible; discobot/1.1; url
37hatena
34 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
3 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
36spinn3r
30 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux x86_64; en-US; rv:1.9.0.19; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/2010040121 Firefox/3.0.19
4 spinn3r.com/robot-Mozilla/5.0 (X11; Linux x86_64; en-US; rv:1.9.0.19; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/2010040121 Firefox/3.0.19
31rcdtokyo
21 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
10 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
28github
26 github.com/pauldix/typhoeus/tree/mastertext/..Typhoeus - url
27tinyurl
27 tinyurl.com/64t5ntext/..Rome Client (url) Ver: 0.9
27it-influentials
27 search.it-influentials.com/bot.htmtext/..Mozilla/5.0 (compatible;FindITAnswersbot/1.0;url)
26zootycoon
26 www.zootycoon.comtext/..Zoo Tycoon 2 Client -- url
26winpodder
26 winpodder.comtext/..WinPodder (url)
26orcabrowser
26 www.orcabrowser.comtext/..Orca Browser (url)
26kula
26 kula.jp/endotext/..endo/1.0 (Mac OS X; ppc i386; url)
26ponderer
26 ponderer.org/download/annotate_google.user.jstext/..annotate_google; url
26seebot
26 seebot.orgtext/..Lynx/2.8 (;url)
25abonti
25 www.abonti.comtext/..Mozilla/5.0 (compatible; Abonti/0.91 - url)
25teesoft
9 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
6 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
25timewe
25 timewe.nettext/..CDR/1.7.1 Simulator/0.7(url) Profile/MIDP-1.0 Configuration/CLDC-1.0
25snarfware
25 www.snarfware.com/text/..Snarfer/0.x.x (url)
25rssbandit
25 www.rssbandit.orgtext/..RssBandit/1.5.0.10 (WinNT 5.1.2600.0; url) (WinNT 5.1.2600.0; )
24blogbridge
24 www.blogbridge.com/text/..BlogBridge 2.13 (url)
24zipcommander
24 www.zipcommander.com/text/..1st ZipCommander (Net) - url
24gigablast
24 www.gigablast.com/spider.htmltext/..Gigabot/3.0 (url)
24ranchero
24 ranchero.com/netnewswire/text/..NetNewsWire/2.x (Mac OS X; url)
24graemef
24 graemef.comtext/..NewsGator FetchLinks extension/0.2.0 (url)
24nemui
24 mozshot.nemui.org/text/..Mozilla/5.0 (Gecko/20070310 Mozshot/0.0.20070628; url)
23rssreader
23 www.rssreader.comtext/..RssReader/1.0.xx.x (url) Microsoft Windows NT 5.1.2600.0
23plagger
23 plagger.org/text/..Plagger/0.x.xx (url)
23oneriot
22 www.oneriot.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (url)
23feeds4all
23 www.feeds4all.com/feedzcollectortext/..FeedZcollector v1.x (Platinum) url
22puritysearch
22 www.puritysearch.net/text/..Mozilla/5.0 (compatible; Purebot/1.1; url)
22webperf
10 www.webperf.deimage/..webPerf/0.05 (url)
4 www.webperf.detext/..webPerf/0.05 (url)
3 www.webperf.deimage/..Firefox/3.6.10 webPerf/0.03 (url)
3 www.webperf.detext/..Firefox/3.6.10 webPerf/0.03 (url)
21fairshare
16 fairshare.cctext/..Mozilla/5.0 url (X11; FreeBSD i386; en-US; rv:1.2a) Gecko/20021021
4 fairshare.cctext/..Mozilla crawl/5.0 (compatible; fairshare.cc url)
21gulliway
16 www.gulliway.org/welcome.htmlapplication/xmlMozzila/5.0 (Windows NT 5.1; GulliwayBot/01 url)
5 www.gulliway.org/welcome.htmltext/..Mozzila/5.0 (Windows NT 5.1; GulliwayBot/01 url)
20alexa
20 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
19metamoji
19 www.metamoji.com/jp/crawler.htmltext/..Mozilla/5.0 (compatible; MetamojiCrawler/1.0; url
18froute
14 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
4 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
17archive-it
11 archive-it.org/files/site-owners.htmlimage/..Mozilla/5.0 (compatible;archive.org_bot; Archive-It; url) Firefox
6 archive-it.org/files/site-owners.htmltext/..Mozilla/5.0 (compatible;archive.org_bot; Archive-It; url) Firefox
17mixi
9 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
8 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
17holmes
17 holmes.getext/..HolmesBot (url)
17blx
17 www.blx.pl/crawlertext/..Mozilla/5.0 (compatible; heritrix/2.0.2 url)
16bloglines
8 www.bloglines.com-Bloglines/3.1 (url; 1 subscriber)
3 www.bloglines.comtext/..Bloglines/3.1 (url; 1 subscriber)
3 www.bloglines.comapplication/xmlBloglines/3.1 (url; 1 subscriber)
15turnitin
15 www.turnitin.com/robot/crawlerinfo.htmltext/..TurnitinBot/2.1 (url)
15topsy
15 labs.topsy.com/butterfly/text/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
15snap
15 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
14wise-guys
9 www.wise-guys.nl/text/..Mozilla/4.0 (compatible; Vagabondo/4.0/CGM; url)
3 webagent.wise-guys.nl/text/..Mozilla/4.0 (compatible; Vagabondo/4.0; mail address ; url; http://www.wise-guys.nl/)
14gramtrans
14 gramtrans.com/text/..GramTrans (url)
14js-kit
14 js-kit.com/text/..JS-Kit URL Resolver, url
14picsearch
10 www.picsearch.com/bot.htmltext/..psbot/0.1 (url)
4 www.picsearch.com/bot.htmlimage/..psbot/0.1 (url)
13chainn
11 www.chainn.com/mxbot.htmltext/..Mozilla/5.0 (compatible; mxbot/1.0; url)
13simplepie
8 simplepie.orgapplication/xmlSimplePie/1.2 (Feed Parser; url; Allow like Gecko) Build/20090627192103
4 simplepie.orgapplication/xmlSimplePie/1.1.1 (Feed Parser; url; Allow like Gecko) Build/20080315205903
13flipboard
4 flipboard.com/browserproxyimage/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (FlipboardBrowserProxy/0.0.5; url)
4 flipboard.com/browserproxytext/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (FlipboardBrowserProxy/1.1; url)
3 flipboard.com/crawlertext/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (url)
12seoprofiler
12 www.seoprofiler.com/bottext/..Mozilla/5.0 (compatible; spbot/2.1; url )
11globalspec
11 www.globalspec.com/Ocellitext/..Ocelli/1.4 (url)
11mediawiki
9 www.mediawiki.org/text/..MediaWiki OAI Harvester 0.2 (url)
11creativecommons
11 wiki.creativecommons.org/Metadata_Scrapertext/..CC Metadata Scaper url
10setooz
10 www.setooz.com/bot.htmltext/..Semantifire/0.20 ( compatible; SETOOZBOT/0.30 ; url ; mail address )
10memidex
10 www.memidex.com/_bottext/..Mozilla/5.0 (compatible; Memibot/1.0; url )
10example
10 example.comtext/..Phonetics (url)
10search
10 www.search.ch/rim.htmltext/..UltraSpider3000/1.0 (url)
10justsystems
10 www.justsystems.com/jp/tech/crawler/text/..JUST-CRAWLER(url)
10rockpeaks
10 www.rockpeaks.com/contacttext/..RockPeaks/0.1 (url)
10vbseo
10 www.vbseo.comtext/..Mozilla/4.0 (vBSEO; url)
48,961total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
3,425PythonWikipediaBot/1.0
2,398 application/json
867 application/xml
159 text/..
1 image/..
1 -
989GoogleBot-Image/1.0
467 text/..
267 image/..
255 -
576ClueBot/1.1
435 application/vnd.php.serialized
141 text/..
533Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
532 text/..
1 -
1 application/vnd.php.serialized
310LinkParser/2.0
310 text/..
277Answersbot
277 text/..
266php wikibot classes
259 application/vnd.php.serialized
7 text/..
1 -
241Onespot Crawler
177 application/json
64 text/..
1 -
211Peachy MediaWiki Bot API Version 1.0
205 application/vnd.php.serialized
6 text/..
208wikiwix-bot-3.0
183 text/..
25 image/..
1 -
176spider
175 text/..
1 application/yaml
1 image/..
159GoogleBot-Image/1.0
146 text/..
13 image/..
1 -
1 application/xml
125GoogleBot-News
125 text/..
1 -
121Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
62 text/..
58 image/..
1 application/x-javascript
111Crawler 0.1
104 text/..
7 image/..
1 application/ogg
77Pywikipediabot/2.0
77 application/json
66MoovidaBot/0.1
66 text/..
64SiocWikiBot/1.0
62 application/vnd.php.serialized
2 text/..
63plantspedia data crawler
63 text/..
60PywikiBot 1.0 mail address
60 text/..
56crawler mail address
56 text/..
54 mail address (Mozilla compatible)
54 text/..
51Test Webbot
51 text/..
1 -
50Mozilla/5.0 (compatible; 3F/ALL-PLA.NET webcrawler)
50 text/..
1 -
50Opera/8.01 (J2ME/MIDP; MXit WebBot/1.1.5.0) Opera Mini/3.1
24 image/..
23 application/vnd.wap.xhtml+xml
3 text/..
1 application/x-httpd-php
44AnomieBOT 1.0 (ReplaceExternalLinks2)
44 application/json
42BHSEOs.com Research Bot
42 text/..
41CorenSearchBot/1.5 en libwww-perl/5.834
41 text/..
41DotNetWikiBot/2.81 (Microsoft Windows NT 6.1.7600.0; )
32 text/..
5 application/xml
4 image/..
1 application/ogg
40Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
26 image/..
14 text/..
1 application/x-javascript
38ibo2bot
38 text/..
38SineBot/1.5.17(User:SineBot)
37 application/vnd.php.serialized
1 text/..
1 -
36dicbot 1.0
36 text/..
34MLBot (www.metadatalabs.com/mlbot)
34 text/..
32infraEnterprise v8 Web Crawler
31 -
1 text/..
32GoogleBot
32 text/..
1 image/..
30TheKeens bot
30 text/..
26Mozilla/5.0 (compatible; SnapPreviewBot; en-US; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9
26 text/..
25Mozilla/5.0 (X11; Linux i686; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 SnapPreviewBot
25 text/..
1 -
25MSR WolframAlpha Bot
19 text/..
6 image/..
25FAST Search Web Crawler 14.0.0291.0000
25 text/..
25MediaWiki::Bot/3.1.6 (User:SporkBot)
25 application/json
24UCMore Crawler App
24 text/..
24AnomieBOT 1.0 (OrphanReferenceFixer)
24 application/json
21musiccrawl/1.0
21 text/..
21COIBot/2.0
21 text/..
20('python-wikitools/1.2 (User:BernsteinBot)',)
20 application/json
20MystBot/1.5 fr libwww-perl/5.836
20 text/..
20DotNetWikiBot/2.95 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
19 text/..
1 application/xml
20COIBot/1.00
20 text/..
1 -
19AnomieBOT 1.0 (TemplateSubster)
19 application/json
18VWBot - CorenSearchBot/1.5 en derivative
18 text/..
17HTMLParser/1.6
17 text/..
16DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
16 text/..
1 application/xml
15AarghBot Linux
15 text/..
1 -
15Spider
15 text/..
1 image/..
14DotNetWikiBot/2.94 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
12 text/..
2 application/xml
12~Bot ([[:fr:w:User:TildeBot]] by [[:fr:w:User:Alphos]] mail address )
12 text/..
12ZanranCrawler/0.3 ( mail address )
12 text/..
12NATE.ROBOT Mozilla/5.0 (Windows; Windows NT 5.1; en-US) AppleWebKit/533.4 KHTML Chrome/5.0.375.125 Safari/533.4
12 text/..
11Bot/WP/EN/Daniel/MediationBot1/1.2
11 text/..
11Opera/8.01 (J2ME/MIDP; MXit WebBot/1.1.4.0) Opera Mini/3.1
5 application/vnd.wap.xhtml+xml
5 image/..
1 text/..
11Twitterbot/0.1
11 text/..
1 image/..
10Mozilla/5.0 (iPhone; CPU iPhone OS 4_0_1 like Mac OS X; fr-fr) OrangeBot-Mobile AppleWebKit/532.9 (KHTML
10 image/..
1 text/..
10ClueBot/2.0
10 application/vnd.php.serialized
9HTMLParser/2.0
8 text/..
1 -
9SurakWare MediaWiki Bot/1.0
9 text/..
1 application/xml
9Tawbot (public svn release; plwiki)
9 text/..
9TVersity Media Robot
9 text/..
1 -
8Teoma/Nutch-1.0 ( Question and Answer Search; mail address )
8 text/..
8Mozilla/4.0 (compatible; MT search portal spider/3.0; mail address )"
8 application/xml
1 text/..
8Mozilla/5.0 (compatible; Windows NT 6.0) Gecko/20090624 Firefox/3.5 NjuiceBot
8 text/..
1 image/..
8taxobot; mail address
8 text/..
1 application/json
7XLinkBot/1.00
7 text/..
7Teoma/Nutch-1.0 ( Question and Answer Search; afarm.com; mail address )
7 text/..
1 application/xml
7Citation_bot; mail address
7 text/..
7GLASS-Bot/Nutch-1.1
7 text/..
7DUI Research Bot
7 text/..
6CrowdsourcingBot/1.0
6 text/..
6Geni ircpybot 1.0
4 text/..
2 application/json
1 application/xml
6AnomieBOT 1.0 (SourceUploader)
6 application/json
1 text/..
6LPbot/Nutch-1.1
6 text/..
1 image/..
6HLTC-HKUST Research Bot 0.1 - E. Prochasson
4 application/json
2 text/..
6Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; en-US; rv:1.9.2.8; flipboard.com/crawler rv:0.0.5) Gecko Firefox
4 image/..
2 text/..
6MSR-ISRCCrawler
3 image/..
3 text/..
54am-spider/1.0
5 text/..
5DotNetWikiBot/2.95 (Microsoft Windows NT 5.1.2600 Service Pack 2; )
5 text/..
5Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
5 text/..
5Freebase Deathbot
5 text/..
5Mozilla/5.0 (Bgbot 0.5)
5 text/..
5('python-wikitools/1.2 (User:LaraBot)',)
5 application/json
4.NET Client Parser
4 application/xml
1 text/..
4core-I-bot/1.0
4 text/..
1 image/..
4Twib::Crawler
3 text/..
1 image/..
4AnomieBOT 1.0 (RandomPagePicker)
4 application/json
4python-wikitools/1.2 (User:Mr.Z-bot)
4 application/json
4COMODOspider/Nutch-1.0
4 text/..
1 image/..
1 application/ogg
4AnomieBOT 1.0 (AFDMergeFromCleaner)
4 application/json
4DotNetWikiBot/2.9 (Unix 2.6.26.2; )
4 text/..
1 application/xml
4AnomieBOT 1.0 (DeletionSortingCleaner)
4 application/json
3Anomebot v2.0
2 application/json
1 text/..
3bitlybot
3 text/..
1 image/..
3LinkToThere Research Bot
2 text/..
1 -
3TrueKnowledgeBot bot mail address >
2 application/xml
1 application/vnd.php.serialized
3AnomieBOT 1.0 (AltLinkTemplateSubster)
3 application/json
3PicselSpider/1.0
3 text/..
3Mozilla/5.0 (X11; Linux x86_64; de-DE; rv:1.9.0.19) Gecko/2010062510 ThumbShotsBot (KFSW 3.0.6-3)
2 image/..
1 text/..
1 application/x-javascript
3KumulBot/0.1
3 application/vnd.php.serialized
3areabot/1.0
3 text/..
3DotNetWikiBot/2.94 (Microsoft Windows NT 6.1.7600.0; )
3 text/..
3HBC Archive Indexerbot 0.9a
3 text/..
3gulseren_spider/Nutch-1.1
3 text/..
3Mozilla/5.0 (iPhone; CPU iPhone OS 4_0_1 like Mac OS X; fr-fr) OrangeBot-Mobile AppleWebKit/532.9 KHTML Version/4.0.5 Mobile/8A306 Safari/( mail address )
3 text/..
3FAST Enterprise Crawler 6 used by Viacom (DEV)
3 text/..
3DownloadSpider/5.1
2 image/..
1 text/..
3Xaldon WebSpider 2.7.b6
3 text/..
1 image/..
3IssueCrawler
3 text/..
3Opera/9.80 (J2ME/MIDP; Opera Mini/5.0 (iPhone; CPU iPhone 0S 3.0 like Mac 0S X; en-us; compatible; GoogleBot/21.529; U; en) Presto/2.5.25 Version/10.54
2 image/..
1 text/..
9,533total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Tue, Jan 11, 2011 6:23
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.