Wikimedia Traffic Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Aug 2010 - 31 Aug 2010

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Op.Sys. / Browsers / Google

The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 45,711,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 351,393,000 external requests, which is 13.0%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
15,346yahoo
14,719 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
161 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
145 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
76 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
55 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
42 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
25 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
19 developer.yahoo.com/searchmonkey/useragentimage/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
18 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmlimage/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
17 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
17 help.yahoo.com/help/us/ysearch/slurpapplication/xmlMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
16 help.yahoo.com/help/us/ysearch/crawling/crawling-01.htmltext/..Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; url)
13 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
13 help.yahoo.com/help/us/ysearch/slurpapplication/vnd.php.serializedMozilla/5.0 (compatible Yahoo! Slurp/3.0 url)
3 misc.yahoo.com.cn/help.html-Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
12,741google
10,255 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
511 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
455 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
257 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
212 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
167 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
145 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
124 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
60 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
60 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
50 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
47 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
37 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
34 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid: wikipedia-raw)
33 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
33 www.google.com/bot.htmlimage/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
29 code.google.com/appenginetext/..AppEngine-Google; (url; appid: npiv82)
28 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
27 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
17 www.google.com/bot.htmlimage/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
17 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
12 www.google.com/feedfetcher.htmlimage/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
12 code.google.com/appenginetext/..AppEngine-Google; (url; appid: finchproxy)
10 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
9 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
8 code.google.com/appenginetext/..AppEngine-Google; (url; appid: nwikiproxy)
7 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
7 code.google.com/appengineapplication/jsonPython-urllib/2.5 AppEngine-Google; (url; appid: loeschmonitor)
6 code.google.com/appenginetext/..oohEmbed.com AppEngine-Google; (url; appid: oohembed)
6 code.google.com/appengineapplication/jsonMWBOT GAE Edition AppEngine-Google; (url; appid: philip-bot)
5 www.google.com/feedfetcher.htmlimage/..FeedFetcher-Google; (url)
5 docs.google.comimage/..Mozilla/5.0 (compatible; GoogleDocs; url)
5 code.google.com/appenginetext/..AppEngine-Google; (url; appid: boxapp)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: honey224test3)
3 code.google.com/p/crawler4j/text/..crawler4j (url)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: linksalpha)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: dustbunnytycoonmonitor)
3 www.google.comtext/..Mozilla/5.0 (compatible; heritrix/2.0.0 url)
3 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
3 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid: nwikiproxy)
3 www.google.com/coop/cse/creftext/..FeedFetcher-Google-CoOp; (url)
6,268facebook
4,251 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
1,837 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
129 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.1 (url)
25 www.facebook.com/externalhit_uatext.php-facebookexternalhit/1.0 (url)
17 developers.facebook.comimage/..facebookplatform/1.0 (url)
6 developers.facebook.comtext/..facebookplatform/1.0 (url)
3 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.1 (url)
2,950msn
1,809 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
793 search.msn.com/msnbot.htm-msnbot/2.0b (url)
116 search.msn.com/msnbot.htmtext/..msnbot-media/1.1 (url)
108 search.msn.com/msnbot.htmimage/..msnbot-media/1.1 (url)
74 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)._
18 search.msn.com/msnbot.htmtext/..msnbot-Products/1.0 (url)
17 search.msn.com/msnbot.htmimage/..msnbot/2.0b (url)
5 search.msn.com/msnbot.htmtext/..msnbot-UDiscovery/2.0b (url)
2,724google?
2,551 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
44 www.google.com/bot.htmlapplication/vnd.php.serializedMozilla/5.0 (compatible; GoogleBot/2.1; url)
39 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
25 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
14 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
13 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
10 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
7 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
6 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
3 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
3 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url) Mozilla/5.0 (compatible; GoogleBot/2.1; http://www.google.com/bot.html) VIA SPRACI WAP TRANSLATOR
3 www.google.com/bot.htmltext/..User-Agent :DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
1,608naver
1,514 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
50 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
33 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
10 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
1,049baidu
525 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
463 www.baidu.jp/spider/text/..Baiduspider(url)
20 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
15 www.baidu.jp/spider/text/..BaiduImagespider(url)
9 www.baidu.jp/spider/-Baiduspider(url)
5 www.baidu.com/search/spider.htm-Baiduspider(url)
5 www.baidu.jp/spider/application/xmlBaiduspider(url)
822yandex
670 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexBot/3.0; url)
85 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexImages/3.0; url)
29 yandex.com/botsimage/..Mozilla/5.0 (compatible; YandexImages/3.0; url)
12 yandex.com/bots-Mozilla/5.0 (compatible; YandexBot/3.0; url)
8 yandex.com/botsimage/..Mozilla/5.0 (compatible; YandexImageResizer/2.0; url)
7 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexBot/3.0; MirrorDetector; url)
3 yandex.com/bots-Mozilla/5.0 (compatible; YandexImages/3.0; url)
782ask
573 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
199 about.ask.com/en/docs/about/webmasters.shtml-Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
5 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url) (via Web-Blaster/2.21 (http://www.assoziations-blaster.de/web-blast.html))
3 about.ask.com/en/docs/about/webmasters.shtmlimage/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
728soso
696 help.soso.com/webspider.htmtext/..Sosospider(url)
23 help.soso.com/webspider.htm-Sosospider(url)
7 help.soso.com/soso-blog-spider.htmtext/..Sosoblogspider(url)
615mnemoo
464 www.mnemoo.com/about/spidertext/..Mnemoo WikiSearch Spider/0.1alpha (compatible; See url)
150 www.mnemoo.com/about/spidertext/..Mnemoo Intelligent Spider/0.1alpha (compatible; See url)
472exabot
260 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
200 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); url)
11 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
370youdao
325 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
17 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
16 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible;YodaoBot-Image/1.0;url;)
8 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
368wikipedia
201 en.wikipedia.org/wiki/Web_crawlertext/..GoogleBot/Nutch-1.0 (Prototype; url; mail address )
115 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.6 url
22 en.wikipedia.org/wiki/User:NicoV/Wikipedia_Cleaner/Documentationtext/..WikiCleaner (url)
14 wikipedia.orgtext/..Mozilla/5.0 (compatible; heritrix/1.14.4 url)
11 en.wikipedia.orgtext/..url
330php
170 pear.php.net/text/..PEAR HTTP_Request class ( url )
65 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
35 pear.php.net/application/jsonPEAR HTTP_Request class ( url )
27 pear.php.net/application/xmlPEAR HTTP_Request class ( url )
27 pear.php.net/package/http_request2text/..HTTP_Request2/0.5.1 (url) PHP/5.2.13
3 pear.php.net/package/http_request2text/..HTTP_Request2/0.5.2 (url) PHP/5.2.6-3ubuntu4.5
310traslated
310 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
288scoutjet
288 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
284pipl
284 www.pipl.com/bot/text/..Mozilla/5.0(compatible;PiplBot;url)
264sogou
252 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
9 www.sogou.com/docs/help/webmasters.htm#07image/..Sogou Pic Spider/3.0(url)
191entireweb
185 www.entireweb.com/about/search_tech/speedy_spider/text/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
4 www.entireweb.com/about/search_tech/speedy_spider/-Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
189majestic12
174 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.3; url)
12 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.2; url)
187yacy
46 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-24-generic; java 1.6.0_18; Europe/en) url
14 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.30-gentoo-r6-090907; java 1.6.0_17; GMT/de) url
12 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-amd64; java 1.6.0_20; Europe/en) url
7 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-24-generic; java 1.6.0_18; Europe/de) url
7 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-24-server; java 1.6.0_18; Europe/cs) url
7 yacy.net/bot.htmltext/..yacybot (amd64 Windows 7 6.1; java 1.6.0_21; Europe/fr) url
6 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.32-21-generic; java 1.6.0_18; America/en) url
6 yacy.net/bot.htmltext/..yacybot (x86_64 Mac OS X 10.6.4; java 1.6.0_20; Europe/de) url
5 yacy.net/bot.htmltext/..yacybot (x86 Windows Vista 6.0; java 1.6.0_21; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-11-rt; java 1.6.0_18; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-194.8.1.el5; java 1.6.0_0; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-24-server; java 1.6.0_18; America/en) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.9-5.ELsmp; java 1.6.0_12-rev; Asia/en) url
4 yacy.net/bot.htmltext/..yacybot (x86 Windows Vista 6.0; java 1.6.0_20; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 FreeBSD 8.0-RELEASE-p3; java 1.6.0_07; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-21-server; java 1.6.0_0; Asia/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-amd64; java 1.6.0_0; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.35; java 1.6.0_18; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.32-24-generic; java 1.6.0_18; Europe/en) url
183waw
162 dubi.itinfo.waw.plimage/..WordPress/2.8.6; url
21 gienia.itinfo.waw.plimage/..WordPress/2.8.6; url
165semager
164 www.semager.de/blog/semager-bots/text/..Mozilla/5.0 (compatible; Semager/1.4; url)
152sblog
95 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
31 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
25 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
144toolserver
93 wiki.toolserver.org/view/GeoHacktext/..Geohack (url)
40 toolserver.org/~bayo/text/..LudoThecaire/1.0 (url)
3 toolserver.org/~dispenser/text/..WebWikipedia Python/2.6 (url)
3 toolserver.org/~para/cgi-bin/kmlexporttext/..url libwww-perl/5.835
3 toolserver.org/~guandalug/application/vnd.php.serializedGuandalugs PHPWikiBot/1.1 (url;de:User:Guandalug)
143wordpress
42 navanavonmilita.wordpress.comtext/..WordPress/MU; url
15 josefboberg.wordpress.comtext/..WordPress/MU; url
10 support.wordpress.com/contact/text/..WordPress.com mShots; url
6 zosotruthtalk.wordpress.comtext/..WordPress/MU; url
5 benabb.wordpress.comtext/..WordPress/MU; url
4 ramichael.wordpress.comtext/..WordPress/MU; url
3 tgbp.wordpress.comtext/..WordPress/MU; url
3 spacebarshift.wordpress.comtext/..WordPress/MU; url
125wikimedia
123 tools.wikimedia.de/~daniel/text/..WikiSense (url)
116goo
110 help.goo.ne.jp/contact/text/..goo wikipedia (url)
114justsystems
113 www.justsystems.com/jp/tech/crawler/text/..JUST-CRAWLER(url)
104daum
103 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
89kosmix
70 www.kosmix.com/html/kosmos.htmlapplication/xmlMozilla/5.0(compatible;Kosmos/1.0;url)
19 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
79emining
77 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
73textdigger
73 textdigger.comtext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
71sf
24 liferea.sf.net/text/..Liferea/1.x.x (Linux; es_ES.UTF-8; url)
23 liferea.sf.net/text/..Liferea/0.x.x (Linux; en_US.UTF-8; url)
23 magpierss.sf.nettext/..MagpieRSS/0.7x (url)
67freebase
66 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
67FeedBurner
66 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
57newsgator
23 www.newsgator.comtext/..NewsGatorOnline/2.0 (url; 1 subscribers)
22 www.newsgator.com/text/..FeedDemon/2.7 (url; Microsoft Windows XP)
12 www.newsgator.com/Individuals/NetNewsWire/-NetNewsWire/3.2.7 (Mac OS X; url; gzip-happy)
54github
29 github.com/pauldix/typhoeus/tree/mastertext/..Typhoeus - url
22 github.com/edsu/linkypediaapplication/jsonlinkpyediabot v0.1: url
48jetbrains
25 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 2.0 Release Candidate 1 (url)
23 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 1.0.x (url)
47avantbrowser
23 www.avantbrowser.comtext/..Advanced Browser (url)
23 www.avantbrowser.comtext/..Avant Browser (url)
44spinn3r
41 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux x86_64; en-US; rv:1.9.0.19; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/2010040121 Firefox/3.0.19
44feedshow
22 www.feedshow.comtext/..FeedshowOnline (url)
22 www.feedshow.comtext/..Feedshow/x.0 (url; 1 subscriber)
42www.
16 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
9 www.text/..GoogleBot-Image/1.0 ( urlGoogleBot.com/bot.html)
9 www.text/..Google - GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
6 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
40cogitoergosum
39 cogitoergosum.co.cctext/..WordPress/MU; url
39rcdtokyo
30 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
6 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
3 www.rcdtokyo.com/pc2m/application/pdfMozilla/5.0 (compatible; PEAR HTTP_Request class; url)
39oneriot
38 www.oneriot.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (url)
39heartrails
16 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.8) Gecko/20100730 HeartRails_Capture/1.0.4 (url) Namoroka/3.6.8
13 capture.heartrails.com/image/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.8) Gecko/20100730 HeartRails_Capture/1.0.4 (url) Namoroka/3.6.8
4 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.8) Gecko/20100730 HeartRails_Capture/1.0.3 (url) Namoroka/3.6.8
4 capture.heartrails.com/image/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.8) Gecko/20100730 HeartRails_Capture/1.0.3 (url) Namoroka/3.6.8
36Anonymouse
20 Anonymouse.org/text/..url (Unix)
16 Anonymouse.org/image/..url (Unix)
35chug
35 crawler.chug.nettext/..Mozilla/5.0 (compatible; heritrix/3.0.0 url)
34teesoft
11 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
8 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
34dotnetdotcom
34 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
34hatena
29 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
5 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
31baypup
31 beta.baypup.com/about/crawlertext/..baypup/Baypup-1.1 (Baypup Search Engine; url; mail address )
26tinyurl
26 tinyurl.com/64t5ntext/..Rome Client (url) Ver: 0.9
26puritysearch
26 www.puritysearch.net/text/..Mozilla/5.0 (compatible; Purebot/1.1; url)
26graemef
26 graemef.comtext/..NewsGator FetchLinks extension/0.2.0 (url)
26snap
15 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
11 www.snap.comtext/..Snapbot/1.0 (url)
25abonti
19 www.abonti.comtext/..Mozilla/5.0 (compatible; Abonti/0.9 - url)
6 www.abonti.comtext/..Mozilla/5.0 (compatible; Abonti/0.91 - url)
24seoprofiler
24 www.seoprofiler.com/bottext/..Mozilla/5.0 (compatible; spbot/2.0.4; url )
24zootycoon
24 www.zootycoon.comtext/..Zoo Tycoon 2 Client -- url
24winpodder
24 winpodder.comtext/..WinPodder (url)
24orcabrowser
24 www.orcabrowser.comtext/..Orca Browser (url)
24plagger
24 plagger.org/text/..Plagger/0.x.xx (url)
24alexa
24 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
24ponderer
24 ponderer.org/download/annotate_google.user.jstext/..annotate_google; url
24seebot
24 seebot.orgtext/..Lynx/2.8 (;url)
23blogbridge
23 www.blogbridge.com/text/..BlogBridge 2.13 (url)
23rssreader
23 www.rssreader.comtext/..RssReader/1.0.xx.x (url) Microsoft Windows NT 5.1.2600.0
23zipcommander
23 www.zipcommander.com/text/..1st ZipCommander (Net) - url
23timewe
23 timewe.nettext/..CDR/1.7.1 Simulator/0.7(url) Profile/MIDP-1.0 Configuration/CLDC-1.0
23snarfware
23 www.snarfware.com/text/..Snarfer/0.x.x (url)
23rssbandit
23 www.rssbandit.orgtext/..RssBandit/1.5.0.10 (WinNT 5.1.2600.0; url) (WinNT 5.1.2600.0; )
23kula
23 kula.jp/endotext/..endo/1.0 (Mac OS X; ppc i386; url)
23edu
14 gais.cs.ccu.edu.tw/robot.phptext/..Gaisbot/3.0( mail address ;url)
5 ws.nju.edu.cn/falcons/text/..Mozilla/5.0 (compatible; Falconsbot; url)
4 master.csie.ntu.edu.tw/searchtext/..BEEGOL Bot/Nutch-1.1 (url; mail address )
23it-influentials
23 search.it-influentials.com/bot.htmtext/..Mozilla/5.0 (compatible;FindITAnswersbot/1.0;url)
23nemui
23 mozshot.nemui.org/text/..Mozilla/5.0 (Gecko/20070310 Mozshot/0.0.20070628; url)
23feeds4all
23 www.feeds4all.com/feedzcollectortext/..FeedZcollector v1.x (Platinum) url
22setooz
17 www.setooz.com/bot.htmltext/..Mozilla/5.0 ( compatible; SETOOZBOT/0.30 ; url ; mail address )
3 www.setooz.com/bot.htmltext/..Mozilla/5.0 ( compatible; SETOOZBOT/0.30 ; url )
22chainn
19 www.chainn.com/mxbot.htmltext/..Mozilla/5.0 (compatible; mxbot/1.0; url)
3 www.chainn.com/mxbot.htmlimage/..Mozilla/5.0 (compatible; mxbot/1.0; url)
22wise-guys
19 www.wise-guys.nl/text/..Mozilla/4.0 (compatible; Vagabondo/4.0/CGM; url)
22ranchero
22 ranchero.com/netnewswire/text/..NetNewsWire/2.x (Mac OS X; url)
20gigablast
20 www.gigablast.com/spider.htmltext/..Gigabot/3.0 (url)
20elte
20 nipg.inf.elte.hutext/..Mozilla/5.0 (compatible; heritrix/1.14.3 url)
19weblio
18 www.weblio.jp/text/..Mozilla/5.0 (compatible; WeblioBot; url)
18picmole
18 www.picmole.comtext/..Mozilla/5.0 (compatible;picmole/1.0 url)
18mixi
10 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
8 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
17froute
13 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
4 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
17bloglines
9 www.bloglines.com-Bloglines/3.1 (url; 1 subscriber)
3 www.bloglines.comtext/..Bloglines/3.1 (url; 1 subscriber)
3 www.bloglines.comapplication/xmlBloglines/3.1 (url; 1 subscriber)
17gramtrans
17 gramtrans.com/text/..GramTrans (url)
1780legs
15 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
16mp3realm
16 mp3realm.org/mp3bot/text/..Mozilla/5.0 (compatible; Mp3Bot/0.7; url)
16topsy
16 labs.topsy.com/butterfly/text/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
16mediawiki
14 www.mediawiki.org/wiki/Extension:XMLRCtext/..rc2udp.py (url) Python-urllib/1.17
16holmes
15 holmes.getext/..HolmesBot (url)
14js-kit
14 js-kit.com/text/..JS-Kit URL Resolver, url
13fairshare
6 fairshare.cctext/..Mozilla/5.0 url (X11; FreeBSD i386; en-US; rv:1.2a) Gecko/20021021
5 fairshare.cctext/..Mozilla crawl/5.0 (compatible; fairshare.cc url)
13knowmore
13 knowmore.com/botstext/..Mozilla/5.0 (compatible; kmbot-62c5/0.0; url)
13archive-it
9 archive-it.org/files/site-owners.htmlimage/..Mozilla/5.0 (compatible;archive.org_bot; Archive-It; url) Firefox
4 archive-it.org/files/site-owners.htmltext/..Mozilla/5.0 (compatible;archive.org_bot; Archive-It; url) Firefox
13vbseo
13 www.vbseo.comtext/..Mozilla/4.0 (vBSEO; url)
12bitvo
8 www.bitvo.comimage/..BitvoUserAgent (url) Bitvo/0.9b
4 www.bitvo.comtext/..BitvoUserAgent (url) Bitvo/0.9b
12bnf
7 www.bnf.fr/fr/outils/a.dl_web_capture_robot.htmlimage/..Mozilla/5.0 (compatible; bnf.fr_bot; url)
3 www.bnf.fr/fr/outils/a.dl_web_capture_robot.htmltext/..Mozilla/5.0 (compatible; bnf.fr_bot; url)
11simplepie
6 simplepie.orgapplication/xmlSimplePie/1.2 (Feed Parser; url; Allow like Gecko) Build/20090627192103
3 simplepie.orgapplication/xmlSimplePie/1.1.1 (Feed Parser; url; Allow like Gecko) Build/20080315205903
11bin-co
10 www.bin-co.com/php/scripts/load/text/..BinGet/1.00.A (url)
11ayna
11 www.ayna.comtext/..Mozilla/5.0 (compatible; ayna-crawler url)
11z-add
10 w3.z-add.co.uk/linkcheck/text/..Z-Add Link Checker (url)
11picsearch
8 www.picsearch.com/bot.htmltext/..psbot/0.1 (url)
3 www.picsearch.com/bot.htmlimage/..psbot/0.1 (url)
10ibm
10 domino.research.ibm.com/comm/research_projects.nsf/pages/sai-crawler.callingcard.htmltext/..url
52,328total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
3,389PythonWikipediaBot/1.0
2,364 application/json
911 application/xml
112 text/..
2 image/..
1 -
1 application/ogg
1,533ClueBot/1.1
1,213 application/vnd.php.serialized
320 text/..
1 -
1,309GoogleBot-Image/1.0
568 text/..
401 -
340 image/..
1 application/pdf
572ExactusBot-v0.1
572 text/..
1 -
473LinkParser/2.0
473 text/..
407Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
407 text/..
1 -
1 application/ogg
388Onespot Crawler
268 application/json
120 text/..
286php wikibot classes
235 application/vnd.php.serialized
50 text/..
1 application/xml
1 -
262Answersbot
262 text/..
220wikiwix-bot-3.0
214 text/..
6 image/..
1 -
178spider
170 text/..
7 application/json
1 application/xml
1 image/..
164GoogleBot-Image/1.0
155 text/..
9 image/..
1 -
161Peachy MediaWiki Bot API Version 0.1beta
155 application/vnd.php.serialized
6 text/..
141SiocWikiBot/1.0
137 application/vnd.php.serialized
4 text/..
1 -
113Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
57 text/..
54 image/..
2 application/x-javascript
1 application/json
1 application/vnd.php.serialized
105GoogleBot-News
104 text/..
1 -
1 image/..
1 application/xml
98MLBot (www.metadatalabs.com/mlbot)
98 text/..
1 -
1 image/..
1 application/vnd.php.serialized
97AarghBot Linux
97 text/..
1 -
76crawler mail address
76 text/..
74DotNetWikiBot/2.81 (Microsoft Windows NT 6.1.7600.0; )
59 text/..
9 application/xml
6 image/..
1 application/ogg
1 audio/midi
74dicbot 1.0
74 text/..
73Mozilla 5.0 (Apibot 0.20)
73 application/vnd.php.serialized
73Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
43 image/..
29 text/..
1 application/x-javascript
1 application/json
68TheKeens bot
68 text/..
1 -
60ZanranCrawler/0.3 ( mail address )
60 text/..
54CorenSearchBot/1.5 en libwww-perl/5.834
54 text/..
54HTMLParser/1.6
48 text/..
6 application/json
53Pywikipediabot/2.0
53 application/json
37Test Webbot
37 text/..
36DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
35 text/..
1 application/xml
36plantspedia data crawler
36 text/..
33infraEnterprise v8 Web Crawler
33 text/..
31MoovidaBot/0.1
31 text/..
30Opera/8.01 (J2ME/MIDP; MXit WebBot/1.1.3.0) Opera Mini/3.1
15 image/..
13 application/vnd.wap.xhtml+xml
2 text/..
28Mozilla/5.0 QunarBot/1.0
28 text/..
28PywikiBot 1.0 mail address
28 text/..
27Opera/8.01 (J2ME/MIDP; MXit WebBot/1.1.2.0) Opera Mini/3.1
14 image/..
11 application/vnd.wap.xhtml+xml
2 text/..
26Jyxobot/1
26 text/..
1 application/vnd.php.serialized
25GoogleBot
25 text/..
1 image/..
24UCMore Crawler App
24 text/..
1 -
23Mozilla/5.0 (X11; Linux i686; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 SnapPreviewBot
23 text/..
23Mozilla/5.0 (compatible; SnapPreviewBot; en-US; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9
23 text/..
1 -
22COIBot/1.00
22 text/..
1 -
21AnomieBOT 1.0 (OrphanReferenceFixer)
21 application/json
21SineBot/1.5.17(User:SineBot)
20 application/vnd.php.serialized
1 text/..
19MystBot/1.5 fr libwww-perl/5.836
19 text/..
18HRoestBot, de-wikipedia using pywikipedia framework
15 application/xml
2 text/..
1 application/json
16SineBot/1.5.16(User:SineBot)
15 application/vnd.php.serialized
1 text/..
16super happy fun geotagged articles crawler
16 application/json
16SurakWare MediaWiki Bot/1.0
16 text/..
1 application/xml
14dictionary-bot
11 application/xml
3 text/..
14Casper Bot Search
14 text/..
13('python-wikitools/1.2 (User:BernsteinBot)',)
13 application/json
12~Bot ([[:fr:w:User:TildeBot]] by [[:fr:w:User:Alphos]] mail address )
12 text/..
12COIBot/2.0
12 text/..
12NATE.ROBOT Mozilla/5.0 (Windows; Windows NT 5.1; en-US) AppleWebKit/533.4 KHTML Chrome/5.0.375.125 Safari/533.4
12 text/..
11Bot/WP/EN/Daniel/MediationBot1/1.2
11 text/..
11Twitterbot/0.1
11 text/..
1 -
1 image/..
10XLinkBot/1.00
10 text/..
10Peachy MediaWiki Bot API Version 1.0
10 application/vnd.php.serialized
1 text/..
9plaNETWORK Bot Search
9 text/..
8Bub's wikibot (Wikibot/2010040100; JWBF/1.2; Java/1.6)
8 text/..
8CaBot Script (running on nightshade.toolserver.org)
8 application/vnd.php.serialized
8HTMLParser/2.0
8 text/..
1 -
8Citation_bot; mail address
8 text/..
7ResCompSpider/Nutch-1.1
7 text/..
1 application/ogg
6Twib::Crawler/0.02
5 text/..
1 image/..
6Tawbot (public svn release; plwiki)
6 text/..
6Geni ircpybot 1.0
4 text/..
2 application/json
1 application/xml
6TVersity Media Robot
6 text/..
6Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; en-US; rv:1.9.2.8; flipboard.com/crawler rv:0.0.5) Gecko Firefox
6 image/..
5PicselSpider/1.0
5 text/..
5Jbot
5 text/..
1 -
5GNAA-bot
5 text/..
5msramlbot
5 text/..
5Freebase Deathbot
5 text/..
5AnomieBOT 1.0 (SourceUploader)
5 application/json
5MSR-ISRCCrawler
5 text/..
1 application/x-javascript
1 image/..
5Mozilla/5.0 (Bgbot 0.5)
5 text/..
5('python-wikitools/1.2 (User:LaraBot)',)
5 application/json
4bitlybot
4 text/..
1 image/..
4kmccrew Bot Search
4 text/..
4Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
4 text/..
4DotNetWikiBot/2.9 (Unix 2.6.26.2; )
4 text/..
4Opera/9.80 (J2ME/MIDP; Opera Mini/5.0 (iPhone; CPU iPhone 0S 3.0 like Mac 0S X; en-us; compatible; GoogleBot/19.892; U; en) Presto/2.5.25
2 image/..
2 text/..
1 application/x-javascript
4Mozilla/5.0 (compatible; Windows NT 6.0) Gecko/20090624 Firefox/3.5 NjuiceBot
4 text/..
1 image/..
4IScraperBot/0.1 Mozilla/5.0
4 application/xml
1 text/..
4unblockbot/1.00
4 text/..
4AnomieBOT 1.0 (DeletionSortingCleaner)
4 application/json
3Opera/9.80 (J2ME/MIDP; Opera Mini/5.0 (iPhone; CPU iPhone 0S 3.0 like Mac 0S X; en-us; compatible; GoogleBot/19.916; U; en) Presto/2.5.25
2 text/..
1 image/..
1 application/json
3DotNetWikiBot/2.91 (Microsoft Windows NT 5.1.2600 Service Pack 2; )
2 text/..
1 application/xml
3Jabse.com Crawler v.2.0 www.jabse.com/crawler.php
3 text/..
3IMARB-Bot/1.0
3 text/..
3goodwer.com bot
3 application/vnd.php.serialized
3.NET Client Parser
3 application/xml
3Mozilla/4.0 (compatible; MSIE is not me; DAUMOA/1.0.0; DAUM Web Robot; Daum Communications Corp., Korea)
3 image/..
1 text/..
3QuickFinder Crawler
3 text/..
3DotNetWikiBot/2.91 (Microsoft Windows NT 5.2.3790 Service Pack 2; )
3 text/..
1 application/xml
3SiocWikiBot
3 text/..
3AnomieBOT 1.0 (RandomPagePicker)
3 application/json
3DotNetWikiBot/2.9 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
3 text/..
1 application/xml
3AnomieBOT 1.0 (AFDMergeFromCleaner)
3 application/json
3('python-wikitools/1.2 (User:Mr.Z-bot)',)
3 application/json
3Mozilla/4.0 (compatible; MT search portal spider/3.0; mail address )"
3 application/xml
1 text/..
3DotNetWikiBot/2.91 (Microsoft Windows NT 6.1.7600.0; )
3 text/..
3MediaWiki::Bot/3.1.6 (User:SporkBot)
3 application/json
11,442total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Tue, Oct 19, 2010 3:22
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.