Wikimedia Visitor Log Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Apr 2010 - 30 Apr 2010

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Op.Sys. / Browsers / Google

The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 39,524,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 304,015,000 external requests, which is 13.0%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
12,995google
10,389 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
586 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
420 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
418 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
239 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
170 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
147 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
103 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
102 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
60 code.google.com/appenginetext/..AppEngine-Google; (url; appid: npiv82)
56 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
46 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
40 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
34 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
30 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
28 code.google.com/appenginetext/..AppEngine-Google; (url; appid: finchproxy)
18 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
15 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
12 code.google.com/p/crawler4j/text/..crawler4j (url)
11 www.google.com/feedfetcher.htmlimage/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
10 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid: wikipedia-raw)
8 docs.google.comimage/..Mozilla/5.0 (compatible; GoogleDocs; url)
7 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
7 www.google.com/bot.htmlimage/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
6 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
4 www.google.com/bot.htmlimage/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: boxapp)
3 www.google.orgtext/..Naveen/Nutch-1.0 (Naveen; url; mail address )
3 code.google.com/appengineapplication/jsonMWBOT GAE Edition AppEngine-Google; (url; appid: philip-bot)
10,293yahoo
9,687 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
157 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
131 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
110 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
45 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmlimage/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
35 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
31 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
25 developer.yahoo.com/searchmonkey/useragentimage/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
15 help.yahoo.comtext/..Mozilla/5.0 (YahooYSMcm/3.0.0; url)
15 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
13 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
10 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
10 help.yahoo.com/help/us/ysearch/slurpapplication/xmlMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
3 misc.yahoo.com.cn/help.html-Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
6,989msn
3,226 search.msn.com/msnbot.htm-msnbot/2.0b (url)
3,220 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
156 search.msn.com/msnbot.htmtext/..msnbot-media/1.1 (url)
127 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url).
78 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)._
46 search.msn.com/msnbot.htmimage/..msnbot-media/1.1 (url)
46 search.msn.com/msnbot.htmtext/..msnbot-Products/1.0 (url)
36 search.msn.com/msnbot.htmimage/..msnbot/2.0b (url)
28 search.msn.com/msnbot.htmtext/..msnbot/1.1 (url)
8 search.msn.com/msnbot.htmapplication/oggmsnbot/2.0b (url)
7 search.msn.com/msnbot.htmtext/..msnbot-NewsBlogs/1.1 (url)
1,658google?
1,409 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
85 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
43 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
33 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
31 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
13 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
12 www.google.com/bot.htmlapplication/xmlDoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
11 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
6 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
6 www.google.com/bot.htmlapplication/xmlSAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
6 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
1,571wikipedia
1,428 en.wikipedia.org/wiki/Web_crawlertext/..GoogleBot/Nutch-1.0 (Prototype; url; mail address )
65 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.4 url
25 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.2 url
13 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.5 url
13 en.wikipedia.orgtext/..url
9 en.wikipedia.org/wiki/User:NicoV/Wikipedia_Cleaner/Documentationtext/..WikiCleaner (url)
6 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.3 url
3 en.wikipedia.org/wiki/User:Sidonuketext/..Huggle-Sidonuke Build/0.9.4 url
3 fr.wikipedia.org/wiki/Utilisateur:Salebotapplication/jsonSalebot, see url (uses Perl MediaWiki::API)
1,167naver
1,098 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
30 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
28 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
8 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
847cuil
831 www.cuil.com/twiceler/robot.htmltext/..Mozilla/5.0 (Twiceler-0.9 url)
13 www.cuil.com/twiceler/robot.htmlapplication/xmlMozilla/5.0 (Twiceler-0.9 url)
713pipl
713 www.pipl.com/bot/text/..Mozilla/5.0(compatible;PiplBot;url)
699ask
571 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
124 about.ask.com/en/docs/about/webmasters.shtml-Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
696baidu
502 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
137 www.baidu.jp/spider/text/..Baiduspider(url)
31 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
9 www.baidu.jp/spider/application/xmlDoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
6 www.baidu.jp/spider/-Baiduspider(url)
3 www.baidu.jp/spider/text/..BaiduImagespider(url)
3 www.baidu.jp/spider/application/xmlBaiduspider(url)
631facebook
510 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
107 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
8 developers.facebook.comtext/..facebookplatform/1.0 (url)
6 developers.facebook.comimage/..facebookplatform/1.0 (url)
539php
378 pear.php.net/text/..PEAR HTTP_Request class ( url )
87 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
28 pear.php.net/package/http_request2text/..HTTP_Request2/0.5.1 (url) PHP/5.2.12
22 pear.php.net/application/xmlPEAR HTTP_Request class ( url )
19 pear.php.net/application/jsonPEAR HTTP_Request class ( url )
4 pear.php.net/package/http_request2text/..HTTP_Request2/0.5.1 (url) PHP/5.2.13
440soso
430 help.soso.com/webspider.htmtext/..Sosospider(url)
4 help.soso.com/webspider.htm-Sosospider(url)
383youdao
362 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
9 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
4 www.youdao.com/help/webmaster/spider/text/..Nokia3650/1.0 SymbianOS/6.1 Series60/1.2 Profile/MIDP-1.0 Configuration/CLDC-1.0/ (compatible; YodaoBot-Mobile/1.0; url; )
3 www.youdao.com/help/webmaster/spider/application/xmlMozilla/5.0 (compatible; YoudaoBot/1.0; url; )
304exabot
161 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
132 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); url)
7 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
268girovolando
268 news.girovolando.ittext/..WordPress/2.9.2; url
241entireweb
229 www.entireweb.com/about/search_tech/speedy_spider/text/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
9 www.entireweb.com/about/search_tech/speedy_spider/text/..Speedy Spider (url)
3 www.entireweb.com/about/search_tech/speedy_spider/-Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
207scoutjet
207 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
199yacy
35 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-20-server; java 1.6.0_0; Europe/de) url
27 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-19-server; java 1.6.0_0; Europe/en) url
21 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-21-generic; java 1.6.0_0; Europe/en) url
15 yacy.net/bot.htmltext/..yacybot (amd64 Windows 7 6.1; java 1.6.0_18; Europe/fr) url
6 yacy.net/bot.htmltext/..yacybot (amd64 Windows 7 6.1; java 1.6.0_19; Europe/fr) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-164.15.1.el5; java 1.6.0_14; Europe/de) url
5 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.24-10-pve; java 1.6.0_12; Etc/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-gentoo-r6; java 1.6.0_17; Etc/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-20-server; java 1.6.0_15; US/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Windows 7 6.1; java 1.6.0_20; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-20-server; java 1.6.0_0; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-trunk-amd64; java 1.6.0_18; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-18-generic; java 1.6.0_0; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (x86 Windows Vista 6.1; java 1.5.0_13; Europe/sv) url
3 yacy.net/bot.htmltext/..yacybot (x86 Windows Vista 6.0; java 1.6.0_07; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_17; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-xen-amd64; java 1.6.0_0; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 FreeBSD 7.1-PRERELEASE; java 1.6.0_07; Europe/en) url
185traslated
185 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
184sogou
175 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
5 www.sogou.com/docs/help/webmasters.htm#07application/xmlSogou web spider/4.0(url)
4 www.sogou.com/docs/help/webmasters.htm#07image/..Sogou Pic Spider/3.0(url)
175mnemoo
175 www.mnemoo.com/about/spidertext/..Mnemoo Intelligent Spider/0.1alpha (compatible; See url)
163toolserver
107 wiki.toolserver.org/view/GeoHacktext/..Geohack (url)
29 toolserver.org/~bayo/text/..LudoThecaire/1.0 (url)
22 toolserver.org/~guandalug/application/vnd.php.serializedGuandalugs PHPWikiBot/1.1 (url;de:User:Guandalug)
3 toolserver.org/~dcoetzee/contributionsurveyor/text/..Contribution Surveyor (url)
161sblog
105 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
32 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
21 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
13680legs
115 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
21 www.80legs.com/spider.htmlimage/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
125wikimedia
124 tools.wikimedia.de/~daniel/text/..WikiSense (url)
111daum
111 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
105textdigger
105 textdigger.comtext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
101www.
47 www.text/..GoogleBot-Image/1.0 ( urlGoogleBot.com/bot.html)
47 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
4 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
3 www.image/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
100majestic12
94 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.2; url)
4 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.3; url)
86discoveryengine
62 discoveryengine.com/discobot.htmltext/..Mozilla/5.0 (compatible; discobot/1.1; url
24 discoveryengine.com/discobot.htmlimage/..Mozilla/5.0 (compatible; discobot/1.1; url
78oneriot
51 www.oneriot.comimage/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (url)
27 www.oneriot.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (url)
77emining
75 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
70kosmix
60 www.kosmix.com/html/kosmos.htmlapplication/xmlMozilla/5.0(compatible;Kosmos/1.0;url)
10 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
66dotnetdotcom
66 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
66semager
66 www.semager.de/blog/semager-bots/text/..Mozilla/5.0 (compatible; Semager/1.4; url)
63activepeople
63 www.activepeople.nettext/..WordPress/2.8.4; url
62teesoft
20 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
14 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
11 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
57wordpress
17 josefboberg.wordpress.comtext/..WordPress/MU; url
10 support.wordpress.com/contact/text/..WordPress.com mShots; url
9 benabb.wordpress.comtext/..WordPress/MU; url
54github
52 github.com/pauldix/typhoeus/tree/mastertext/..Typhoeus - url
41sf
14 liferea.sf.net/text/..Liferea/0.x.x (Linux; en_US.UTF-8; url)
13 magpierss.sf.nettext/..MagpieRSS/0.7x (url)
13 liferea.sf.net/text/..Liferea/1.x.x (Linux; es_ES.UTF-8; url)
39spinn3r
36 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
39FeedBurner
38 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
38fairshare
35 fairshare.cctext/..Mozilla/5.0 url (X11; FreeBSD i386; en-US; rv:1.2a) Gecko/20021021
36newsgator
13 www.newsgator.com/text/..FeedDemon/2.7 (url; Microsoft Windows XP)
13 www.newsgator.comtext/..NewsGatorOnline/2.0 (url; 1 subscribers)
8 www.newsgator.com/Individuals/NetNewsWire/-NetNewsWire/3.2.7 (Mac OS X; url; gzip-happy)
34hatena
29 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
5 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
29yanga
29 www.yanga.co.uk/text/..Yanga WorldSearch Bot v1.1/beta (url)
29z-add
26 w3.z-add.co.uk/linkcheck/text/..Z-Add Link Checker (url)
29conceptlinkage
29 www.conceptlinkage.orgtext/..c-link wikipedia miner (url) mail address
28puritysearch
28 www.puritysearch.net/text/..Mozilla/5.0 (compatible; Purebot/1.1; url)
27qdos
27 qdos.com/text/..qdos/1.1 (url)
27jetbrains
14 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 1.0.x (url)
13 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 2.0 Release Candidate 1 (url)
26yioop
25 www.yioop.com/bot.htmltext/..Mozilla/5.0 (compatible; YioopBot url)
26avantbrowser
13 www.avantbrowser.comtext/..Avant Browser (url)
12 www.avantbrowser.comtext/..Advanced Browser (url)
25rcdtokyo
19 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
6 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
25Anonymouse
12 Anonymouse.org/image/..url (Unix)
11 Anonymouse.org/text/..url (Unix)
25heartrails
8 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
7 capture.heartrails.com/image/..Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:1.8.1.20) Gecko/20090429 HeartRails_Capture/0.6 (url) BonEcho/2.0.0.20
5 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.3) Gecko/20100403 HeartRails_Capture/1.0.1 (url) Namoroka/3.6.3
5 capture.heartrails.com/image/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.3) Gecko/20100403 HeartRails_Capture/1.0.1 (url) Namoroka/3.6.3
25feedshow
13 www.feedshow.comtext/..FeedshowOnline (url)
12 www.feedshow.comtext/..Feedshow/x.0 (url; 1 subscriber)
25snap
25 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
24moose
24 www.moose.at/text/..Mozilla/5.0 (compatible; MooseBot/1.1; Linux i686; de; url)
23alexa
23 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
23vbseo
22 www.vbseo.comtext/..Mozilla/4.0 (vBSEO; url)
23freebase
23 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
20seoprofiler
20 www.seoprofiler.com/bot/text/..Mozilla/5.0 (compatible; spbot/2.0.2; url )
19phonifier
11 www.phonifier.comtext/..aNti_miSa$puAsa/5.0 (compatible; Phonifier; url)
8 www.phonifier.comtext/..Mozilla/5.0 (compatible; Phonifier; url)
18chainn
16 www.chainn.com/mxbot.htmltext/..Mozilla/5.0 (compatible; mxbot/1.0; url)
18princeton
18 www.cs.princeton.edu/cass/text/..nu_tch-princeton/Nu_tch-1.0-dev (princeton crawler for cass project; url; zhewang a_t cs ddot princeton dot edu)
16froute
13 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
3 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
16turnitin
16 www.turnitin.com/robot/crawlerinfo.htmltext/..TurnitinBot/2.1 (url)
15tinyurl
15 tinyurl.com/64t5ntext/..Rome Client (url) Ver: 0.9
15rssbandit
15 www.rssbandit.orgtext/..RssBandit/1.5.0.10 (WinNT 5.1.2600.0; url) (WinNT 5.1.2600.0; )
15archive-it
10 archive-it.org/files/site-owners.htmlimage/..Mozilla/5.0 (compatible;archive.org_bot/1.7.0; Archive-It; url) Firefox
5 archive-it.org/files/site-owners.htmltext/..Mozilla/5.0 (compatible;archive.org_bot/1.7.0; Archive-It; url) Firefox
14timewe
14 timewe.nettext/..CDR/1.7.1 Simulator/0.7(url) Profile/MIDP-1.0 Configuration/CLDC-1.0
14winpodder
14 winpodder.comtext/..WinPodder (url)
14orcabrowser
14 www.orcabrowser.comtext/..Orca Browser (url)
14plagger
13 plagger.org/text/..Plagger/0.x.xx (url)
14kula
14 kula.jp/endotext/..endo/1.0 (Mac OS X; ppc i386; url)
14mixi
7 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
7 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
14it-influentials
14 search.it-influentials.com/bot.htmtext/..Mozilla/5.0 (compatible;FindITAnswersbot/1.0;url)
14seebot
14 seebot.orgtext/..Lynx/2.8 (;url)
14meta
14 meta.ua/spidertext/..Mozilla/5.0 (compatible; METASpider; url)
13blogbridge
13 www.blogbridge.com/text/..BlogBridge 2.13 (url)
13setooz
9 www.setooz.com/oozbot.htmltext/..OOZBOT/0.20 ( url ; mail address )
4 www.setooz.com/bot.htmltext/..Mozilla/5.0/Nutch-1.0-dev ( compatible; SETOOZBOT/0.30 ; url ; mail address )
13rssreader
13 www.rssreader.comtext/..RssReader/1.0.xx.x (url) Microsoft Windows NT 5.1.2600.0
13zipcommander
13 www.zipcommander.com/text/..1st ZipCommander (Net) - url
13globalspec
13 www.globalspec.com/Ocellitext/..Ocelli/1.4 (url)
13memidex
13 www.memidex.com/_bottext/..Mozilla/5.0 (compatible; Memibot/1.0; url )
13topsy
13 labs.topsy.com/butterfly.htmltext/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
13snarfware
13 www.snarfware.com/text/..Snarfer/0.x.x (url)
13ranchero
13 ranchero.com/netnewswire/text/..NetNewsWire/2.x (Mac OS X; url)
13bin-co
13 www.bin-co.com/php/scripts/load/text/..BinGet/1.00.A (url)
13weblio
12 www.weblio.jp/text/..Mozilla/5.0 (compatible; WeblioBot; url)
13archive
4 crawler.archive.orgtext/..Mozilla/5.0 (compatible; heritrix/3.1.1-SNAPSHOT-20100424.022413 url)
3 crawler.archive.orgtext/..Mozilla/5.0 (compatible; heritrix/3.1.1-SNAPSHOT-20100427.215201 url)
13ponderer
13 ponderer.org/download/annotate_google.user.jstext/..annotate_google; url
13graemef
13 graemef.comtext/..NewsGator FetchLinks extension/0.2.0 (url)
13nemui
13 mozshot.nemui.org/text/..Mozilla/5.0 (Gecko/20070310 Mozshot/0.0.20070628; url)
12zootycoon
12 www.zootycoon.comtext/..Zoo Tycoon 2 Client -- url
12simplepie
4 simplepie.orgapplication/xmlSimplePie/1.1.1 (Feed Parser; url; Allow like Gecko) Build/20080315205903
3 simplepie.orgapplication/xmlSimplePie/1.2 (Feed Parser; url; Allow like Gecko) Build/20090627192103
3 simplepie.orgtext/..SimplePie/1.1.1 (Feed Parser; url; Allow like Gecko) Build/20080315205903
12feeds4all
12 www.feeds4all.com/feedzcollectortext/..FeedZcollector v1.x (Platinum) url
11holmes
11 holmes.getext/..HolmesBot (url)
10superfeedr
9 superfeedr.comapplication/xmlSuperfeedr: Superparser/1.0 url - mail address - Please get in touch if we're polling too hard
44,270total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
2,264PythonWikipediaBot/1.0
1,484 application/json
579 application/xml
200 text/..
1 image/..
1 -
1,009GoogleBot-Image/1.0
430 text/..
329 image/..
250 -
1 application/pdf
878ClueBot/1.1
669 application/vnd.php.serialized
209 text/..
441Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
441 text/..
1 -
1 application/pdf
351LinkParser/2.0
351 text/..
262Answersbot
262 text/..
167DotNetWikiBot/2.9 (Microsoft Windows NT 5.1.2600 Service Pack 2; )
161 text/..
6 application/xml
159gsa-crawler (Enterprise; S5-MS8QQPJ5BGWAA; mail address )
159 text/..
153GoogleBot-Image/1.0
149 text/..
4 image/..
1 -
1 application/x-javascript
149wikiwix-bot-3.0
147 text/..
2 image/..
1 -
143Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
74 text/..
50 image/..
19 application/x-javascript
109php wikibot classes
91 application/vnd.php.serialized
18 text/..
102MLBot (www.metadatalabs.com/mlbot)
102 text/..
1 -
1 image/..
100DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
100 text/..
1 application/xml
97ListasBot 3
97 text/..
70gsa-crawler (Enterprise; S5-BCMTBUXTXSNJA; mail address )
70 text/..
56DotNetWikiBot/2.81 (Microsoft Windows NT 6.1.7600.0; )
43 text/..
8 application/xml
5 image/..
1 -
1 application/ogg
51Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
30 image/..
21 text/..
1 application/x-javascript
39DotNetWikiBot/2.72 (Microsoft Windows NT 5.2.3790 Service Pack 2; )
39 text/..
38DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600 Service Pack 2; )
38 text/..
1 -
37plantspedia data crawler
37 text/..
30dictionary-bot
21 application/xml
9 text/..
30Test Webbot
30 text/..
28crawler mail address
28 text/..
27SineBot/1.5.15(User:SineBot)
26 application/vnd.php.serialized
1 text/..
25SoxBot IRC Bot. PHP
21 application/vnd.php.serialized
4 text/..
24DOCODE Spider/Nutch-1.0
24 text/..
1 application/pdf
1 image/..
24AnomieBOT 1.0 (OrphanReferenceFixer)
24 application/json
23MPUploadBot; PHP 5.2.6-3ubuntu4.5
23 application/vnd.php.serialized
22COIBot/1.00
22 text/..
21DotNetWikiBot/2.91 (Microsoft Windows NT 5.1.2600 Service Pack 2; )
21 text/..
1 -
1 application/xml
20MSRBOT
20 text/..
20www.rootza.com crawler mail address
20 application/xml
1 text/..
20Perfect Search Crawler
20 text/..
19FAST Enterprise Crawler 6 used by CTS ( mail address )
19 text/..
1 -
19HTMLParser/2.0
19 text/..
1 -
18LinkParser/1.00
18 text/..
18SiocWikiBot/1.0
18 application/vnd.php.serialized
1 text/..
17CorenSearchBot/1.5 en libwww-perl/5.808
17 text/..
16Twitterbot/0.1
16 text/..
1 image/..
1 application/ogg
15QuickFinder Crawler
15 text/..
14SurakWare MediaWiki Bot/1.0
14 text/..
14dicbot 1.0
14 text/..
13Mozilla/5.0 (compatible; SnapPreviewBot; en-US; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9
13 text/..
13GoogleBot
13 text/..
1 image/..
13UCMore Crawler App
13 text/..
13HTMLParser/1.6
13 text/..
1 application/json
13betaBot
13 text/..
1 application/ogg
13spider
12 text/..
1 application/json
1 image/..
12Mozilla/5.0 (X11; Linux i686; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 SnapPreviewBot
12 text/..
12DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600.0; )
12 text/..
12TheKeens bot
12 text/..
12YaDirectBot/1.0
12 text/..
12SoxBot PHP
11 application/vnd.php.serialized
1 text/..
11Tawbot (public svn release; plwiki)
11 text/..
11SophiaOneBot
11 text/..
10('python-wikitools/1.2 (User:BernsteinBot)',)
10 application/json
10~Bot ([[:fr:w:User:TildeBot]] by [[:fr:w:User:Alphos]] mail address )
10 text/..
10IsraBot/Nutch-1.1 (MyTest; 10.10.4.231; mail address )
10 text/..
1 application/opensearchdescription+xml
1 image/..
9Bot/WP/EN/Daniel/MediationBot1/1.2
9 text/..
9DotNetWikiBot/2.9 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
8 text/..
1 application/xml
8Pywikipediabot/2.0
8 application/json
1 text/..
8XLinkBot/1.00
8 text/..
7zschobot/Nutch-0.9-semantic_patch (zschobot indexing; Zscho.de/de/bot.html)
7 text/..
1 image/..
7FAST Enterprise Crawler 6 used by fast ( mail address )
7 text/..
1 -
6AdultsVisitUs/Nutch-1.1 (www.AdultsVisitUs.com; mail address )
6 text/..
1 application/ogg
6Lycos_Spider_(modspider)
3 image/..
3 text/..
6Jbot
6 text/..
1 image/..
6GNAA-bot
6 text/..
6Citation_bot; mail address
6 text/..
6Freebase Deathbot
6 text/..
6('python-wikitools/1.2 (User:Mr.Z-bot)',)
6 application/json
6Mozilla/5.0 (compatible; Windows NT 6.0) Gecko/20090624 Firefox/3.5 NjuiceBot
6 text/..
1 image/..
6Keybot Translation-Search-Machine
6 text/..
5zomba-bot/0.1
5 text/..
5QBot
5 text/..
5Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
5 text/..
5CorenSearchBot/1.4 en libwww-perl/5.808
5 text/..
5WukongSpider
5 text/..
5 mail address (Mozilla compatible)
5 text/..
1 image/..
5Jyxobot/1
5 text/..
5Mozilla/5.0 (Bgbot 0.5)
5 text/..
4bitlybot
4 text/..
1 image/..
4Mozilla/5.0 (compatible; Feedtrace-bot/0.2; mail address )
3 text/..
1 image/..
1 application/ogg
4Mozilla 5.0 (Apibot 0.20)
4 application/vnd.php.serialized
4MyCuteBot / 0.1.
4 text/..
4CaBot Script (running on nightshade.toolserver.org)
4 application/vnd.php.serialized
4Erel Bot
4 text/..
4TestBot 1.0
4 text/..
4Geni ircpybot 1.0
2 application/json
2 text/..
1 application/xml
4AnomieBOT 1.0 (SourceUploader)
4 application/json
4CheMoBot/1.00
4 text/..
3Mozilla/5.0 (compatible; Nigma.ru/3.0; mail address )
3 text/..
3DotNetWikiBot/2.8 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
3 text/..
1 application/xml
3Jabse.com Crawler v.2.0 www.jabse.com/crawler.php
3 text/..
3DotNetWikiBot/2.7 (Microsoft Windows NT 6.1.7600.0; )
3 text/..
1 application/xml
3Netvibes Wasabi-bot v1.0
2 application/xml
1 text/..
1 -
3AnomieBOT 1.0 (AFDMergeFromCleaner)
3 application/json
3DotNetWikiBot/2.9 (Unix 2.6.26.2; )
3 text/..
3TVersity Media Robot
3 text/..
1 application/xml
3unblockbot/1.00
3 text/..
3('python-wikitools/1.2 (User:LaraBot)',)
3 application/json
3IssueCrawler
3 text/..
7,546total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Sat, Jun 26, 2010 9:26
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.