Wikimedia Visitor Log Analysis Report - Crawler requests

Daily averages, based on sample period: 1 May 2010 - 31 May 2010

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Op.Sys. / Browsers / Google

The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 35,871,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 290,390,000 external requests, which is 12.4%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
11,236google
9,062 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
550 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
371 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
240 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
217 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
99 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
92 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
80 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
77 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
67 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
52 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
38 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid: wikipedia-raw)
37 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
31 code.google.com/appenginetext/..AppEngine-Google; (url; appid: finchproxy)
30 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
28 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
26 code.google.com/appenginetext/..AppEngine-Google; (url; appid: npiv82)
24 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
16 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
15 www.google.com/bot.htmlimage/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
13 www.google.com/feedfetcher.htmlimage/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
9 www.google.com/bot.htmlimage/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
7 docs.google.comimage/..Mozilla/5.0 (compatible; GoogleDocs; url)
7 code.google.com/appenginetext/..AppEngine-Google; (url; appid: boxapp)
7 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
5 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
5 code.google.com/appengineapplication/jsonMWBOT GAE Edition AppEngine-Google; (url; appid: philip-bot)
4 code.google.com/appenginetext/..oohEmbed.com AppEngine-Google; (url; appid: oohembed)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: simple-tools4)
3 code.google.com/p/crawler4j/text/..crawler4j (url)
3 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url),gzip(gfe) AppEngine-Google; (http://code.google.com/appengine; appid: swu)
3 code.google.com/appengineapplication/jsonPython-urllib/2.5 AppEngine-Google; (url; appid: loeschmonitor)
10,806yahoo
10,230 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
119 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
118 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
83 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
61 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmlimage/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
33 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
33 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
31 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
22 developer.yahoo.com/searchmonkey/useragentimage/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
15 help.yahoo.com/help/us/ysearch/slurpapplication/xmlMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
15 help.yahoo.comtext/..Mozilla/5.0 (YahooYSMcm/3.0.0; url)
11 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
11 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
10 help.yahoo.com/help/us/ysearch/crawling/crawling-01.htmltext/..Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; url)
7 www.yahoo.comtext/..Mozilla/5.0 (compatible; heritrix/1.14.3 url)
3 misc.yahoo.com.cn/help.html-Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
7,312msn
4,022 search.msn.com/msnbot.htm-msnbot/2.0b (url)
2,890 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
156 search.msn.com/msnbot.htmtext/..msnbot-media/1.1 (url)
73 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url).
57 search.msn.com/msnbot.htmimage/..msnbot-media/1.1 (url)
29 search.msn.com/msnbot.htmtext/..msnbot/1.1 (url)
28 search.msn.com/msnbot.htmimage/..msnbot/2.0b (url)
18 search.msn.com/msnbot.htmtext/..msnbot-Products/1.0 (url)
12 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)._
9 search.msn.com/msnbot.htmapplication/oggmsnbot/2.0b (url)
5 search.msn.com/msnbot.htmtext/..msnbot-NewsBlogs/1.1 (url)
4 search.msn.com/msnbot.htmtext/..msnbot-UDiscovery/2.0b (url)
3 search.msn.com/msnbot.htm-msnbot-media/1.1 (url)
2,591facebook
2,354 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
227 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
6 developers.facebook.comimage/..facebookplatform/1.0 (url)
4 developers.facebook.comtext/..facebookplatform/1.0 (url)
1,633google?
1,402 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
59 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
46 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
30 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
24 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
16 www.google.com/bot.htmlapplication/xmlDoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
12 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
9 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
9 www.google.com/bot.htmlapplication/vnd.php.serializedMozilla/5.0 (compatible; GoogleBot/2.1; url)
8 www.google.com/bot.htmlapplication/xmlSAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
8 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
5 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
1,235toolserver
1,116 toolserver.org/~guandalug/application/vnd.php.serializedGuandalugs PHPWikiBot/1.1 (url;de:User:Guandalug)
88 wiki.toolserver.org/view/GeoHacktext/..Geohack (url)
26 toolserver.org/~bayo/text/..LudoThecaire/1.0 (url)
1,207naver
1,146 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
28 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
25 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
6 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
697baidu
444 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
208 www.baidu.jp/spider/text/..Baiduspider(url)
19 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
8 www.baidu.jp/spider/application/xmlDoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
5 www.baidu.jp/spider/-Baiduspider(url)
5 www.baidu.jp/spider/text/..BaiduImagespider(url)
3 www.baidu.jp/spider/application/xmlBaiduspider(url)
638ask
489 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
144 about.ask.com/en/docs/about/webmasters.shtml-Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
515php
363 pear.php.net/text/..PEAR HTTP_Request class ( url )
89 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
30 pear.php.net/package/http_request2text/..HTTP_Request2/0.5.1 (url) PHP/5.2.13
17 pear.php.net/application/jsonPEAR HTTP_Request class ( url )
15 pear.php.net/application/xmlPEAR HTTP_Request class ( url )
466cuil
462 www.cuil.com/twiceler/robot.htmltext/..Mozilla/5.0 (Twiceler-0.9 url)
3 www.cuil.com/twiceler/robot.htmlapplication/xmlMozilla/5.0 (Twiceler-0.9 url)
428pipl
428 www.pipl.com/bot/text/..Mozilla/5.0(compatible;PiplBot;url)
406soso
398 help.soso.com/webspider.htmtext/..Sosospider(url)
313wikipedia
154 en.wikipedia.org/wiki/Web_crawlertext/..GoogleBot/Nutch-1.0 (Prototype; url; mail address )
69 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.5 url
38 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.6 url
20 en.wikipedia.org/wiki/User:NicoV/Wikipedia_Cleaner/Documentationtext/..WikiCleaner (url)
16 en.wikipedia.orgtext/..url
5 en.wikipedia.org/wiki/User:Sidonuketext/..Huggle-Sidonuke Build/0.9.4 url
3 ko.wikipedia.orgtext/..url
3 fr.wikipedia.org/wiki/Utilisateur:Salebotapplication/jsonSalebot, see url (uses Perl MediaWiki::API)
252entireweb
241 www.entireweb.com/about/search_tech/speedy_spider/text/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
10 www.entireweb.com/about/search_tech/speedy_spider/-Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
247exabot
148 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
91 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); url)
6 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
238yacy
28 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-21-generic; java 1.6.0_0; Europe/en) url
20 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-18-generic; java 1.6.0_0; Europe/en) url
19 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-21-server; java 1.6.0_0; Asia/ja) url
17 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-21-generic; java 1.6.0_0; Europe/en) url
13 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-18-generic; java 1.6.0_0; Europe/en) url
6 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-14-generic-pae; java 1.6.0_20; Europe/en) url
6 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-18-generic; java 1.6.0_20; Europe/en) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32.9-rscloud; java 1.6.0_20; Etc/en) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-21-generic; java 1.6.0_20; Europe/en) url
5 yacy.net/bot.htmltext/..yacybot (amd64 FreeBSD 7.1-PRERELEASE; java 1.6.0_07; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-gentoo-r6; java 1.6.0_17; Etc/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-21-server; java 1.6.0_20; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-194.3.1.el5; java 1.6.0; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_20; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-18-generic; java 1.6.0_20; GMT/en) url
4 yacy.net/bot.htmltext/..yacybot (sparc Linux 2.6.26-2-sparc64-smp; java 1.6.0_0; SystemV/en) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-18-server; java 1.6.0_0; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_17; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_18; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-20-generic; java 1.6.0_20-ea; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.32-gentoo-r7; java 1.6.0_20; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-amd64; java 1.6.0_0; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (x86 Windows Vista 6.1; java 1.5.0_13; Europe/sv) url
237youdao
223 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
9 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
202traslated
202 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
191sogou
182 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
5 www.sogou.com/docs/help/webmasters.htm#07image/..Sogou Pic Spider/3.0(url)
4 www.sogou.com/docs/help/webmasters.htm#07application/xmlSogou web spider/4.0(url)
184scoutjet
184 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
171z-add
162 w3.z-add.co.uk/linkcheck/text/..Z-Add Link Checker (url)
6 w3.z-add.co.uk/linkcheck/image/..Z-Add Link Checker (url)
3 w3.z-add.co.uk/linkcheck/application/x-javascriptZ-Add Link Checker (url)
165waw
102 dubi.itinfo.waw.plimage/..WordPress/2.8.6; url
63 gienia.itinfo.waw.plimage/..WordPress/2.8.6; url
163sblog
98 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
31 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
30 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
3 fulltext.sblog.cz/screenshot/application/x-javascriptMozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
145mnemoo
145 www.mnemoo.com/about/spidertext/..Mnemoo Intelligent Spider/0.1alpha (compatible; See url)
139github
129 github.com/pauldix/typhoeus/tree/mastertext/..Typhoeus - url
6 github.com/edsu/linkypediaapplication/jsonlinkpyediabot v0.1: url
3 github.com/pauldix/typhoeus/tree/masterapplication/jsonTyphoeus - url
120textdigger
118 textdigger.comtext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
118majestic12
87 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.3; url)
29 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.2; url)
114wikimedia
111 tools.wikimedia.de/~daniel/text/..WikiSense (url)
101gigablast
101 www.gigablast.com/spider.htmltext/..Gigabot/3.0 (url)
91daum
91 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
8780legs
77 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
10 www.80legs.com/spider.htmlimage/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
84archive
72 crawler.archive.orgtext/..Mozilla/5.0 (compatible; heritrix/3.1.1-SNAPSHOT-20100429.232622 url)
5 crawler.archive.orgimage/..Mozilla/5.0 (compatible; heritrix/3.1.1-SNAPSHOT-20100429.232622 url)
4 www.archive.orgtext/..Mozilla/5.0 (compatible; archive.org_bot/heritrix-1.15.4 url)
74emining
72 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
72wordpress
21 josefboberg.wordpress.comtext/..WordPress/MU; url
10 cricketdiane.wordpress.comtext/..WordPress/MU; url
9 support.wordpress.com/contact/text/..WordPress.com mShots; url
6 benabb.wordpress.comtext/..WordPress/MU; url
4 kinandana.wordpress.comtext/..WordPress/MU; url
3 korananakindonesia.wordpress.comtext/..WordPress/MU; url
69kosmix
56 www.kosmix.com/html/kosmos.htmlapplication/xmlMozilla/5.0(compatible;Kosmos/1.0;url)
13 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
69FeedBurner
68 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
62discoveryengine
61 discoveryengine.com/discobot.htmltext/..Mozilla/5.0 (compatible; discobot/1.1; url
57teesoft
19 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
12 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
9 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
5 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
3 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
56www.
18 www.text/..GoogleBot-Image/1.0 ( urlGoogleBot.com/bot.html)
15 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
10 www.text/..Google - GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
6 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)/2.1 (http://www.GoogleBot.com/bot.html; http://www.GoogleBot.com/bot.html; mail address )
4 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
3 www.image/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
54semager
54 www.semager.de/blog/semager-bots/text/..Mozilla/5.0 (compatible; Semager/1.4; url)
50dotnetdotcom
50 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
49freebase
49 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
48oneriot
29 www.oneriot.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (url)
19 www.oneriot.comimage/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (url)
46conceptlinkage
46 www.conceptlinkage.orgtext/..c-link wikipedia miner (url) mail address
44goo
38 help.goo.ne.jp/contact/text/..goo wikipedia (url)
3 help.goo.ne.jp/help/article/1142/text/..DoCoMo/2.0 P900i(c100;TB;W24H11) (compatible; ichiro/mobile goo; url)
41heartrails
20 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.3) Gecko/20100403 HeartRails_Capture/1.0.1 (url) Namoroka/3.6.3
17 capture.heartrails.com/image/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.3) Gecko/20100403 HeartRails_Capture/1.0.1 (url) Namoroka/3.6.3
38sf
13 liferea.sf.net/text/..Liferea/0.x.x (Linux; en_US.UTF-8; url)
12 magpierss.sf.nettext/..MagpieRSS/0.7x (url)
12 liferea.sf.net/text/..Liferea/1.x.x (Linux; es_ES.UTF-8; url)
35mindbreeze
35 www.mindbreeze.comtext/..Mozilla/5.0 (compatible; heritrix/3.0.0 url)
35newsgator
12 www.newsgator.com/text/..FeedDemon/2.7 (url; Microsoft Windows XP)
12 www.newsgator.comtext/..NewsGatorOnline/2.0 (url; 1 subscribers)
8 www.newsgator.com/Individuals/NetNewsWire/-NetNewsWire/3.2.7 (Mac OS X; url; gzip-happy)
35spinn3r
32 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
31Anonymouse
15 Anonymouse.org/image/..url (Unix)
13 Anonymouse.org/text/..url (Unix)
3 Anonymouse.org/application/x-javascripturl (Unix)
31dummy
31 www.dummy.comtext/..TEST/hoqBot-1.0 (dummy; url; mail address )
31hatena
26 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
5 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
28fairshare
27 fairshare.cctext/..Mozilla/5.0 url (X11; FreeBSD i386; en-US; rv:1.2a) Gecko/20021021
28puritysearch
28 www.puritysearch.net/text/..Mozilla/5.0 (compatible; Purebot/1.1; url)
26avantbrowser
13 www.avantbrowser.comtext/..Avant Browser (url)
12 www.avantbrowser.comtext/..Advanced Browser (url)
25xe
25 www.wevonzo.xe.cx/bot.phptext/..wevonzo (url)
25feedshow
13 www.feedshow.comtext/..FeedshowOnline (url)
12 www.feedshow.comtext/..Feedshow/x.0 (url; 1 subscriber)
25jetbrains
13 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 2.0 Release Candidate 1 (url)
12 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 1.0.x (url)
24paxle
22 www.paxle.net/en/bottext/..Mozilla/5.0 (compatible; PaxleFramework/0.1.0; url)
24chainn
21 www.chainn.com/mxbot.htmltext/..Mozilla/5.0 (compatible; mxbot/1.0; url)
3 www.chainn.com/mxbot.htmlimage/..Mozilla/5.0 (compatible; mxbot/1.0; url)
24snap
23 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
23rcdtokyo
16 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
7 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
22superfeedr
20 superfeedr.comapplication/xmlSuperfeedr: Superparser/1.0 url - Please read this http://blog.superfeedr.com/publishers.html or get in touch if we're polling too hard
21ronzoo
21 www.ronzoo.com/about/text/..Ronzoobot/1.4 (url)
21weblio
19 www.weblio.jp/text/..Mozilla/5.0 (compatible; WeblioBot; url)
21qdos
21 qdos.com/text/..qdos/1.1 (url)
21alexa
21 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
17meta
17 meta.ua/spidertext/..Mozilla/5.0 (compatible; METASpider; url)
16princeton
16 www.cs.princeton.edu/cass/text/..nu_tch-princeton/Nu_tch-1.0-dev (princeton crawler for cass project; url; zhewang a_t cs ddot princeton dot edu)
16yioop
15 www.yioop.com/bot.htmltext/..Mozilla/5.0 (compatible; YioopBot url)
15froute
12 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
3 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
15yanga
15 www.yanga.co.uk/text/..Yanga WorldSearch Bot v1.1/beta (url)
15mixi
9 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
6 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
14tinyurl
14 tinyurl.com/64t5ntext/..Rome Client (url) Ver: 0.9
13blogbridge
13 www.blogbridge.com/text/..BlogBridge 2.13 (url)
13rssreader
13 www.rssreader.comtext/..RssReader/1.0.xx.x (url) Microsoft Windows NT 5.1.2600.0
13topsy
13 labs.topsy.com/butterfly.htmltext/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
13orcabrowser
13 www.orcabrowser.comtext/..Orca Browser (url)
13plagger
13 plagger.org/text/..Plagger/0.x.xx (url)
13simplepie
5 simplepie.orgapplication/xmlSimplePie/1.2.1-dev (Feed Parser; url; Allow like Gecko) Build/20100325064732
3 simplepie.orgapplication/xmlSimplePie/1.1.1 (Feed Parser; url; Allow like Gecko) Build/20080315205903
13holmes
13 holmes.getext/..HolmesBot (url)
12seoprofiler
9 www.seoprofiler.com/bot/text/..Mozilla/5.0 (compatible; spbot/2.0.2; url )
3 www.seoprofiler.com/bot/text/..Mozilla/5.0 (compatible; spbot/2.0.3; url )
12zipcommander
12 www.zipcommander.com/text/..1st ZipCommander (Net) - url
12winpodder
12 winpodder.comtext/..WinPodder (url)
12rssbandit
12 www.rssbandit.orgtext/..RssBandit/1.5.0.10 (WinNT 5.1.2600.0; url) (WinNT 5.1.2600.0; )
12kula
12 kula.jp/endotext/..endo/1.0 (Mac OS X; ppc i386; url)
12yo-hu
12 www.yo-hu.comtext/..Mozilla/4.0(compatible;MSIE8.0;WindowsNT6.1;WOW64;Trident/4.0;SLCC2;.NETCLR2.0.50727;.NETCLR3.5.30729;.NETCLR3.0.30729;MediaCenterPC6.0;Yo-Hu Crawler 1.00;url;)
12ponderer
12 ponderer.org/download/annotate_google.user.jstext/..annotate_google; url
12graemef
12 graemef.comtext/..NewsGator FetchLinks extension/0.2.0 (url)
12it-influentials
12 search.it-influentials.com/bot.htmtext/..Mozilla/5.0 (compatible;FindITAnswersbot/1.0;url)
12feeds4all
12 www.feeds4all.com/feedzcollectortext/..FeedZcollector v1.x (Platinum) url
12seebot
12 seebot.orgtext/..Lynx/2.8 (;url)
11zootycoon
11 www.zootycoon.comtext/..Zoo Tycoon 2 Client -- url
11timewe
11 timewe.nettext/..CDR/1.7.1 Simulator/0.7(url) Profile/MIDP-1.0 Configuration/CLDC-1.0
11snarfware
11 www.snarfware.com/text/..Snarfer/0.x.x (url)
11bloglines
7 www.bloglines.com-Bloglines/3.1 (url; 1 subscriber)
11quus
11 fx.quus.net/text/..url
11ranchero
11 ranchero.com/netnewswire/text/..NetNewsWire/2.x (Mac OS X; url)
11archive-it
8 archive-it.org/files/site-owners.htmlimage/..Mozilla/5.0 (compatible;archive.org_bot/1.7.0; Archive-It; url) Firefox
3 archive-it.org/files/site-owners.htmltext/..Mozilla/5.0 (compatible;archive.org_bot/1.7.0; Archive-It; url) Firefox
11nemui
11 mozshot.nemui.org/text/..Mozilla/5.0 (Gecko/20070310 Mozshot/0.0.20070628; url)
11chenli
11 chenli.com.cntext/..Chen Li/Nutch-1.0 (Nutch spiderman; url; mail address )
10plagiarismtest
10 plagiarismtest.comapplication/jsonWikiCrawl 1.0b (url contact-mail: mail address )
10tineye
8 tineye.com/crawler.htmlimage/..TinEye/1.1 (url)
10bin-co
10 www.bin-co.com/php/scripts/load/text/..BinGet/1.00.A (url)
10uk:8080
10 www.wishfind.co.uk:8080text/..Wish_Find_Search_Engine_Crawler/Nutch-0.9 (www.wishfind.co.uk Wish Find Search Engine Crawler; url; mail address )
10edu
5 ws.nju.edu.cn/falcons/text/..Mozilla/5.0 (compatible; Falconsbot; url)
3 www.users.pjwstk.edu.pl/~msyd/webmining.htmltext/..PJCrawler/Nutch-1.0 (PJWSTK Web Mining Lab Project; url; mail address (dot) pl)
10rockpeaks
10 www.rockpeaks.com/contacttext/..RockPeaks/0.1 (url)
44,416total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
2,259PythonWikipediaBot/1.0
1,488 application/json
622 application/xml
149 text/..
1 -
1 image/..
1,388GoogleBot-Image/1.0
613 text/..
400 -
375 image/..
1 application/pdf
979ClueBot/1.1
791 application/vnd.php.serialized
188 text/..
401Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
401 text/..
1 -
1 application/vnd.php.serialized
272LinkParser/2.0
272 text/..
260Answersbot
260 text/..
187Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
87 text/..
74 image/..
26 application/x-javascript
1 application/json
151MLBot (www.metadatalabs.com/mlbot)
151 text/..
1 image/..
145gsa-crawler (Enterprise; S5-MS8QQPJ5BGWAA; mail address )
145 text/..
144php wikibot classes
93 application/vnd.php.serialized
51 text/..
142wikiwix-bot-3.0
140 text/..
2 image/..
1 -
136www.rootza.com crawler mail address
136 application/xml
1 text/..
136DotNetWikiBot/2.9 (Microsoft Windows NT 5.1.2600 Service Pack 2; )
131 text/..
5 application/xml
122DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
122 text/..
98GoogleBot-Image/1.0
97 text/..
1 image/..
1 -
77GoogleBot-News
76 text/..
1 -
68DotNetWikiBot/2.72 (Microsoft Windows NT 5.2.3790 Service Pack 2; )
68 text/..
54crawler mail address
54 text/..
49Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
30 image/..
19 text/..
1 application/json
1 application/x-javascript
47DotNetWikiBot/2.81 (Microsoft Windows NT 6.1.7600.0; )
35 text/..
8 application/xml
4 image/..
1 application/ogg
42zschobot/Nutch-0.9-semantic_patch (zschobot indexing; Zscho.de/de/bot.html)
42 text/..
1 image/..
1 application/ogg
40betaBot
40 text/..
1 image/..
34DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600 Service Pack 2; )
34 text/..
33SoxBot IRC Bot. PHP
28 application/vnd.php.serialized
5 text/..
32Pywikipediabot/2.0
32 application/json
1 text/..
32plantspedia data crawler
32 text/..
30Test Webbot
30 text/..
28SineBot/1.5.16(User:SineBot)
27 application/vnd.php.serialized
1 text/..
1 -
23CorenSearchBot/1.5 en libwww-perl/5.808
23 text/..
22COIBot/1.00
22 text/..
21Twitterbot/0.1
20 text/..
1 image/..
1 -
20spider
19 text/..
1 application/json
1 image/..
18AnomieBOT 1.0 (OrphanReferenceFixer)
18 application/json
16HLTC-HKUST Research Bot 0.1 - E. Prochasson
11 application/json
5 text/..
14GoogleBot
14 text/..
1 image/..
14COMODOspider/Nutch-1.0
14 text/..
1 application/x-javascript
1 image/..
13Jyxobot/1
13 text/..
12Mozilla/5.0 (X11; Linux i686; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 SnapPreviewBot
12 text/..
12Mozilla/5.0 (compatible; SnapPreviewBot; en-US; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9
12 text/..
12UCMore Crawler App
12 text/..
12SurakWare MediaWiki Bot/1.0
12 text/..
12DotNetWikiBot/2.9 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
10 text/..
2 application/xml
12YaDirectBot/1.0
12 text/..
11('python-wikitools/1.2 (User:BernsteinBot)',)
11 application/json
11MSRBOT
11 text/..
11dictionary-bot
7 application/xml
4 text/..
11HTMLParser/1.6
11 text/..
11LinkParser/1.00
11 text/..
10Mozilla 5.0 (Apibot 0.20)
10 application/vnd.php.serialized
10~Bot ([[:fr:w:User:TildeBot]] by [[:fr:w:User:Alphos]] mail address )
10 text/..
10Jbot
10 text/..
10SoxBot PHP
10 application/vnd.php.serialized
1 text/..
9Lycos_Spider_(modspider)
8 text/..
1 image/..
9TheKeens bot
9 text/..
9Tawbot (public svn release; plwiki)
9 text/..
9SiocWikiBot/1.0
9 application/vnd.php.serialized
1 -
1 text/..
9FAST Enterprise Crawler 6 used by admo ( mail address )
9 text/..
1 -
1 image/..
8CaBot Script (running on nightshade.toolserver.org)
8 application/vnd.php.serialized
1 text/..
7Bot/WP/EN/Daniel/MediationBot1/1.2
7 text/..
7HTMLParser/2.0
7 text/..
1 -
1 image/..
6XLinkBot/1.00
6 text/..
6Mozilla/5.0 (compatible; Windows NT 6.0) Gecko/20090624 Firefox/3.5 NjuiceBot
6 text/..
1 image/..
6IsraBot/Nutch-1.1 (MyTest; 10.10.4.231; mail address )
6 text/..
6minicrawler
5 text/..
1 image/..
5bitlybot
5 text/..
1 image/..
5Onespot Crawler
5 application/json
1 text/..
5Netvibes Wasabi-bot v1.0
3 application/xml
1 -
1 text/..
1 application/opensearchdescription+xml
5ListasBot 3
5 text/..
5Geni ircpybot 1.0
3 text/..
2 application/json
1 application/xml
5SophiaOneBot
5 text/..
5FAST Enterprise Crawler 6 used by fast ( mail address )
5 text/..
5Mozilla/5.0 (Bgbot 0.5)
5 text/..
5('python-wikitools/1.2 (User:LaraBot)',)
5 application/json
4SuperBot/4.7.0.70 (Windows XP)
4 text/..
1 image/..
1 application/xml
4GNAA-bot
4 text/..
4Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
4 text/..
4Freebase Deathbot
4 text/..
4AnomieBOT 1.0 (AFDMergeFromCleaner)
4 application/json
4('python-wikitools/1.2 (User:Mr.Z-bot)',)
4 application/json
4AnomieBOT 1.0 (SourceUploader)
4 application/json
4DotNetWikiBot/2.9 (Unix 2.6.26.2; )
4 text/..
4FAST Search Web Crawler 14.0.0291.0000
4 text/..
1 -
4CheMoBot/1.00
4 text/..
3Mozilla/5.0 (compatible; Nigma.ru/3.0; mail address )
3 text/..
1 -
1 image/..
3SuperBot/4.7.0.74 (Windows 7)
3 text/..
1 image/..
3'citeseerxbot'
3 text/..
1 image/..
3IMARB-Bot/1.0
3 text/..
3QBot
3 text/..
3AnomieBOT 1.0 (RandomPagePicker)
3 application/json
3DotNetWikiBot/2.91 (Microsoft Windows NT 6.1.7600.0; )
3 text/..
1 application/xml
3IssueCrawler
3 text/..
7,916total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Sat, Jul 3, 2010 18:26
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.