Wikimedia Visitor Log Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Apr 2009 - 30 Apr 2009

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Google
 
The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 37,056,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 256,428,000 external requests, which is 14.5%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
16,010google
13,571 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
1,957 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
201 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
108 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
54 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
32 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
29 code.google.com/appenginetext/..AppEngine-Google; (url)
11 www.google.com/feedfetcher.htmlapplication/jsonFeedFetcher-Google; (url)
7 code.google.com/appengineapplication/xmlAppEngine-Google; (url)
6 www.google.com/ietext/..Mozilla/5.0 (Windows; Windows NT 6.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7 GTB5Referer: url
6 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
6 www.google.com/feedfetcher.htmltext/..Google OpenSocial agent (url)
5 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
4 www.google.com/feedfetcher.htmlapplication/xmlGoogle OpenSocial agent (url)
3 www.google.com/bot.htmlapplication/vnd.wap.xhtml+xmlSAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
8,044yahoo
4,690 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
2,545 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
281 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
147 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
115 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
113 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
69 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
28 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
15 misc.yahoo.com.cn/help.html-Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
12 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
10 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
7 help.yahoo.com/help/us/ysearch/slurpapplication/x-javascriptMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
4 help.yahoo.com/yahoo_adcrawlertext/..Mozilla/5.0 (compatible; Yahoo!-AdCrawler; url)
4 help.yahoo.com/help/us/ysearch/crawling/crawling-01.htmltext/..Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; url)
2,48180legs
1,885 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 80bot/0.71; url;) Gecko/2008032620
306 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 80bot/0.71; url;)
233 www.80legs.com/spider.htmlimage/..Mozilla/5.0 (compatible; 80bot/0.71; url;) Gecko/2008032620
30 www.80legs.com/spider.htmlimage/..Mozilla/5.0 (compatible; 80bot/0.71; url;)
21 www.80legs.com/spider.html-Mozilla/5.0 (compatible; 80bot/0.71; url;) Gecko/2008032620
3 www.80legs.com/spider.html-Mozilla/5.0 (compatible; 80bot/0.71; url;)
2,415google?
1,471 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
414 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
413 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
50 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
43 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
12 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
6 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
1,969msn
1,545 search.msn.com/msnbot.htmtext/..msnbot/1.1 (url)
159 search.msn.com/msnbot.htmtext/..msnbot-media/1.1 (url)
68 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
66 search.msn.com/msnbot.htmtext/..librabot/1.0 (url)
46 search.msn.com/msnbot.htmtext/..renlifangbot/1.0 (url)
44 search.msn.com/msnbot.htmimage/..msnbot-media/1.1 (url)
14 search.msn.com/msnbot.htmtext/..MSMOBOT/1.1 (url)
13 search.msn.com/msnbot.htmimage/..msnbot/1.1 (url)
6 search.msn.com/msnbot.htmapplication/oggmsnbot/1.1 (url)
1,074yanga
1,030 www.yanga.co.uk/text/..Yanga WorldSearch Bot v1.1/beta (url)
44 www.yanga.co.uk/image/..Yanga WorldSearch Bot v1.1/beta (url)
592searchme
277 www.searchme.com/support/text/..Mozilla/5.0 (compatible; Charlotte/1.0t; url)
144 www.searchme.com/support/image/..Mozilla/5.0 (compatible; Charlotte/1.0t; url)
105 www.searchme.com/support/text/..Mozilla/5.0 (compatible; Charlotte/1.1; url)
62 www.searchme.com/support/application/x-javascriptMozilla/5.0 (compatible; Charlotte/1.0t; url)
3 www.searchme.com/support/-Mozilla/5.0 (compatible; Charlotte/1.1; url)
509pipl
509 www.pipl.com/bot/text/..Mozilla/5.0(compatible;PiplBot;url)
436exabot
417 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
11 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot-Images/3.0; url)
7 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
304cuil
302 www.cuil.com/twiceler/robot.htmltext/..Mozilla/5.0 (Twiceler-0.9 url)
304naver
228 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
28 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
21 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
16 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
6 help.naver.com/robots/text/..Mozilla/5.0 (compatible; Yeti/1.0; NHN Corp.; url)
3 help.naver.com/robots/image/..Mozilla/5.0 (compatible; Yeti/1.0; NHN Corp.; url)
273majestic12
259 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.2.4; url)
11 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.2.3; url)
247ask
233 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
9 about.ask.com/en/docs/about/webmasters.shtmlimage/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
4 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url) (via Web-Blaster/2.21 (http://www.assoziations-blaster.de/web-blast.html))
219dotnetdotcom
219 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
214baidu
112 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
80 www.baidu.jp/spider/text/..Baiduspider(url)
8 www.baidu.jp/spider/-Baiduspider(url)
7 www.baidu.com/search/spider.htm-Baiduspider(url)
3 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0; url)
195wikimedia
194 tools.wikimedia.de/~daniel/text/..WikiSense (url)
168yacy
15 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.24-23-generic; java 1.6.0_07; Europe/en) url
15 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_05; Europe/de) url
15 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.24-23-generic; java 1.5.0_16; Europe/en) url
15 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.29-ARCH; java 1.6.0_0; Europe/de) url
9 yacy.net/bot.htmltext/..yacybot (i386 Mac OS X 10.5.6; java 1.5.0_16; Europe/de) url
8 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.27-11-generic; java 1.6.0_0; Europe/de) url
7 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.9-023stab048.4-smp; java 1.6.0; GMT/en) url
6 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.27-11-generic; java 1.5.0_16; GMT01:00/en) url
5 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.24-21-generic; java 1.6.0_07; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_11; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.27.19-3.2-pae; java 1.6.0_0; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.24-23-generic; java 1.6.0_07; GMT/en) url
4 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_13; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (amd64 FreeBSD 7.1-STABLE; java 1.6.0_07; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 FreeBSD 7.1-PRERELEASE; java 1.6.0_07; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (x86 Windows Vista 6.0; java 1.6.0_13; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.24-24-generic; java 1.6.0_07; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.27-11-server; java 1.6.0_0; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.27-14-generic; java 1.6.0_0; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.24-23-generic; java 1.6.0_07; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.29-ARCH; java 1.6.0_0; Europe/en) url
167soso
158 help.soso.com/webspider.htmtext/..Sosospider(url)
9 help.soso.com/soso-image-spider.htmimage/..Sosoimagespider(url)
166wikipedia
84 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.0 url
30 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.0.0 url
15 en.wikipedia.orgtext/..url
11 en.wikipedia.org/wiki/Wapediaapplication/vnd.php.serializedwapedia.mobi liveupdate (url)
8 zh.wikipedia.org/w/index.php?title=204&variant=zh-cntext/..url
5 zh.wikipedia.org/w/index.php?title=P/2006_W4&variant=zh-cntext/..url
3 ko.wikipedia.orgtext/..url
3 ms.wikipedia.orgtext/..url
3 zh.wikipedia.org/w/index.php?title=Special:qianzhuisuoyin/Jeremie_Miller&variant=zh-cntext/..url
162daum
160 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
148kosmix
78 www.kosmix.com/crawler.htmltext/..voyager/2.0 (url)
44 www.kosmix.com/crawler.htmlapplication/xmlvoyager/1.0 url
26 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
137youdao
95 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
11 www.youdao.com/help/webmaster/spider/image/..Mozilla/5.0 (compatible;YodaoBot-Image/1.0;url;)
10 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
8 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
6 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible;YodaoBot-Image/1.0;url;)
131php
53 pear.php.net/text/..PEAR HTTP_Request class ( url )
32 pear.php.net/package/http_request2text/..HTTP_Request2/0.3.0 (url) PHP/5.2.8
30 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
13 pear.php.net/application/jsonPEAR HTTP_Request class ( url )
131sogou
124 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
3 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url) (via Web-Blaster/2.21 (http://www.a-blast.org/web-blast.html))
129ellerdale
129 www.ellerdale.com/crawler.htmltext/..Mozilla/5.0 (compatible; EllerdaleBot/ 1.0; url)
100sblog
47 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
28 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
21 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
3 fulltext.sblog.cz/screenshot/application/x-javascriptMozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
98googlepages
98 peterpuwang.googlepages.comtext/..Peter Wang/Nutch-1.0-dev (Nutch spiderman; url ; MyEmail)
95cydral
64 www.cydral.comtext/..CydralSpider/3.0 (Cydral Image Search; url)
30 www.cydral.comimage/..CydralSpider/3.0 (Cydral Image Search; url)
90dealgates
70 spider.dealgates.com/bot.htmltext/..DealGates Bot/1.1 by Luc Michalski (url)
20 www.dealgates.net/bot.htmltext/..DealGates Bot/1.2 by Luc Michalski (url)
86scoutjet
86 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
68estsoft
68 www.estsoft.com/text/..Mozilla/5.0 (compatible; Estbot/1.0; url)
67freebase
67 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
54tineye
42 tineye.com/crawler.htmltext/..Mozilla/5.0 (compatible; heritrix/1.12.1 url)
11 tineye.com/crawler.htmlimage/..Mozilla/5.0 (compatible; heritrix/1.12.1 url)
54loc
50 www.loc.gov/minerva/crawl.htmltext/..Mozilla/5.0 (compatible; archive.org_bot/1.5.0 url)
54flatlandindustries
46 www.flatlandindustries.com/flatlandbottext/..flatlandbot/battle-angel (Flatland Industries Web Spider; url; mail address )
6 www.flatlandindustries.com/flatlandbottext/..flatlandbot/baypup (Flatland Industries Web Spider; url; mail address )
47archive-it
32 www.archive-it.orgimage/..Mozilla/5.0 (compatible;archive.org_bot/1.7.0; Archive-It; url)
15 www.archive-it.orgtext/..Mozilla/5.0 (compatible;archive.org_bot/1.7.0; Archive-It; url)
41facebook
27 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
8 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
5 developers.facebook.comtext/..facebookplatform/1.0 (url)
40matuschek
40 www.matuschek.net/jobo.htmltext/..JoBo/1.4 (url)
39moose
39 www.moose.at/text/..Mozilla/5.0 (compatible; MooseBot/1.1; Linux i686; de; url)
32globalspec
17 www.globalspec.com/Ocellitext/..Ocelli/1.3 (url)
15 www.globalspec.com/Ocellitext/..Ocelli/1.4 (url)
30qdos
30 qdos.com/text/..qdos/1.1 (url)
28entireweb
28 www.entireweb.com/about/search_tech/speedy_spider/text/..Speedy Spider (url)
27emusic
18 www.emusic.com/application/jsonMozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1.8pre) Gecko/20071017 eMusic DLM/4.0/1.0.0.2 (url) (like Firefox/2.0.0.8)
6 www.emusic.com/application/jsonMozilla/5.0 (Windows; Windows NT 6.0; en-US; rv:1.8.1.8pre) Gecko/20071017 eMusic DLM/4.0/1.0.0.2 (url) (like Firefox/2.0.0.8)
27alexa
27 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
26
22 image/..GoogleBot/1.0 (mail address color=red>GoogleBot.com urlGoogleBot.com/)
4 text/..GoogleBot/1.0 (mail address color=red>GoogleBot.com urlGoogleBot.com/)
26rcdtokyo
18 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
8 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
25paxle
15 www.paxle.net/en/bottext/..Mozilla/5.0 (compatible; PaxleFramework/0.1.21.SNAPSHOT; url)
10 www.paxle.net/en/bottext/..Mozilla/5.0 (compatible; PaxleFramework/0.1.1; url)
24Anonymouse
13 Anonymouse.org/image/..url (Unix)
9 Anonymouse.org/text/..url (Unix)
24froute
18 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
6 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
23jumptap
21 www.jumptap.com/jumpbottext/..Nokia6820/2.0 (5.88) Profile/MIDP-1.0 Configuration/CLDC-1.0/1.0 (Jumpbot; url; mail address )
22mixi
11 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
11 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
21snap
21 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
21edu
17 iws.seu.edu.cn/services/falcons/contactus.jsptext/..Mozilla/5.0 (compatible; Falconsbot; url)
4 iws.seu.edu.cn/services/falcons/contactus.jspimage/..Mozilla/5.0 (compatible; Falconsbot; url)
21kalooga
11 www.kalooga.com/info.html?page=crawlertext/..kalooga/KaloogaBot (Kalooga; url)
10 www.kalooga.com/info.html?page=crawlerimage/..kalooga/KaloogaBot (Kalooga; url)
19spinn3r
19 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.0); url) Gecko/20021130
18virtual-presence
16 lms.virtual-presence.orgtext/..Firebat 2.9.1 (url)
18dium
11 me.dium.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 Me.dium/1.0 (url)
7 me.dium.comimage/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 Me.dium/1.0 (url)
17hatena
12 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
5 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
17goroam
17 goroam.net/text/..goroam/1.0-SNAPSHOT (goraom geo crawler; url; mail address )
16similarpages
16 www.similarpages.comtext/..SimilarPages/Nutch-1.0-dev (SimilarPages Nutch Crawler; url; mail address )
16shisoft
16 ench.shisoft.net/text/..Shisoft KowSeeker Beta 3: url
14Zscho
14 Zscho.de/text/..Zscho.de/Nutch-0.9-semantic_patch (url Suchmaschine, mit semantischen Erweiterungen, dient dem Aufbau von Wissensbasen zur Fragebeantwortung; mail address )
14feedparser
14 feedparser.org/application/xmlUniversalFeedParser/4.1 url
13newsgator
5 www.newsgator.comtext/..NewsGator/2.0 Bot (url)
4 www.newsgator.comtext/..NewsGatorOnline/2.0 (url) bot
12emining
12 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
12phonifier
12 www.phonifier.comtext/..Mozilla/5.0 (compatible; Phonifier; url)
11discoveryengine
11 discoveryengine.com/discobot.htmltext/..Mozilla/5.0 (compatible; discobot/1.0; url)
11factolex
7 www.factolex.com/image/..Factolex (url)
4 www.factolex.com/application/vnd.php.serializedFactolex (url)
11weblio
10 www.weblio.jp/text/..Mozilla/5.0 (compatible; WeblioBot; url)
10abonti
10 www.abonti.comtext/..Mozilla/5.0 (compatible; Abonti/0.8 - url)
10mashget
10 www.mashget.comtext/..Mashgetbot/2.1 (url)
10z-add
10 w3.z-add.co.uk/linkcheck/text/..Z-Add Link Checker (url)
10opengroove
10 search.opengroove.com/robots.htmltext/..Mozilla/5.0 (compatible; zds-crawler/0.1; url)
38,203total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
2,050PythonWikipediaBot/1.0
1,559 text/..
410 application/xml
81 application/json
1 -
1 image/..
787Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
492 text/..
182 image/..
113 application/x-javascript
1 -
708GoogleBot-Image/1.0
303 text/..
257 image/..
148 -
419Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
419 text/..
1 -
1 application/ogg
227Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
111 image/..
94 text/..
22 application/x-javascript
1 -
1 application/json
1 application/ogg
210Wikirage.Com Statistics Bot
210 text/..
198php wikibot classes
198 application/vnd.php.serialized
1 -
177AISearchBot (Email: mail address ; If your web site doesn't want to be crawled, please send us a email.)
177 text/..
1 -
159wikiwix-bot-3.0
158 text/..
1 image/..
1 -
150Answersbot
150 text/..
131GinioSpider
131 text/..
121UniFind Site Spider; email mail address
121 text/..
1 -
88Tawbot (public svn release; plwiki)
88 text/..
66COIBot/1.00
66 text/..
55MSR-ISRCCrawler
43 text/..
11 application/x-javascript
1 image/..
52PRCrawler/Nutch-0.9 (data mining development project; mail address )
52 text/..
1 application/vnd.wap.xhtml+xml
51Pywikipediabot/2.0
51 application/json
1 text/..
50atigeobot
50 text/..
45GoogleBot-Image/1.0
44 text/..
1 image/..
1 -
43SineBot/1.5.13(User:SineBot)
42 application/vnd.php.serialized
1 text/..
41Test Webbot
41 text/..
35Spider/5.0
35 text/..
1 -
1 image/..
1 application/ogg
34GoogleBot
34 text/..
1 application/x-javascript
1 image/..
33Jyxobot/1
33 text/..
28AnomieBOT 1.0 (OrphanReferenceFixer)
28 application/json
27WebCrawler
27 text/..
27rdfbot/1.0 (Indian Language Web Search Engine; Rediff.com; rdfbot mail address )
27 text/..
25crawler mail address
25 text/..
23Bot/WP/EN/E/EBot
23 text/..
22msnbot/2.0b
21 text/..
1 image/..
1 -
1 application/ogg
21CorenSearchBot/1.0 libwww-perl/5.808
21 text/..
21Plagiat Web Spider WSIiZ wsiz.rzeszow.pl
20 text/..
1 image/..
1 application/pdf
1 application/ogg
18dictionary-bot
14 application/xml
4 text/..
18plantspedia data crawler
18 text/..
16Szukacz/1.5 (robot; www.szukacz.pl/jakdzialarobot.html; mail address )
16 text/..
1 -
1 image/..
15IlseBot/1.1
14 text/..
1 -
15FAST Enterprise Crawler 6 used by a (a)
15 text/..
1 -
15nutch.us/Nutch-1.0 (www.nutch.us; mail address )
15 text/..
1 application/ogg
12dicbot 1.0
12 text/..
11SurakWare MediaWiki Bot/1.0
11 text/..
1 application/xml
10FAST Enterprise Crawler 6 used by Logica ( mail address )
10 text/..
1 -
1 application/opensearchdescription+xml
1 application/xml
10Bot/WP/EN/Daniel/MediationBot1/1.2
10 text/..
10Mozilla/4.0 (compatible; focuseekbot)
10 text/..
1 image/..
9FAST Enterprise Crawler 6 used by Microsoft ( mail address )
9 text/..
1 -
9FAST Enterprise Crawler 6 used by Lenovo ( mail address )
9 text/..
1 -
9Mozilla/5.0 (Apibot 0.01)
9 application/vnd.php.serialized
9Mozilla/5.0 (Bgbot 0.5)
9 text/..
1 application/xml
8lmspider/Nutch-0.9-dev (For research purposes.; www.nuance.com; mail address )
8 text/..
1 image/..
1 application/ogg
8Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
8 text/..
8YaDirectBot/1.0
8 text/..
7MLBot (www.metadatalabs.com/mlbot)
7 text/..
7XLinkBot/1.00
7 text/..
7Pybot 1.0 mail address
6 text/..
1 application/xml
7DotNetWikiBot/2.61 (Microsoft Windows NT 5.2.3790 Service Pack 2; )
7 text/..
1 application/xml
7AnomieBOT 1.0 (WikiProjectTagger)
7 application/json
6websitethumbnail.de snapshot spider
6 text/..
6FAST Enterprise Crawler 6 used by fast ( mail address )
6 text/..
1 -
5Jabse.com Crawler v.2.0 www.jabse.com/crawler.php
5 text/..
5gsa-crawler (Enterprise; S5-JJMWS6DTD8JJT; mail address )
5 text/..
5topyx-crawler
5 text/..
1 -
5FAST Enterprise Crawler 6 used by Merck ( mail address )
5 text/..
4Blogspider
4 text/..
1 application/opensearchdescription+xml
4Agent.Kotbot
4 text/..
4Mozilla/5.0 (Windows; Windows NT 5.1; en-GB; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot
4 text/..
1 image/..
4QuickFinder Crawler
4 text/..
4RacaiCrawler/RacaiCrawler-0.2
4 text/..
4Draicone's bot
4 text/..
4DotNetWikiBot/2.3 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
4 text/..
1 application/xml
4Freebase Deathbot
4 text/..
4FAST Enterprise Crawler 6 used by Admo ( mail address )
4 text/..
1 -
4spider
4 text/..
1 application/xml
4gigabot
2 image/..
2 text/..
4Bot/WP/EN/Quadell/polbot
4 text/..
4Xaldon WebSpider 2.7.b6
4 text/..
1 image/..
34am-spider/1.0
3 text/..
3AnomieBOT 1.0 (WikiProjectWorker)
3 application/json
3israbot
3 text/..
3rdfbot/1.0 (rdfbot mail address )
3 text/..
1 -
3GNAA-bot
3 text/..
3Szukacz/1.5 (robot; www.szukacz.pl/html/jak_dziala_robot.html; mail address )
3 text/..
3AnomieBOT 1.0 (SourceUploader)
3 application/json
3Mozilla/5.0 (Yahoo-MMCrawler/4.0; mail address )
3 image/..
1 text/..
3WikiNEWSticsBOT (by user Melancholie)
3 text/..
3unblockbot/1.00
3 text/..
6,485total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Friday August 21, 2009
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.