Wikimedia Visitor Log Analysis Report - Crawler requests

Daily averages, based on sample period: 1 May 2009 - 31 May 2009

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Google
 
The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 34,834,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 256,087,000 external requests, which is 13.6%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
14,626google
12,126 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
1,970 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
157 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
115 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
62 code.google.com/appenginetext/..AppEngine-Google; (url)
62 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
44 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
42 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
14 code.google.com/appengineapplication/xmlAppEngine-Google; (url)
9 www.google.com/feedfetcher.htmlapplication/jsonFeedFetcher-Google; (url)
8 www.google.com/feedfetcher.htmltext/..Google OpenSocial agent (url)
4 www.google.com/feedfetcher.htmlapplication/xmlGoogle OpenSocial agent (url)
3 www.google.com/feedfetcher.htmlapplication/jsonGoogle OpenSocial agent (url)
3 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
3 code.google.com/appengineimage/..AppEngine-Google; (url)
8,336yahoo
4,942 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
2,446 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
415 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
125 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
117 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
108 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
88 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
35 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
16 misc.yahoo.com.cn/help.html-Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
12 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
11 help.yahoo.com/help/us/ysearch/slurpapplication/x-javascriptMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
10 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
7 help.yahoo.com/help/us/ysearch/crawling/crawling-01.htmltext/..Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; url)
2,274msn
1,526 search.msn.com/msnbot.htmtext/..msnbot/1.1 (url)
237 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
158 search.msn.com/msnbot.htmtext/..msnbot-media/1.1 (url)
114 search.msn.com/msnbot.htmtext/..renlifangbot/1.0 (url)
72 search.msn.com/msnbot.htm-msnbot/1.1 (url)
51 search.msn.com/msnbot.htm-msnbot/2.0b (url)
48 search.msn.com/msnbot.htmimage/..msnbot-media/1.1 (url)
23 search.msn.com/msnbot.htmimage/..msnbot/1.1 (url)
18 search.msn.com/msnbot.htmtext/..librabot/1.0 (url)
10 search.msn.com/msnbot.htmtext/..MSMOBOT/1.1 (url)
8 search.msn.com/msnbot.htmapplication/oggmsnbot/1.1 (url)
1,818google?
1,505 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
98 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
83 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
66 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
44 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
10 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
5 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
3 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
1,056yanga
1,041 www.yanga.co.uk/text/..Yanga WorldSearch Bot v1.1/beta (url)
15 www.yanga.co.uk/image/..Yanga WorldSearch Bot v1.1/beta (url)
558pipl
558 www.pipl.com/bot/text/..Mozilla/5.0(compatible;PiplBot;url)
542exabot
468 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
59 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot-Images/3.0; url)
8 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
7 www.exabot.com/go/robotimage/..Mozilla/5.0 (compatible; Exabot-Images/3.0; url)
482naver
412 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
23 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
22 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
13 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
6 help.naver.com/robots/text/..Mozilla/5.0 (compatible; Yeti/1.0; NHN Corp.; url)
4 help.naver.com/robots/image/..Mozilla/5.0 (compatible; Yeti/1.0; NHN Corp.; url)
387searchme
163 www.searchme.com/support/text/..Mozilla/5.0 (compatible; Charlotte/1.0t; url)
106 www.searchme.com/support/text/..Mozilla/5.0 (compatible; Charlotte/1.1; url)
82 www.searchme.com/support/image/..Mozilla/5.0 (compatible; Charlotte/1.0t; url)
34 www.searchme.com/support/application/x-javascriptMozilla/5.0 (compatible; Charlotte/1.0t; url)
339cuil
339 www.cuil.com/twiceler/robot.htmltext/..Mozilla/5.0 (Twiceler-0.9 url)
335soso
325 help.soso.com/webspider.htmtext/..Sosospider(url)
9 help.soso.com/soso-image-spider.htmimage/..Sosoimagespider(url)
326loc
321 www.loc.gov/minerva/crawl.htmltext/..Mozilla/5.0 (compatible; archive.org_bot/1.5.0 url)
5 www.loc.gov/minerva/crawl.htmlimage/..Mozilla/5.0 (compatible; archive.org_bot/1.5.0 url)
300baidu
169 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
81 www.baidu.jp/spider/text/..Baiduspider(url)
32 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0; url)
9 www.baidu.jp/spider/-Baiduspider(url)
8 www.baidu.com/search/spider.htm-Baiduspider(url)
271majestic12
262 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.2.4; url)
6 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.2.3; url)
240yacy
19 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.27-11-server; java 1.6.0_0; Europe/en) url
18 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-amd64; java 1.6.0_12; Europe/de) url
18 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_05; Europe/de) url
16 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.24-24-generic; java 1.6.0_07; Europe/en) url
13 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_13; Europe/de) url
13 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-11-generic; java 1.6.0_13; Europe/de) url
10 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-11-generic; java 1.6.0_0; Europe/de) url
10 yacy.net/bot.htmltext/..yacybot (x86 SunOS 5.11; java 1.6.0_13; Europe/en) url
9 yacy.net/bot.htmltext/..yacybot (amd64 Windows Vista 6.1; java 1.6.0_13; Europe/de) url
9 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.24-23-generic; java 1.6.0_07; Europe/de) url
8 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.24-23-generic; java 1.6.0_07; Europe/en) url
8 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.24-16-server; java 1.6.0_07; Zulu/en) url
5 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-11-server; java 1.6.0_13; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.9-023stab048.6-enterprise; java 1.6.0_07; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.24-24-generic; java 1.6.0_07; GMT/en) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-gentoo-r5; java 1.5.0_18; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-amd64; java 1.6.0_12; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 FreeBSD 7.1-STABLE; java 1.6.0_07; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 FreeBSD 7.1-PRERELEASE; java 1.6.0_07; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.29-ARCH; java 1.6.0_0; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-11-generic; java 1.6.0_13; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-12-generic; java 1.6.0_13; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-11-generic; java 1.6.0_0; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.22-14-server; java 1.6.0_06; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.24-etchnhalf.1-686; java 1.5.0_14; GMT01:00/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.27-11-generic; java 1.6.0_0; Europe/de) url
239ask
227 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
6 about.ask.com/en/docs/about/webmasters.shtmlimage/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
6 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url) (via Web-Blaster/2.21 (http://www.assoziations-blaster.de/web-blast.html))
230wikimedia
228 tools.wikimedia.de/~daniel/text/..WikiSense (url)
175paxle
141 www.paxle.net/en/bottext/..Mozilla/5.0 (compatible; PaxleFramework/0.1.1; url)
28 www.paxle.net/en/bottext/..Mozilla/5.0 (compatible; PaxleFramework/0.1.0; url)
6 www.paxle.net/en/bottext/..Mozilla/5.0 (compatible; PaxleFramework/0.1.21.SNAPSHOT; url)
165sblog
108 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
26 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
23 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
4 fulltext.sblog.cz/screenshot/application/x-javascriptMozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
3 fulltext.sblog.cz/robot/-SeznamBot/2.0 (url)
161youdao
77 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
38 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
19 www.youdao.com/help/webmaster/spider/image/..Mozilla/5.0 (compatible;YodaoBot-Image/1.0;url;)
7 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible;YodaoBot-Image/1.0;url;)
5 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
3 www.youdao.com/help/webmaster/spider/audio/midiMozilla/5.0 (compatible; YodaoBot/1.0; url; )
3 www.youdao.com/help/webmaster/spider/text/..WAP_Browser/5.0 (compatible; YodaoBot/1.0; url; )
3 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible;YodaoBot-Image/1.0;url;)
159wikipedia
75 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.0 url
32 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.0.0 url
16 en.wikipedia.orgtext/..url
10 en.wikipedia.org/wiki/Wapediaapplication/vnd.php.serializedwapedia.mobi liveupdate (url)
9 zh.wikipedia.org/w/index.php?title=阿古姆二世&variant=zh-cntext/..url
8 zh.wikipedia.org/w/index.php?title=jiegouzhuyijianzhu&variant=zh-cntext/..url
3 ko.wikipedia.orgtext/..url
156php
84 pear.php.net/text/..PEAR HTTP_Request class ( url )
31 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
18 pear.php.net/package/http_request2text/..HTTP_Request2/0.4.0 (url) PHP/5.2.8
10 pear.php.net/application/jsonPEAR HTTP_Request class ( url )
8 pear.php.net/package/http_request2text/..HTTP_Request2/0.3.0 (url) PHP/5.2.8
4 pear.php.net/application/xmlPEAR HTTP_Request class ( url )
155
97 image/..GoogleBot/1.0 (mail address color=red>GoogleBot.com urlGoogleBot.com/)
58 text/..GoogleBot/1.0 (mail address color=red>GoogleBot.com urlGoogleBot.com/)
151meta-spinner
151 www.meta-spinner.de/text/..Metaspinner/0.01 (Metaspinner; url; mail address )
135daum
133 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
125dotnetdotcom
125 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
12380legs
121 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 80bot/0.71; url;) Gecko/2008032620
121setooz
121 www.setooz.com/oozbot.htmltext/..OOZBOT/0.20 ( -- ; url ; mail address )
94scoutjet
94 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
88freebase
88 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
88yodao
60 www.yodao.com/help/webmaster/spider/text/..MozillaTest/5.0 (compatible; YodaoBot/1.0; url; )
28 www.yodao.com/help/webmaster/spider/image/..MozillaTest/5.0 (compatible; YodaoBot/1.0; url; )
82sogou
72 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
4 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url) (via Web-Blaster/2.21 (http://www.a-blast.org/web-blast.html))
4 www.sogou.com/docs/help/webmasters.htm#07image/..Sogou Pic Spider/3.0(url)
69kosmix
48 www.kosmix.com/crawler.htmlapplication/xmlvoyager/1.0 url
16 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
5 www.kosmix.com/crawler.htmltext/..voyager/2.0 (url)
65estsoft
65 www.estsoft.com/text/..Mozilla/5.0 (compatible; Estbot/1.0; url)
55facebook
35 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
10 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
5 developers.facebook.comtext/..facebookplatform/1.0 (url)
3 www.facebook.com/externalhit_uatext.php-facebookexternalhit/1.0 (url)
54ellerdale
54 www.ellerdale.com/crawler.htmltext/..Mozilla/5.0 (compatible; EllerdaleBot/ 1.0; url)
52topsy
52 labs.topsy.com/butterfly.htmltext/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
51garlik
46 www.garlik.com/crawlertext/..dpdev/Nutch-1.0 (datapatrol from garlik.com; url; mail address )
5 www.garlik.com/crawlerimage/..dpdev/Nutch-1.0 (datapatrol from garlik.com; url; mail address )
45gigablast
45 www.gigablast.com/spider.htmltext/..Gigabot/3.0 (url)
43dealgates
39 spider.dealgates.com/bot.htmltext/..DealGates Bot/1.1 by Luc Michalski (url)
4 www.dealgates.net/bot.htmltext/..DealGates Bot/1.2 by Luc Michalski (url)
39flatlandindustries
37 www.flatlandindustries.com/flatlandbottext/..flatlandbot/wikibot (Flatland Industries Web Spider; url; mail address )
34mnemoo
34 www.mnemoo.com/about/spidertext/..Mnemoo Intelligent Spider/0.1alpha (compatible; See url)
31guruji
18 www.guruji.com/en/WebmasterFAQ.htmltext/..Mozilla/5.0 (compatible; GurujiBot/1.0; url)
9 www.guruji.com/en/WebmasterFAQ.htmlimage/..GurujiImageBot/1.0 (url)
4 www.guruji.com/en/WebmasterFAQ.htmltext/..GurujiImageBot/1.0 (url)
29qdos
29 qdos.com/text/..qdos/1.1 (url)
29entireweb
27 www.entireweb.com/about/search_tech/speedy_spider/text/..Speedy Spider (url)
29goo
28 help.goo.ne.jp/contact/text/..goo wikipedia (url)
28emusic
18 www.emusic.com/application/jsonMozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1.8pre) Gecko/20071017 eMusic DLM/4.0/1.0.0.2 (url) (like Firefox/2.0.0.8)
6 www.emusic.com/application/jsonMozilla/5.0 (Windows; Windows NT 6.0; en-US; rv:1.8.1.8pre) Gecko/20071017 eMusic DLM/4.0/1.0.0.2 (url) (like Firefox/2.0.0.8)
3 www.emusic.com/application/jsonMozilla/5.0 (Macintosh; Intel Mac OS X; en-US; rv:1.8.1.8pre) Gecko/20071009 eMusic DLM/4.0/1.0.0.2 (url) (like Firefox/2.0.0.8)
28froute
22 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
6 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
28rcdtokyo
20 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
8 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
26Anonymouse
14 Anonymouse.org/image/..url (Unix)
10 Anonymouse.org/text/..url (Unix)
25princexml
25 www.princexml.comimage/..Prince/6.0 (url)
24spinn3r
23 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.0); url) Gecko/20021130
23newsgator
8 www.newsgator.comtext/..NewsGator/2.0 Bot (url)
8 www.newsgator.comtext/..NewsGatorOnline/2.0 (url) bot
22weblio
21 www.weblio.jp/text/..Mozilla/5.0 (compatible; WeblioBot; url)
21jumptap
20 www.jumptap.com/jumpbottext/..Nokia6820/2.0 (5.88) Profile/MIDP-1.0 Configuration/CLDC-1.0/1.0 (Jumpbot; url; mail address )
21alexa
21 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
20mixi
11 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
9 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
19hatena
11 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
8 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
19snap
19 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
18FeedBurner
18 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
18bimeon
11 bimeon.com/spider.phptext/..url vcrawler_BootV10.1.3_Link
7 bimeon.com/spider.phptext/..url vcrawler_BootV10.1.3_Link (http://www.openwebspider.org/)
16wordpress
11 support.wordpress.com/contact/text/..WordPress.com mShots; url
16apkc
11 devel.apkc.nettext/..MBS/0.1 (url)
5 devel.apkc.netimage/..MBS/0.1 (url)
15shisoft
13 ench.shisoft.net/text/..Shisoft KowSeeker Beta 3: url
15linkaider
15 linkaider.com/crawler/text/..LinkAider (url)
14dium
11 me.dium.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 Me.dium/1.0 (url)
3 me.dium.comimage/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 Me.dium/1.0 (url)
14feedparser
14 feedparser.org/application/xmlUniversalFeedParser/4.1 url
13virtual-presence
12 lms.virtual-presence.orgtext/..Firebat 2.9.1 (url)
13greenivory
13 greenivory.frtext/..GreenIvory/Nutch-0.9 (GreenIvory-BlueCrane; url; mail address )
12go
10 www2.kokken.go.jp/masaya/public/wiki/text/..Mozilla/5.0 (compatible; heritrix/1.14.3 url)
12acont
12 hilfe.acont.de/bot.htmltext/..url ACONTBOT
11mashget
11 www.mashget.comtext/..Mashgetbot/2.1 (url)
11picsearch
6 www.picsearch.com/bot.htmlimage/..psbot/0.1 (url)
5 www.picsearch.com/bot.htmltext/..psbot/0.1 (url)
11ac
6 www.onb.ac.at/about/webarchivierung.htmimage/..webcrawler (compatible; heritrix/1.12.1 url)
5 www.doc.ic.ac.uk/~bwm05/blacklist.phptext/..If you would like to blacklist your hostname, please visit url otherwise contact mail address for any other queries.
10centrum
8 morfeo.centrum.cz/bottext/..holmes/3.12.4 (url)
10aafter
8 aafter.com/crawler.htmtext/..AAfter.com Crawler/Nutch-1.0 (This bot is very focused, well-behaved, and wants to do good to internet community. For any questions, please call collect 1 214-714-2224. Team AAfter, Dallas, TX, USA; url; crawler at aafter.com)
35,504total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
3,298PythonWikipediaBot/1.0
2,581 text/..
636 application/xml
81 application/json
1 -
1 image/..
1,015GoogleBot-Image/1.0
443 text/..
360 image/..
212 -
953Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
607 text/..
213 image/..
133 application/x-javascript
905Answersbot
905 text/..
291Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
291 text/..
1 -
1 application/ogg
207php wikibot classes
207 application/vnd.php.serialized
1 -
1 text/..
178Wikirage.Com Statistics Bot
178 text/..
167wikiwix-bot-3.0
166 text/..
1 image/..
1 -
160AISearchBot (Email: mail address ; If your web site doesn't want to be crawled, please send us a email.)
160 text/..
1 -
1 application/xml
140Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
71 image/..
56 text/..
13 application/x-javascript
92Tawbot (public svn release; plwiki)
92 text/..
81GinioSpider
81 text/..
66UniFind Site Spider; email mail address
66 text/..
1 -
66COIBot/1.00
66 text/..
1 -
52GoogleBot-Image/1.0
52 text/..
1 -
1 image/..
45crawler mail address
45 text/..
42GoogleBot
42 text/..
1 image/..
41Test Webbot
41 text/..
41SineBot/1.5.13(User:SineBot)
40 application/vnd.php.serialized
1 text/..
39AnomieBOT 1.0 (OrphanReferenceFixer)
39 application/json
37dictionary-bot
33 application/xml
4 text/..
30MSR-ISRCCrawler
22 text/..
8 application/x-javascript
1 image/..
29WebCrawler
29 text/..
27ListasBot 3
27 text/..
25gigabot
22 image/..
2 text/..
1 -
24Pywikipediabot/2.0
24 application/json
1 text/..
23CorenSearchBot/1.0 libwww-perl/5.808
23 text/..
22spider
21 text/..
1 application/vnd.php.serialized
1 -
1 application/xml
22Bot/WP/EN/E/EBot
22 text/..
22Jyxobot/1
22 text/..
20Plagiat Web Spider WSIiZ wsiz.rzeszow.pl
18 text/..
1 image/..
1 application/ogg
1 application/pdf
19plantspedia data crawler
19 text/..
14Denodo IECrawler/4.5
12 text/..
2 application/x-javascript
13FAST Enterprise Crawler 6 used by Lenovo ( mail address )
13 text/..
1 -
13DotNetWikiBot/2.64 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
11 text/..
2 application/xml
12PRCrawler/Nutch-0.9 (data mining development project; mail address )
12 text/..
12Legobot
12 application/json
12beast/Nutch-0.9 (agentspider; mail address )
12 text/..
1 image/..
12Freebase Deathbot
12 text/..
11Mozilla/5.0 (Apibot 0.01)
11 application/vnd.php.serialized
10SurakWare MediaWiki Bot/1.0
10 text/..
9IlseBot/1.1
8 text/..
1 -
9DotNetWikiBot/2.61 (Microsoft Windows NT 5.2.3790 Service Pack 2; )
9 text/..
1 application/xml
9Bot/WP/EN/Daniel/MediationBot1/1.2
9 text/..
9GingerCrawler/1.0 (Language Assistant for Dyslexics; www.gingersoftware.com/crawler_agent.htm; support at ginger software dot com)
9 text/..
1 -
1 application/x-javascript
1 image/..
9TrudoBot 0.1
9 text/..
8Pybot 1.0 mail address
6 text/..
2 application/xml
8DotNetWikiBot/2.3 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
8 text/..
1 application/xml
8web18bot
8 text/..
8Mozilla/5.0 (Bgbot 0.5)
8 text/..
7XLinkBot/1.00
7 text/..
7rdfbot/1.0 (Indian Language Web Search Engine; Rediff.com; rdfbot mail address )
7 text/..
1 image/..
7rdfbot/1.0 (rdfbot mail address )
7 text/..
1 -
1 application/ogg
7ACC Crawler Alpha - mail address
7 text/..
7DefaultsortBot
7 text/..
6This is part of a research project (www.doc.ic.ac.uk/~bwm05), please contact mail address if this is causing problems
6 text/..
6FAST Enterprise Crawler 6 used by Wanadoo ( mail address )
4 application/x-javascript
2 text/..
6Banana Crawler/0.0.2, contact mail address if this is causing problems
6 text/..
6Mozilla/5.0 (Yahoo-MMCrawler/4.0; mail address )
6 image/..
1 text/..
5MLBot (www.metadatalabs.com/mlbot)
5 text/..
5DotNetWikiBot/2.53 (Unix 2.6.26.2; )
5 text/..
5Draicone's bot
5 text/..
5nutch.us/Nutch-1.0 (www.nutch.us; mail address )
5 text/..
1 application/ogg
5topyx-crawler
5 text/..
1 -
5websitethumbnail.de snapshot spider
5 text/..
4kindsight/Nutch-1.0 (kscrawler; www.projectrialto.com; mail address )
4 text/..
4Mozilla/4.74 [en] (Windows NT 5.0; maxamine.com--robot)
4 text/..
4FAST Enterprise Crawler 6 used by Microsoft ( mail address )
4 text/..
1 -
4NUTCHCRAWLER/Nutch-0.9
4 text/..
4Szukacz/1.5 (robot; www.szukacz.pl/jakdzialarobot.html; mail address )
4 text/..
1 -
1 image/..
4Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
4 text/..
4YaDirectBot/1.0
4 text/..
4AnomieBOT 1.0 (SourceUploader)
4 application/json
4KyfossBot
4 text/..
4unblockbot/1.00
4 text/..
3Jabse.com Crawler v.2.0 www.jabse.com/crawler.php
3 text/..
1 application/xml
3Mozilla/5.0 (Windows; Windows NT 5.1; en-GB; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot
3 text/..
1 image/..
3uberbot 1.0
3 text/..
1 image/..
3Spider/5.0
3 text/..
1 -
1 image/..
1 application/ogg
3FAST Enterprise Crawler 6 used by a (a)
3 text/..
3SengSpider/0.1
3 text/..
3TKBot 1.0 ( mail address )
3 application/xml
3Mozilla/4.0 (compatible; focuseekbot)
3 text/..
1 image/..
3WikiBot
3 text/..
3IssueCrawler
3 text/..
8,526total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Friday August 21, 2009
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.