Wikimedia Visitor Log Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Jun 2010 - 29 Jun 2010

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Op.Sys. / Browsers / Google

The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 36,352,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 280,019,000 external requests, which is 13.0%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
12,833yahoo
12,202 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
195 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
111 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
72 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
50 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
48 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmlimage/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
33 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
31 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
24 developer.yahoo.com/searchmonkey/useragentimage/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
18 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
17 help.yahoo.com/help/us/ysearch/slurpapplication/xmlMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
12 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
12 help.yahoo.com/help/us/ysearch/crawling/crawling-01.htmltext/..Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; url)
9,522google
7,562 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
469 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
380 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
199 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
198 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
96 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
76 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
62 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
49 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
45 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
39 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
29 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
27 code.google.com/appenginetext/..AppEngine-Google; (url; appid: finchproxy)
26 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
25 code.google.com/appenginetext/..AppEngine-Google; (url; appid: npiv82)
25 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid: wikipedia-raw)
23 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
23 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
19 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
18 www.google.com/bot.htmlimage/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
17 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
17 www.google.com/feedfetcher.htmlimage/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
14 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
10 code.google.com/appenginetext/..AppEngine-Google; (url; appid: boxapp)
9 www.google.com/bot.htmlimage/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
8 code.google.com/p/crawler4j/text/..crawler4j (url)
6 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
6 code.google.com/appengineapplication/jsonPython-urllib/2.5 AppEngine-Google; (url; appid: loeschmonitor)
5 docs.google.comimage/..Mozilla/5.0 (compatible; GoogleDocs; url)
5 code.google.com/appengineapplication/jsonMWBOT GAE Edition AppEngine-Google; (url; appid: philip-bot)
4 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
3 www.google.orgtext/..Naveen/Nutch-1.0 (Naveen; url; mail address )
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: linksalpha)
3 code.google.com/appenginetext/..oohEmbed.com AppEngine-Google; (url; appid: oohembed)
5,986msn
3,045 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
2,728 search.msn.com/msnbot.htm-msnbot/2.0b (url)
56 search.msn.com/msnbot.htmtext/..msnbot-media/1.1 (url)
41 search.msn.com/msnbot.htmimage/..msnbot-media/1.1 (url)
33 search.msn.com/msnbot.htmtext/..msnbot/1.1 (url)
29 search.msn.com/msnbot.htmapplication/jsonmsnbot/1.0 (url)
20 search.msn.com/msnbot.htmtext/..msnbot-Products/1.0 (url)
15 search.msn.com/msnbot.htmimage/..msnbot/2.0b (url)
6 search.msn.com/msnbot.htmapplication/oggmsnbot/2.0b (url)
5 search.msn.com/msnbot.htmtext/..msnbot-UDiscovery/2.0b (url)
2,764facebook
2,167 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
564 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
12 facebook.com/sharer.phptext/..facebook share (url)
9 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.1 (url)
8 developers.facebook.comimage/..facebookplatform/1.0 (url)
4 developers.facebook.comtext/..facebookplatform/1.0 (url)
1,710google?
1,388 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
92 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
57 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
34 www.google.com/bot.htmlapplication/vnd.php.serializedMozilla/5.0 (compatible; GoogleBot/2.1; url)
33 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
31 www.google.com/bot.htmlapplication/xmlDoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
27 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
11 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
9 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
6 www.google.com/bot.htmltext/..User-Agent :Mozilla/5.0 (compatible; GoogleBot/2.1; url)
5 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
4 www.google.com/bot.htmlapplication/xmlSAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
4 www.google.com/bot.htmltext/..User-Agent :SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
3 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
3 www.google.com/bot.htmltext/..User-Agent :DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
1,185naver
1,107 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
42 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
26 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
9 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
748baidu
374 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
316 www.baidu.jp/spider/text/..Baiduspider(url)
22 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
11 www.baidu.jp/spider/application/xmlDoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
8 www.baidu.jp/spider/-Baiduspider(url)
7 www.baidu.jp/spider/text/..BaiduImagespider(url)
4 www.baidu.jp/spider/application/xmlBaiduspider(url)
3 www.baidu.com/search/spider.htmltext/..Nokia6681/1.0 (2.30.0) Series60/2.6 Profile/MIDP-2.0 Configuration/CLDC-1.1 (compatible; baiduspider; url)
711pipl
711 www.pipl.com/bot/text/..Mozilla/5.0(compatible;PiplBot;url)
704wikipedia
558 en.wikipedia.org/wiki/Web_crawlertext/..GoogleBot/Nutch-1.0 (Prototype; url; mail address )
85 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.6 url
22 en.wikipedia.orgtext/..url
21 en.wikipedia.org/wiki/User:NicoV/Wikipedia_Cleaner/Documentationtext/..WikiCleaner (url)
5 ko.wikipedia.orgtext/..url
5 en.wikipedia.org/wiki/User:Sidonuketext/..Huggle-Sidonuke Build/0.9.4 url
596ask
464 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
130 about.ask.com/en/docs/about/webmasters.shtml-Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
504php
356 pear.php.net/text/..PEAR HTTP_Request class ( url )
68 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
34 pear.php.net/package/http_request2text/..HTTP_Request2/0.5.1 (url) PHP/5.2.13
26 pear.php.net/application/jsonPEAR HTTP_Request class ( url )
19 pear.php.net/application/xmlPEAR HTTP_Request class ( url )
380soso
373 help.soso.com/webspider.htmtext/..Sosospider(url)
3 help.soso.com/soso-blog-spider.htmtext/..Sosoblogspider(url)
329yacy
26 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-22-generic; java 1.6.0_0; Europe/en) url
15 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.32-gentoo-r7; java 1.6.0_20; Europe/en) url
13 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-amd64; java 1.6.0_12; Europe/en) url
13 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-22-generic-pae; java 1.6.0_20; Europe/en) url
13 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-19-generic; java 1.6.0_0; Europe/en) url
12 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-23-generic; java 1.6.0_18; Europe/de) url
11 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-19-generic; java 1.6.0_20; Europe/en) url
10 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-22-generic; java 1.6.0_0; Europe/en) url
10 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-amd64; java 1.6.0_0; Europe/en) url
10 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-22-generic; java 1.6.0_20; Europe/en) url
9 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31.12-0.2-default; java 1.5.0_16; GMT01:00/de) url
9 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_20; Europe/de) url
8 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-19-generic; java 1.6.0_0; Europe/en) url
7 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-21-server; java 1.6.0_0; Asia/ja) url
7 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-23-generic; java 1.6.0_18; Europe/en) url
7 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.18-028stab064.7; java 1.6.0_15; Europe/de) url
6 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-21-server; java 1.6.0_0; Asia/en) url
6 yacy.net/bot.htmltext/..yacybot (x86 Windows Vista 6.1; java 1.5.0_13; Europe/sv) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-gentoo-r6; java 1.6.0_17; Etc/en) url
5 yacy.net/bot.htmltext/..yacybot (amd64 FreeBSD 8.0-RELEASE-p3; java 1.6.0_07; Europe/en) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.23.17-dbserv; java 1.6.0_04; Europe/en) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-164.15.1.el5xen; java 1.6.0; Europe/en) url
5 yacy.net/bot.htmltext/..yacybot (i386 FreeBSD 8.0-RELEASE; java 1.6.0_07; GMT/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-3-amd64; java 1.6.0_18; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.30-gentoo-r6-090907; java 1.6.0_17; GMT/de) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32.9-rscloud; java 1.6.0_20; Etc/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.18-194.3.1.el5; java 1.6.0; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-22-generic; java 1.6.0_18; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.26-2-686; java 1.6.0_0; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (i386 Mac OS X 10.5.8; java 1.5.0_24; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-19-generic; java 1.6.0_20; GMT/en) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-22-generic; java 1.6.0_20; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.33-ARCH; java 1.6.0_18; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-22-generic; java 1.6.0_18; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-21-generic; java 1.6.0_20; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-xen-amd64; java 1.6.0_12; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.24-19-generic; java 1.6.0_20; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-22-server; java 1.6.0_18; America/en) url
3 yacy.net/bot.htmltext/..yacybot (amd64 Windows 7 6.1; java 1.6.0_20; Europe/fr) url
3 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_18; Europe/fr) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.30-gentoo-r4; java 1.6.0_20; Canada/en) url
310exabot
184 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
119 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); url)
7 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
306cuil
304 www.cuil.com/twiceler/robot.htmltext/..Mozilla/5.0 (Twiceler-0.9 url)
302toolserver
208 toolserver.org/~guandalug/application/vnd.php.serializedGuandalugs PHPWikiBot/1.1 (url;de:User:Guandalug)
60 wiki.toolserver.org/view/GeoHacktext/..Geohack (url)
30 toolserver.org/~bayo/text/..LudoThecaire/1.0 (url)
259waw
189 dubi.itinfo.waw.plimage/..WordPress/2.8.6; url
70 gienia.itinfo.waw.plimage/..WordPress/2.8.6; url
217fairshare
212 fairshare.cctext/..Mozilla/5.0 url (X11; FreeBSD i386; en-US; rv:1.2a) Gecko/20021021
202scoutjet
202 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
190kosmix
135 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
53 www.kosmix.com/html/kosmos.htmlapplication/xmlMozilla/5.0(compatible;Kosmos/1.0;url)
179sogou
171 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
4 www.sogou.com/docs/help/webmasters.htm#07application/xmlSogou web spider/4.0(url)
4 www.sogou.com/docs/help/webmasters.htm#07image/..Sogou Pic Spider/3.0(url)
175sblog
116 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
35 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
18 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
5 fulltext.sblog.cz/screenshot/application/x-javascriptMozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
163traslated
163 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
157majestic12
143 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.3; url)
12 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.2; url)
138youdao
124 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
5 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
5 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
3 www.youdao.com/help/webmaster/spider/application/xmlMozilla/5.0 (compatible; YoudaoBot/1.0; url; )
130semager
130 www.semager.de/blog/semager-bots/text/..Mozilla/5.0 (compatible; Semager/1.4; url)
128textdigger
128 textdigger.comtext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
118entireweb
113 www.entireweb.com/about/search_tech/speedy_spider/text/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
5 www.entireweb.com/about/search_tech/speedy_spider/-Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
108wikimedia
106 tools.wikimedia.de/~daniel/text/..WikiSense (url)
108goo
105 help.goo.ne.jp/contact/text/..goo wikipedia (url)
90daum
89 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
77freebase
76 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
74wordpress
12 josefboberg.wordpress.comtext/..WordPress/MU; url
10 support.wordpress.com/contact/text/..WordPress.com mShots; url
6 benabb.wordpress.comtext/..WordPress/MU; url
4 antiuaar.wordpress.comtext/..WordPress/MU; url
3 arikurniantopurworaharjo.wordpress.comtext/..WordPress/MU; url
3 spacebarshift.wordpress.comtext/..WordPress/MU; url
3 musiquefreak.wordpress.comtext/..WordPress/MU; url
73emining
71 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
70github
70 github.com/pauldix/typhoeus/tree/mastertext/..Typhoeus - url
67mnemoo
67 www.mnemoo.com/about/spidertext/..Mnemoo Intelligent Spider/0.1alpha (compatible; See url)
58z-add
53 w3.z-add.co.uk/linkcheck/text/..Z-Add Link Checker (url)
4 w3.z-add.co.uk/linkcheck/image/..Z-Add Link Checker (url)
52conceptlinkage
52 www.conceptlinkage.orgtext/..c-link wikipedia miner (url) mail address
52www.
23 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
13 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
10 www.text/..GoogleBot-Image/1.0 ( urlGoogleBot.com/bot.html)
4 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)/2.1 (http://www.GoogleBot.com/bot.html; http://www.GoogleBot.com/bot.html; mail address )
46teesoft
15 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
10 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
8 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/image/..Mozilla/5.0 (X11; Linux i686; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
46sf
15 liferea.sf.net/text/..Liferea/0.x.x (Linux; en_US.UTF-8; url)
15 magpierss.sf.nettext/..MagpieRSS/0.7x (url)
14 liferea.sf.net/text/..Liferea/1.x.x (Linux; es_ES.UTF-8; url)
43moose
43 www.moose.at/about.phptext/..Mozilla/5.0 (compatible; Moose/1.2; Linux i686; de; url)
41newsgator
14 www.newsgator.com/text/..FeedDemon/2.7 (url; Microsoft Windows XP)
14 www.newsgator.comtext/..NewsGatorOnline/2.0 (url; 1 subscribers)
10 www.newsgator.com/Individuals/NetNewsWire/-NetNewsWire/3.2.7 (Mac OS X; url; gzip-happy)
39heartrails
21 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.3) Gecko/20100403 HeartRails_Capture/1.0.2 (url) Namoroka/3.6.3
18 capture.heartrails.com/image/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.3) Gecko/20100403 HeartRails_Capture/1.0.2 (url) Namoroka/3.6.3
37spinn3r
34 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
36oneriot
28 www.oneriot.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (url)
8 www.oneriot.comimage/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (url)
33FeedBurner
32 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
32avantbrowser
16 www.avantbrowser.comtext/..Avant Browser (url)
15 www.avantbrowser.comtext/..Advanced Browser (url)
32hatena
29 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
3 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
31dotnetdotcom
31 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
31jetbrains
17 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 1.0.x (url)
14 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 2.0 Release Candidate 1 (url)
30feedshow
16 www.feedshow.comtext/..FeedshowOnline (url)
14 www.feedshow.comtext/..Feedshow/x.0 (url; 1 subscriber)
30Anonymouse
14 Anonymouse.org/image/..url (Unix)
13 Anonymouse.org/text/..url (Unix)
3 Anonymouse.org/application/x-javascripturl (Unix)
27yandex
26 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexBot/3.0; url)
26setooz
24 www.setooz.com/bot.htmltext/..Mozilla/5.0 ( compatible; SETOOZBOT/0.30 ; url ; mail address )
25meta
25 meta.ua/spidertext/..Mozilla/5.0 (compatible; METASpider; url)
24chainn
21 www.chainn.com/mxbot.htmltext/..Mozilla/5.0 (compatible; mxbot/1.0; url)
3 www.chainn.com/mxbot.htmlimage/..Mozilla/5.0 (compatible; mxbot/1.0; url)
23yioop
22 www.yioop.com/bot.htmltext/..Mozilla/5.0 (compatible; YioopBot url)
23puritysearch
23 www.puritysearch.net/text/..Mozilla/5.0 (compatible; Purebot/1.1; url)
2380legs
17 www.80legs.com/spider.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
6 www.80legs.com/spider.htmlimage/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
22abonti
22 www.abonti.comtext/..Mozilla/5.0 (compatible; Abonti/0.9 - url)
20alexa
20 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
20discoveryengine
19 discoveryengine.com/discobot.htmltext/..Mozilla/5.0 (compatible; discobot/1.1; url
20weblio
18 www.weblio.jp/text/..Mozilla/5.0 (compatible; WeblioBot; url)
18spenki
18 www.spenki.ittext/..SpenkiBot v1.0 (url)
18rcdtokyo
13 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
5 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
17winpodder
17 winpodder.comtext/..WinPodder (url)
17it-influentials
17 search.it-influentials.com/bot.htmtext/..Mozilla/5.0 (compatible;FindITAnswersbot/1.0;url)
17nemui
17 mozshot.nemui.org/text/..Mozilla/5.0 (Gecko/20070310 Mozshot/0.0.20070628; url)
16tinyurl
16 tinyurl.com/64t5ntext/..Rome Client (url) Ver: 0.9
16froute
12 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
4 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
16archive
9 crawler.archive.orgtext/..Mozilla/5.0 (compatible; heritrix/3.1.1-SNAPSHOT-20100429.232622 url)
5 www.archive.orgtext/..Mozilla/5.0 (compatible; archive.org_bot/heritrix-1.15.4 url)
16rssreader
16 www.rssreader.comtext/..RssReader/1.0.xx.x (url) Microsoft Windows NT 5.1.2600.0
16ranchero
16 ranchero.com/netnewswire/text/..NetNewsWire/2.x (Mac OS X; url)
16feeds4all
16 www.feeds4all.com/feedzcollectortext/..FeedZcollector v1.x (Platinum) url
16seebot
16 seebot.orgtext/..Lynx/2.8 (;url)
15zipcommander
15 www.zipcommander.com/text/..1st ZipCommander (Net) - url
15graemef
15 graemef.comtext/..NewsGator FetchLinks extension/0.2.0 (url)
15holmes
14 holmes.getext/..HolmesBot (url)
15snarfware
15 www.snarfware.com/text/..Snarfer/0.x.x (url)
15orcabrowser
15 www.orcabrowser.comtext/..Orca Browser (url)
15rssbandit
15 www.rssbandit.orgtext/..RssBandit/1.5.0.10 (WinNT 5.1.2600.0; url) (WinNT 5.1.2600.0; )
15snap
15 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
15kula
15 kula.jp/endotext/..endo/1.0 (Mac OS X; ppc i386; url)
14blogbridge
14 www.blogbridge.com/text/..BlogBridge 2.13 (url)
14zootycoon
14 www.zootycoon.comtext/..Zoo Tycoon 2 Client -- url
14mixi
7 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
7 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
14ponderer
14 ponderer.org/download/annotate_google.user.jstext/..annotate_google; url
14timewe
14 timewe.nettext/..CDR/1.7.1 Simulator/0.7(url) Profile/MIDP-1.0 Configuration/CLDC-1.0
14bnf
12 www.bnf.fr/fr/outils/a.dl_web_capture_robot.htmlimage/..Mozilla/5.0 (compatible; bnf.fr_bot; url)
14plagger
14 plagger.org/text/..Plagger/0.x.xx (url)
13topsy
13 labs.topsy.com/butterfly.htmltext/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
12chug
12 crawler.chug.nettext/..Mozilla/5.0 (compatible; heritrix/3.0.0 url)
12gigablast
12 www.gigablast.com/spider.htmltext/..Gigabot/3.0 (url)
11seoprofiler
11 www.seoprofiler.com/bottext/..Mozilla/5.0 (compatible; spbot/2.0.4; url )
11wise-guys
7 www.wise-guys.nl/text/..Mozilla/4.0 (compatible; Vagabondo/4.0/CGM; url)
11chenli
11 chenli.com.cntext/..Chen Li/Nutch-1.0 (Nutch spiderman; url; mail address )
10superfeedr
9 superfeedr.comapplication/xmlSuperfeedr: Superparser/1.0 url - Please read this http://blog.superfeedr.com/publishers.html or get in touch if we're polling too hard
10tineye
8 tineye.com/crawler.htmlimage/..TinEye/1.1 (url)
43,051total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
2,468PythonWikipediaBot/1.0
1,662 application/json
678 application/xml
128 text/..
1 image/..
1,304GoogleBot-Image/1.0
498 text/..
459 -
347 image/..
540ClueBot/1.1
439 application/vnd.php.serialized
101 text/..
322Answersbot
322 text/..
308Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
308 text/..
1 -
1 application/ogg
297LinkParser/2.0
297 text/..
264Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
120 image/..
102 text/..
42 application/x-javascript
1 application/json
227php wikibot classes
189 application/vnd.php.serialized
38 text/..
147MLBot (www.metadatalabs.com/mlbot)
147 text/..
1 application/xml
1 image/..
146Onespot Crawler
108 application/json
38 text/..
146gsa-crawler (Enterprise; S5-MS8QQPJ5BGWAA; mail address )
146 text/..
145wikiwix-bot-3.0
130 text/..
15 image/..
1 -
100GoogleBot-Image/1.0
98 text/..
2 image/..
1 -
98GoogleBot-News
97 text/..
1 -
92SoxBot IRC Bot. PHP
87 application/vnd.php.serialized
5 text/..
85DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
82 text/..
3 application/xml
66Casper Bot Search
66 text/..
1 -
59crawler mail address
59 text/..
1 image/..
55spider
54 text/..
1 application/json
1 application/xml
1 image/..
53Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
33 image/..
20 text/..
1 application/x-javascript
49Mozilla 5.0 (Apibot 0.20)
49 application/vnd.php.serialized
46DotNetWikiBot/2.81 (Microsoft Windows NT 6.1.7600.0; )
37 text/..
6 application/xml
3 image/..
40Mozilla/5.0 (compatible; sgbot v0.01a, mail address )
40 text/..
39msramlbot
39 text/..
32Test Webbot
32 text/..
30MyCuteBot / 0.1.
30 text/..
29SineBot/1.5.16(User:SineBot)
28 application/vnd.php.serialized
1 text/..
28zschobot/Nutch-0.9-semantic_patch (zschobot indexing; Zscho.de/de/bot.html)
28 text/..
28COMODOspider/Nutch-1.0
28 text/..
1 image/..
1 application/ogg
27Jbot
27 text/..
26CorenSearchBot/1.5 en libwww-perl/5.834
26 text/..
24Pywikipediabot/2.0
24 application/json
21TheKeens bot
21 text/..
21www.rootza.com crawler mail address
21 application/xml
1 text/..
21HTMLParser/2.0
21 text/..
20GoogleBot
20 text/..
1 image/..
1 application/opensearchdescription+xml
20plantspedia data crawler
20 text/..
18TrueKnowledgeBot bot mail address >
9 application/xml
9 application/vnd.php.serialized
18MSR-ISRCCrawler
17 text/..
1 image/..
17COIBot/1.00
17 text/..
16Mozilla/5.0 (X11; Linux i686; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 SnapPreviewBot
16 text/..
15HTMLParser/1.6
15 text/..
1 application/json
15AnomieBOT 1.0 (OrphanReferenceFixer)
15 application/json
14Mozilla/5.0 (compatible; SnapPreviewBot; en-US; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9
14 text/..
14UCMore Crawler App
14 text/..
14ZanranCrawler/0.2 ( mail address )
14 text/..
13('python-wikitools/1.2 (User:BernsteinBot)',)
13 application/json
13Twitterbot/0.1
13 text/..
1 application/pdf
1 image/..
12dictionary-bot
8 application/xml
4 text/..
12YaDirectBot/1.0
12 text/..
12gsa-crawler (Enterprise; S5-D9DMGG3QLGJJB; mail address )
12 text/..
11HRoestBot, de-wikipedia using pywikipedia framework
10 application/xml
1 application/json
1 text/..
11SurakWare MediaWiki Bot/1.0
11 text/..
11DotNetWikiBot/2.9 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
9 text/..
2 application/xml
11SoxBot PHP
10 application/vnd.php.serialized
1 text/..
9Bot/WP/EN/Daniel/MediationBot1/1.2
9 text/..
8~Bot ([[:fr:w:User:TildeBot]] by [[:fr:w:User:Alphos]] mail address )
8 text/..
8MystBot/1.5 fr libwww-perl/5.836
8 text/..
8Tawbot (public svn release; plwiki)
8 text/..
8SiocWikiBot/1.0
8 application/vnd.php.serialized
1 text/..
7FAST Search Web Crawler 14.0.0291.0000
6 text/..
1 -
6bitlybot
6 text/..
1 image/..
6Mozilla/5.0 (compatible; Nigma.ru/3.0; mail address )
6 text/..
1 image/..
6XLinkBot/1.00
6 text/..
6betaBot
6 text/..
6QBikSpider/2.0
6 text/..
6Mozilla/5.0 (compatible; Windows NT 6.0) Gecko/20090624 Firefox/3.5 NjuiceBot
6 text/..
1 image/..
5GNAA-bot
5 text/..
5ess-crawler/Nutch-1.0 (web crawler)
5 text/..
5Jyxobot/1
5 text/..
4CorenSearchBot/1.5 en libwww-perl/5.808
4 text/..
4Moholibot
4 text/..
4Jabse.com Crawler v.2.0 www.jabse.com/crawler.php
4 text/..
4Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
4 text/..
4Freebase Deathbot
4 text/..
4IScraperBot/0.1 Mozilla/5.0
4 application/xml
4Mozilla/5.0 (Bgbot 0.5)
4 text/..
4('python-wikitools/1.2 (User:LaraBot)',)
4 application/json
3FAST Enterprise Crawler/6.7.8 ( mail address )
3 text/..
3FAST Enterprise Crawler 6 used by Wanadoo ( mail address )
2 application/x-javascript
1 text/..
3Peter Wang/Nutch-1.0 (Nutch spiderman; local host ; mail address )
3 text/..
3Citation_bot; mail address
3 text/..
3Geni ircpybot 1.0
2 text/..
1 application/json
3AnomieBOT 1.0 (AFDMergeFromCleaner)
3 application/json
3AnomieBOT 1.0 (SourceUploader)
3 application/json
3TVersity Media Robot
3 text/..
1 -
3IScraperBot/0.1 Mozilla/5.0
3 application/xml
7,830total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Sat, Jul 3, 2010 19:20
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.