Wikimedia Traffic Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Jul 2011 - 31 Jul 2011

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Op.Sys. / Browsers / Google

The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 62,520,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 396,361,000 external requests, which is 15.8%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
21,809google
18,438 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
984 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
482 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
338 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
184 code.google.com/appenginetext/..AppEngine-Google; (url; appid: ortografia4)
180 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
105 code.google.com/appengineapplication/jsonAppEngine-Google; (url; appid: s~redconceptual)
100 www.google.com/feedfetcher.htmlimage/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
99 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
98 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
85 code.google.com/appenginetext/..AppEngine-Google; (url; appid: rarplayer)
75 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wikien4)
65 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
58 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
41 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
35 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
34 code.google.com/p/crawler4j/text/..crawler4j (url)
33 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wikien3)
32 code.google.com/appenginetext/..AppEngine-Google; (url; appid: ortopedianew)
24 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid: wikipedia-raw)
23 code.google.com/appenginetext/..AppEngine-Google; (url; appid: proxworx)
22 code.google.com/appenginetext/..WikiBot/0.1 AppEngine-Google; (url; appid: newikipedia)
21 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
19 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
15 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
14 www.google.com/coop/cse/creftext/..FeedFetcher-Google-CoOp; (url)
13 code.google.com/appenginetext/..AppEngine-Google; (url; appid: tortelliniman)
11 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
11 code.google.com/appenginetext/..www.productontology.org/1.0 (Contact: mail address ) AppEngine-Google; (url; appid: gr4bing)
10 code.google.com/appenginetext/..AppEngine-Google; (url; appid: mygpxy)
10 code.google.com/appenginetext/..AppEngine-Google; (url; appid: boxapp)
10 www.google.com/bot.htmlimage/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
8 code.google.com/appengineimage/..AppEngine-Google; (url; appid: d24-img)
7 code.google.com/appenginetext/..AppEngine-Google; (url; appid: gcdnmirror)
6 code.google.com/appenginetext/..AppEngine-Google; (url; appid: s~sony-hack)
6 code.google.com/appenginetext/..AppEngine-Google; (url; appid: finchproxy)
6 code.google.com/appengineapplication/jsonMWBOT GAE Edition AppEngine-Google; (url; appid: philip-bot)
6 code.google.com/appenginetext/..AppEngine-Google; (url; appid: d24-img)
5 code.google.com/appenginetext/..AppEngine-Google; (url; appid: retimeme)
5 code.google.com/appenginetext/..AppEngine-Google; (url; appid: lullar-data),gzip(gfe) (via translate.google.com)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wagagate)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: kbworld24)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: okmyfinder)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wikidashboard)
3 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
3 code.google.com/appenginetext/..oohEmbed.com AppEngine-Google; (url; appid: s~ooohembed)
3 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: dustbunnytycoonmonitor)
15,501yahoo
9,625 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
3,507 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
2,000 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
112 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
58 listing.yahoo.co.jp/support/faq/int/other/other_001.htmltext/..Y!J-BRJ/YATS crawler (url)
42 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
28 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
21 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmlimage/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
18 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp; url)
18 help.yahoo.com/help/us/ysearch/slurpapplication/vnd.php.serializedMozilla/5.0 (compatible Yahoo! Slurp/3.0 url)
13 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp; url)
12 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
9 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
9 developer.yahoo.com/yql/providertext/..Mozilla/5.0 (compatible; Yahoo Pipes 2.0; url) Gecko/20090729 Firefox/3.5.2
7 misc.yahoo.com.cn/help.html-Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
6 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRT/1.0 crawler (url)
4 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
4 help.yahoo.com/help/us/ysearch/slurpapplication/x-javascriptMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
13,029facebook
8,293 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
4,553 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
110 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.1 (url)
55 developers.facebook.comimage/..facebookplatform/1.0 (url)
13 www.facebook.com/externalhit_uatext.php-facebookexternalhit/1.0 (url)
4 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.1 (url)
6,617bing
4,784 www.bing.com/bingbot.htmtext/..Mozilla/5.0 (compatible; bingbot/2.0; url)
1,817 www.bing.com/bingbot.htm-Mozilla/5.0 (compatible; bingbot/2.0; url)
6 www.bing.com/bingbot.htmimage/..Mozilla/5.0 (compatible; bingbot/2.0; url)
3 www.bing.com/bingbot.htmtext/..Mozilla/5.0 (compatible; bingbot/2.0; url) ASProxy/5.5b3
3 www.bing.com/bingbot.htmapplication/vnd.php.serializedMozilla/5.0 (compatible; bingbot/2.0; url)
6,440google?
5,788 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
218 www.google.com/bot.htmlapplication/vnd.php.serializedMozilla/5.0 (compatible; GoogleBot/2.1; url)
180 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
132 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
47 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
32 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
12 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
10 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
6 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
5 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url) ASProxy/5.5b3
4 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
2,139yandex
1,588 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexBot/3.0; url)
386 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexImages/3.0; url)
67 yandex.com/botsimage/..Mozilla/5.0 (compatible; YandexImages/3.0; url)
57 yandex.com/bots-Mozilla/5.0 (compatible; YandexBot/3.0; url)
14 yandex.com/botsimage/..Mozilla/5.0 (compatible; YandexImageResizer/2.0; url)
8 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexDirect/3.0; url)
4 yandex.com/botsapplication/vnd.php.serializedMozilla/5.0 (compatible; YandexBot/3.0; url)
3 yandex.com/botsimage/..Mozilla/5.0 (compatible; YandexBot/3.0; url)
3 yandex.com/botsapplication/vnd.php.serializedMozilla/5.0 (compatible; YandexImages/3.0; url)
2,127naver
1,904 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
202 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
11 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
7 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url) ASProxy/5.5b5
1,029baidu
936 www.baidu.com/search/spider.htmltext/..Mozilla/5.0 (compatible; Baiduspider/2.0; url)
41 www.baidu.com/search/spider.html-Mozilla/5.0 (compatible; Baiduspider/2.0; url)
37 www.baidu.com/search/spider.htmtext/..Baiduspider-image(url)
4 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
3 www.baidu.com/search/spider.htmlapplication/vnd.php.serializedMozilla/5.0 (compatible; Baiduspider/2.0; url)
989msn
469 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)._
236 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
106 search.msn.com/msnbot.htmtext/..msnbot-NewsBlogs/2.0b (url)
66 search.msn.com/msnbot.htmtext/..msnbot-media/1.1 (url)
65 search.msn.com/msnbot.htmimage/..msnbot-media/1.1 (url)
34 search.msn.com/msnbot.htmtext/..msnbot-Products/1.0 (url)
6 search.msn.com/msnbot.htmtext/..msnbot-UDiscovery/2.0b (url)
447youdao
420 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
17 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
4 www.youdao.com/help/webmaster/spider/application/vnd.php.serializedMozilla/5.0 (compatible; YoudaoBot/1.0; url; )
4 toolbar.youdao.com/image/..Youdao Toolbar (url)
425traslated
425 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
395exabot
325 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
63 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); url)
7 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
363sblog
218 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
66 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
43 fulltext.sblog.cz/text/..SeznamBot/3.0 (url)
31 fulltext.sblog.cz/text/..SeznamBot/3.0-test (url)
3 fulltext.sblog.cz/-SeznamBot/3.0 (url)
324entireweb
316 www.entireweb.com/about/search_tech/speedy_spider/text/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
4 www.entireweb.com/about/search_tech/speedy_spider/-Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
295scoutjet
295 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
277mediawiki
277 www.mediawiki.org/text/..MediaWiki OAI Harvester 0.2 (url)
265php
139 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
42 pear.php.net/package/http_request2text/..HTTP_Request2/0.5.2 (url) PHP/5.2.17
40 pear.php.net/application/xmlPEAR HTTP_Request class ( url )
39 pear.php.net/text/..PEAR HTTP_Request class ( url )
3 pear.php.net/package/http_request2text/..HTTP_Request2/2.0.0RC1 (url) PHP/5.3.2-1ubuntu4.9
26180legs
220 www.80legs.com/webcrawler.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url) Gecko/2008032620
34 www.80legs.com/webcrawler.htmlimage/..Mozilla/5.0 (compatible; 008/0.83; url) Gecko/2008032620
4 www.80legs.com/webcrawler.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
204bsurprised
187 bsurprised.com/text/..BSurprised WikiBox 0.1.3 (url)
17 bsurprised.com/text/..BSurprised WikiBox 0.1 (url)
199majestic12
172 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.4.0; url)
26 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.3; url)
199www.
135 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
32 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
28 www.text/..GoogleBot-Image/1.0 ( urlGoogleBot.com/bot.html)
4 www.image/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
176sogou
161 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
6 www.sogou.com/docs/help/webmasters.htm#07application/vnd.php.serializedSogou web spider/4.0(url)
4 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou Pic Spider/3.0(url)
3 www.sogou.com/docs/help/webmasters.htm#07-Sogou web spider/4.0(url)
168wikipedia
59 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/2.1.13.0 url
54 en.wikipedia.org/wiki/User:NicoV/Wikipedia_Cleaner/Documentationtext/..WikiCleaner (url)
36 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/2.1.14.0 url
6 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/2.1.1.0 url
3 en.wikipedia.orgtext/..url
166enwp
146 enwp.org/User:SDPatrolBottext/..SDPatrolBot (url)
16 enwp.org/User:KingpinBottext/..KingpinBot (url)
3 enwp.org/User:H3llkn0wz/WikiSharpAPItext/..WikiSharpAPI/0.3 url (C# .NET)
163toolserver
108 wiki.toolserver.org/view/GeoHacktext/..Geohack (url)
43 toolserver.org/~bayo/text/..LudoThecaire/1.0 (url)
4 toolserver.org/~dispenser/text/..WebWikipedia Python (url)
3 toolserver.org/~para/cgi-bin/kmlexporttext/..url libwww-perl/6.02
3 toolserver.org/~guandalug/application/vnd.php.serializedGuandalugs PHPWikiBot/1.1 (url;de:User:Guandalug)
161soso
151 help.soso.com/webspider.htmtext/..Sosospider(url)
4 help.soso.com/webspider.htm-Sosospider(url)
3 help.soso.com/soso-image-spider.htmtext/..Sosoimagespider(url)
132tumblr
129 benderthewebrobot.tumblr.comtext/..Mozilla/5.0 (compatible; Bender; url)
128wordpress
11 arthur2rcasc.wordpress.comtext/..WordPress/MU; url
7 kterrl.wordpress.comtext/..WordPress/MU; url
4 driwancybermuseum.wordpress.comtext/..WordPress/MU; url
4 curtisnarimatsu.wordpress.comtext/..WordPress/MU; url
4 iwansuwandy.wordpress.comtext/..WordPress/MU; url
3 theancientweb.wordpress.comtext/..WordPress/MU; url
3 vindicatemj.wordpress.comtext/..WordPress/MU; url
3 eof737.wordpress.comtext/..WordPress/MU; url
3 cheltjules.wordpress.comtext/..WordPress/MU; url
3 vinoconvistablog.wordpress.comtext/..WordPress/MU; url
3 pnx2011.wordpress.comtext/..WordPress/MU; url
123goo
74 help.goo.ne.jp/contact/text/..goo wikipedia (url)
46 help.goo.ne.jp/help/article/1142/text/..DoCoMo/2.0 P900i(c100;TB;W24H11) (compatible; ichiro/mobile goo; url)
123semager
114 www.semager.de/blog/semager-bots/text/..Mozilla/5.0 (compatible; Semager/1.4; url)
6 www.semager.de/blog/semager-bots/application/jsonMozilla/5.0 (compatible; Semager/1.4; url)
3 www.semager.de/blog/semager-bots/-Mozilla/5.0 (compatible; Semager/1.4; url)
121yacy
23 yacy.net/bot.htmltext/..yacybot (sciencenet/any; amd64 Linux 2.6.35-30-generic; java 1.6.0_20; Europe/en) url
18 yacy.net/bot.htmltext/..yacybot (sciencenet/any; amd64 Linux 2.6.32-24-generic; java 1.6.0_18; Europe/en) url
15 yacy.net/bot.htmltext/..yacybot (freeworld/global; amd64 Linux 2.6.31-gentoo-r6; java 1.6.0_17; Etc/en) url
11 yacy.net/bot.htmltext/..yacybot (freeworld/global; i386 Linux 2.6.35-gentoo-r4; java 1.6.0_20; Europe/el) url
10 yacy.net/bot.htmltext/..yacybot (freeworld/global; i386 Linux 2.6.33.7-server-2mnb; java 1.6.0_18; Europe/fr) url
6 yacy.net/bot.htmltext/..yacybot (webportal-global; amd64 Linux 2.6.35-30-generic; java 1.6.0_20; Europe/de) url
5 yacy.net/bot.htmltext/..yacybot (freeworld/global; i386 Linux 2.6.38-8-generic; java 1.6.0_22; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (freeworld/global; amd64 Linux 2.6.31-23-server; java 1.6.0_24; Europe/en) url
97echonest
81 the.echonest.com/reader/application/xmlnestReader/0.3 (discovery; url; reader at echonest.com)
16 the.echonest.com/reader/text/..nestReader/0.3 (discovery; url; reader at echonest.com)
96sf
32 liferea.sf.net/text/..Liferea/0.x.x (Linux; en_US.UTF-8; url)
32 magpierss.sf.nettext/..MagpieRSS/0.7x (url)
31 liferea.sf.net/text/..Liferea/1.x.x (Linux; es_ES.UTF-8; url)
93sitebot
93 www.sitebot.org/robot/text/..Mozilla/5.0 (compatible; SiteBot/0.1; url)
91wikimedia
87 tools.wikimedia.de/~daniel/text/..WikiSense (url)
80daum
71 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
8 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/3.0
74FeedBurner
74 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
72kosmix
69 www.kosmix.com/html/kosmos.htmlapplication/xmlMozilla/5.0(compatible;Kosmos/1.0;url)
3 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
71jetbrains
36 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 2.0 Release Candidate 1 (url)
35 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 1.0.x (url)
68avantbrowser
35 www.avantbrowser.comtext/..Avant Browser (url)
33 www.avantbrowser.comtext/..Advanced Browser (url)
68emining
66 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
64newsgator
32 www.newsgator.com/text/..FeedDemon/2.7 (url; Microsoft Windows XP)
32 www.newsgator.comtext/..NewsGatorOnline/2.0 (url; 1 subscribers)
63feedshow
32 www.feedshow.comtext/..FeedshowOnline (url)
31 www.feedshow.comtext/..Feedshow/x.0 (url; 1 subscriber)
63archive-it
40 archive-it.org/files/site-owners.htmlimage/..Mozilla/5.0 (compatible;archive.org_bot; Archive-It; url) Firefox/0.0
22 archive-it.org/files/site-owners.htmltext/..Mozilla/5.0 (compatible;archive.org_bot; Archive-It; url) Firefox/0.0
62sentymetr
33 sentymetr.pl/bot.htmlapplication/jsonMozilla/5.0 (compatible; SentymetrBot 1.0; url)
29 sentymetr.pl/bot.htmltext/..Mozilla/5.0 (compatible; SentymetrBot 1.0; url)
59frontpagesearch
37 frontpagesearch.nettext/..WordPress/3.1.3; url
11 frontpagesearch.netimage/..WordPress/3.1.3; url
9 frontpagesearch.nettext/..WordPress/3.1.4; url
55flipboard
20 flipboard.com/browserproxyimage/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (FlipboardProxy/0.0.5; url)
12 flipboard.com/browserproxyapplication/jsonMozilla/5.0 (Macintosh; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (FlipboardProxy/0.0.1; url)
11 flipboard.com/browserproxytext/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (FlipboardProxy/0.0.5; url)
11 flipboard.com/browserproxytext/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (FlipboardProxy/1.1; url)
50z-add
46 w3.z-add.co.uk/linkcheck/text/..Z-Add Link Checker (url)
4 w3.z-add.co.uk/linkcheck/image/..Z-Add Link Checker (url)
48ayna
48 www.ayna.comtext/..Mozilla/5.0 (compatible; Ayna url)
45freebase
44 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
42textdigger
41 textdigger.comtext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
39dataparksearch
38 dataparksearch.org/bottext/..DataparkSearch/4.54-26052011 (url)
38apache
38 lucene.apache.org/nutch/bot.htmltext/..NutchCVS/0.7.2 (Nutch; url; mail address )
37archive
36 www.archive.org/details/archive.org_bottext/..Mozilla/5.0 (compatible; archive.org_bot url)
37weblio
36 www.weblio.jp/text/..Mozilla/5.0 (compatible; WeblioBot; url)
36tinyurl
35 tinyurl.com/64t5ntext/..Rome Client (url) Ver: 0.9
36fairshare
31 fairshare.cctext/..Mozilla/5.0 url (X11; FreeBSD i386; en-US; rv:1.2a) Gecko/20021021
4 fairshare.cctext/..Mozilla crawl/5.0 (compatible; fairshare.cc url)
36hatena
33 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
3 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
36seebot
36 seebot.orgtext/..Lynx/2.8 (;url)
35discoveryengine
29 discoveryengine.com/discobot.htmltext/..Mozilla/5.0 (compatible; discobot/1.1; url)
6 discoveryengine.com/discobot.htmlimage/..Mozilla/5.0 (compatible; discobot/1.1; url)
34ponderer
34 ponderer.org/download/annotate_google.user.jstext/..annotate_google; url
34graemef
34 graemef.comtext/..NewsGator FetchLinks extension/0.2.0 (url)
34it-influentials
34 search.it-influentials.com/bot.htmtext/..Mozilla/5.0 (compatible;FindITAnswersbot/1.0;url)
34nemui
34 mozshot.nemui.org/text/..Mozilla/5.0 (Gecko/20070310 Mozshot/0.0.20070628; url)
33zipcommander
33 www.zipcommander.com/text/..1st ZipCommander (Net) - url
33rssbandit
33 www.rssbandit.orgtext/..RssBandit/1.5.0.10 (WinNT 5.1.2600.0; url) (WinNT 5.1.2600.0; )
33kula
33 kula.jp/endotext/..endo/1.0 (Mac OS X; ppc i386; url)
33zootycoon
33 www.zootycoon.comtext/..Zoo Tycoon 2 Client -- url
32timewe
32 timewe.nettext/..CDR/1.7.1 Simulator/0.7(url) Profile/MIDP-1.0 Configuration/CLDC-1.0
32ranchero
32 ranchero.com/netnewswire/text/..NetNewsWire/2.x (Mac OS X; url)
32rssreader
32 www.rssreader.comtext/..RssReader/1.0.xx.x (url) Microsoft Windows NT 5.1.2600.0
32orcabrowser
32 www.orcabrowser.comtext/..Orca Browser (url)
31plagger
31 plagger.org/text/..Plagger/0.x.xx (url)
31Anonymouse
21 Anonymouse.org/text/..url (Unix)
10 Anonymouse.org/image/..url (Unix)
31blogbridge
31 www.blogbridge.com/text/..BlogBridge 2.13 (url)
31winpodder
31 winpodder.comtext/..WinPodder (url)
30whatrhymeswith
30 www.whatrhymeswith.com/site/rhyme-bottext/..RhymeBot/0.1 (url)
30snarfware
30 www.snarfware.com/text/..Snarfer/0.x.x (url)
29feeds4all
29 www.feeds4all.com/feedzcollectortext/..FeedZcollector v1.x (Platinum) url
29federatedmedia
27 federatedmedia.nettext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
26topsy
26 labs.topsy.com/butterfly/text/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
25garlik
25 garlik.com/text/..GarlikCrawler/1.1 (url, mail address )
25spinn3r
22 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux x86_64; en-US; rv:1.9.0.19; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/2010040121 Firefox/3.0.19
24turnitin
24 www.turnitin.com/robot/crawlerinfo.htmltext/..TurnitinBot/2.1 (url)
24github
10 github.com/pauldix/typhoeus/tree/mastertext/..Typhoeus - url
9 github.com/NeilCrosby/wikislurpapplication/vnd.php.serializedWikiSlurp (url)
3 github.com/dbalatero/typhoeus/tree/mastertext/..Typhoeus - url
24bibalex
15 archive.bibalex.org/bot/image/..Mozilla/5.0 (compatible; archive.bibalex.org_bot; url)
9 archive.bibalex.org/bot/text/..Mozilla/5.0 (compatible; archive.bibalex.org_bot; url)
23puritysearch
23 www.puritysearch.net/text/..Mozilla/5.0 (compatible; Purebot/1.1; url)
23picsearch
21 www.picsearch.com/bot.htmltext/..psbot/0.1 (url)
21alexa
21 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
214chat
21 www.4chat.tvtext/..url
21netnewswireapp
21 netnewswireapp.com/mac/-NetNewsWire/3.2.15 (Mac OS X; url; gzip-happy)
20backlinktest
20 www.backlinktest.com/crawler.htmltext/..BacklinkCrawler (url)
19froute
15 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
4 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
19dotnetdotcom
19 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
19phonifier
19 www.phonifier.comtext/..Mozilla/5.0 (compatible; Phonifier; url)
18rcdtokyo
16 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
16drupal
9 drupal.org/text/..User-Agent: Drupal (url)
4 drupal.org/text/..Drupal (url)
16ibis
10 ibis.ne.jp/browser/about.htmlimage/..Mozilla/4.0 (compatible; ibisBrowser; url)
4 ibis.ne.jp/browser/about.htmltext/..Mozilla/4.0 (compatible; ibisBrowser; url)
16rockpeaks
16 www.rockpeaks.com/contacttext/..RockPeaks/0.1 (url)
15idrc
12 web.idrc.ca/challenge/ev-136691-201-1-DO_TOPIC.htmltext/..Mozilla/5.0 (compatible; http; url; mail address )
14findthatfile
13 www.findthatfile.comtext/..Mozilla/5.0 (compatible; heritrix/1.14.4 url)
14searchtechnologies
14 www.searchtechnologies.comtext/..Mozilla/5.0 (compatible; heritrix/1.14.3 url)
14search
14 www.search.ch/rim.htmltext/..UltraSpider3000/1.0 (url)
14moviecus
13 www.moviecus.com/botcontactinfo.phpapplication/yamlmoviecus bot (url)
13rankur
13 rankur.comtext/..RankurBot/Rankur2.1 (url; mail address )
13wattsupwiththat
13 wattsupwiththat.comtext/..WordPress/MU; url
12yoursite
11 yoursite.com/botinfoapplication/vnd.php.serializedMozilla/5.0 (compatible; YourCoolBot/1.0; url)
12kalooga
7 kalooga.com/crawlerimage/..Mozilla/5.0 (compatible; KaloogaBot; url)
5 kalooga.com/crawlertext/..Mozilla/5.0 (compatible; KaloogaBot; url)
12netarkivet
10 netarkivet.dk/website/info.htmltext/..Mozilla/5.0 (compatible; heritrix/1.12.1b url)
12ac
7 ce.yazduni.ac.irtext/..Mozilla/5.0 (compatible; heritrix/1.14.4 url)
4 www.tkl.iis.u-tokyo.ac.jp/~crawler/text/..Mozilla/5.0 (compatible; Steeler/3.5; url)
11js-kit
11 js-kit.com/text/..JS-Kit URL Resolver, url
11carclassed
11 carclassed.com/text/..WikiBox 0.1 (url)
11goso
7 www.goso.cn/spider.htmltext/..gosospider Mozilla/5.0 (compatible; GOSOSpider; url)
11creativecommons
11 wiki.creativecommons.org/Metadata_Scrapertext/..CC Metadata Scaper url
10bin-co
5 www.bin-co.com/php/scripts/load/application/vnd.php.serializedBinGet/1.00.A (url)
5 www.bin-co.com/php/scripts/load/text/..BinGet/1.00.A (url)
10potaru
10 potaru.com/HatoPoppo.htmltext/..Mozilla/5.0 (compatible; HatoPoppo/1.0b; url)/Nutch-1.2
10suggy
10 blog.suggy.com/was-ist-suggy/suggy-webcrawler/text/..Mozilla/5.0 (compatible; suggybot v0.01a, url)
77,752total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
4,729PythonWikipediaBot/1.0
3,492 application/json
1,191 application/xml
46 text/..
1 -
1 image/..
1,247GoogleBot-Image/1.0
645 text/..
496 image/..
106 -
955MediaWikiCrawler-Google/2.0 ( mail address )
953 text/..
2 -
813spider
813 text/..
1 -
1 application/json
1 image/..
1 application/ogg
545php wikibot classes
527 application/vnd.php.serialized
18 text/..
1 -
497Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
497 text/..
1 -
1 application/pdf
1 application/vnd.php.serialized
454GoogleBot-Image/1.0
420 text/..
21 image/..
13 application/vnd.php.serialized
1 -
412LinkParser/2.0
412 text/..
343Peachy MediaWiki Bot API Version 1.0
343 application/vnd.php.serialized
321wikiwix-bot-3.0
317 text/..
4 image/..
1 -
247Answersbot
247 text/..
193Onespot Crawler
143 application/json
47 text/..
3 -
182ClueBot/2.0
182 application/vnd.php.serialized
1 -
1 text/..
160GoogleBot-News
160 text/..
1 -
1 application/xml
125ClueBot/1.1
125 application/vnd.php.serialized
1 text/..
125MoovidaBot/0.1
125 text/..
118 mail address
117 application/vnd.php.serialized
1 text/..
1 -
118SiocWikiBot/1.0
108 application/vnd.php.serialized
10 text/..
103Test Webbot
103 text/..
102DotNetWikiBot/2.81 (Microsoft Windows NT 6.1.7601 Service Pack 1; )
83 text/..
15 application/xml
4 image/..
1 application/ogg
95gosospider "Mozilla/5.0
94 text/..
1 -
1 application/xml
85Opera/8.01 (J2ME/MIDP; MXit WebBot/1.3.1.0) Opera Mini/3.1
73 application/vnd.wap.xhtml+xml
6 image/..
6 text/..
1 -
75Mozilla/5.0 (compatible; Ezooms/1.0; mail address )
74 text/..
1 application/ogg
1 image/..
72TVersity Media Robot
72 text/..
70Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
46 image/..
24 text/..
1 application/json
1 application/x-javascript
46ROCKMELT-BOT
46 application/xml
1 text/..
40GoogleBot/2.1
40 text/..
1 image/..
40Pywikipediabot/2.0
40 application/json
1 text/..
39CorenSearchBot/1.5 en libwww-perl/5.834
39 text/..
36ReadonlyBot
36 text/..
33Mozilla/5.0 (compatible; SnapPreviewBot; en-US; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9
33 text/..
32Mozilla/5.0 (X11; Linux i686; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 SnapPreviewBot
32 text/..
32UCMore Crawler App
32 text/..
1 -
32SineBot/1.5.17(User:SineBot)
31 application/vnd.php.serialized
1 text/..
31MediaWiki::Bot/3.2.6
31 application/json
30wikbot/1.1 CFNetwork/485.13.9 Darwin/11.0.0
18 image/..
12 application/json
1 -
1 text/..
29GoogleBot
28 text/..
1 -
1 image/..
29Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
23 image/..
6 text/..
1 application/x-javascript
28DotNetWikiBot/2.97 (Unix 5.10.0.0; )
28 application/xml
1 text/..
27MLBot (www.metadatalabs.com/mlbot)
17 text/..
10 application/vnd.php.serialized
25DNSTallyKwBot/0.2
25 text/..
25AnomieBOT 1.0 (TagDater)
25 application/json
1 text/..
25Opera/8.01 (J2ME/MIDP; MXit WebBot/1.4.0.0) Opera Mini/3.1
21 application/vnd.wap.xhtml+xml
2 image/..
2 text/..
1 -
24Mozilla/5.0 (compatible; Web CEO Online robot)
24 text/..
24Tawbot (public svn release; plwiki)
24 text/..
24COIBot/2.0
24 text/..
23HRoestBot, de-wikipedia using pywikipedia framework
11 application/json
7 application/xml
5 text/..
23DotNetWikiBot/2.97 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
23 text/..
1 -
1 application/xml
22Mozilla/5.0 MaboMwFramework/1.1 (w:de:MerlIwBot)
22 text/..
21DotNetWikiBot/2.97 (Microsoft Windows NT 6.1.7601 Service Pack 1; )
14 text/..
7 application/xml
21.NET Client Parser
21 application/xml
1 text/..
20AnomieBOT 1.0 (ReplaceExternalLinks2)
20 application/json
1 text/..
19HTMLParser/2.0
19 text/..
1 -
1 image/..
19Mozilla/5.0 (compatible; en) Crawler from G51.
19 text/..
19SuperBot/4.7.0.74 (Windows XP)
19 text/..
1 image/..
18Obliviousness Crawler
12 application/vnd.wap.xhtml+xml
6 text/..
1 application/opensearchdescription+xml
18COIBot/1.00
18 text/..
18Twitterbot/0.1
17 text/..
1 image/..
1 -
17FAST Enterprise Crawler 6 used by ESP ( mail address )
17 text/..
16Peachy MediaWiki Bot API Version 0.1beta
16 application/vnd.php.serialized
15SiteSeekerCrawler/1.0
13 text/..
2 -
14Mozilla/5.0 (compatible; PaperLiBot/2.1)
14 text/..
1 image/..
1 application/vnd.php.serialized
13OhmsLawBot
13 text/..
13QuickFinder Crawler
13 text/..
1 -
12TrueKnowledgeBot bot mail address >
8 application/vnd.php.serialized
4 application/xml
1 text/..
12~Bot ([[:fr:w:User:TildeBot]] by [[:fr:w:User:Alphos]] mail address )
12 text/..
12FAST Enterprise Crawler 6 used by viaapia (viaapia)
11 text/..
1 -
12Mozilla/4.0 (compatible; EmberSpider 0.8; Scout (a); bgft)
12 text/..
12MR Crawler/Nutch-1.3
12 text/..
1 image/..
11Twitterbot/1.0
11 text/..
1 -
1 image/..
11AnomieBOT 1.0 (BAGBot)
7 application/json
4 text/..
11YBot/0.1
11 application/vnd.php.serialized
11AnomieBOT 1.0 (TemplateSubster)
11 application/json
1 text/..
10SurakWare MediaWiki Bot/1.0
10 text/..
10ibo2bot
10 text/..
10SiocWikiBot
10 text/..
10GNAA-bot
10 text/..
9DotNetWikiBot/2.96 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
9 text/..
1 application/xml
9TwengaBot-Discover
8 image/..
1 text/..
1 -
9Mozilla/5.0 QunarBot/1.0
9 text/..
1 -
1 image/..
9AnomieBOT 1.0 (OrphanReferenceFixer)
9 application/json
1 text/..
9CheMoBot/1.00
9 text/..
8XLinkBot/1.00
8 text/..
8Mozilla/5.0 (compatible; Birubot/1.0) Gecko/2009032608 Firefox/3.0.8
6 text/..
2 image/..
1 -
7('python-wikitools/1.2 (User:BernsteinBot)',)
7 application/json
7phpAPIbot 0.1
6 application/vnd.php.serialized
1 text/..
7infraEnterprise v8 Web Crawler
7 -
1 text/..
7wikbot/1.2 CFNetwork/485.13.9 Darwin/11.0.0
4 image/..
3 application/json
7WebCrawler/Nutch-1.2 (WebCrawler; WebCrawler)
7 text/..
1 image/..
7('python-wikitools/1.2 (User:LaraBot)',)
7 application/json
6HTMLParser/1.6
6 text/..
6TheKeens bot
6 text/..
6FAST Enterprise Crawler/5.3.4 ( mail address )
6 text/..
6 mail address (Mozilla compatible)
6 text/..
1 image/..
6bitlybot
6 text/..
1 -
1 image/..
6AniBot/0.9 php/curl
6 application/vnd.php.serialized
1 -
6DotNetWikiBot/2.96 (Microsoft Windows NT 6.1.7601 Service Pack 1; )
6 text/..
1 image/..
1 application/xml
6DotNetWikiBot/2.96 (Unix 5.10.0.0; )
3 application/xml
3 text/..
5Soundkiosk Relation-Crawler (Version 1.0; soundkiosk.de)
5 application/xml
5DotNetWikiBot/2.97 (Microsoft Windows NT 6.1.7600.0; )
5 text/..
1 application/xml
5AnomieBOT 1.0 (DeletionSortingCleaner)
5 application/json
54am-spider/1.0
5 text/..
5Wiktionary spider. mail address
5 text/..
5Mozilla/5.0 (compatible; Linux; Socialradarbot/2.0; en-US; mail address )
5 text/..
1 image/..
5DotNetWikiBot/2.9 (Unix 5.10.0.0; )
5 text/..
4MystBot/1.5 fr libwww-perl/6.02
4 text/..
4Freebase Deathbot
4 text/..
4AnomieBOT 1.0 (AFDMergeFromCleaner)
4 application/json
4wikbot/1.1 CFNetwork/485.12.30 Darwin/10.4.0
2 application/json
2 image/..
4tellit_rest_bot, contact mail address
2 application/x-wiki
2 text/..
4DotNetWikiBot/2.97 (Microsoft Windows NT 6.0.6000.0; )
4 text/..
1 -
4wikbot/1.1 CFNetwork/485.12.7 Darwin/10.4.0
2 application/json
2 image/..
1 text/..
4feedbot/0.0.1
4 text/..
4unblockbot/1.00
4 text/..
4SavingsDaily.com Deals Bot
4 text/..
4Handelabra WikiBot
3 application/vnd.php.serialized
1 text/..
4YourFilmsBot/0.1
4 application/json
4HTMLParser/1.4
4 text/..
4Geni ircpybot 1.0
2 application/json
2 text/..
4NFCCheckBot/1.0
4 text/..
3Mozilla/5.0 (compatible; FriendFeedBot/0.1; Http://friendfeed.com/about/bot; 370 subscribers; feed-id=3852576738117026533)
2 application/xml
1 -
3Friendly Spider 1.0 contact mail address
3 text/..
3Jabse.com Crawler v.2.0 www.jabse.com/crawler.php
3 text/..
1 application/xml
3Mozilla 5.0 (Apibot 0.30)
3 application/vnd.php.serialized
3LexxeBot/1.0 ( mail address )
3 text/..
1 application/vnd.php.serialized
3Mozilla 5.0 (Apibot 0.30b5)
3 application/vnd.php.serialized
3User-Agent: MyWikiBot/0.3
3 image/..
3Magus Bot 1.0
3 text/..
3User-Agent: MyWikiBot/0.2
3 image/..
3python-wikitools/1.2 (User:Mr.Z-bot)
3 application/json
1 text/..
3Silverbot/1.0 (https://github.com/thesilvervestgroup/silverbot)
2 text/..
1 application/json
3wikbot/1.21 CFNetwork/485.13.9 Darwin/11.0.0
2 image/..
1 application/json
1 text/..
3123peoplebot/1.0
3 text/..
3wikiparser/1 CFNetwork/454.12.4 Darwin/10.8.0 (x86_64) (MacPro5,1)
2 image/..
1 text/..
3Wikibot/1.1 CFNetwork/454.12.4 Darwin/10.8.0 (x86_64) (MacBookPro8,1)
2 image/..
1 application/json
1 text/..
3AnomieBOT 1.0 (RandomPagePicker)
3 application/json
3GoogleBot-Image/1.0 ASProxy/5.5b3
3 image/..
3SetLinks bot 1.0
3 text/..
3DotNetWikiBot/2.9 (Microsoft Windows NT 6.0.6000.0; )
3 text/..
3Mozilla/5.0 (Bgbot 0.5)
3 text/..
3IssueCrawler
3 text/..
13,621total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Wed, Aug 31, 2011 13:55
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.