Wikimedia Traffic Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Jun 2011 - 30 Jun 2011

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Op.Sys. / Browsers / Google

The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 58,424,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 393,221,000 external requests, which is 14.9%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
17,403google
14,231 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
837 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
529 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
371 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
193 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
149 code.google.com/appenginetext/..AppEngine-Google; (url; appid: ortografia4)
104 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
99 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
77 code.google.com/appengineapplication/jsonAppEngine-Google; (url; appid: s~redconceptual)
76 code.google.com/appenginetext/..AppEngine-Google; (url; appid: rarplayer)
74 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wikien4)
70 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
65 code.google.com/appengineapplication/jsonMozilla 3.5 AppEngine-Google; (url; appid: prfleme)
58 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
51 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
33 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
26 www.google.com/feedfetcher.htmlimage/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
24 code.google.com/appenginetext/..WikiBot/0.1 AppEngine-Google; (url; appid: newikipedia)
24 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
21 code.google.com/appenginetext/..AppEngine-Google; (url; appid: ortopedianew)
21 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wikien3)
19 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid: wikipedia-raw)
18 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
15 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
10 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
10 code.google.com/appenginetext/..AppEngine-Google; (url; appid: kbworld24)
10 www.google.com/bot.htmlimage/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
9 www.google.com/coop/cse/creftext/..FeedFetcher-Google-CoOp; (url)
7 code.google.com/appenginetext/..AppEngine-Google; (url; appid: boxapp)
6 code.google.com/appenginetext/..AppEngine-Google; (url; appid: findadvise)
6 code.google.com/appenginetext/..www.productontology.org/1.0 (Contact: mail address ) AppEngine-Google; (url; appid: gr4bing)
6 code.google.com/appengineapplication/jsonMWBOT GAE Edition AppEngine-Google; (url; appid: philip-bot)
5 code.google.com/appengineimage/..AppEngine-Google; (url; appid: d24-img)
5 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wiki-crawler00)
5 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wiki-crawler04)
5 code.google.com/appenginetext/..AppEngine-Google; (url; appid: mygpxy)
5 code.google.com/appenginetext/..AppEngine-Google; (url; appid: retimeme)
5 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wiki-crawler03)
5 code.google.com/appengineapplication/jsonPython-urllib/2.5 AppEngine-Google; (url; appid: loeschmonitor)
4 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wiki-crawler01)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: proxworx)
4 sites.google.com/site/bendercrawlertext/..Mozilla/5.0 (compatible; Bender; url)
4 code.google.com/appenginetext/..Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1 AppEngine-Google; (url; appid: hinamaturior)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: proxypy41)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wiki-crawler02)
4 code.google.com/p/crawler4j/text/..crawler4j (url)
4 code.google.com/appenginetext/..oohEmbed.com AppEngine-Google; (url; appid: oohembed)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: tortelliniman)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: d24-img)
3 code.google.com/p/ldspider/wiki/Robotstext/..ldspider (BTC 2011 crawl, mail address , url)
3 docs.google.comimage/..Mozilla/5.0 (compatible; GoogleDocs; url)
3 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: dustbunnytycoonmonitor)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: lullar-data),gzip(gfe) (via translate.google.com)
3 docs.google.comimage/..Mozilla/5.0 (compatible; GoogleDocs; kix; url)
13,683yahoo
9,253 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
3,048 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
996 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
145 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
51 listing.yahoo.co.jp/support/faq/int/other/other_001.htmltext/..Y!J-BRJ/YATS crawler (url)
43 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
19 help.yahoo.com/help/us/ysearch/slurpapplication/vnd.php.serializedMozilla/5.0 (compatible Yahoo! Slurp/3.0 url)
17 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
17 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp; url)
14 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
13 help.yahoo.com/help/us/ysearch/crawling/crawling-01.htmltext/..Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; url)
12 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmlimage/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
12 misc.yahoo.com.cn/help.html-Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
10 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp; url)
7 developer.yahoo.com/yql/providertext/..Mozilla/5.0 (compatible; Yahoo Pipes 2.0; url) Gecko/20090729 Firefox/3.5.2
6 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRT/1.0 crawler (url)
6 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
3 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
12,587facebook
7,619 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
4,658 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
257 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.1 (url)
40 developers.facebook.comimage/..facebookplatform/1.0 (url)
9 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.1 (url)
3 www.facebook.com/externalhit_uatext.php-facebookexternalhit/1.0 (url)
6,650bing
4,831 www.bing.com/bingbot.htmtext/..Mozilla/5.0 (compatible; bingbot/2.0; url)
1,802 www.bing.com/bingbot.htm-Mozilla/5.0 (compatible; bingbot/2.0; url)
7 www.bing.com/bingbot.htmimage/..Mozilla/5.0 (compatible; bingbot/2.0; url)
4 www.bing.com/bingbot.htmapplication/xmlMozilla/5.0 (compatible; bingbot/2.0; url)
6,583google?
5,928 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
228 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
183 www.google.com/bot.htmlapplication/vnd.php.serializedMozilla/5.0 (compatible; GoogleBot/2.1; url)
72 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
48 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
42 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
38 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
15 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
11 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
5 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url) ASProxy/5.5b3
5 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
2,049yandex
1,612 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexBot/3.0; url)
244 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexImages/3.0; url)
85 yandex.com/bots-Mozilla/5.0 (compatible; YandexBot/3.0; url)
64 yandex.com/botsimage/..Mozilla/5.0 (compatible; YandexImages/3.0; url)
12 yandex.com/botsimage/..Mozilla/5.0 (compatible; YandexImageResizer/2.0; url)
11 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexDirect/3.0; url)
9 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexBot/3.0; MirrorDetector; url)
4 yandex.com/bots-Mozilla/5.0 (compatible; YandexImages/3.0; url)
1,851naver
1,790 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
38 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
10 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
9 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url) ASProxy/5.5b5
1,248msn
457 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)._
285 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
151 search.msn.com/msnbot.htmtext/..msnbot-NewsBlogs/2.0b (url)
125 search.msn.com/msnbot.htmtext/..msnbot-Products/1.0 (url)
119 search.msn.com/msnbot.htmtext/..msnbot-media/1.1 (url)
102 search.msn.com/msnbot.htmimage/..msnbot-media/1.1 (url)
4 search.msn.com/msnbot.htmtext/..User-Agent :msnbot/2.0b (url)._
4 search.msn.com/msnbot.htmtext/..msnbot-UDiscovery/2.0b (url)
1,094baidu
983 www.baidu.com/search/spider.htmltext/..Mozilla/5.0 (compatible; Baiduspider/2.0; url)
47 www.baidu.com/search/spider.htmtext/..Baiduspider-image(url)
38 www.baidu.com/search/spider.html-Mozilla/5.0 (compatible; Baiduspider/2.0; url)
13 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
5 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
478youdao
451 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
18 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
5 toolbar.youdao.com/image/..Youdao Toolbar (url)
3 www.youdao.com/help/webmaster/spider/application/vnd.php.serializedMozilla/5.0 (compatible; YoudaoBot/1.0; url; )
405frontpagesearch
269 frontpagesearch.nettext/..WordPress/3.1.3; url
136 frontpagesearch.netimage/..WordPress/3.1.3; url
385entireweb
376 www.entireweb.com/about/search_tech/speedy_spider/text/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
5 www.entireweb.com/about/search_tech/speedy_spider/-Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
351traslated
351 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
310www.
139 www.text/..GoogleBot-Image/1.0 ( urlGoogleBot.com/bot.html)
112 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
57 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
274sblog
166 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
52 fulltext.sblog.cz/text/..SeznamBot/3.0 (url)
43 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
8 fulltext.sblog.cz/text/..SeznamBot/3.0-test (url)
3 fulltext.sblog.cz/-SeznamBot/3.0 (url)
267php
135 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
52 pear.php.net/application/xmlPEAR HTTP_Request class ( url )
49 pear.php.net/package/http_request2text/..HTTP_Request2/0.5.2 (url) PHP/5.2.17
26 pear.php.net/text/..PEAR HTTP_Request class ( url )
3 pear.php.net/package/http_request2text/..HTTP_Request2/2.0.0RC1 (url) PHP/5.3.2
226exabot
159 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
62 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); url)
5 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
224bsurprised
189 bsurprised.com/text/..BSurprised WikiBox 0.1.3 (url)
35 bsurprised.com/text/..BSurprised WikiBox 0.1 (url)
21780legs
178 www.80legs.com/webcrawler.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url) Gecko/2008032620
38 www.80legs.com/webcrawler.htmlimage/..Mozilla/5.0 (compatible; 008/0.83; url) Gecko/2008032620
198enwp
148 enwp.org/User:SDPatrolBottext/..SDPatrolBot (url)
34 enwp.org/User:H3llkn0wz/WikiSharpAPItext/..WikiSharpAPI/0.3 url (C# .NET)
15 enwp.org/User:KingpinBottext/..KingpinBot (url)
194majestic12
193 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.3; url)
180wikipedia
102 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/2.1.13.0 url
60 en.wikipedia.org/wiki/User:NicoV/Wikipedia_Cleaner/Documentationtext/..WikiCleaner (url)
4 en.wikipedia.orgtext/..url
3 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/2.1.12.0 url
3 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/2.1.1.0 url
180embed
94 support.embed.ly/image/..Mozilla/5.0 (compatible; Embedly/0.2; url)
86 support.embed.ly/text/..Mozilla/5.0 (compatible; Embedly/0.2; url)
174wordpress
22 kterrl.wordpress.comtext/..WordPress/MU; url
20 quantenheilungen.wordpress.comtext/..WordPress/MU; url
14 arthur2rcasc.wordpress.comtext/..WordPress/MU; url
11 driwancybermuseum.wordpress.comtext/..WordPress/MU; url
4 tgbp.wordpress.comtext/..WordPress/MU; url
4 wannareadenglish.wordpress.comtext/..WordPress/MU; url
4 bibi3736.wordpress.comtext/..WordPress/MU; url
4 christopherboe.wordpress.comtext/..WordPress/MU; url
3 systemischenergetischescoaching.wordpress.comtext/..WordPress/MU; url
3 goodjohnjr.wordpress.comtext/..WordPress/MU; url
3 aremarweb.wordpress.comtext/..WordPress/MU; url
166FeedBurner
164 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
150rootza
148 www.rootza.comapplication/xmlRootzaCrawler 2.0 (url)
149yacy
45 yacy.net/bot.htmltext/..yacybot (sciencenet-any; amd64 Linux 2.6.32-24-generic; java 1.6.0_18; Europe/en) url
19 yacy.net/bot.htmltext/..yacybot (freeworld/global; amd64 Linux 2.6.32-32-generic; java 1.6.0_20; Europe/en) url
14 yacy.net/bot.htmltext/..yacybot (freeworld/global; amd64 Linux 2.6.31-gentoo-r6; java 1.6.0_17; Etc/en) url
12 yacy.net/bot.htmltext/..yacybot (freeworld/global; i386 Linux 2.6.35-gentoo-r4; java 1.6.0_20; Europe/el) url
8 yacy.net/bot.htmltext/..yacybot (sciencenet/any; amd64 Linux 2.6.32-24-generic; java 1.6.0_18; Europe/en) url
7 yacy.net/bot.htmltext/..yacybot (freeworld/global; amd64 Linux 2.6.32-32-server; java 1.6.0_24; Europe/de) url
6 yacy.net/bot.html-yacybot (freeworld/global; amd64 Linux 2.6.32-32-generic; java 1.6.0_20; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (freeworld/global; amd64 Windows 7 6.1; java 1.6.0_25; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (freeworld/global; x86 Windows 7 6.1; java 1.6.0_26; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (freeworld/global; amd64 Linux 2.6.31-23-server; java 1.6.0_24; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (freeworld/global; amd64 Linux 2.6.32-33-generic; java 1.6.0_24; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (freeworld/global; i386 Linux 2.6.38-8-generic-pae; java 1.6.0_22; Europe/de) url
140sogou
121 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
8 www.sogou.com/docs/help/webmasters.htm#07image/..Sogou Pic Spider/3.0(url)
7 www.sogou.com/docs/help/webmasters.htm#07application/vnd.php.serializedSogou web spider/4.0(url)
135toolserver
88 wiki.toolserver.org/view/GeoHacktext/..Geohack (url)
33 toolserver.org/~bayo/text/..LudoThecaire/1.0 (url)
8 toolserver.org/~dispenser/text/..WebWikipedia Python/2.6 (url)
3 toolserver.org/~guandalug/application/vnd.php.serializedGuandalugs PHPWikiBot/1.1 (url;de:User:Guandalug)
134scoutjet
134 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
115sitebot
115 www.sitebot.org/robot/text/..Mozilla/5.0 (compatible; SiteBot/0.1; url)
114garlik
105 garlik.com/text/..GarlikCrawler/1.1 (url, mail address )
9 garlik.com/text/..GarlikCrawler/1.1 (url)
114sf
38 liferea.sf.net/text/..Liferea/0.x.x (Linux; en_US.UTF-8; url)
38 magpierss.sf.nettext/..MagpieRSS/0.7x (url)
37 liferea.sf.net/text/..Liferea/1.x.x (Linux; es_ES.UTF-8; url)
96ayna
96 www.ayna.comtext/..Mozilla/5.0 (compatible; Ayna url)
94goo
71 help.goo.ne.jp/contact/text/..goo wikipedia (url)
19 help.goo.ne.jp/help/article/1142/text/..DoCoMo/2.0 P900i(c100;TB;W24H11) (compatible; ichiro/mobile goo; url)
90echonest
74 the.echonest.com/reader/application/xmlnestReader/0.3 (discovery; url; reader at echonest.com)
10 the.echonest.com/reader/text/..nestReader/0.3 (discovery; url; reader at echonest.com)
6 the.echonest.com/reader/image/..nestReader/0.3 (discovery; url; reader at echonest.com)
89soso
84 help.soso.com/webspider.htmtext/..Sosospider(url)
3 help.soso.com/webspider.htm-Sosospider(url)
88gulliway
81 gulliway.orgapplication/xmlMozzila/5.0 (Windows NT 5.1; GulliwayBot/01 url)
7 gulliway.orgtext/..Mozzila/5.0 (Windows NT 5.1; GulliwayBot/01 url)
88wikimedia
83 tools.wikimedia.de/~daniel/text/..WikiSense (url)
86ac
46 www.tkl.iis.u-tokyo.ac.jp/~crawler/text/..Mozilla/5.0 (compatible; Steeler/3.5; url)
35 ce.yazduni.ac.irtext/..Mozilla/5.0 (compatible; heritrix/1.14.4 url)
80semager
64 www.semager.de/blog/semager-bots/text/..Mozilla/5.0 (compatible; Semager/1.4; url)
15 www.semager.de/blog/semager-bots/application/jsonMozilla/5.0 (compatible; Semager/1.4; url)
78jetbrains
40 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 2.0 Release Candidate 1 (url)
38 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 1.0.x (url)
76avantbrowser
39 www.avantbrowser.comtext/..Avant Browser (url)
37 www.avantbrowser.comtext/..Advanced Browser (url)
76feedshow
39 www.feedshow.comtext/..FeedshowOnline (url)
37 www.feedshow.comtext/..Feedshow/x.0 (url; 1 subscriber)
74kosmix
70 www.kosmix.com/html/kosmos.htmlapplication/xmlMozilla/5.0(compatible;Kosmos/1.0;url)
4 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
73newsgator
37 www.newsgator.comtext/..NewsGatorOnline/2.0 (url; 1 subscribers)
36 www.newsgator.com/text/..FeedDemon/2.7 (url; Microsoft Windows XP)
69mediawiki
69 www.mediawiki.org/text/..MediaWiki OAI Harvester 0.2 (url)
67daum
66 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
64discoveryengine
54 discoveryengine.com/discobot.htmltext/..Mozilla/5.0 (compatible; discobot/1.1; url)
10 discoveryengine.com/discobot.htmlimage/..Mozilla/5.0 (compatible; discobot/1.1; url)
63emining
60 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
3 emining.jp/-emBot-GalaBuzz/Nutch-1.0 (url; mail address )
44sentymetr
23 sentymetr.pl/bot.htmlapplication/jsonMozilla/5.0 (compatible; SentymetrBot 1.0; url)
21 sentymetr.pl/bot.htmltext/..Mozilla/5.0 (compatible; SentymetrBot 1.0; url)
44freebase
43 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
40zootycoon
40 www.zootycoon.comtext/..Zoo Tycoon 2 Client -- url
39rssbandit
39 www.rssbandit.orgtext/..RssBandit/1.5.0.10 (WinNT 5.1.2600.0; url) (WinNT 5.1.2600.0; )
39ponderer
39 ponderer.org/download/annotate_google.user.jstext/..annotate_google; url
39graemef
39 graemef.comtext/..NewsGator FetchLinks extension/0.2.0 (url)
39tinyurl
38 tinyurl.com/64t5ntext/..Rome Client (url) Ver: 0.9
39rssreader
39 www.rssreader.comtext/..RssReader/1.0.xx.x (url) Microsoft Windows NT 5.1.2600.0
39it-influentials
39 search.it-influentials.com/bot.htmtext/..Mozilla/5.0 (compatible;FindITAnswersbot/1.0;url)
38zipcommander
38 www.zipcommander.com/text/..1st ZipCommander (Net) - url
38timewe
38 timewe.nettext/..CDR/1.7.1 Simulator/0.7(url) Profile/MIDP-1.0 Configuration/CLDC-1.0
38snarfware
38 www.snarfware.com/text/..Snarfer/0.x.x (url)
38plagger
38 plagger.org/text/..Plagger/0.x.xx (url)
38ranchero
38 ranchero.com/netnewswire/text/..NetNewsWire/2.x (Mac OS X; url)
38feeds4all
38 www.feeds4all.com/feedzcollectortext/..FeedZcollector v1.x (Platinum) url
38blogbridge
38 www.blogbridge.com/text/..BlogBridge 2.13 (url)
38nemui
38 mozshot.nemui.org/text/..Mozilla/5.0 (Gecko/20070310 Mozshot/0.0.20070628; url)
38seebot
38 seebot.orgtext/..Lynx/2.8 (;url)
37kula
37 kula.jp/endotext/..endo/1.0 (Mac OS X; ppc i386; url)
37winpodder
37 winpodder.comtext/..WinPodder (url)
37orcabrowser
37 www.orcabrowser.comtext/..Orca Browser (url)
36weblio
33 www.weblio.jp/text/..Mozilla/5.0 (compatible; WeblioBot; url)
35hatena
32 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
3 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
33textdigger
33 textdigger.comtext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
32whatrhymeswith
32 www.whatrhymeswith.com/site/rhyme-bottext/..RhymeBot/0.1 (url)
31weitz
20 weitz.de/drakma/image/..Drakma/1.0.0 (LispWorks 5.1.2; FreeBSD; 5.4-PRERELEASE; url)
10 weitz.de/drakma/text/..Drakma/1.0.0 (LispWorks 5.1.2; FreeBSD; 5.4-PRERELEASE; url)
31picsearch
28 www.picsearch.com/bot.htmltext/..psbot/0.1 (url)
29simplepie
17 simplepie.orgapplication/xmlSimplePie/1.2 (Feed Parser; url; Allow like Gecko) Build/20090627192103
10 simplepie.orgtext/..SimplePie/1.2 (Feed Parser; url; Allow like Gecko) Build/20090627192103
29flipboard
14 flipboard.com/browserproxyimage/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (FlipboardProxy/0.0.5; url)
8 flipboard.com/browserproxytext/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (FlipboardProxy/0.0.5; url)
7 flipboard.com/browserproxytext/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (FlipboardProxy/1.1; url)
28Anonymouse
19 Anonymouse.org/text/..url (Unix)
9 Anonymouse.org/image/..url (Unix)
28archive
27 www.archive.org/details/archive.org_bottext/..Mozilla/5.0 (compatible; archive.org_bot url)
27bibalex
17 archive.bibalex.org/bot/image/..Mozilla/5.0 (compatible; archive.bibalex.org_bot; url)
10 archive.bibalex.org/bot/text/..Mozilla/5.0 (compatible; archive.bibalex.org_bot; url)
26turnitin
26 www.turnitin.com/robot/crawlerinfo.htmltext/..TurnitinBot/2.1 (url)
26z-add
24 w3.z-add.co.uk/linkcheck/text/..Z-Add Link Checker (url)
23puritysearch
23 www.puritysearch.net/text/..Mozilla/5.0 (compatible; Purebot/1.1; url)
23github
10 github.com/pauldix/typhoeus/tree/mastertext/..Typhoeus - url
10 github.com/NeilCrosby/wikislurpapplication/vnd.php.serializedWikiSlurp (url)
23rcdtokyo
19 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
4 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
23archive-it
16 archive-it.org/files/site-owners.htmlimage/..Mozilla/5.0 (compatible;archive.org_bot; Archive-It; url) Firefox/0.0
7 archive-it.org/files/site-owners.htmltext/..Mozilla/5.0 (compatible;archive.org_bot; Archive-It; url) Firefox/0.0
23netnewswireapp
23 netnewswireapp.com/mac/-NetNewsWire/3.2.15 (Mac OS X; url; gzip-happy)
22sourceforge
20 fess.sourceforge.jp/bot.htmltext/..Mozilla/5.0 (compatible; Fess/4.0; url)
22spinn3r
19 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux x86_64; en-US; rv:1.9.0.19; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/2010040121 Firefox/3.0.19
21alexa
21 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
21dataparksearch
19 dataparksearch.org/bottext/..DataparkSearch/4.54-26052011 (url)
19fairshare
13 fairshare.cctext/..Mozilla/5.0 url (X11; FreeBSD i386; en-US; rv:1.2a) Gecko/20021021
4 fairshare.cctext/..Mozilla crawl/5.0 (compatible; fairshare.cc url)
18topsy
18 labs.topsy.com/butterfly/text/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
18dotnetdotcom
18 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
17froute
14 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
3 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
16netarkivet
10 netarkivet.dk/website/info.htmltext/..Mozilla/5.0 (compatible; heritrix/1.5.0-200506132127 url)
3 netarkivet.dk/website/info.htmlimage/..Mozilla/5.0 (compatible; heritrix/1.12.1b url)
15drupal
9 drupal.org/text/..User-Agent: Drupal (url)
4 drupal.org/text/..Drupal (url)
15suggy
15 blog.suggy.com/was-ist-suggy/suggy-webcrawler/text/..Mozilla/5.0 (compatible; suggybot v0.01a, url)
15ibis
10 ibis.ne.jp/browser/about.htmlimage/..Mozilla/4.0 (compatible; ibisBrowser; url)
3 ibis.ne.jp/browser/about.htmltext/..Mozilla/4.0 (compatible; ibisBrowser; url)
15apache
15 lucene.apache.org/nutch/bot.htmltext/..NutchCVS/0.7.2 (Nutch; url; mail address )
14advertising
14 sl.advertising.comtext/..Mozilla/5.0 (compatible; AOL Sponsored Listing Contextual Crawler/0.8; url)
144chat
14 www.4chat.tvtext/..url
14searchtechnologies
14 www.searchtechnologies.comtext/..Mozilla/5.0 (compatible; heritrix/1.14.3 url)
14edu
12 ws.nju.edu.cn/falcons/text/..Mozilla/5.0 (compatible; Falconsbot; url)
14rockpeaks
14 www.rockpeaks.com/contacttext/..RockPeaks/0.1 (url)
13superfeedr
13 superfeedr.comapplication/xmlSuperfeedr: Superparser bot/1.1 url - Please read this http://blog.superfeedr.com/publishers.html or get in touch if we are polling too hard
12wise-guys
10 www.wise-guys.nl/text/..Mozilla/4.0 (compatible; Vagabondo/4.0/CGM; url)
12creativecommons
12 wiki.creativecommons.org/Metadata_Scrapertext/..CC Metadata Scaper url
11sygol
11 www.sygol.comtext/..SygolBot url
11arquivo
6 arquivo.pt/faq-crawlingimage/..Arquivo-web-crawler (compatible; heritrix/1.14.3 url)
5 arquivo.pt/faq-crawlingtext/..Arquivo-web-crawler (compatible; heritrix/1.14.3 url)
10linkedin
6 www.linkedin.comimage/..LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta Commons-HttpClient/3.1 url)
4 www.linkedin.comtext/..LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta Commons-HttpClient/3.1 url)
10tumblr
9 benderthewebrobot.tumblr.comtext/..Mozilla/5.0 (compatible; Bender; url)
71,492total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
5,095PythonWikipediaBot/1.0
3,711 application/json
1,332 application/xml
52 text/..
1 -
1 image/..
1,768GoogleBot-Image/1.0
1,079 text/..
630 image/..
59 -
1 application/pdf
1,167Peachy MediaWiki Bot API Version 1.0
1,167 application/vnd.php.serialized
1 text/..
921MediaWikiCrawler-Google/2.0 ( mail address )
919 text/..
2 -
506Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
506 text/..
1 -
1 application/ogg
1 application/vnd.php.serialized
433php wikibot classes
423 application/vnd.php.serialized
10 text/..
413GoogleBot-Image/1.0
374 text/..
28 image/..
11 application/vnd.php.serialized
1 -
359LinkParser/2.0
359 text/..
294spider
294 text/..
1 image/..
292GoogleBot/2.1
292 text/..
1 image/..
255Answersbot
255 text/..
251Onespot Crawler
189 application/json
59 text/..
3 -
215 mail address
212 application/vnd.php.serialized
3 text/..
1 application/json
179GoogleBot-News
178 text/..
1 -
1 image/..
1 application/xml
154ClueBot/2.0
154 application/vnd.php.serialized
151ClueBot/1.1
151 application/vnd.php.serialized
1 text/..
143HTMLParser/2.0
143 text/..
1 -
135wikiwix-bot-3.0
130 text/..
5 image/..
1 -
112Opera/8.01 (J2ME/MIDP; MXit WebBot/1.3.1.0) Opera Mini/3.1
95 application/vnd.wap.xhtml+xml
9 image/..
8 text/..
1 -
109DotNetWikiBot/2.81 (Microsoft Windows NT 6.1.7601 Service Pack 1; )
89 text/..
18 application/xml
2 image/..
103MoovidaBot/0.1
103 text/..
98Mozilla/5.0 (compatible; Ezooms/1.0; mail address )
96 text/..
1 image/..
1 application/ogg
1 application/xml
1 application/vnd.php.serialized
1 audio/midi
91HTMLParser/1.6
74 text/..
17 application/json
78Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
54 image/..
24 text/..
1 -
1 application/json
1 application/x-javascript
67Test Webbot
67 text/..
66gosospider "Mozilla/5.0
66 text/..
1 -
1 application/xml
1 application/ogg
66YBot/0.1
66 application/vnd.php.serialized
59TVersity Media Robot
59 text/..
57SiocWikiBot/1.0
53 application/vnd.php.serialized
4 text/..
53Pywikipediabot/2.0
53 application/json
49ROCKMELT-BOT
49 application/xml
1 text/..
45CorenSearchBot/1.5 en libwww-perl/5.834
45 text/..
43COMODOspider/Nutch-1.0
43 text/..
1 image/..
1 video/ogg
41Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
33 image/..
8 text/..
1 application/json
1 application/x-javascript
39PhiloBot/0.1
39 text/..
38Mozilla/5.0 (X11; Linux i686; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 SnapPreviewBot
38 text/..
1 -
38UCMore Crawler App
38 text/..
37Mozilla/5.0 (compatible; SnapPreviewBot; en-US; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9
37 text/..
1 -
35AnomieBOT 1.0 (TagDater)
35 application/json
1 text/..
35MediaWiki::Bot/3.2.6
35 application/json
1 -
1 text/..
33MLBot (www.metadatalabs.com/mlbot)
23 text/..
10 application/vnd.php.serialized
33.NET Client Parser
33 application/xml
1 text/..
33SineBot/1.5.17(User:SineBot)
32 application/vnd.php.serialized
1 text/..
31PyCrawler
31 text/..
28DotNetWikiBot/2.97 (Unix 5.10.0.0; )
28 application/xml
1 text/..
28crawler4j
28 text/..
1 image/..
1 application/xml
26VWBot - CorenSearchBot/1.5 en derivative
26 text/..
25OhmsLawBot
25 text/..
23AnomieBOT 1.0 (ReplaceExternalLinks2)
23 application/json
1 text/..
23php WalkingSoulBot
23 application/vnd.php.serialized
1 text/..
23FAST Enterprise Crawler 6 used by Microsoft ( mail address )
23 text/..
22GoogleBot
22 text/..
1 image/..
22HRoestBot, de-wikipedia using pywikipedia framework
10 application/json
8 application/xml
4 text/..
21DotNetWikiBot/2.97 (Microsoft Windows NT 6.1.7600.0; )
21 text/..
1 application/xml
20COIBot/1.00
20 text/..
19Mozilla/5.0 MaboMwFramework/1.1 (w:de:MerlIwBot)
19 text/..
19DotNetWikiBot/2.7 (Microsoft Windows NT 6.1.7600.0; )
18 text/..
1 image/..
18python-wikitools/1.2 (User:Mr.Z-bot)
18 application/json
18COIBot/2.0
18 text/..
17Peachy MediaWiki Bot API Version 0.1beta
17 application/vnd.php.serialized
17Tawbot (public svn release; plwiki)
17 text/..
16Twitterbot/0.1
16 text/..
1 -
1 image/..
15wikbot/1.0 CFNetwork/485.13.9 Darwin/11.0.0
10 image/..
5 application/json
1 text/..
15ibo2bot
15 text/..
15AnomieBOT 1.0 (BAGBot)
9 application/json
6 text/..
15MediaWiki::Bot/3.1.6 (User:SporkBot)
15 application/json
14Friendly Spider 1.0 contact mail address
14 text/..
14Mozilla/5.0 QunarBot/1.0
14 text/..
1 -
14AnomieBOT 1.0 (OrphanReferenceFixer)
14 application/json
14AnomieBOT 1.0 (TemplateSubster)
14 application/json
13Mozilla/5.0 (compatible; PaperLiBot/2.1)
13 text/..
1 image/..
12DotNetWikiBot/2.96 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
12 text/..
1 application/xml
12SiteSeekerCrawler/1.0
11 text/..
1 -
12FAST Enterprise Crawler 6 used by viaapia (viaapia)
12 text/..
1 -
11ReadonlyBot
11 text/..
11TrueKnowledgeBot bot mail address >
6 application/vnd.php.serialized
5 application/xml
11~Bot ([[:fr:w:User:TildeBot]] by [[:fr:w:User:Alphos]] mail address )
11 text/..
11DotNetWikiBot/2.97 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
11 text/..
1 application/xml
10Twitterbot/1.0
10 text/..
1 -
1 image/..
10SurakWare MediaWiki Bot/1.0
10 text/..
9('python-wikitools/1.2 (User:BernsteinBot)',)
9 application/json
9MystBot/1.5 fr libwww-perl/6.02
9 text/..
8TheKeens bot
8 text/..
8phpAPIbot 0.1
7 application/vnd.php.serialized
1 text/..
1 -
8feedbot/0.0.1
8 text/..
8infraEnterprise v8 Web Crawler
8 -
1 text/..
8DotNetWikiBot/2.96 (Microsoft Windows NT 6.1.7601 Service Pack 1; )
7 text/..
1 application/xml
1 image/..
8CheMoBot/1.00
8 text/..
7AnomieBOT 1.0 (AFDMergeFromCleaner)
7 application/json
7SiocWikiBot
7 text/..
7('python-wikitools/1.2 (User:LaraBot)',)
7 application/json
6User-Agent: MyWikiBot/0.2
6 image/..
6Oxyme.Search - Web crawler
6 text/..
1 application/xml
1 application/x-external-editor
6XLinkBot/1.00
6 text/..
6MR Crawler/Nutch-1.3
6 text/..
5Citation_bot; mail address
5 text/..
5DotNetWikiBot/2.97 (Microsoft Windows NT 6.1.7601 Service Pack 1; )
5 text/..
1 application/xml
5KAZ.KZ Spider
5 text/..
5AnomieBOT 1.0 (DeletionSortingCleaner)
5 application/json
5bitlybot
5 text/..
1 -
1 image/..
1 application/ogg
5Handelabra WikiBot
4 application/vnd.php.serialized
1 text/..
5DotNetWikiBot/2.96 (Microsoft Windows NT 6.1.7600.0; )
5 text/..
1 application/xml
1 image/..
5vspider
5 text/..
5Mozilla/5.0 (compatible; Birubot/1.0) Gecko/2009032608 Firefox/3.0.8
5 text/..
1 -
1 image/..
5DotNetWikiBot/2.9 (Unix 5.10.0.0; )
5 text/..
4Soundkiosk Relation-Crawler (Version 1.0; soundkiosk.de)
4 application/xml
4TwengaBot-Discover
3 image/..
1 text/..
1 -
4Mozilla 5.0 (Apibot 0.30b5)
4 application/vnd.php.serialized
4JavaCrawler/1.1
4 text/..
4Freebase Deathbot
4 text/..
4MediaWiki::Bot/3.1.6 (User:Plasticspork)
4 application/json
4Silverbot/1.0 (https://github.com/thesilvervestgroup/silverbot)
2 application/json
2 text/..
4 mail address (Mozilla compatible)
4 text/..
1 image/..
4unblockbot/1.00
4 text/..
4Geni ircpybot 1.0
2 application/json
2 text/..
1 application/xml
4Mozilla/5.0 (Bgbot 0.5)
4 text/..
3Jabse.com Crawler v.2.0 www.jabse.com/crawler.php
3 text/..
3CheckBot/1.0
3 text/..
3TextBot 0.3
3 text/..
1 -
3Perfect Search Crawler
3 text/..
1 application/rsd+xml
3Inlibris.com XMLBot/1.0
3 text/..
3Mozilla/5.0 (compatible; Unknown; ; crawler at example dot com)
3 text/..
3AniBot/0.9 php/curl
3 application/vnd.php.serialized
1 -
3ReapETbot/1.0.0 (incompatible-notwebbrowser:robot:exclusion-noncompliant) bot>
3 text/..
3robert bot
3 text/..
3DotNetWikiBot/2.96 (Unix 5.10.0.0; )
2 application/xml
1 text/..
3AnomieBOT 1.0 (RandomPagePicker)
3 application/json
3wikbot/1.1 CFNetwork/485.13.9 Darwin/11.0.0
2 image/..
1 application/json
1 text/..
3Jyxobot/1
3 text/..
15,080total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Fri, Jul 15, 2011 22:17
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.