Wikimedia Traffic Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Sep 2011 - 30 Sep 2011

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Op.Sys. / Browsers / Google
WMF traffic logging service suffered from server capacity problems in August and September 2011.
Absolute traffic counts for September 2011 are approximatly 19% too low.
Data loss only occurred during peak hours. It therefore may have had different impact for traffic from different parts of the world.
and may have also skewed relative figures like share of traffic per browser or operating system.

The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 54,611,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 330,454,000 external requests, which is 16.5%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
19,965google
16,033 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
1,025 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
636 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
589 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
418 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
178 code.google.com/appenginetext/..AppEngine-Google; (url; appid: ortografia4)
90 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
84 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
77 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
74 www.google.com/feedfetcher.htmlimage/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
74 code.google.com/appenginetext/..AppEngine-Google; (url; appid: rarplayer)
70 code.google.com/appengineapplication/jsonAppEngine-Google; (url; appid: s~redconceptual)
59 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
58 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wikien4)
50 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
47 code.google.com/appenginetext/..AppEngine-Google; (url; appid: ortopedianew)
39 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
37 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wikien3)
31 code.google.com/appengineimage/..AppEngine-Google; (url; appid: tinysrc)
19 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid: wikipedia-raw)
18 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
17 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
14 www.google.com/coop/cse/creftext/..FeedFetcher-Google-CoOp; (url)
14 code.google.com/p/crawler4j/text/..crawler4j (url)
14 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
13 code.google.com/appenginetext/..AppEngine-Google; (url; appid: mygpxy)
12 code.google.com/appenginetext/..AppEngine-Google; (url; appid: boxapp)
11 code.google.com/appenginetext/..AppEngine-Google; (url; appid: s~expinia-wiki)
10 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
8 code.google.com/appenginetext/..WikiBot/0.1 AppEngine-Google; (url; appid: newikipedia)
7 code.google.com/appenginetext/..AppEngine-Google; (url; appid: s~harunakaze)
7 www.google.com/bot.htmlimage/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
6 code.google.com/appengineapplication/jsonMWBOT GAE Edition AppEngine-Google; (url; appid: philip-bot)
5 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wiki-crawler01)
5 code.google.com/appengineimage/..AppEngine-Google; (url; appid: d24-img)
5 code.google.com/appenginetext/..AppEngine-Google; (url; appid: my-reg)
5 code.google.com/appenginetext/..AppEngine-Google; (url; appid: retimeme)
5 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wiki-crawler03)
5 code.google.com/appenginetext/..AppEngine-Google; (url; appid: d24-img)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wiki-crawler00)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wiki-crawler04)
4 code.google.com/appenginetext/..AppEngine-Google; (url; appid: wiki-crawler02)
4 code.google.com/appenginetext/..www.productontology.org/1.0 (Contact: mail address ) AppEngine-Google; (url; appid: gr4bing)
4 code.google.com/appengineapplication/jsonAppEngine-Google; (url; appid: s~sp-echo)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: finchproxy)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: kbworld24)
3 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
3 docs.google.comimage/..Mozilla/5.0 (compatible; GoogleDocs; documents; url)
13,770yahoo
8,735 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
3,047 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
1,577 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
118 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
66 listing.yahoo.co.jp/support/faq/int/other/other_001.htmltext/..Y!J-BRJ/YATS crawler (url)
41 developer.yahoo.com/yql/providertext/..Mozilla/5.0 (compatible; Yahoo Pipes 2.0; url) Gecko/20090729 Firefox/3.5.2
36 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
21 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmlimage/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
17 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp; url)
17 help.yahoo.com/help/us/ysearch/slurpapplication/vnd.php.serializedMozilla/5.0 (compatible Yahoo! Slurp/3.0 url)
15 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
14 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
13 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp; url)
10 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRW/1.0 crawler (url)
10 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
8 misc.yahoo.com.cn/help.html-Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
7 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
5 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRT/1.0 crawler (url)
4 help.yahoo.com/help/us/ysearch/slurpapplication/jsonMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
3 help.yahoo.com/help/us/ysearch/slurpapplication/x-javascriptMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
11,249facebook
9,182 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
1,836 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
170 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.1 (url)
53 developers.facebook.comimage/..facebookplatform/1.0 (url)
7 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.1 (url)
5,470bing
3,551 www.bing.com/bingbot.htmtext/..Mozilla/5.0 (compatible; bingbot/2.0; url)
1,896 www.bing.com/bingbot.htm-Mozilla/5.0 (compatible; bingbot/2.0; url)
10 www.bing.com/bingbot.htmapplication/vnd.php.serializedMozilla/5.0 (compatible; bingbot/2.0; url)
4 www.bing.com/bingbot.htmapplication/xmlMozilla/5.0 (compatible; bingbot/2.0; url)
3 www.bing.com/bingbot.htmimage/..Mozilla/5.0 (compatible; bingbot/2.0; url)
3 www.bing.com/bingbot.htmtext/..Mozilla/5.0 (compatible; bingbot/2.0; url) ASProxy/5.5b3
4,742google?
4,154 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
360 www.google.com/bot.htmlapplication/vnd.php.serializedMozilla/5.0 (compatible; GoogleBot/2.1; url)
135 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
42 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
22 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
7 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
6 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
5 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
3 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url) ASProxy/5.5b5
1,803naver
1,758 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
21 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
12 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
7 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url) ASProxy/5.5b5
3 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url),gzip(gfe) (via translate.google.com)
1,745baidu
1,530 www.baidu.com/search/spider.htmltext/..Mozilla/5.0 (compatible; Baiduspider/2.0; url)
83 www.baidu.com/search/spider.htmtext/..Baiduspider-image(url)
51 www.baidu.com/search/spider.html-Mozilla/5.0 (compatible; Baiduspider/2.0; url)
48 www.baidu.com/search/spider.htmlapplication/vnd.php.serializedMozilla/5.0 (compatible; Baiduspider/2.0; url)
12 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
9 www.baidu.com/search/spider.htmlimage/..Mozilla/5.0 (compatible; Baiduspider/2.0; url)
3 www.baidu.com/search/spider.htmltext/..Mozilla/5.0 (compatible; Baiduspider/2.0; url) ASProxy/5.5b5
1,526yandex
1,236 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexBot/3.0; url)
169 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexImages/3.0; url)
60 yandex.com/botsimage/..Mozilla/5.0 (compatible; YandexImages/3.0; url)
40 yandex.com/bots-Mozilla/5.0 (compatible; YandexBot/3.0; url)
11 yandex.com/botsimage/..Mozilla/5.0 (compatible; YandexImageResizer/2.0; url)
4 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexDirect/3.0; url)
1,514msn
600 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)._
255 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
255 search.msn.com/msnbot.htmtext/..msnbot-Products/1.0 (url)
177 search.msn.com/msnbot.htmtext/..msnbot-NewsBlogs/2.0b (url)
110 search.msn.com/msnbot.htmtext/..msnbot-media/1.1 (url)
97 search.msn.com/msnbot.htmimage/..msnbot-media/1.1 (url)
6 search.msn.com/msnbot.htmtext/..msnbot-UDiscovery/2.0b (url)
5 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)._ (via Web-Blaster/2.21 (http://www.assoziations-blaster.de/web-blast.html))
4 search.msn.com/msnbot.htmtext/..msnbot/0.01 (url)
413sentymetr
209 sentymetr.pl/bot.htmlapplication/jsonMozilla/5.0 (compatible; SentymetrBot 1.0; url)
204 sentymetr.pl/bot.htmltext/..Mozilla/5.0 (compatible; SentymetrBot 1.0; url)
392www.
247 www.text/..GoogleBot-Image/1.0 ( urlGoogleBot.com/bot.html)
88 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
54 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
3 www.image/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
36580legs
314 www.80legs.com/webcrawler.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url) Gecko/2008032620
42 www.80legs.com/webcrawler.htmlimage/..Mozilla/5.0 (compatible; 008/0.83; url) Gecko/2008032620
9 www.80legs.com/webcrawler.htmltext/..Mozilla/5.0 (compatible; 008/0.83; url;) Gecko/2008032620
363traslated
363 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
318youdao
305 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
5 toolbar.youdao.com/image/..Youdao Toolbar (url)
4 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible;YodaoBot-Image/1.0;url;)
288php
182 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
39 pear.php.net/application/xmlPEAR HTTP_Request class ( url )
34 pear.php.net/package/http_request2text/..HTTP_Request2/0.5.2 (url) PHP/5.2.17
25 pear.php.net/text/..PEAR HTTP_Request class ( url )
7 pear.php.net/image/..PEAR HTTP_Request class ( url )
287sblog
155 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
65 fulltext.sblog.cz/text/..SeznamBot/3.0 (url)
34 fulltext.sblog.cz/text/..SeznamBot/3.0-test (url)
28 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
3 fulltext.sblog.cz/-SeznamBot/3.0 (url)
277exabot
272 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
5 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
221fox
219 shoulu.fox.com/spider.htmltext/..FoxSpider Mozilla/5.0 (compatible; FoxSpider; url)
211entireweb
206 www.entireweb.com/about/search_tech/speedy_spider/text/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
191toolserver
86 wiki.toolserver.org/view/GeoHacktext/..Geohack (url)
60 toolserver.org/~dispenser/text/..WebWikipedia Python (url)
37 toolserver.org/~bayo/text/..LudoThecaire/1.0 (url)
3 toolserver.org/~para/cgi-bin/kmlexporttext/..url libwww-perl/6.02
3 toolserver.org/~guandalug/application/vnd.php.serializedGuandalugs PHPWikiBot/1.1 (url;de:User:Guandalug)
181wikipedia
51 en.wikipedia.org/wiki/User:NicoV/Wikipedia_Cleaner/Documentationtext/..WikiCleaner (url)
44 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/2.1.16.0 url
40 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/2.1.14.0 url
22 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/2.1.15.0 url
8 en.wikipedia.orgtext/..url
7 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/2.1.1.0 url
3 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/2.1.13.0 url
176soso
165 help.soso.com/webspider.htmtext/..Sosospider(url)
5 help.soso.com/webspider.htm-Sosospider(url)
4 help.soso.com/soso-image-spider.htmimage/..Sosoimagespider(url)
174sogou
159 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
5 www.sogou.com/docs/help/webmasters.htm#07-Sogou web spider/4.0(url)
5 www.sogou.com/docs/help/webmasters.htm#07application/vnd.php.serializedSogou web spider/4.0(url)
160majestic12
159 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.4.0; url)
137enwp
118 enwp.org/User:SDPatrolBottext/..SDPatrolBot (url)
13 enwp.org/User:KingpinBottext/..KingpinBot (url)
5 enwp.org/User:H3llkn0wz/WikiSharpAPItext/..WikiSharpAPI/0.3 url (C# .NET)
136archive
95 www.archive.org/details/archive.org_bottext/..Mozilla/5.0 (compatible; archive.org_bot url)
37 www.archive.org/details/archive.org_bottext/..Mozilla/5.0 (compatible; heritrix/3.1.0-SNAPSHOT-20110927.173523 url)
132mediawiki
129 www.mediawiki.org/text/..MediaWiki OAI Harvester 0.2 (url)
3 www.mediawiki.org/text/..MediaWiki OAI Harvester 0.2 (url) (client id: nttr.co.jp; experimental)
127wwwgogetpapers
105 wwwgogetpapers.com/application/jsonUser-Agent: GoGetPapersBot (url)
22 wwwgogetpapers.com/text/..User-Agent: GoGetPapersBot (url)
114archive-it
75 archive-it.org/files/site-owners.htmlimage/..Mozilla/5.0 (compatible;archive.org_bot; Archive-It; url) Firefox/0.0
38 archive-it.org/files/site-owners.htmltext/..Mozilla/5.0 (compatible;archive.org_bot; Archive-It; url) Firefox/0.0
113yacy
47 yacy.net/bot.htmltext/..yacybot (sciencenet-any; amd64 Linux 2.6.35-30-generic; java 1.6.0_20; Europe/en) url
13 yacy.net/bot.htmltext/..yacybot (freeworld/global; amd64 Linux 2.6.38-11-generic; java 1.6.0_22; Europe/en) url
12 yacy.net/bot.htmltext/..yacybot (freeworld/global; amd64 Linux 2.6.31-gentoo-r6; java 1.6.0_17; Etc/en) url
8 yacy.net/bot.htmltext/..yacybot (freeworld/global; amd64 Linux 2.6.38-11-server; java 1.6.0_22; America/en) url
5 yacy.net/bot.htmltext/..yacybot (sciencenet-any; amd64 Linux 2.6.32-33-generic; java 1.6.0_20; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (freeworld/global; amd64 Windows 7 6.1; java 1.7.0; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (freeworld/global; amd64 Linux 2.6.32-33-server; java 1.6.0_20; Europe/de) url
108admincheats
108 admincheats.comtext/..WordPress/3.2.1; url
96echonest
76 the.echonest.com/reader/application/xmlnestReader/0.3 (discovery; url; reader at echonest.com)
20 the.echonest.com/reader/text/..nestReader/0.3 (discovery; url; reader at echonest.com)
92sf
32 liferea.sf.net/text/..Liferea/1.x.x (Linux; es_ES.UTF-8; url)
30 liferea.sf.net/text/..Liferea/0.x.x (Linux; en_US.UTF-8; url)
29 magpierss.sf.nettext/..MagpieRSS/0.7x (url)
88wordpress
10 arthur2rcasc.wordpress.comtext/..WordPress/MU; url
5 stradivariusconcerti.wordpress.comtext/..WordPress/MU; url
4 eof737.wordpress.comtext/..WordPress/MU; url
3 curtisnarimatsu.wordpress.comtext/..WordPress/MU; url
3 klima47.wordpress.comtext/..WordPress/MU; url
86FeedBurner
85 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
85bsurprised
85 bsurprised.com/text/..BSurprised WikiBox 0.1.3 (url)
82federatedmedia
80 federatedmedia.nettext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
77covario
77 www.covario.com/idstext/..Covario-IDS/1.0 (Covario; url; mail address )
74flipboard
38 flipboard.com/browserproxyimage/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (FlipboardProxy/0.0.5; url)
18 flipboard.com/browserproxytext/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (FlipboardProxy/0.0.5; url)
18 flipboard.com/browserproxytext/..Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (FlipboardProxy/1.1; url)
68semager
64 www.semager.de/blog/semager-bots/text/..Mozilla/5.0 (compatible; Semager/1.4; url)
3 www.semager.de/blog/semager-bots/application/jsonMozilla/5.0 (compatible; Semager/1.4; url)
64jetbrains
33 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 1.0.x (url)
31 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 2.0 Release Candidate 1 (url)
63wikimedia
60 tools.wikimedia.de/~daniel/text/..WikiSense (url)
62z-add
59 w3.z-add.co.uk/linkcheck/text/..Z-Add Link Checker (url)
3 w3.z-add.co.uk/linkcheck/image/..Z-Add Link Checker (url)
62avantbrowser
31 www.avantbrowser.comtext/..Advanced Browser (url)
31 www.avantbrowser.comtext/..Avant Browser (url)
62feedshow
31 www.feedshow.comtext/..FeedshowOnline (url)
31 www.feedshow.comtext/..Feedshow/x.0 (url; 1 subscriber)
59newsgator
31 www.newsgator.com/text/..FeedDemon/2.7 (url; Microsoft Windows XP)
28 www.newsgator.comtext/..NewsGatorOnline/2.0 (url; 1 subscribers)
58goo
42 help.goo.ne.jp/contact/text/..goo wikipedia (url)
9 help.goo.ne.jp/help/article/1142/-DoCoMo/2.0 P900i(c100;TB;W24H11) (compatible; ichiro/mobile goo; url)
5 help.goo.ne.jp/help/article/1142/text/..DoCoMo/2.0 P900i(c100;TB;W24H11) (compatible; ichiro/mobile goo; url)
57qwiki
57 qwiki.comtext/..Qwiki Fetcher (url)
56kosmix
54 www.kosmix.com/html/kosmos.htmlapplication/xmlMozilla/5.0(compatible;Kosmos/1.0;url)
53freebase
53 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
50emining
48 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
49github
32 github.com/pauldix/typhoeus/tree/mastertext/..Typhoeus - url
11 github.com/NeilCrosby/wikislurpapplication/vnd.php.serializedWikiSlurp (url)
3 github.com/dbalatero/typhoeus/tree/mastertext/..Typhoeus - url
44garlik
44 garlik.com/text/..GarlikCrawler/1.1 (url, mail address )
43daum
38 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
5 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/3.0
40scoutjet
40 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
38diveintopython22
38 diveintopython22.org/text/..OpenAnything/1.0 url
38zapbot
13 www.zapbot.nettext/..Mozilla/5.0 (compatible; ZapBot/0.2n; url)
13 www.zapbot.comtext/..Mozilla/5.0 (compatible; ZapBot/0.2c; url)
12 www.zapbot.orgtext/..Mozilla/5.0 (compatible; ZapBot/0.2o; url)
35whatrhymeswith
35 www.whatrhymeswith.com/site/rhyme-bottext/..RhymeBot/0.1 (url)
33tinyurl
30 tinyurl.com/64t5ntext/..Rome Client (url) Ver: 0.9
3 tinyurl.com/64t5napplication/xmlRome Client (url) Ver: UNKNOWN
33ahrefs
33 ahrefs.com/robot/text/..Mozilla/5.0 (compatible; AhrefsBot/1.0; url)
33graemef
33 graemef.comtext/..NewsGator FetchLinks extension/0.2.0 (url)
33tumblr
30 benderthewebrobot.tumblr.comtext/..Mozilla/5.0 (compatible; Bender; url)
33seebot
33 seebot.orgtext/..Lynx/2.8 (;url)
32blogbridge
32 www.blogbridge.com/text/..BlogBridge 2.13 (url)
32rssreader
32 www.rssreader.comtext/..RssReader/1.0.xx.x (url) Microsoft Windows NT 5.1.2600.0
32zipcommander
32 www.zipcommander.com/text/..1st ZipCommander (Net) - url
32zootycoon
32 www.zootycoon.comtext/..Zoo Tycoon 2 Client -- url
32winpodder
32 winpodder.comtext/..WinPodder (url)
32plagger
32 plagger.org/text/..Plagger/0.x.xx (url)
32rssbandit
32 www.rssbandit.orgtext/..RssBandit/1.5.0.10 (WinNT 5.1.2600.0; url) (WinNT 5.1.2600.0; )
32kula
32 kula.jp/endotext/..endo/1.0 (Mac OS X; ppc i386; url)
32ponderer
32 ponderer.org/download/annotate_google.user.jstext/..annotate_google; url
32it-influentials
32 search.it-influentials.com/bot.htmtext/..Mozilla/5.0 (compatible;FindITAnswersbot/1.0;url)
31timewe
31 timewe.nettext/..CDR/1.7.1 Simulator/0.7(url) Profile/MIDP-1.0 Configuration/CLDC-1.0
31snarfware
31 www.snarfware.com/text/..Snarfer/0.x.x (url)
31orcabrowser
31 www.orcabrowser.comtext/..Orca Browser (url)
31Anonymouse
22 Anonymouse.org/text/..url (Unix)
9 Anonymouse.org/image/..url (Unix)
31nemui
31 mozshot.nemui.org/text/..Mozilla/5.0 (Gecko/20070310 Mozshot/0.0.20070628; url)
31hatena
28 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
31feeds4all
31 www.feeds4all.com/feedzcollectortext/..FeedZcollector v1.x (Platinum) url
30biible
30 www.biible.infotext/..Biible/Nutch-1.2 (Biible; url ; mail address )
294chat
29 www.4chat.tvtext/..url
28ranchero
28 ranchero.com/netnewswire/text/..NetNewsWire/2.x (Mac OS X; url)
28yioop
23 www.yioop.com/bot.phptext/..Mozilla/5.0 (compatible; YioopBot url)
3 yioop.com/bot.phptext/..Mozilla/5.0 (compatible; YioopBot url)
26microsoft
26 academic.research.microsoft.com/text/..librabot/2.0 (url)
26gnip
25 www.gnip.com/text/..UnwindFetchor/1.0 (url)
24textdigger
24 textdigger.comtext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
24spinn3r
21 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux x86_64; en-US; rv:1.9.0.19; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/2010040121 Firefox/3.0.19
24apache
24 lucene.apache.org/nutch/bot.htmltext/..NutchCVS/0.7.2 (Nutch; url; mail address )
23bin-co
12 www.bin-co.com/php/scripts/load/text/..BinGet/1.00.A (url)
11 www.bin-co.com/php/scripts/load/application/vnd.php.serializedBinGet/1.00.A (url)
23bibalex
14 archive.bibalex.org/bot/image/..Mozilla/5.0 (compatible; archive.bibalex.org_bot; url)
9 archive.bibalex.org/bot/text/..Mozilla/5.0 (compatible; archive.bibalex.org_bot; url)
23ac
14 www.cse.iitb.ac.in/~vishaal_h4text/..DrRajendra/Nutch-0.9 (IIT Kharagpur; url; mail address )
6 www.clips.ua.ac.be/pages/patternapplication/jsonPattern/1.0 url
22accelobot
22 www.accelobot.comtext/..Mozilla/5.0 (compatible; heritrix/1.14.3 url)
22idrc
22 web.idrc.ca/challenge/ev-136691-201-1-DO_TOPIC.htmltext/..Mozilla/5.0 (compatible; http; url; mail address )
21thearchangelmichael
21 thearchangelmichael.nettext/..WordPress/3.0.3; url
21alexa
21 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
20puritysearch
20 www.puritysearch.net/text/..Mozilla/5.0 (compatible; Purebot/1.1; url)
20suggy
20 blog.suggy.com/was-ist-suggy/suggy-webcrawler/text/..Mozilla/5.0 (compatible; suggybot v0.01a, url)
18rcdtokyo
16 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
18moviecus
17 www.moviecus.com/botcontactinfo.phpapplication/yamlmoviecus bot (url)
18netnewswireapp
18 netnewswireapp.com/mac/-NetNewsWire/3.2.15 (Mac OS X; url; gzip-happy)
18vik
16 vik.comtext/..vik-robot/Nutch-1.0 (vikspider; url; mail address )
17test
14 www.test.testtext/..Mozilla/5.0 (compatible; heritrix/1.6.0-OFFIS url)
3 shoulu.test.com/spider.htmltext/..TestSpider Mozilla/5.0 (compatible; TestSpider; url)
16tweetmeme
15 tweetmeme.com/text/..Mozilla/5.0 (compatible; TweetmemeBot/2.11; url)
16turnitin
16 www.turnitin.com/robot/crawlerinfo.htmltext/..TurnitinBot/2.1 (url)
16netarkivet
10 netarkivet.dk/website/info.htmltext/..Mozilla/5.0 (compatible; heritrix/1.12.1b url)
6 netarkivet.dk/website/info.htmlimage/..Mozilla/5.0 (compatible; heritrix/1.12.1b url)
16trendiction
16 www.trendiction.de/bottext/..Mozilla/5.0 (Windows; Windows NT 6.0; en-GB; rv:1.0; trendictionbot0.4.5; trendiction search; url; please let us know of any problems; web at trendiction.com) Gecko/20071127 Firefox/3.0.0.11
16123
16 www.123.fr/abus.htmltext/..PHP mutualise sur 123.fr - signalez les abus sur url
15searchtechnologies
15 www.searchtechnologies.comtext/..Mozilla/5.0 (compatible; heritrix/1.14.3 url)
15picsearch
14 www.picsearch.com/bot.htmltext/..psbot/0.1 (url)
14froute
11 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
3 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
14fairshare
6 fairshare.cctext/..Mozilla/5.0 url (X11; FreeBSD i386; en-US; rv:1.2a) Gecko/20021021
5 fairshare.cctext/..Mozilla crawl/5.0 (compatible; fairshare.cc url)
14drupal
8 drupal.org/text/..User-Agent: Drupal (url)
4 drupal.org/text/..Drupal (url)
13SearchNearMe
7 SearchNearMe.com/contact.phpapplication/vnd.php.serializedSearchNearMe (url)
6 SearchNearMe.com/contact.phptext/..SearchNearMe (url)
13search
13 www.search.ch/rim.htmltext/..UltraSpider3000/1.0 (url)
13weblio
12 www.weblio.jp/text/..Mozilla/5.0 (compatible; WeblioBot; url)
13rockpeaks
13 www.rockpeaks.com/contacttext/..RockPeaks/0.1 (url)
11blogscope
11 www.blogscope.net/text/..Mozilla/5.0 (compatible; BlogScope/1.0; url; U of Toronto)
11arquivo
9 arquivo.pt/faq-crawlingtext/..Arquivo-web-crawler (compatible; heritrix/1.14.3 url)
11wikiglass
11 wikiglass.comtext/..url : mail address
11ibis
8 ibis.ne.jp/browser/about.htmlimage/..Mozilla/4.0 (compatible; ibisBrowser; url)
11mytvmoments
11 www.mytvmoments.comtext/..My TV Moments (url)
10topsy
10 labs.topsy.com/butterfly/text/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
10paper
10 support.paper.li/entries/20023257-what-is-paper-litext/..Mozilla/5.0 (compatible; PaperLiBot/2.1; url)
10js-kit
10 js-kit.com/text/..JS-Kit URL Resolver, url
69,872total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
4,566PythonWikipediaBot/1.0
3,534 application/json
1,008 application/xml
24 text/..
1 -
1 image/..
1,471Kavande Crawler 1.0/Nutch-1.4-dev (Iranian National Web Crawler)
1,470 text/..
1 image/..
1 -
944GoogleBot-Image/1.0
440 image/..
408 text/..
96 -
909MediaWikiCrawler-Google/2.0 ( mail address )
908 text/..
1 -
683php wikibot classes
682 application/vnd.php.serialized
1 text/..
1 -
617ClueBot/1.1
617 application/vnd.php.serialized
1 text/..
385LinkParser/2.0
385 text/..
382Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
382 text/..
1 -
1 application/ogg
376DotNetWikiBot/2.96 (Unix 5.10.0.0; )
374 text/..
2 application/xml
271wikiwix-bot-3.0
265 text/..
6 image/..
1 -
261jikespider "Mozilla/5.0
259 text/..
1 -
1 image/..
1 application/xml
1 application/ogg
230Answersbot
230 text/..
226Peachy MediaWiki Bot API Version 1.0
226 application/vnd.php.serialized
1 text/..
225Pywikipediabot/2.0
225 application/json
202Onespot Crawler
146 application/json
55 text/..
1 -
193ClueBot/2.0
193 application/vnd.php.serialized
1 text/..
192spider
191 text/..
1 image/..
1 application/ogg
1 audio/midi
159WebCrawler/Nutch-1.2 (WebCrawler; WebCrawler)
151 text/..
8 image/..
1 application/ogg
1 video/ogg
158GoogleBot-Image/1.0
116 text/..
33 image/..
9 application/vnd.php.serialized
1 -
1 application/json
106Opera/8.01 (J2ME/MIDP; MXit WebBot/1.4.0.0) Opera Mini/3.1
77 application/vnd.wap.xhtml+xml
23 image/..
6 text/..
1 -
99 mail address
99 application/vnd.php.serialized
1 text/..
87DotNetWikiBot/2.81 (Microsoft Windows NT 6.1.7601 Service Pack 1; )
72 text/..
11 application/xml
4 image/..
80Webwiki Search Engine Bot - www.webwiki.de
80 text/..
76DNSTallyKwBot/0.2
76 text/..
75Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
55 image/..
20 text/..
1 application/json
1 application/x-javascript
74Wikibot 1.21 (Macintosh; Mac OS X 10.7.0; en_US)
74 text/..
1 image/..
71DotNetWikiBot/2.96 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
71 text/..
1 application/xml
61DotNetWikiBot/2.97 (Unix 5.10.0.0; )
36 text/..
25 application/xml
61Mozilla/4.0 (compatible; EmberSpider 0.8; Scout (a); bgft)
61 text/..
60GoogleBot
60 text/..
1 image/..
59Test Webbot
59 text/..
1 -
59python-wikitools/1.2 (User:BernsteinBot)
59 application/json
58Mozilla/5.0 (compatible; Ezooms/1.0; mail address )
57 text/..
1 application/ogg
1 image/..
1 application/vnd.php.serialized
1 audio/midi
49AnomieBOT 1.0 (TagDater)
49 application/json
48MediaWiki::Bot/3.2.6
48 application/json
47jikespider ("Mozilla/5.0)
47 text/..
1 -
1 application/xml
1 application/ogg
1 application/vnd.php.serialized
45Mozilla 5.0 (Apibot 0.31)
45 application/vnd.php.serialized
40HTMLParser/2.0
40 text/..
1 -
39buzzbox bot
39 text/..
37GoogleBot-News
37 text/..
1 -
36FAST Enterprise Crawler 6 used by ESP ( mail address )
36 text/..
34Nutch Crawler/Nutch-1.2
34 text/..
32ROCKMELT-BOT
32 application/xml
1 text/..
30Mozilla/5.0 (X11; Linux i686; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 SnapPreviewBot
30 text/..
30UCMore Crawler App
30 text/..
1 -
29Mozilla/5.0 (compatible; SnapPreviewBot; en-US; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9
29 text/..
29YBot/0.1
29 application/vnd.php.serialized
29SineBot/1.5.17(User:SineBot)
28 application/vnd.php.serialized
1 text/..
26Mozilla/5.0 MaboMwFramework/1.1 (w:de:MerlIwBot)
26 text/..
25SiocWikiBot/1.0
23 application/vnd.php.serialized
2 text/..
24wikbot/1.21 CFNetwork/485.13.9 Darwin/11.0.0
13 image/..
11 application/json
1 -
1 text/..
23DotNetWikiBot/2.97 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
22 text/..
1 application/xml
22Mozilla/5.0 (compatible; Nigma.ru/3.0; mail address )
22 text/..
1 application/rsd+xml
1 application/xml
22MLBot (www.metadatalabs.com/mlbot)
15 text/..
7 application/vnd.php.serialized
1 image/..
22Mozilla/5.0 (compatible; Birubot/1.0) Gecko/2009032608 Firefox/3.0.8
18 image/..
4 text/..
20HRoestBot, de-wikipedia using pywikipedia framework
8 application/json
7 application/xml
5 text/..
20AnomieBOT 1.0 (ReplaceExternalLinks2)
20 application/json
20Mozilla/5.0 (compatible; LucidWorks/; ; crawler at example dot com)
20 text/..
19MyCuteBot/0.1
19 text/..
1 application/json
1 application/vnd.php.serialized
19HTMLParser/1.6
17 text/..
2 application/json
1 -
18Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
15 image/..
3 text/..
1 application/x-javascript
16.NET Client Parser
16 application/xml
1 text/..
16Peachy MediaWiki Bot API Version 0.1beta
16 application/vnd.php.serialized
16TVersity Media Robot
16 text/..
16Twitterbot/0.1
16 text/..
1 -
1 image/..
14AniBot/0.9 php/curl
14 application/vnd.php.serialized
14TrueKnowledgeBot bot mail address >
11 application/vnd.php.serialized
3 application/xml
14AnomieBOT 1.0 (BAGBot)
11 application/json
3 text/..
14COIBot/2.0
14 text/..
14AnomieBOT 1.0 (OrphanReferenceFixer)
14 application/json
13Twitterbot/1.0
13 text/..
1 image/..
13TheKeens bot
13 text/..
13COIBot/1.00
13 text/..
13TwynCatBot/0.1 (Contact: www.twyn.com)
13 application/json
12AnomieBOT 1.0 (TemplateSubster)
12 application/json
12TravelRecordBot/1.0
12 text/..
11DotNetWikiBot/2.96 (Microsoft Windows NT 6.1.7601 Service Pack 1; )
10 text/..
1 application/xml
1 image/..
11SurakWare MediaWiki Bot/1.0
11 text/..
1 application/xml
10AnomieBOT 1.0 (FlagIconRemover)
10 application/json
10ibo2bot
10 text/..
9~Bot ([[:fr:w:User:TildeBot]] by [[:fr:w:User:Alphos]] mail address )
9 text/..
9FAST Enterprise Crawler 6 used by viaapia (viaapia)
8 text/..
1 -
8XLinkBot/1.00
8 text/..
8Mozilla/5.0 QunarBot/1.0
8 text/..
1 image/..
8UnisterBot (Mozilla/5.0 compatible; mail address )
8 text/..
1 -
8Tawbot (public svn release; plwiki)
8 text/..
8HTMLParser/1.4
8 text/..
8DigitalsmithsBot
8 text/..
7Pastec bot
7 text/..
1 image/..
1 application/ogg
7python-wikitools/1.2 (User:LaraBot)
7 application/json
7CheMoBot/1.00
7 text/..
6SkimWordsBot/1.0
6 text/..
5Soundkiosk Relation-Crawler (Version 1.0; soundkiosk.de)
5 application/xml
5Handelabra WikiBot
4 application/vnd.php.serialized
1 text/..
5Slevnicka.cz CURL bot
5 text/..
5GNAA-bot
5 text/..
5DotNetWikiBot/2.9 (Unix 5.10.0.0; )
5 text/..
5OrlodrimBot/1.0
5 text/..
4bitlybot
4 text/..
1 -
1 image/..
4MediaWiki::Bot/3.4.0
4 application/json
4Friendly Spider 1.0 contact mail address
4 text/..
4AdMedia bot
4 text/..
4Jabse.com Crawler v.2.0 www.jabse.com/crawler.php
4 text/..
4setoozbot/0.30 ( compatible; SETOOZBOT/0.30 ; setooz.com ; bot AT setooz DOT com )
4 text/..
1 image/..
4DotNetWikiBot/2.97 (Microsoft Windows NT 6.1.7601 Service Pack 1; )
4 text/..
1 application/xml
4AnomieBOT 1.0 (RandomPagePicker)
4 application/json
4Taboo Card Spider
4 text/..
4python-wikitools/1.2 (User:Mr.Z-bot)
4 application/json
4Geni ircpybot 1.0
2 application/json
2 text/..
4Freebase Deathbot
4 text/..
4AnomieBOT 1.0 (AFDMergeFromCleaner)
4 application/json
4OpenSearchServer_Bot
4 text/..
4OpenText Semantic Navigation Crawler 1.1/Nutch-1.1
3 -
1 text/..
4tellit_rest_bot, contact mail address
2 application/x-wiki
2 text/..
4 mail address (Mozilla compatible)
4 text/..
1 image/..
3Alex Blokha bot/2.9 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
3 text/..
3NoNSTOPRobot/Nutch-1.3
3 text/..
3Mozilla 5.0 (Apibot 0.30b5)
3 application/vnd.php.serialized
3SiocWikiBot
3 text/..
3Citation_bot; mail address
3 text/..
3BotMapDev/1.3.624 CFNetwork/485.13.9 Darwin/11.0.0
3 image/..
3MediaWiki::Bot/3.1.6 (User:Plasticspork)
3 application/json
3wikbot/1.21 CFNetwork/485.12.7 Darwin/10.4.0
2 image/..
1 application/json
1 text/..
3DotNetWikiBot/2.9 (Microsoft Windows NT 6.0.6000.0; )
3 text/..
3Mozilla/5.0 (compatible; FriendFeedBot/0.1; Http://friendfeed.com/about/bot; 371 subscribers; feed-id=3852576738117026533)
2 application/xml
1 -
3wikbotlite/1.0 CFNetwork/485.13.9 Darwin/11.0.0
2 image/..
1 application/json
3DotNetWikiBot/2.92 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
2 text/..
1 application/xml
3HBC Archive Indexerbot 0.9a
3 text/..
3trailsbot/Nutch-1.2
3 text/..
3unblockbot/1.00
3 text/..
3Mozilla/5.0 (Bgbot 0.5)
3 text/..
3Baiduspider
3 text/..
3AnomieBOT 1.0 (DeletionSortingCleaner)
3 application/json
14,809total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Thu, Oct 6, 2011 15:00
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.