Wikimedia Traffic Analysis Report - Crawler requests

Daily averages, based on sample period: 1 Jul 2010 - 31 Jul 2010

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Op.Sys. / Browsers / Google

The following overview of crawler (aka bot) page requests is based on the user agent information that accompanies most server requests. Unfortunately this user agent information follows rather loosely defined guidelines.
Also please bear in mind than the most popular crawler names may be somewhat overrepresented. This is the result of so called user agent spoofing (where a requester supplies false credentials, e.g. to bypass web servers filters).
GoogleBot seems to be a favorite for spoofing. Therefore requests from an ip address registered by Google (see below) are color coded GoogleBot, others GoogleBot

For this report page requests are considered to be issued by a crawler in two cases:
1 The user agent string contains a web address (only crawlers should have that, but there a some false positives, where a browser sends a user agent string with a web address (ill behaved plug-in, main offenders have been eliminated)
2 The user agent string contains the term bot, spider or crawl[er]'

In total 35,385,000 page requests (mime type text/html only!) per day are considered crawler requests, out of 262,814,000 external requests, which is 13.5%

Page requests for crawlers that specify a url in the agent string
Count
x 1000
Secondary domain
(~site) name
URLMime typeUser agent
13,325yahoo
12,682 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
218 help.yahoo.com/help/us/ysearch/slurp-Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
127 misc.yahoo.com.cn/help.htmltext/..Mozilla/5.0 (compatible; Yahoo! Slurp China; url)
50 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmlimage/..'Mozilla/5.0 (compatible; Y!J SearchMonkey/1.0 (Y!J-AGENT; url))'
46 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! Slurp; url)
41 help.yahoo.com/help/us/ysearch/slurpimage/..Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
37 developer.yahoo.com/searchmonkey/useragentimage/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
32 help.yahoo.com/help/us/ysearch/slurptext/..Mozilla/5.0 (compatible; Yahoo! DE Slurp; url)
27 developer.yahoo.com/searchmonkey/useragenttext/..Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
15 help.yahoo.com/help/us/ysearch/slurpapplication/xmlMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
12 help.yahoo.com/help/us/ysearch/slurpapplication/oggMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
11 help.yahoo.com/help/us/ysearch/crawling/crawling-01.htmltext/..Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; url)
10 help.yahoo.co.jp/help/jp/search/indexing/indexing-15.htmltext/..Y!J-BRI/0.0.1 crawler ( url )
5 developer.yahoo.com/searchmonkey/useragentapplication/x-javascriptMozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; url)
3 help.yahoo.com/help/us/ysearch/slurpapplication/x-javascriptMozilla/5.0 (compatible; Yahoo! Slurp/3.0; url)
8,393google
6,473 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
391 www.google.com/bot.html-Mozilla/5.0 (compatible; GoogleBot/2.1; url)
267 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
185 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
152 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
146 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
111 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
98 www.google.com/feedfetcher.html-FeedFetcher-Google; (url)
87 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
46 www.google.com/feedfetcher.htmltext/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
44 www.google.com/feedfetcher.htmltext/..FeedFetcher-Google; (url)
44 desktop.google.com/image/..Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
40 www.google.com/feedfetcher.htmlapplication/xmlFeedFetcher-Google; (url)
30 www.google.com/bot.htmlimage/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
27 code.google.com/appengineapplication/xmlAppEngine-Google; (url; appid: wikipedia-raw)
26 code.google.com/appenginetext/..AppEngine-Google; (url; appid: npiv82)
22 www.google.com/feedfetcher.htmlimage/..Mozilla/5.0 (compatible) FeedFetcher-Google; (url)
22 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
19 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
19 code.google.com/appenginetext/..AppEngine-Google; (url; appid: finchproxy)
16 www.google.com/feedfetcher.htmlapplication/jsonMozilla/5.0 (compatible) FeedFetcher-Google; (url)
14 www.google.com/feedfetcher.htmlapplication/xmlMozilla/5.0 (compatible) FeedFetcher-Google; (url)
13 desktop.google.com/text/..Mozilla/5.0 (compatible; Google Desktop/5.9.911.3589; url)
12 desktop.google.com/application/xmlMozilla/5.0 (compatible; Google Desktop/5.9.909.30391; url)
12 www.google.com/bot.htmlimage/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
6 code.google.com/appengineapplication/jsonPython-urllib/2.5 AppEngine-Google; (url; appid: loeschmonitor)
6 code.google.com/appenginetext/..oohEmbed.com AppEngine-Google; (url; appid: oohembed)
5 www.google.com/feedfetcher.htmlimage/..FeedFetcher-Google; (url)
4 docs.google.comimage/..Mozilla/5.0 (compatible; GoogleDocs; url)
4 code.google.com/appengineapplication/jsonMWBOT GAE Edition AppEngine-Google; (url; appid: philip-bot)
3 desktop.google.com/-Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; url)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: job-info)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: simple-tools4)
3 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: boxapp)
3 code.google.com/appenginetext/..AppEngine-Google; (url; appid: linksalpha)
3,574facebook
2,501 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.0 (url)
997 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.0 (url)
59 www.facebook.com/externalhit_uatext.phptext/..facebookexternalhit/1.1 (url)
9 developers.facebook.comimage/..facebookplatform/1.0 (url)
5 developers.facebook.comtext/..facebookplatform/1.0 (url)
3 www.facebook.com/externalhit_uatext.phpimage/..facebookexternalhit/1.1 (url)
2,416msn
1,569 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)
510 search.msn.com/msnbot.htm-msnbot/2.0b (url)
111 search.msn.com/msnbot.htmtext/..msnbot-media/1.1 (url)
99 search.msn.com/msnbot.htmimage/..msnbot-media/1.1 (url)
48 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url)._
27 search.msn.com/msnbot.htmtext/..msnbot-Products/1.0 (url)
13 search.msn.com/msnbot.htmimage/..msnbot/2.0b (url)
10 search.msn.com/msnbot.htmtext/..msnbot/2.0b (url) Test
8 search.msn.com/msnbot.htmtext/..msnbot/1.1 (url)
5 search.msn.com/msnbot.htmtext/..msnbot/1.0 (url)
4 search.msn.com/msnbot.htmapplication/jsonmsnbot/1.0 (url)
4 search.msn.com/msnbot.htmtext/..msnbot-UDiscovery/2.0b (url)
2,251google?
2,079 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
31 www.google.com/bot.htmltext/..GoogleBot/2.1 (url)
30 www.google.com/bot.htmltext/..DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
23 www.google.com/bot.htmltext/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
23 www.google.com/bot.htmlapplication/xmlMozilla/5.0 (compatible; GoogleBot/2.1; url)
12 www.google.com/bot.htmlimage/..Mozilla/5.0 (compatible; GoogleBot/2.1; url)
12 www.google.com/bot.htmlapplication/vnd.php.serializedMozilla/5.0 (compatible; GoogleBot/2.1; url)
11 www.google.com/bot.htmltext/..SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; GoogleBot-Mobile/2.1; url)
8 www.google.com/bot.htmlapplication/xmlDoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; GoogleBot-Mobile/2.1; url)
5 www.google.com/bot.htmltext/..Mozilla/5.0(compatible;GoogleBot/2.1;url)
1,358naver
1,291 help.naver.com/robots/text/..Yeti/1.0 (NHN Corp.; url)
38 help.naver.com/robots/image/..Yeti/1.0 (NHN Corp.; url)
21 help.naver.com/delete_main.asptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
6 help.naver.com/customer_webtxt_02.jsptext/..Mozilla/4.0 (compatible; NaverBot/1.0; url)
895baidu
447 www.baidu.com/search/spider.htmtext/..Baiduspider(url)
385 www.baidu.jp/spider/text/..Baiduspider(url)
34 www.baidu.jp/spider/text/..DoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
9 www.baidu.jp/spider/application/xmlDoCoMo/2.0 P05A(c100;TB;W24H15) (compatible; BaiduMobaider/1.0;url)
6 www.baidu.jp/spider/-Baiduspider(url)
6 www.baidu.jp/spider/text/..BaiduImagespider(url)
4 www.baidu.jp/spider/application/xmlBaiduspider(url)
731yandex
692 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexBot/3.0; url)
9 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexImages/3.0; url)
8 yandex.com/bots-Mozilla/5.0 (compatible; YandexBot/3.0; url)
6 yandex.com/botsimage/..Mozilla/5.0 (compatible; YandexImageResizer/2.0; url)
4 yandex.com/botsimage/..Mozilla/5.0 (compatible; YandexImages/3.0; url)
4 yandex.com/botstext/..Mozilla/5.0 (compatible; YandexBot/3.0; MirrorDetector; url)
706pipl
706 www.pipl.com/bot/text/..Mozilla/5.0(compatible;PiplBot;url)
602ask
469 about.ask.com/en/docs/about/webmasters.shtmltext/..Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
130 about.ask.com/en/docs/about/webmasters.shtml-Mozilla/5.0 (compatible; Ask Jeeves/Teoma; url)
520php
378 pear.php.net/text/..PEAR HTTP_Request class ( url )
72 pear.php.net/application/vnd.php.serializedPEAR HTTP_Request class ( url )
32 pear.php.net/package/http_request2text/..HTTP_Request2/0.5.1 (url) PHP/5.2.13
23 pear.php.net/application/jsonPEAR HTTP_Request class ( url )
13 pear.php.net/application/xmlPEAR HTTP_Request class ( url )
502soso
492 help.soso.com/webspider.htmtext/..Sosospider(url)
4 help.soso.com/soso-blog-spider.htmtext/..Sosoblogspider(url)
4 help.soso.com/webspider.htm-Sosospider(url)
386youdao
342 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
18 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible;YodaoBot-Image/1.0;url;)
16 www.youdao.com/help/webmaster/spider/text/..Mozilla/5.0 (compatible; YodaoBot/1.0; url; )
5 www.youdao.com/help/webmaster/spider/-Mozilla/5.0 (compatible; YoudaoBot/1.0; url; )
268traslated
268 mymemory.traslated.net/doc/text/..Mozilla/5.0 (MyMemory Bot url)
239scoutjet
239 www.scoutjet.com/text/..Mozilla/5.0 (compatible; ScoutJet; url)
235exabot
144 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0; url)
84 www.exabot.com/go/robottext/..Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); url)
6 www.exabot.com/go/robot-Mozilla/5.0 (compatible; Exabot/3.0; url)
220kosmix
164 www.kosmix.com/html/kosmos.htmltext/..Mozilla/5.0(compatible;Kosmos/1.0;url)
54 www.kosmix.com/html/kosmos.htmlapplication/xmlMozilla/5.0(compatible;Kosmos/1.0;url)
212cuil
209 www.cuil.com/twiceler/robot.htmltext/..Mozilla/5.0 (Twiceler-0.9 url)
208yacy
15 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-23-generic; java 1.6.0_18; Europe/en) url
14 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31.12-0.2-default; java 1.5.0_16; GMT01:00/de) url
11 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-22-generic-pae; java 1.6.0_20; Europe/en) url
11 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-22-generic; java 1.6.0_0; Europe/en) url
10 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.32-24-generic; java 1.6.0_18; Europe/en) url
9 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-19-generic; java 1.6.0_0; Europe/en) url
8 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-21-server; java 1.6.0_0; Asia/en) url
6 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.32-21-generic-pae; java 1.6.0_18; America/en) url
6 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-amd64; java 1.6.0_0; Europe/en) url
6 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-22-generic; java 1.6.0_20; Europe/en) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-gentoo-r6; java 1.6.0_17; Etc/en) url
5 yacy.net/bot.htmltext/..yacybot (x86_64 Mac OS X 10.6.4; java 1.6.0_20; Europe/en) url
5 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.26-2-openvz-686; java 1.6.0_0; Europe/en) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.26-2-amd64; java 1.6.0_20; Europe/en) url
5 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_20; Europe/de) url
5 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-22-generic; java 1.6.0_18; Europe/en) url
5 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.31-22-generic; java 1.6.0_18; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (x86_64 Mac OS X 10.6.4; java 1.6.0_20; Europe/de) url
4 yacy.net/bot.htmltext/..yacybot (amd64 Linux 2.6.28-19-generic; java 1.6.0_0; Europe/en) url
4 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.24-10-pve; java 1.6.0_12; Etc/en) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-11-server; java 1.6.0_0; Europe/el) url
3 yacy.net/bot.htmltext/..yacybot (PowerPC OS/400 V7R1M0; java 1.6.0; UTC/de) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.28-19-generic; java 1.6.0_20; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.30-2.slh.2-sidux-686; java 1.6.0_20; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (x86 Windows XP 5.1; java 1.6.0_21; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.31-22-generic; java 1.6.0_0; Europe/en) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.32-23-generic-pae; java 1.6.0_18; Europe/de) url
3 yacy.net/bot.htmltext/..yacybot (i386 Linux 2.6.26-2-openvz-686; java 1.6.0_0; Europe/de) url
189waw
139 dubi.itinfo.waw.plimage/..WordPress/2.8.6; url
50 gienia.itinfo.waw.plimage/..WordPress/2.8.6; url
186sogou
169 www.sogou.com/docs/help/webmasters.htm#07text/..Sogou web spider/4.0(url)
12 www.sogou.com/docs/help/webmasters.htm#07image/..Sogou Pic Spider/3.0(url)
5 www.sogou.com/docs/help/webmasters.htm#07application/xmlSogou web spider/4.0(url)
185majestic12
164 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.3; url)
16 www.majestic12.co.uk/bot.php?text/..Mozilla/5.0 (compatible; MJ12bot/v1.3.2; url)
154semager
154 www.semager.de/blog/semager-bots/text/..Mozilla/5.0 (compatible; Semager/1.4; url)
145entireweb
142 www.entireweb.com/about/search_tech/speedy_spider/text/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
3 www.entireweb.com/about/search_tech/speedy_spider/-Mozilla/5.0 (Windows; Windows NT 5.1; en-US) Speedy Spider (url)
142toolserver
72 wiki.toolserver.org/view/GeoHacktext/..Geohack (url)
35 toolserver.org/~guandalug/application/vnd.php.serializedGuandalugs PHPWikiBot/1.1 (url;de:User:Guandalug)
31 toolserver.org/~bayo/text/..LudoThecaire/1.0 (url)
136sblog
84 fulltext.sblog.cz/screenshot/image/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
27 fulltext.sblog.cz/screenshot/text/..Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
20 fulltext.sblog.cz/robot/text/..SeznamBot/2.0 (url)
4 fulltext.sblog.cz/screenshot/application/x-javascriptMozilla/5.0 (compatible; Seznam screenshot-generator 2.0; url)
115wikipedia
85 en.wikipedia.org/wiki/Wikipedia:Huggletext/..Huggle/0.9.6 url
18 en.wikipedia.org/wiki/User:NicoV/Wikipedia_Cleaner/Documentationtext/..WikiCleaner (url)
6 en.wikipedia.orgtext/..url
106goo
96 help.goo.ne.jp/contact/text/..goo wikipedia (url)
8 help.goo.ne.jp/help/article/1142/text/..DoCoMo/2.0 P900i(c100;TB;W24H11) (compatible; ichiro/mobile goo; url)
105mnemoo
105 www.mnemoo.com/about/spidertext/..Mnemoo Intelligent Spider/0.1alpha (compatible; See url)
100textdigger
100 textdigger.comtext/..Mozilla/5.0 (url) Gecko/20061208 Firefox/2.0.0.1
96daum
96 ws.daum.net/aboutWebSearch.htmltext/..Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; url) Daumoa/2.0
81heartrails
44 capture.heartrails.com/image/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.3) Gecko/20100403 HeartRails_Capture/1.0.2 (url) Namoroka/3.6.3
27 capture.heartrails.com/text/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.3) Gecko/20100403 HeartRails_Capture/1.0.2 (url) Namoroka/3.6.3
10 capture.heartrails.com/application/x-javascriptMozilla/5.0 (X11; Linux i686; en-US; rv:1.9.2.3) Gecko/20100403 HeartRails_Capture/1.0.2 (url) Namoroka/3.6.3
77wikimedia
74 tools.wikimedia.de/~daniel/text/..WikiSense (url)
74wordpress
11 josefboberg.wordpress.comtext/..WordPress/MU; url
7 navanavonmilita.wordpress.comtext/..WordPress/MU; url
7 support.wordpress.com/contact/text/..WordPress.com mShots; url
5 musiquefreak.wordpress.comtext/..WordPress/MU; url
4 benabb.wordpress.comtext/..WordPress/MU; url
3 palashscape.wordpress.comtext/..WordPress/MU; url
3 peacersvp.wordpress.comtext/..WordPress/MU; url
3 sutrawidanta.wordpress.comtext/..WordPress/MU; url
60emining
58 emining.jp/text/..emBot-GalaBuzz/Nutch-1.0 (url; mail address )
59FeedBurner
58 www.FeedBurner.comtext/..FeedBurner/1.0 (url)
57sf
20 liferea.sf.net/text/..Liferea/0.x.x (Linux; en_US.UTF-8; url)
18 magpierss.sf.nettext/..MagpieRSS/0.7x (url)
18 liferea.sf.net/text/..Liferea/1.x.x (Linux; es_ES.UTF-8; url)
57dotnetdotcom
57 www.dotnetdotcom.org/text/..Mozilla/5.0 (compatible; DotBot/1.1; url, mail address )
56discoveryengine
55 discoveryengine.com/discobot.htmltext/..Mozilla/5.0 (compatible; discobot/1.1; url
56www.
35 www.text/..GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
8 www.text/..GoogleBot-Image/1.0 ( urlGoogleBot.com/bot.html)
6 www.text/..GoogleBot/2.1 (urlGoogleBot.com/bot.html)
5 www.text/..Google - GoogleBot/2.1 ( urlGoogleBot.com/bot.html)
55z-add
50 w3.z-add.co.uk/linkcheck/text/..Z-Add Link Checker (url)
3 w3.z-add.co.uk/linkcheck/image/..Z-Add Link Checker (url)
47newsgator
19 www.newsgator.com/text/..FeedDemon/2.7 (url; Microsoft Windows XP)
19 www.newsgator.comtext/..NewsGatorOnline/2.0 (url; 1 subscribers)
9 www.newsgator.com/Individuals/NetNewsWire/-NetNewsWire/3.2.7 (Mac OS X; url; gzip-happy)
43teesoft
17 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
8 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 5.1; [lang code]; rv:[..]) Gecko/.. etc (url)
6 www.teesoft.info/image/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
4 www.teesoft.info/text/..Mozilla/5.0 (Windows; Windows NT 6.0; [lang code]; rv:[..]) Gecko/.. etc (url)
38jetbrains
19 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 2.0 Release Candidate 1 (url)
19 www.jetbrains.com/omea_reader/text/..JetBrains Omea Reader 1.0.x (url)
37avantbrowser
19 www.avantbrowser.comtext/..Avant Browser (url)
18 www.avantbrowser.comtext/..Advanced Browser (url)
36feedshow
18 www.feedshow.comtext/..FeedshowOnline (url)
18 www.feedshow.comtext/..Feedshow/x.0 (url; 1 subscriber)
35freebase
35 www.freebase.comtext/..metaweb/Nutch-1.0-dev (url; help_at_metaweb.com)
33Anonymouse
15 Anonymouse.org/image/..url (Unix)
14 Anonymouse.org/text/..url (Unix)
4 Anonymouse.org/application/x-javascripturl (Unix)
32oneriot
27 www.oneriot.comtext/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (url)
5 www.oneriot.comimage/..Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (url)
32spinn3r
23 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux x86_64; en-US; rv:1.9.0.19; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/2010040121 Firefox/3.0.19
6 spinn3r.com/robottext/..Mozilla/5.0 (X11; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); url) Gecko/20021130
28chug
28 crawler.chug.nettext/..Mozilla/5.0 (compatible; heritrix/3.0.0 url)
25rcdtokyo
19 www.rcdtokyo.com/pc2m/text/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
5 www.rcdtokyo.com/pc2m/image/..Mozilla/5.0 (compatible; PEAR HTTP_Request class; url)
25weblio
24 www.weblio.jp/text/..Mozilla/5.0 (compatible; WeblioBot; url)
25alexa
25 www.alexa.com/site/help/webmasterstext/..ia_archiver (url; mail address )
25hatena
21 a.hatena.ne.jp/helptext/..Hatena Antenna/0.5 (url)
4 mgw.hatena.ne.jp/helptext/..DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; url)
23conceptlinkage
23 www.conceptlinkage.orgtext/..c-link wikipedia miner (url) mail address
23github
15 github.com/pauldix/typhoeus/tree/mastertext/..Typhoeus - url
7 github.com/edsu/linkypediaapplication/jsonlinkpyediabot v0.1: url
21puritysearch
21 www.puritysearch.net/text/..Mozilla/5.0 (compatible; Purebot/1.1; url)
20blogbridge
20 www.blogbridge.com/text/..BlogBridge 2.13 (url)
20abonti
20 www.abonti.comtext/..Mozilla/5.0 (compatible; Abonti/0.9 - url)
20chainn
17 www.chainn.com/mxbot.htmltext/..Mozilla/5.0 (compatible; mxbot/1.0; url)
3 www.chainn.com/mxbot.htmlimage/..Mozilla/5.0 (compatible; mxbot/1.0; url)
20winpodder
20 winpodder.comtext/..WinPodder (url)
20kula
20 kula.jp/endotext/..endo/1.0 (Mac OS X; ppc i386; url)
20graemef
20 graemef.comtext/..NewsGator FetchLinks extension/0.2.0 (url)
20meta
20 meta.ua/spidertext/..Mozilla/5.0 (compatible; METASpider; url)
19setooz
12 www.setooz.com/bot.htmltext/..Mozilla/5.0 ( compatible; SETOOZBOT/0.30 ; url ; mail address )
5 www.setooz.com/bot.htmltext/..Mozilla/5.0 ( compatible; SETOOZBOT/0.30 ; url )
19rssreader
19 www.rssreader.comtext/..RssReader/1.0.xx.x (url) Microsoft Windows NT 5.1.2600.0
19zipcommander
19 www.zipcommander.com/text/..1st ZipCommander (Net) - url
19timewe
19 timewe.nettext/..CDR/1.7.1 Simulator/0.7(url) Profile/MIDP-1.0 Configuration/CLDC-1.0
19accelobot
19 www.accelobot.comtext/..Mozilla/5.0 (compatible; heritrix/1.14.3 url)
19ranchero
19 ranchero.com/netnewswire/text/..NetNewsWire/2.x (Mac OS X; url)
19rssbandit
19 www.rssbandit.orgtext/..RssBandit/1.5.0.10 (WinNT 5.1.2600.0; url) (WinNT 5.1.2600.0; )
19edu
15 gais.cs.ccu.edu.tw/robot.phptext/..Gaisbot/3.0( mail address ;url)
3 ws.nju.edu.cn/falcons/text/..Mozilla/5.0 (compatible; Falconsbot; url)
19justsystems
19 www.justsystems.com/jp/tech/crawler/text/..JUST-CRAWLER(url)
19ponderer
19 ponderer.org/download/annotate_google.user.jstext/..annotate_google; url
19it-influentials
19 search.it-influentials.com/bot.htmtext/..Mozilla/5.0 (compatible;FindITAnswersbot/1.0;url)
19nemui
19 mozshot.nemui.org/text/..Mozilla/5.0 (Gecko/20070310 Mozshot/0.0.20070628; url)
19snap
19 www.snap.comtext/..Snapbot/1.0 (Snap Shots, url)
19seebot
19 seebot.orgtext/..Lynx/2.8 (;url)
18tinyurl
18 tinyurl.com/64t5ntext/..Rome Client (url) Ver: 0.9
18zootycoon
18 www.zootycoon.comtext/..Zoo Tycoon 2 Client -- url
18memidex
18 www.memidex.com/_bottext/..Mozilla/5.0 (compatible; Memibot/1.0; url )
18snarfware
18 www.snarfware.com/text/..Snarfer/0.x.x (url)
18orcabrowser
18 www.orcabrowser.comtext/..Orca Browser (url)
18plagger
18 plagger.org/text/..Plagger/0.x.xx (url)
18feeds4all
18 www.feeds4all.com/feedzcollectortext/..FeedZcollector v1.x (Platinum) url
17moose
17 www.moose.at/about.phptext/..Mozilla/5.0 (compatible; Moose/1.2; Linux i686; de; url)
16mixi
8 mixi.jp/text/..mixi-mobile-converter/1.0 (url)
8 mixi.jp/image/..mixi-mobile-converter/1.0 (url)
14seoprofiler
14 www.seoprofiler.com/bottext/..Mozilla/5.0 (compatible; spbot/2.0.4; url )
14froute
11 labs.froute.jp/pc2m/help.htmltext/..Froute Mobile Gateway/1.0 (url)
3 labs.froute.jp/pc2m/help.htmlimage/..Froute Mobile Gateway/1.0 (url)
13flaptor
13 www.flaptor.com/text/..HounderCrawl/Nutch-0.9 (Hounder Search Bot; url)
13holmes
13 holmes.getext/..HolmesBot (url)
11bloglines
7 www.bloglines.com-Bloglines/3.1 (url; 1 subscriber)
11gramtrans
11 gramtrans.com/text/..GramTrans (url)
11js-kit
11 js-kit.com/text/..JS-Kit URL Resolver, url
10topsy
6 labs.topsy.com/butterfly.htmltext/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
4 labs.topsy.com/butterfly/text/..Mozilla/5.0 (compatible; Butterfly/1.0; url) Gecko/2009032608 Firefox/3.0.8
10yioop
9 www.yioop.com/bot.phptext/..Mozilla/5.0 (compatible; YioopBot url)
40,541total

Page requests for probable crawlers, recognized by keyword
Count
x 1000
Agent string
  Mime type (count ≥ 3)
2,418PythonWikipediaBot/1.0
1,767 application/json
534 application/xml
116 text/..
1 image/..
1 -
1,199ClueBot/1.1
1,002 application/vnd.php.serialized
197 text/..
974GoogleBot-Image/1.0
454 text/..
272 -
248 image/..
1 application/pdf
638Answersbot
638 text/..
359Onespot Crawler
270 application/json
89 text/..
335LinkParser/2.0
335 text/..
273Mozilla/5.0 (Windows; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 ( mail address )
273 text/..
1 -
1 application/pdf
1 application/ogg
1 application/vnd.php.serialized
201php wikibot classes
160 application/vnd.php.serialized
41 text/..
1 -
192Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (Exabot-Thumbnails)
97 image/..
66 text/..
29 application/x-javascript
1 application/json
150Jbot
150 text/..
1 image/..
144MLBot (www.metadatalabs.com/mlbot)
144 text/..
1 -
1 image/..
140wikiwix-bot-3.0
140 text/..
1 -
1 image/..
130spider
129 text/..
1 application/json
1 image/..
118GoogleBot-Image/1.0
116 text/..
2 image/..
101Peachy MediaWiki Bot API Version 0.1beta
97 application/vnd.php.serialized
4 text/..
1 image/..
98GoogleBot-News
98 text/..
1 -
1 image/..
98Casper Bot Search
98 text/..
1 -
84Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
50 image/..
27 text/..
7 application/x-javascript
1 application/json
69AarghBot Linux
69 text/..
1 -
62crawler mail address
62 text/..
57DotNetWikiBot/2.81 (Microsoft Windows NT 6.1.7600.0; )
47 text/..
6 application/xml
4 image/..
1 application/ogg
52Mozilla/5.0 (compatible; sgbot v0.01a, mail address )
52 text/..
1 -
50HTMLParser/1.6
47 text/..
3 application/json
48HTMLParser/2.0
38 text/..
10 -
1 image/..
47TheKeens bot
47 text/..
40DotNetWikiBot/2.72 (Microsoft Windows NT 5.1.2600 Service Pack 3; )
40 text/..
1 application/xml
38Test Webbot
38 text/..
38MSR-ISRCCrawler
24 text/..
10 image/..
4 application/x-javascript
1 application/json
36Mozilla 5.0 (Apibot 0.20)
36 application/vnd.php.serialized
36ZanranCrawler/0.3 ( mail address )
36 text/..
33dicbot 1.0
33 text/..
31MyCuteBot / 0.1.
31 text/..
28SineBot/1.5.16(User:SineBot)
27 application/vnd.php.serialized
1 text/..
27gsa-crawler (Enterprise; S5-MS8QQPJ5BGWAA; mail address )
27 text/..
24Pywikipediabot/2.0
24 application/json
24GoogleBot
24 text/..
1 image/..
22Mozilla/5.0 wtvbot/0.6-snapshot
22 text/..
1 -
1 application/xml
21HRoestBot, de-wikipedia using pywikipedia framework
18 application/xml
2 text/..
1 application/json
21UCMore Crawler App
21 text/..
1 -
20Mozilla/5.0 (X11; Linux i686; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 SnapPreviewBot
20 text/..
20CorenSearchBot/1.5 en libwww-perl/5.834
20 text/..
20plaNETWORK Bot Search
20 text/..
19Mozilla/5.0 (compatible; SnapPreviewBot; en-US; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9
19 text/..
19plantspedia data crawler
19 text/..
19SiocWikiBot/1.0
19 application/vnd.php.serialized
1 text/..
18infraEnterprise v8 Web Crawler
13 text/..
5 -
18msramlbot
18 text/..
17COMODOspider/Nutch-1.0
17 text/..
1 image/..
1 application/ogg
15govbot/Nutch-1.1 (Gov bot; none)
14 text/..
1 image/..
15AnomieBOT 1.0 (OrphanReferenceFixer)
15 application/json
15COIBot/1.00
15 text/..
14('python-wikitools/1.2 (User:BernsteinBot)',)
14 application/json
14Opera/8.01 (J2ME/MIDP; MXit WebBot/1.1.2.0) Opera Mini/3.1
7 image/..
6 application/vnd.wap.xhtml+xml
1 text/..
13dictionary-bot
10 application/xml
3 text/..
13MystBot/1.5 fr libwww-perl/5.836
13 text/..
12Mozilla/5.0 (X11; Linux x86_64; en-US; rv:1.9.2) Gecko Firefox/3.5 (ITABot; mail address )
11 text/..
1 image/..
1 application/x-javascript
12SurakWare MediaWiki Bot/1.0
12 text/..
1 application/xml
12SoxBot IRC Bot. PHP
12 application/vnd.php.serialized
1 text/..
11kmccrew Bot Search
11 text/..
10~Bot ([[:fr:w:User:TildeBot]] by [[:fr:w:User:Alphos]] mail address )
10 text/..
10Mozilla/5.0 QunarBot/1.0
10 text/..
9QuickFinder Crawler
9 text/..
1 -
1 application/x-external-editor
9CS572Spider/Nutch-1.1 (CS572; NotAvailable; mail address dot edu)
9 text/..
8Bot/WP/EN/Daniel/MediationBot1/1.2
8 text/..
8Tawbot (public svn release; plwiki)
8 text/..
8Twitterbot/0.1
8 text/..
1 image/..
7Bub's wikibot (Wikibot/2010040100; JWBF/1.2; Java/1.6)
7 text/..
7XLinkBot/1.00
7 text/..
7PywikiBot 1.0 mail address
7 text/..
7Geni ircpybot 1.0
5 text/..
2 application/json
1 application/xml
6TrueKnowledgeBot bot mail address >
4 application/vnd.php.serialized
2 application/xml
6Codeton Software RSS Bot/1.0
6 text/..
6A .NET Web Crawler
6 text/..
6Wikibot 0.24
6 application/vnd.php.serialized
6betaBot
6 text/..
1 image/..
6Freebase Deathbot
6 text/..
6DotNetWikiBot/2.92 (Microsoft Windows NT 6.1.7600.0; )
6 text/..
6Mozilla/5.0 (compatible; Windows NT 6.0) Gecko/20090624 Firefox/3.5 NjuiceBot
6 text/..
1 image/..
6Jyxobot/1
6 text/..
5Jabse.com Crawler v.2.0 www.jabse.com/crawler.php
5 text/..
1 application/xml
5Dex Bot Search
5 text/..
5DownloadSpider/6.0
4 image/..
1 text/..
4Twib::Crawler/0.02
3 text/..
1 image/..
4GNAA-bot
4 text/..
4Citation_bot; mail address
4 text/..
4ZanranCrawler/0.2 ( mail address )
4 text/..
4YaDirectBot/1.0
4 text/..
4('python-wikitools/1.2 (User:Mr.Z-bot)',)
4 application/json
4DotNetWikiBot/2.9 (Unix 2.6.26.2; )
4 text/..
4TVersity Media Robot
4 text/..
4Mozilla/5.0 (Bgbot 0.5)
4 text/..
3'citeseerxbot'
3 text/..
1 image/..
3Erel Bot
3 text/..
3dex Bot Search
3 text/..
3Bot/WP/EN/Alex_Bakharev/AlexNewArtBot
3 text/..
3DotNetWikiBot/2.94 (Microsoft Windows NT 6.1.7600.0; )
3 text/..
3AnomieBOT 1.0 (SourceUploader)
3 application/json
3sledink Bot Search
3 text/..
3FAST Search Web Crawler 14.0.0291.0000
3 text/..
3IScraperBot/0.1 Mozilla/5.0
3 application/xml
1 text/..
3('python-wikitools/1.2 (User:LaraBot)',)
3 application/json
8,999total

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Generated on Tue, Oct 19, 2010 3:24
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.