Wikimedia Visitor Log Analysis Report - Google requests

Daily averages, based on sample period: 1 Nov 2009 - 30 Nov 2009

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Op.Sys. / Browsers / Google
 
 
This report shows all requests to Wikimedia servers where a Google server of service was involved in any way, be it the GoogleBot crawler or FeedFetcher collector scripts that run on Google servers, or a user that follows a link from a Google Web or Google Desktop search results page, or from Google Maps or Google Earth etcetera.

Technically speaking three fields in the squid log records are checked for this: client ip address, referer header and user agent header. A request can originate from an ip address which has been registered by Google and/or it can carry a referer tag that tells us a user clicked a link on a Google results page and/or it can carry an agent string that mentions a Google application which can reasonably be assumed to be genuinely Google's. See bottom of page for further details.


In total Google was somehow involved in 50.5% of daily external page* requests

Google referred to our sites, through its services including search, maps, and Google Earth, 112,183,000 page views per day, representing 42.3% of our external page requests.

Including all of its different search crawlers and services hosted on its servers, Google itself requested another 21,628,000 page pages per day, representing 8.2% of our external page requests.

* = mime type text/html only

In order of request volume

Requests originating from a Google ip address
ServiceTotalPagesImagesOther
GoogleBot21,85718,7324962,629
Wireless3,0152,04396012
Translate504504  
Other3312355343
FeedFetcher2821041178
Desktop711 71
Web search101011
Image search111 
Toolbar111 
Maps1 1 
Earth1 1 
KeywordTool11  
Total26,07021,6281,5102,933
 
Requests originating from elsewhere
ServiceTotalPagesImagesOther
Web search109,604107,7841,263558
Other4,6812122,3582,111
Desktop3,92014813,772
Image search3,8422,2061,59144
GoogleBot?1,4361,391441
Toolbar1,04819659793
Earth7961366573
Maps389633224
Mail6112418
Translate3616119
Wireless201911
FeedFetcher11 1
Total125,833112,1836,3067,343
 
Top level domains
.com36,90735,2811,371255
.de8,8578,61821327
.uk6,0385,85116819
.fr5,3705,18315333
.jp4,7534,6548019
.br4,1564,0666624
.pl4,1004,0266213
.it3,9803,8838413
.mx3,7703,6966311
.ca3,6923,57010912
.es3,2573,1737013
.au1,8201,7475517
.in1,8091,7455311
.nl1,3931,341475
.co1,3601,2841858
.ar1,1901,159247
.tr1,0841,048324
.ve1,0631,048114
.se1,017988263
.ru1,017935774
.cl855831195
.pe847829154
.be842819212
.at819791244
.fi766750132
.ph763746117
.ch757737182
.id635607216
.pt630612153
.th611595124
.hu543517206
.ro527506192
.il43442482
.no412398131
.dk383371111
.sg37436752
.my36635592
.vn35734882
.ua35134352
.gr343330111
.cn338288492
.ie33833071
.nz31229994
.bg31030181
.cz307290142
.ec27126731
.hr27026181
.tw23322562
.lt21220651
.do20520131
.sk20019082
.hk18718251
.za17717251
.eg17116461
.pk16315841
.si16015541
.ma13513321
.pr12712431
.cr12011631
.kr11587271
.ae10810531
.uy10510321
.lv10310121
.sa1039841
.ee999621
.bo969421
.gt848221
.ba807721
.sv767421
.py464411
.lk454321
.hn434211
.ke424211
.jo424021
.jm414011
.tt373711
.rs353411
.is343311
.by343411
.bd333211
.ng333211
.kz323111
.kw313011
.ni313011
.md313011
.lu292811
.qa262511
.mt252411
.bh242311
.ge232211
.om171711
.gh171711
.lb17171 
.mu171711
.pa161511
.ps151411
.cu151511
.np131311
.az121111
.ly101011
.mn10911
.ci9811
.am9911
.cat9911
.sn88  
.bs8811
.et8811
.mv7711
.tz771 
.dz7611
.bn7711
.gp6611
.fj5511
.bz551 
.net5141
.bw4411
.mg441 
.ag4411
.mz4411
.zw431 
.dm3311
.na3311
.kg321 
.vi3311
.kh331 
.uz331 
.ht331 
.gy33  
.org3131
.ug221 
.rw2211
.gi22  
.li211 
.cd221 
.zm22  
.vc22  
.sm221 
.biz111 
.gl11  
.af111 
.ls11 1
.la111 
.ai11  
.me11 1
.vg111 
.cg11  
.as11  
.sc11 1
.bj1111
.gm11 1
.to11  
.mk111 
.ws111 
.ad11  
.dj111 
.gg111 
.ki111 
.nu11  
.st11  
.ck11  
.ao11  
.tm11  
.ms11  
.pn111 
.sb11  
.ir1 1 
.sl11  
.us1 1 
.tv111 
.bi11 1
.gd11  
.tk111 
.nr11  
.vu11  
.edu11  
.mw11  
.yu11  
.eu1 1 
.ac11  
.gov1 1 
.je11  
.cc1 11
.sh1 1 
.tl11  
.tj11  
.cf11  
.im1111
.fm111 
undefined37,39323,2394,5259,628
Total151,903133,8107,81410,271

In alphabetical order

Requests originating from a Google ip address
ServiceTotalPagesImagesOther
Desktop711 71
Earth1 1 
FeedFetcher2821041178
GoogleBot21,85718,7324962,629
Image search111 
KeywordTool11  
Maps1 1 
Other3312355343
Toolbar111 
Translate504504  
Web search101011
Wireless3,0152,04396012
Total26,07021,6281,5102,933
 
Requests originating from elsewhere
ServiceTotalPagesImagesOther
Desktop3,92014813,772
Earth7961366573
FeedFetcher11 1
GoogleBot?1,4361,391441
Image search3,8422,2061,59144
Mail6112418
Maps389633224
Other4,6812122,3582,111
Toolbar1,04819659793
Translate3616119
Web search109,604107,7841,263558
Wireless201911
Total125,833112,1836,3067,343
 
Top level domains
.ac11  
.ad11  
.ae10810531
.af111 
.ag4411
.ai11  
.am9911
.ao11  
.ar1,1901,159247
.as11  
.at819791244
.au1,8201,7475517
.az121111
.ba807721
.bd333211
.be842819212
.bg31030181
.bh242311
.bi11 1
.biz111 
.bj1111
.bn7711
.bo969421
.br4,1564,0666624
.bs8811
.bw4411
.by343411
.bz551 
.ca3,6923,57010912
.cat9911
.cc1 11
.cd221 
.cf11  
.cg11  
.ch757737182
.ci9811
.ck11  
.cl855831195
.cn338288492
.co1,3601,2841858
.com36,90735,2811,371255
.cr12011631
.cu151511
.cz307290142
.de8,8578,61821327
.dj111 
.dk383371111
.dm3311
.do20520131
.dz7611
.ec27126731
.edu11  
.ee999621
.eg17116461
.es3,2573,1737013
.et8811
.eu1 1 
.fi766750132
.fj5511
.fm111 
.fr5,3705,18315333
.gd11  
.ge232211
.gg111 
.gh171711
.gi22  
.gl11  
.gm11 1
.gov1 1 
.gp6611
.gr343330111
.gt848221
.gy33  
.hk18718251
.hn434211
.hr27026181
.ht331 
.hu543517206
.id635607216
.ie33833071
.il43442482
.im1111
.in1,8091,7455311
.ir1 1 
.is343311
.it3,9803,8838413
.je11  
.jm414011
.jo424021
.jp4,7534,6548019
.ke424211
.kg321 
.kh331 
.ki111 
.kr11587271
.kw313011
.kz323111
.la111 
.lb17171 
.li211 
.lk454321
.ls11 1
.lt21220651
.lu292811
.lv10310121
.ly101011
.ma13513321
.md313011
.me11 1
.mg441 
.mk111 
.mn10911
.ms11  
.mt252411
.mu171711
.mv7711
.mw11  
.mx3,7703,6966311
.my36635592
.mz4411
.na3311
.net5141
.ng333211
.ni313011
.nl1,3931,341475
.no412398131
.np131311
.nr11  
.nu11  
.nz31229994
.om171711
.org3131
.pa161511
.pe847829154
.ph763746117
.pk16315841
.pl4,1004,0266213
.pn111 
.pr12712431
.ps151411
.pt630612153
.py464411
.qa262511
.ro527506192
.rs353411
.ru1,017935774
.rw2211
.sa1039841
.sb11  
.sc11 1
.se1,017988263
.sg37436752
.sh1 1 
.si16015541
.sk20019082
.sl11  
.sm221 
.sn88  
.st11  
.sv767421
.th611595124
.tj11  
.tk111 
.tl11  
.tm11  
.to11  
.tr1,0841,048324
.tt373711
.tv111 
.tw23322562
.tz771 
.ua35134352
.ug221 
.uk6,0385,85116819
.us1 1 
.uy10510321
.uz331 
.vc22  
.ve1,0631,048114
.vg111 
.vi3311
.vn35734882
.vu11  
.ws111 
.yu11  
.za17717251
.zm22  
.zw431 
undefined37,39323,2394,5259,628
Total151,903133,8107,81410,271
 

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Agents: as for genuine agent strings: too many crawlers indentify themselves as 'GoogleBot' to take this at face value. They are accepted as genuine Google crawler requests only when the ip address matches a known range (see above). Other records that mention GoogleBot are counted as GoogleBot? (question mark, as this may include partners, like DoCoMo). However when the agent string mentions Google Desktop or Google Earth this is always accepted

Service: the service name is based on the agent string (plus for GoogleBot check for ip address, see above), if this is inconclusive it is based on the referer string.

Here is detailed breakdown per service of indicators that pointed to Google (total ≥ 3)
 
ServiceTotalOriginating from
Google ip address
Referer mentions
Google url
Agent mentions
Google service
Desktop3,919--Y
Desktop71Y-Y
Earth796--Y
FeedFetcher282Y-Y
GoogleBot21,856Y-Y
GoogleBot?1,434--Y
Image search3,842-Y-
Mail61-Y-
Maps389-Y-
Other46--Y
Other4,630-Y-
Other4-YY
Other74Y--
Other257Y-Y
Toolbar883--Y
Toolbar116-Y-
Toolbar49-YY
Translate36-Y-
Translate345Y-Y
Translate158YYY
Web search109,604-Y-
Web search10YY-
Wireless20--Y
Wireless3,013Y-Y

Top Level Domain 'undefined': requests with top level domain 'undefined' are nearly all requests from anonymous ip addresses (crawler and other services)

Note: averages below 1 are always rounded up to 1

Generated on Tuesday December 15, 2009
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.