Wikimedia Traffic Analysis Report - Google requests

Daily averages, based on sample period: 1 Sep 2011 - 30 Sep 2011

 This analysis is based on a 1:1000 sampled server log (squids) ⇒ all counts x 1000.
 See also: Requests by destination or by origin / Methods / Scripts / Skins / Crawlers / Op.Sys. / Browsers / Google
WMF traffic logging service suffered from server capacity problems in August and September 2011.
Absolute traffic counts for September 2011 are approximatly 19% too low.
Data loss only occurred during peak hours. It therefore may have had different impact for traffic from different parts of the world.
and may have also skewed relative figures like share of traffic per browser or operating system.

 
This report shows all requests to Wikimedia servers where a Google server of service was involved in any way, be it the GoogleBot crawler or FeedFetcher collector scripts that run on Google servers, or a user that follows a link from a Google Web or Google Desktop search results page, or from Google Maps or Google Earth etcetera.

Technically speaking three fields in the squid log records are checked for this: client ip address, referer header and user agent header. A request can originate from an ip address which has been registered by Google and/or it can carry a referer tag that tells us a user clicked a link on a Google results page and/or it can carry an agent string that mentions a Google application which can reasonably be assumed to be genuinely Google's. See bottom of page for further details.


In total Google was somehow involved in 43.4% of daily external page* requests

Google referred to our sites, through its services including search, maps, and Google Earth, 121,079,000 page views per day, representing 36.6% of our external page requests.

Including all of its different search crawlers and services hosted on its servers, Google itself requested another 22,265,000 page pages per day, representing 6.7% of our external page requests.

* = mime type text/html only

In order of request volume

Requests originating from a Google ip address
ServiceTotalPagesImagesOther
GoogleBot19,08917,1847461,159
Wireless3,3143,249623
Other2,340942416983
Translate799796 3
FeedFetcher3588175203
Web search141311
Image search11  
Toolbar11  
Desktop1  1
Maps1 1 
Total25,91422,2651,3002,351
 
Requests originating from elsewhere
ServiceTotalPagesImagesOther
Web search132,658116,3992,08914,170
GoogleBot?5,2514,51640696
Desktop53819 519
Other3294573211
Maps11222891
Toolbar98321253
Image search5028201
Earth2814141
Translate13382
Mail3121
Wireless1111
FeedFetcher1111
Total139,082121,0792,34815,653
 
Top level domains
.com41,77532,2049848,588
.de7,2866,77566444
.uk6,9916,064121806
.jp6,4255,326231,076
.mx5,9445,65982203
.fr5,2424,77383385
.br5,0464,8866892
.it4,7484,42573250
.ca3,9813,56562354
.in3,5003,26469166
.es3,2773,04056182
.pl2,9072,8165141
.au2,7752,43945291
.co2,3352,2732834
.ru1,9841,8773473
.ar1,8081,7513028
.nl1,5581,41736105
.ph1,3631,3102923
.pe1,1831,1402419
.th1,0641,0072334
.tr1,0579932440
.se1,04092118102
.cl1,0179712026
.id9398852826
.be8478061922
.ua8418171113
.ch8347451970
.at7066361555
.fi6546211123
.my6386071120
.ve624601815
.hk623575939
.vn575542924
.il531481841
.sg524454564
.nz523490825
.hu488465158
.ec48046578
.dk472426937
.ro4604331215
.no4594191029
.pt450433710
.gr421402910
.do41740637
.ie385337544
.tw36935658
.cz349330127
.pk336315814
.bg29928865
.za298268723
.hr26525366
.gt26025145
.org24352371
.sa230207618
.eg22220679
.sk19518564
.lt18918242
.ma18617647
.ae181168410
.cr16415248
.uy15515032
.sv15414834
.kr154123526
.pr15214237
.bo15014533
.si13112642
.by999621
.ba969231
.ee969321
.lv939021
.kz918812
.hn888422
.py817713
.lk807632
.tn737021
.ng656014
.jo636021
.rs605711
.md585611
.jm565311
.tt545211
.ni504812
.ke464313
.qa434011
.ge393811
.kw383413
.is353411
.ws321311
.lb312911
.np312911
.lu302614
.az302811
.mu292811
.ps282611
.gh282512
.mt272611
.am272611
.iq252311
.om242311
.bh232111
.net183151
.ug161511
.pa161511
.cm131211
.mn131211
.ly121011
.bn121111
.mv111011
.sn111011
.tz111011
.ci101011
.mz10911
.bs101011
.cu101011
.zw101011
.et9811
.bw8711
.kh8711
.cat8811
.dm7711
.kg7711
.mg7611
.bz7711
.na6611
.dz6611
.gy6611
.zm6511
.uz5511
.vc5411
.gp551 
.bd4311
.rw4411
.cd4311
.ht43 1
.af3211
.li3211
.ag3311
.vi331 
.mw3211
.mk3311
.bf2211
.ao221 
.la2211
.me2111
.fj2211
.ml2211
.dj22 1
.ne1111
.biz1 11
.ls11 1
.tm11 1
.ga1111
.gi1111
.sb11 1
.ai11  
.vg11  
.ir111 
.sl111 
.us111 
.as1111
.sc111 
.bj111 
.gm111 
.to1111
.so1111
.tl111 
.fm111 
.ad1111
.tg11 1
.gg111 
.ki11  
.gl11  
.nu111 
.cy11 1
.st11 1
.ck11  
.tc11  
.ms11  
.pn11  
.td111 
.cg11 1
.tv111 
.bi11 1
.gd1 1 
.tk111 
.su111 
.cn1111
.nr11  
.vu11  
.edu111 
.eu111 
.ac11  
.gov1 1 
.je11  
.cc111 
.sh11  
.tj11  
.cf11  
.im111 
.sm11  
undefined30,98826,2659193,804
Total164,989143,3443,63918,000

In alphabetical order

Requests originating from a Google ip address
ServiceTotalPagesImagesOther
Wireless3,3143,249623
Web search141311
Translate799796 3
Toolbar11  
Other2,340942416983
Maps1 1 
Image search11  
GoogleBot19,08917,1847461,159
FeedFetcher3588175203
Desktop1  1
Total25,91422,2651,3002,351
 
Requests originating from elsewhere
ServiceTotalPagesImagesOther
Wireless1111
Web search132,658116,3992,08914,170
Translate13382
Toolbar98321253
Other3294573211
Maps11222891
Mail3121
Image search5028201
GoogleBot?5,2514,51640696
FeedFetcher1111
Earth2814141
Desktop53819 519
Total139,082121,0792,34815,653
 
Top level domains
.ac11  
.ad1111
.ae181168410
.af3211
.ag3311
.ai11  
.am272611
.ao221 
.ar1,8081,7513028
.as1111
.at7066361555
.au2,7752,43945291
.az302811
.ba969231
.bd4311
.be8478061922
.bf2211
.bg29928865
.bh232111
.bi11 1
.biz1 11
.bj111 
.bn121111
.bo15014533
.br5,0464,8866892
.bs101011
.bw8711
.by999621
.bz7711
.ca3,9813,56562354
.cat8811
.cc111 
.cd4311
.cf11  
.cg11 1
.ch8347451970
.ci101011
.ck11  
.cl1,0179712026
.cm131211
.cn1111
.co2,3352,2732834
.com41,77532,2049848,588
.cr16415248
.cu101011
.cy11 1
.cz349330127
.de7,2866,77566444
.dj22 1
.dk472426937
.dm7711
.do41740637
.dz6611
.ec48046578
.edu111 
.ee969321
.eg22220679
.es3,2773,04056182
.et9811
.eu111 
.fi6546211123
.fj2211
.fm111 
.fr5,2424,77383385
.ga1111
.gd1 1 
.ge393811
.gg111 
.gh282512
.gi1111
.gl11  
.gm111 
.gov1 1 
.gp551 
.gr421402910
.gt26025145
.gy6611
.hk623575939
.hn888422
.hr26525366
.ht43 1
.hu488465158
.id9398852826
.ie385337544
.il531481841
.im111 
.in3,5003,26469166
.iq252311
.ir111 
.is353411
.it4,7484,42573250
.je11  
.jm565311
.jo636021
.jp6,4255,326231,076
.ke464313
.kg7711
.kh8711
.ki11  
.kr154123526
.kw383413
.kz918812
.la2211
.lb312911
.li3211
.lk807632
.ls11 1
.lt18918242
.lu302614
.lv939021
.ly121011
.ma18617647
.md585611
.me2111
.mg7611
.mk3311
.ml2211
.mn131211
.ms11  
.mt272611
.mu292811
.mv111011
.mw3211
.mx5,9445,65982203
.my6386071120
.mz10911
.na6611
.ne1111
.net183151
.ng656014
.ni504812
.nl1,5581,41736105
.no4594191029
.np312911
.nr11  
.nu111 
.nz523490825
.om242311
.org24352371
.pa161511
.pe1,1831,1402419
.ph1,3631,3102923
.pk336315814
.pl2,9072,8165141
.pn11  
.pr15214237
.ps282611
.pt450433710
.py817713
.qa434011
.ro4604331215
.rs605711
.ru1,9841,8773473
.rw4411
.sa230207618
.sb11 1
.sc111 
.se1,04092118102
.sg524454564
.sh11  
.si13112642
.sk19518564
.sl111 
.sm11  
.sn111011
.so1111
.st11 1
.su111 
.sv15414834
.tc11  
.td111 
.tg11 1
.th1,0641,0072334
.tj11  
.tk111 
.tl111 
.tm11 1
.tn737021
.to1111
.tr1,0579932440
.tt545211
.tv111 
.tw36935658
.tz111011
.ua8418171113
.ug161511
.uk6,9916,064121806
.us111 
.uy15515032
.uz5511
.vc5411
.ve624601815
.vg11  
.vi331 
.vn575542924
.vu11  
.ws321311
.za298268723
.zm6511
.zw101011
undefined30,98826,2659193,804
Total164,989143,3443,63918,000
 

IP ranges: known ip ranges for Google are 64.233.[160.0-191.255], 66.249.[64.0-95.255], 66.102.[0.0-15.255], 72.14.[192.0-255.255],
74.125.[0.0-255.255], 209.085.[128.0-255.255], 216.239.[32.0-63.255] and a few minor other subranges

Agents: as for genuine agent strings: too many crawlers indentify themselves as 'GoogleBot' to take this at face value. They are accepted as genuine Google crawler requests only when the ip address matches a known range (see above). Other records that mention GoogleBot are counted as GoogleBot? (question mark, as this may include partners, like DoCoMo). However when the agent string mentions Google Desktop or Google Earth this is always accepted

Service: the service name is based on the agent string (plus for GoogleBot check for ip address, see above), if this is inconclusive it is based on the referer string.

Here is detailed breakdown per service of indicators that pointed to Google (total ≥ 3)
 
ServiceTotalOriginating from
Google ip address
Referer mentions
Google url
Agent mentions
Google service
Desktop538--Y
Earth28--Y
FeedFetcher358Y-Y
GoogleBot19,089Y-Y
GoogleBot?5,218--Y
GoogleBot?34-YY
Image search50-Y-
Mail3-Y-
Maps112-Y-
Other216--Y
Other46-Y-
Other67-YY
Other148Y--
Other2,190Y-Y
Toolbar66--Y
Toolbar20-Y-
Toolbar11-YY
Translate12-Y-
Translate644Y-Y
Translate155YYY
Web search132,658-Y-
Web search14YY-
Wireless3,314Y-Y

Top Level Domain 'undefined': requests with top level domain 'undefined' are nearly all requests from anonymous ip addresses (crawler and other services)

Note: averages below 1 are always rounded up to 1

Generated on Thu, Oct 6, 2011 15:00
Author:Erik Zachte (Web site)
Mail: ezachte@### (no spam: ### = wikimedia.org)
All data and images on this page are in the public domain.