Wikipedias, participation per language

About the visualization

Pages

Data

Editor counts (used in participation counts) were collected from the Wikipedia dumps by Wikistats 1 scripts. As usual only active editors (= registered contributors, with 5 or more edits in a given month) have been included.

Population counts were taken from the English Wikipedia (Aug 2018). For most countries number of speakers was taken either from 'List of languages by total number of speakers' 1, or from the article about the language (infobox). Counts include secondary language speakers, if available (see also caveats below).

For global languages (= with considerable presence in more than one continent) counts for speakers per continent were taken from a list of speakers per country (only available for a small set of global languages), and then totalled per continent. See this spreadsheet for intermediate results. These lists were consulted for English, Russian, French, Spanish, Portuguese, Arabic.

For these global languages participation rate is always global participation rate, in other words the same number will be presented in every breakdown. Counting editors per wiki per continent is theoretically possible. WMF collects traffic data that could be used to generate such geo-aware editor stats. However this is somewhat privacy sensitive (especially for small wikis and/or small regions, below the continent level). Also collecting such stats only for this visualization would be too costly.

Rankings

Languages with 100+ million speakers:
Language code/nameSpeakers
ENEnglish 1281
ZHChinese 1107
ESSpanish 499
HIHindi 442
ARArabic 303
FRFrench 285
MSMalay 281
PTPortuguese 279
BNBengali 262
IDIndonesian 199
RURussian 187
PAPunjabi 148
DEGerman 132
JAJapanese 128
FAPersian 110
Languages with 100+ thousand speakers
and highest rank for participation:
Language code/nameParticipation
EUBasque 139
ETEstonian 111
HEHebrew 107
FIFinnish 74
NONorwegian 68
ISIcelandic 67
SVSwedish 58
CYWelsh 58
BRBreton 58
ASTAsturian 56
LBLuxembourgish49
CSCzech 46
CSBCassubian 46
HYArmenian 45
LVLatvian 43
EOEsperanto 42

Caveats

Secondary speakers

As stated on
English Wikipedia a number of sources have compiled lists of languages by their number of speakers. However, all such lists should be used with caution.

First, it is difficult to define exactly what constitutes a language as opposed to a dialect. For example, some languages including Chinese and Arabic are sometimes considered single languages and sometimes language families. Similarly, Hindi is sometimes considered to be a language, but together with Urdu it also is often considered a single language, Hindustani.

Second, there is no single criterion for how much knowledge is sufficient to be counted as a second-language speaker. For example, English has about 400 million native speakers but, depending on the criterion chosen, can be said to have as many as 2 billion speakers.

Summing up speakers per language does not equal population counts, neither per continent, nor global!

As stated on each page of this visualization: "(includes secondary speakers; caveat: bilinguals will be counted twice)" Here are the totals for all languages as summed up, compared to actual population counts:

RegionPopulationSum of speakers Overcount
Africa 1216 M1104 M-9%
Asia 4436 M5069 M14%
Europe 739 M1204 M63%
North America 579 M 599 M3%
South America 423 M 476 M13%
Oceania 40 M 25 M-38%
World 7633 M8479 M11%

Note: Size of large circles in page breakdown per continent does not reflect overall population count, nor does it reflect summed up speakers counts, as listed above. Instead these large circles have been tweaked manually, so that small circles for similar number of speakers are roughly drawn equal size in each large circle.

Distribution by continent

All foreign language speakers in Russia have been counted for the European continent.

Conflicting or ambiguous numbers on English Wikipedia for Hindustani/Hindi/Urdu:

Hindustani = Hindi + Urdu: 697 L1+L2 on 1,
Hindi: 442 L1+L2 on 2
Urdu: 67 L1 on 3
(does this imply Urdu L2 = 697 - 442 - 67 = 188 ?!)

Filter

202 Wikipedias are shown in the visualization, with a threshold participation level of 0.01 editor per million speakers. At least half of the Wikipedias not included exist only as nearly empty wiki (think of a dozen stubs, which also existed 10 years ago). See also this diagram.

Trends

This spreadsheet is an attempt to quantify changes in participation over a 5 year period.
row 62+ It uses active editor counts per month from Wikistats 1 for 150 most viewed Wikipedias
row 48-60 From these it calculates yearly averages
row 7-19 It uses 2018 numbers for speakers and extrapolates for earlier years, using a global population growth rate of 1.18% Of course population growth differs widely per region, so this is a gross generalization.
row 22-34 It then calculates yearly participation numbers per language from 2006 onwards
row 37-38 Then it compares recent and earlier participation figures for each language (again using averages, now for 3 consecutive years)
row 41-45 Finally it highlights remarkable changes for the most recent trend figures in participation

@ Wikimedia Foundation, CC-BY-SA 4.0 (author Erik Zachte)