Wikipedias, participation per languageAbout the visualization | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Pages
DataEditor counts (used in participation counts) were collected from the Wikipedia dumps by Wikistats 1 scripts. As usual only active editors (= registered contributors, with 5 or more edits in a given month) have been included.Population counts were taken from the English Wikipedia (Aug 2018). For most countries number of speakers was taken either from 'List of languages by total number of speakers' 1, or from the article about the language (infobox). Counts include secondary language speakers, if available (see also caveats below). For global languages (= with considerable presence in more than one continent) counts for speakers per continent were taken from a list of speakers per country (only available for a small set of global languages), and then totalled per continent. See this spreadsheet for intermediate results. These lists were consulted for English, Russian, French, Spanish, Portuguese, Arabic. For these global languages participation rate is always global participation rate, in other words the same number will be presented in every breakdown. Counting editors per wiki per continent is theoretically possible. WMF collects traffic data that could be used to generate such geo-aware editor stats. However this is somewhat privacy sensitive (especially for small wikis and/or small regions, below the continent level). Also collecting such stats only for this visualization would be too costly. Rankings
CaveatsSecondary speakersAs stated on English Wikipedia a number of sources have compiled lists of languages by their number of speakers. However, all such lists should be used with caution.First, it is difficult to define exactly what constitutes a language as opposed to a dialect. For example, some languages including Chinese and Arabic are sometimes considered single languages and sometimes language families. Similarly, Hindi is sometimes considered to be a language, but together with Urdu it also is often considered a single language, Hindustani. Second, there is no single criterion for how much knowledge is sufficient to be counted as a second-language speaker. For example, English has about 400 million native speakers but, depending on the criterion chosen, can be said to have as many as 2 billion speakers. Summing up speakers per language does not equal population counts, neither per continent, nor global!As stated on each page of this visualization: "(includes secondary speakers; caveat: bilinguals will be counted twice)" Here are the totals for all languages as summed up, compared to actual population counts:
Note: Size of large circles in page breakdown per continent does not reflect overall population count, nor does it reflect summed up speakers counts, as listed above. Instead these large circles have been tweaked manually, so that small circles for similar number of speakers are roughly drawn equal size in each large circle. Distribution by continentAll foreign language speakers in Russia have been counted for the European continent.Conflicting or ambiguous numbers on English Wikipedia for Hindustani/Hindi/Urdu:Hindustani = Hindi + Urdu: 697 L1+L2 on 1,Hindi: 442 L1+L2 on 2 Urdu: 67 L1 on 3 (does this imply Urdu L2 = 697 - 442 - 67 = 188 ?!) Filter202 Wikipedias are shown in the visualization, with a threshold participation level of 0.01 editor per million speakers. At least half of the Wikipedias not included exist only as nearly empty wiki (think of a dozen stubs, which also existed 10 years ago). See also this diagram.TrendsThis spreadsheet is an attempt to quantify changes in participation over a 5 year period.
@ Wikimedia Foundation, CC-BY-SA 4.0 (author Erik Zachte) |