Representation of Non-Western Cultural Knowledge on Wikipedia: The Case of the Visual Arts

We explore gaps in Wikipedia’s coverage of the visual arts by comparing the representation of 100 artists and 100 artworks from the Western canon against corresponding sets of notable artists and artworks from non-Western cultures. We measure the coverage of these two sets of topics across Wikipedia as a whole and for its individual language versions. We also compare the coverage for Wikimedia Commons and Wikidata, sister projects of Wikipedia that host digital media and structured data. We show that all these platforms strongly favour the Western canon, giving many times more coverage to Western art. We highlight specific examples of differing coverage of visual art inside and outside the Western canon. We find that European language versions of Wikipedia are generally more “Western” in their coverage and Asian languages more “global,” with interesting exceptions, including that English is one of the most “global.” We suggest how both Wikipedia and the wider cultural sector can address this gap in content and thus give Wikipedia a truly global perspective on the visual arts. Nous explorons des lacunes dans la couverture de Wikipédia des arts visuels, en comparant la représentation de 100 artistes et de 100 œuvres d’art venant du canon occidental avec des artistes et des œuvres d’art notables venant de cultures non-occidentales. Nous mesurons la couverture de ces deux sujets à travers Wikipédia, dans son ensemble et pour chaque version individuelle dans une langue différente. Nous comparons également la couverture concernant Wikimedia Commons et Wikidata, des projets sœurs de Wikipédia qui hébergent des médias numériques et des données structurées. Nous montrons que toutes ces plateformes favorisent le canon occidental, offrant à l’art occidental beaucoup plus de couverture. Nous soulignons des exemples spécifiques


The impact of Wikipedia
Wikipedia is the world's leading website through which people learn about history and culture. It is the number one informational site on the web and gets many times more use than museum websites. For example, the Metropolitan Museum of Art's (the Met) images on Wikipedia get roughly 10 million hits per month versus 2 million per month on the Met's online catalogue (Maher and Tallon 2018). Each day, there are 260 million views on English Wikipedia from about 70 million users. While it is difficult to know what proportion are for "cultural" articles, it is fair to say there is on English Wikipedia the equivalent of at least one Exposition Universelle (nine million attendance) every single day. The English Wikipedia is just one of nearly three hundred language versions maintained by volunteer communities of differing sizes. The magnitude of this influence brings with it a responsibility of equal measure: to ensure its content is representational of the great diversity of communities and cultures that it engages and informs.
Wikipedia is part of the Wikimedia movement, which includes online platforms, volunteer communities, and charitable organizations, sharing the goals of open knowledge for all. In its current strategy (Meta , the Wikimedia movement has explicitly committed to the goal of knowledge equity as one of two core principles: "As a social movement, we will focus our efforts on the knowledge and communities that have been left out by structures of power and privilege." This strategy shapes the grant-making activities of the organizations, and the partnerships sought. For example, Wikimedia's GLAM-Wiki Initiative works with cultural institutions to share their resources openly ("GLAM" is an umbrella term for the cultural heritage sector, encompassing Galleries, Libraries, Archives, and Museums) (Outreach Wiki Contributors 2021b). This includes Wikimedian-in-Residence programmes, in which experienced Wikipedian editors are commissioned by a cultural institution to support an open access culture in the host institution (Meta Contributors 2021c). Although this work is already being done, knowledge equity is such a big task that much more can potentially be done. In this paper we explore how Wikipedia could advance towards knowledge equity in the domain of the visual arts.

Cultural bias
Various forms of bias on Wikipedia have already been described by research, and a focus of the activity for the Wikimedia organizations is activity to address these biases.
Wikipedia's geographic bias and gender bias have their own literatures, so will be outside the scope of the present research. Here we focus specifically on cultural bias, that is, underrepresentation or misrepresentation of aspects of the cultures of the non-Western world. It has long been observed in the literature that the different language versions of Wikipedia reflect cultural biases of, and celebrate the "local heroes" of, their respective language communities (Callahan and Herring 2011; Maurer and Kolbitsch 2006). For example, the biographies in European-language Wikipedias do not follow the pattern of world population but greatly emphasize the culture of Western Europe and the United States (Graham, Hale, and Stephens 2011).
Cultural biases existing on Wikipedia can generally be considered a reflection (both a cause and a consequence) of biases existing in the literature and more widely in society. These societal biases have a long and well-documented history, rooted in systems of hegemony and oppression like imperialism. Seminal works such as Edward Said's Culture and Imperialism have spotlighted how many of these biases persist in the postcolonial era. Globalization facilitated less of a proportionate cross-cultural exchange and more a spread of the predominant culture (that is, Western).
The term "art" has a complex history (Steiner 1996;Dean 2006), and we recognize that defining it in general is a tricky business, especially in the context of recognizing different cultural qualifications. Universalizing the term, which to some extent must be done for the purpose of comparative analysis, comes with the risk of potentially "employing a Western bias to explore a Western bias," thus replicating the bias. Our attempt to minimize such a risk is outlined in the next section of this paper, where our respect for definitions and hierarchies is reflected in the inclusion of various works that are considered "visual art" according to various non-Western cultural traditions.
The internet initially promised to make geography irrelevant, but algorithms have created new kinds of inequality in the amount of data about physical locations and its availability to different language communities (Graham and Zook 2013). Recent activism, such as Black Lives Matter and the debate over the holdings of European museums, has underlined the urgency of unearthing overlooked or oppressed histories and cultures.
These questions are being raised in the most traditional cultural institutions, as well as by online platforms such as Wikipedia.

The visual arts
Whereas many forms of bias relating to a specific culture-such as its music, language, literature, performing arts, history, fashion, food, philosophical ideas-clearly exist, this paper pertains specifically to the visual arts. As per the scope of this paper, the culture under examination is the entire "non-Western" world (a concept defined later).
A pro-Western cultural bias relating to the visual arts can be demonstrated with a superficial survey of visual-art-related lists on English Wikipedia, the largest language version. For example, its "list of sculptors" is 99% Western, its "list of painters by nationality" is around 75% European, and its "list of contemporary visual artists" is 80% European. Moreover, many countries (even those with especially rich artistic traditions, such as Libya and Mali) do not even have dedicated articles about their art in the same way that there exist exhaustive articles such as "Art of France" or "Art of Greece." This national bias is further evidenced by the "list of national museums" where non-Western national museums (even those among the most visited in the world, e.g., Brazil) have relatively short, insufficient articles, often without collections galleries (something that is almost a given for most major Western museums). It is also indicated by the fact that despite there being many museums in the non-West dedicated to a single artist, the articles covering the "list of single artist museums" and "museums devoted to one artist" are 90% Western.
One could imagine a situation where Persian Wikipedia had a similar emphasis on Middle Eastern art and so on: in other words, where these imbalances in coverage were all due to the "local hero" effect. Instead, we think a larger bias is at play. Our hypothesis is that Wikipedia (taking all its language versions as a whole) has significant and systemic imbalances in the representation of non-Western visual arts, and that these can be identified and addressed. As such, the main objectives of this research are: to identify those areas in Wikipedia's coverage of the (visual) arts where there are significant imbalances according to culture, language, and geography; to ascertain the scale and nature of these imbalances; to describe what a more equitable representation of visual arts on Wikipedia would look like; and finally, to suggest strategic and practical ways towards that greater balance, building on the work already being done by the Wikipedia communities and organizations.

Paper structure
To test the hypothesis concerning the representation of non-Western cultural content on Wikipedia, this paper will take both a quantitative and qualitative approach. A research methodology based on making comparisons of the coverage of Western artists and artworks vis-à-vis their non-Western counterparts will be employed.

Definitions and scope
What exactly are we classifying as "visual art"? In theory, visual art can refer to a range of artistic expressions including conceptual art, installation art, and contemporary art, but this paper will focus on the traditional art forms that have been practiced over the centuries and across the world and have often been referred to as "fine art." Yet what is considered "fine art," too, differs according to different cultures: The hierarchy in the West has placed epic easel painting at the highest, whereas in the Islamic world calligraphy is among the highest, as are textiles and miniatures in Persia and calligraphic landscapes in China, and in Japan there is a special reverence for decorative and applied arts.
This study balances the need to be respectful to each of these hierarchies whilst also standardizing to some degree to allow for reasonable comparison. After careful consideration of these cultural sensitivities, it was decided that the paper should largely focus on painting and sculpture but also include other media such as illuminated manuscripts, textiles, and calligraphy. It does not include architectural features, although it must be noted that much artistry and craftsmanship-for example, the stained-glass windows of European Cathedrals or the geometric tilework and calligraphic inscriptions in Samarkand, Bukhara, and Alhambra-was recruited to serve aims of aesthetic creativity. The study does not include architecture, ancient artifacts, manuscripts (unless with calligraphy and illumination of considerable merit), jewellery, furniture, or fashion.
Many of the artists involved in these projects-particularly outside the Westremain anonymous.
The "West" is a problematic term and concept, as it promotes the notion of a bipolar, dichotomic world. What is classified as non-Western culture is all culture originating and prevailing outside of Europe, Scandinavia, Russia, Eastern Europe, North America, and Australasia, except for those cultures (now in the minority) indigenous to those lands, such as aboriginal and Inuit. This is an extremely large group.
Is it fair to put Europe with a population of one sixth of the world against the rest?
It would in theory be more apt to compare Europe with another continent such as Latin America or Africa. This should be an absurd exercise, but in fact the results show it is absurd for exactly the opposite reasons.
The time scope of art in this study is roughly 1000 years. There are many reasons for this. Firstly, this covers the emergence of the conventional East-West dichotomy, and therefore the "West and the rest" narrative that continues to this day. Secondly, this period comprises major cultural civilizations from across the world and therefore various artistic golden ages, which celebrated, commissioned, recorded, and preserved the works of leading artists. Thirdly, this covers the era of the great European empires, which collectively governed the majority of the non-European world-important, as (especially) the last 500 years of European colonialism suppressed or looted many indigenous works from the colonies, the legacy of which is much of the knowledge imbalance that this paper seeks to highlight. Fourthly, before this period, artworks were often considered artifacts (or sometimes in the Western case antiquities) rather than masterpieces produced by an individual artist, or even a guild or atelier. A typical demonstration of this might be the exhibition of the piece in a historical museum rather than a dedicated fine art gallery.

Identifying Western artists
English Wikipedia has a system of "Vital Article" lists that define topics judged to have different levels of encyclopedic importance (Wikipedia Contributors 2020). Level 1 contains ten articles (including "The Arts"), Level 2 contains one hundred articles (including "Visual Arts"), and so on. These lists are compiled irrespective of the quality of the existing articles. It is fair to the Wikipedia community to use a standard they have set themselves, so we took the Vital Article lists as a starting point.
The 10,000 topics at Vital Article Level 4, as of November 2020, included 78 Western artists; our shortlist began with these. The additional 22 artists were selected after consultation with the wide range of lists available in media articles and published books.
"Top 100 Artists" lists are common with regards to Western artists. In our choices we aimed to diversify a list dominated by painters from a few European countries, introducing women, decorative artists, and Scandinavian artists.

Identifying non-Western artists
The same methodology for establishing the set of leading non-Western artists was simply impossible. For instance, only three leading non-Western artists have vital articles (Hokusai, Riviera, and Kahlo). No single definitive list exists as a counterpart to the abundance of sources defining the Western canon. Therefore, a mixed methodology was developed towards making a list of 100 artists that could credibly serve as a counterpart to our Western list.
One of the starting points was to consult the lists already available on Wikipedia.
The "list of African artists" and "list of Chinese artists," for example, provided a sound basis for further investigation, as it is these lists-however inadequate-that we intend to amend and enrich as a result of the research. This initial compilation of non-Western artists was then cross-referenced against those listed through Google search's respective lists such as "African artists" or "Chinese artists." As Wikipedia and Google lists of this sort are usually considered indicators of popularity, those appearing on both lists were shortlisted for further investigation.
Separately, a digital media search was conducted, and a number of magazine articles, for example, "Top African Artists" or "The Greatest Japanese Sculptors" and other such rankings were consulted. Where names appeared frequently in different articles, those were shortlisted and again cross-referenced with existing lists. A highlevel (though limited) literature review of books and articles was conducted to list the canon in each major region according to academic experts. These were again crossreferenced against existing lists with a view to shortlisting those artists who were both popular as well as critically acclaimed.
Another measure or "marker" for artists deserving a place on this shortlist was whether they had attained official recognition through national and international awards, as well as receiving the highest national honours for their contribution to visual arts, as well as those considered "national artists" or those appointed "imperial court artists." Some of these names overlapped with existing research whereas others required further validation. Much of this validation came from interviews with experts in the respective fields of art. These experts are listed in the Acknowledgements.
Finally, we cut down the lists of Western and non-Western artists to make lists that were similar in terms of time-period coverage and were diverse in multiple respects. It is important to note here that the resultant list (in Table 8 and Table 9 [Appendix A and Appendix B]) is a representative and indicative sample, sufficient for this particular study to test the hypothesis and provide indicative results. It is not exhaustive and certainly not aimed at establishing a definitive "Top 100." The latter would be outside the scope of this paper and require extensive research and consultation, warranting a paper in its own right.
The English Wikipedia defines a topic as notable when it has significant coverage in at least three reliable sources. Language versions of Wikipedia differ somewhat in their notability standards. All the artists identified through the various forms of research can be considered notable, and therefore deserving of Wikipedia articles. For the purposes of this study, where the objective was to have a representative sample list of counterpart artists to those in Western culture, shortlisting through this process of verification suffices. Some names who created more than one masterpiece were also included.

Identifying Western masterpieces
As with the Western artists, we used English Wikipedia's lists of Vital Articles as a starting point for our target list of masterpieces. Getting the relevant articles from Vital Articles Level 5 and filtering out some that were ancient or too recent gave us 170 works.
Wikidata allowed us to identify that 78 of these works had articles in Encyclopedia Britannica, which was an additional cue to notability. The longlist included many cases of multiple works by the same artist, so we cut this list down to 100 while preserving diversity by removing works by artists who were already included (see Table 10 [Appendix C]).

Identifying non-Western masterpieces
The process of shortlisting a representative set for leading non-Western masterpieces was different from all of the above, though there are some similarities with the process of researching non-Western artists.
This list was the most challenging to compile; firstly, this is because no such list currently exists, and secondly because substantial research into non-Western masterpieces would simply unveil too many options to shortlist from. Though Wikipedia and Google search unearthed some notable examples of non-Western masterpieces, this method was not as helpful as it was for researching non-Western artists. So, we began by including the most celebrated works listed as "national treasures" by various non-Western countries, namely those that subscribed to our remit of visual art. In addition, highlights from National Museum and Galleries collections across Asia, Africa, and Latin America were also longlisted, as were those identified from a media review as artworks of symbolic significance or representing an important cultural movement. We added to this a select number of works from the non-West that broke sales records at major auction houses, as well as names appearing repeatedly through our literature review. The list was finalized after cross-referencing with scholarly experts and shortlisted to 100 based on the expert discretion of the authors of this paper (see Table 11 [Appendix D]).

Quantitative comparison
The finalized lists of Western and non-Western artists and masterpieces defined four content areas whose coverage we could explore both quantitatively and qualitatively.

Wikipedia articles
Wikidata queries provide all the Wikipedia articles about a given topic-in this case, articles about the artists and artworks in our lists. Our code then requested the byte length of each article from the relevant language version of Wikipedia. Byte length is a fairer measure of the content of an article than character count. For example, characters in English take one byte each, in Hebrew two bytes each, and in Chinese three or four bytes each.
It was discovered that there were five times as many articles about our Western artists (total 7,808) as non-Western (1,621) and sixteen times as many for Western masterpieces (2,570) as for non-Western (165)

Digital media files
Files on Wikimedia Commons can be tagged with an artist's name for many reasons.
They may be a depiction of that artist, a photograph of an artwork, or a document relating to them. The connection can be more tenuous: photographs of places where the artist lived, or of places named after them. A Wikidata query provided us with the categories relating to our chosen artists. Categories can contain sub-categories, and so on iteratively, so to get total numbers of files we used the Commons API and, for a few especially large categories, the PetScan tool created by Magnus Manske (https:// petscan.wmflabs.org/). There might be files related to a topic that exist on Commons but are not categorized appropriately, or where the category link exists but is not known to Wikidata, so our measure might underestimate the coverage of obscure topics, although we mitigated this by searching directly on Commons and adding a few links that were missing in Wikidata.
We found twenty-one times as many files for Western artists (total 185,509) as for non-Western (8,980 files). All of the Western artists had a category on Commons compared to 84 of the non-Western (see Figure 3).

Database statements
On Wikidata, all of our Western artists and masterpieces were already represented.
Of the 100 non-Western artists, 99 already existed in Wikidata, along with 34 of the 100 non-Western masterpieces. Wikidata's query service allowed us to count the statements for each. We found just under four times as many statements about Western artists as non-Western artists, and nine times as many statements about Western as non-Western masterpieces (see Figures 4 and 5).

Differences across language versions
The language versions of Wikipedia have contributor communities that vary greatly in their size and where they are located. Thus, they vary in the amount of text they have produced and about what topics. For each pair of an artist and a language version of Wikipedia, our data have a byte count expressing the size of the artists' article in that language. By summing across each language, we can compare our matched lists, measuring the degree to which different Wikipedias prioritize the Western canon in the field of visual arts. Since we are comparing the coverage given to matched lists, our measure is not directly affected by the size of the Wikipedia itself.  Our measure is each Wikipedia's coverage of our Western artists divided by its coverage of the non-Western artists. Thus, higher numbers mean a more Western focus and lower mean more global. Table 1 shows this ratio for 86 of the larger Wikipedias.
Six of them give more coverage to our non-Western than to Western artists.

Language
Language code Western artists (bytes)  Table 2).  Su Shi, the 11th-century Chinese artist whose painting broke the record for highest selling Asian artwork, was a polymath also celebrated as a poet, engineer, litterateur, scientist, and political figure. He is covered in 35 language versions of Wikipedia, whereas the Western polymath and comparably versatile artist Leonardo da Vinci is one of the most covered artists on Wikipedia, with articles in 222 languages totalling nearly five million bytes (see Table 3).  Wikipedia coverage levels (see Table 4).  Beyond artists and artworks, another way of seeing the disproportionality in representation of the visual arts is by analyzing Western artistic movements vis-àvis counterparts outside the West. For example, the Pre-Raphaelite Brotherhood in 19th-century England was a major movement that sought a return to traditional forms of Western art and comprised a number of notable artists, critics, and patrons (such as Millais, Burne-Jones, Gabriel-Rossetti, Ruskin, Morris). It is extensively covered on Wikipedia, Commons, and Wikidata. The Bengal School of Art likewise rejected modernism and sought a reversion to traditional forms, and also included major artists, critics, and patrons such as Bose, Tagore, and Kastghir. Its coverage on Wikipedia is minimal in comparison to that of the Pre-Raphaelite Brotherhood (see Table 5).  Another suitable comparison might be the European Post-Impressionists and the Japanese Nihonga movement (see Table 6).

Discussion
We have replicated the common finding of a "local hero" effect, with European artists given higher priority in European-language Wikipedias, but that is not the most salient result. Looking at Wikipedia as a whole, and at the multilingual sites Wikidata and Wikimedia Commons, we found large differences in their relative coverage of our Western and non-Western artists: ratios of 7, 4, and 21, respectively. We showed earlier that an examination of English Wikipedia shows a strong emphasis on Western rather than non-Western art; it turns out that English is one of the least biased major Wikipedias in this respect.
Wikipedia's volunteer contributors summarize published sources, including books, research papers, and institutional catalogues. Factors that might contribute to an imbalance of coverage include the extent to which different kinds of art are described in published sources, the availability of those sources to Wikipedia contributors (in their language and in forms that they can access), and the interests and priorities of contributors to a given language version.
By our quantitative measure, Wikidata has much less Western bias than Wikipedia If, instead of ratios, we consider the absolute size of coverage of non-Western art, we see that this coverage is most extensive on European-language Wikipedias. The languages that have more than 500,000 bytes of content about non-Western artists are German, Spanish, French, Russian, Ukrainian, and Armenian. This is unsurprising given that these are among the largest versions of Wikipedia, with relatively large volunteer communities. It suggests one interim way to address the imbalance and make other language Wikipedias more global may be to translate articles from these to other languages. This, ironically, would help improve the pro-European emphasis of Wikipedia as a whole, although it would mean that the articles are drawn primarily from sources in European languages. This would be a step in the right direction, but not a solution to the problem of knowledge inequity due to systems of power and privilege for which we suggest bolder action later on.
We did a follow-up analysis focusing on coverage of the Arabic and Persian artists and masterpieces. Summing the coverage of these topics, excluding those languages whose total coverage is less than 100,000 bytes, gives us the results shown in Table 7.

Recommendations for the cultural sector
The representation of a topic on the Wikimedia sites depends on multiple factors.

Recommendations for the Wikimedia contributor communities
Wikipedia and Wikimedia volunteer contributors can take action straight away to reduce the content gaps described in this paper.
• An outstanding example of work to reduce a content gap on Wikipedia is the Women in Red project (Wikipedia Contributors 2021b). This addresses the gender content gap by using Wikidata and other sources to build "redlists": lists of notable women who do not yet have a Wikipedia article and whose links are therefore red. In addition to these lists of target articles, the Women in Red project pages include bibliographic sources, guidance, and supporting materials for "editathon" events. Volunteers can choose an article to create, turning the link from red to blue. During the existence of the project, tens of thousands of new English articles relevant to women have been created.
It is hard to know how much new content to attribute to a specific effort, but research has found a rise in article quality for the broad topic of women scientists compared to articles in general (Halfaker 2017). We propose that there should be similar projects for the gaps in representation of the visual arts. The Wikidata identifiers and other information in our appendices can be used to make redlists.
• The community should consider adding artists and masterpieces from our non-Western lists to the Vital Article lists on English Wikipedia and any counterparts on other language versions.
• Since 2015, Wikipedia has had a Content Translation tool, which prepares a machine-translated version of an article that a human user can correct and publish (Dolmaya 2017). We have seen that English, French, and Russian Wikipedias have a relatively large volume of coverage of non-Western art, so translation of those articles into more languages would improve the balance.
• A crucial supply of Commons images comes from photographs of out-ofcopyright works that museum visitors have taken and then uploaded. For museums that do not have a formal programme of digitization, this informal digitization is an option for creating digital content. It requires the institution to allow, even encourage, visitors to take photographs as part of their engagement during the visit.

Recommendations for Wikimedia organizations
Addressing knowledge gaps is already a main focus of the activity of the Wikimedia

Limitations and further research
Further subdivisions of the categories of Western and non-Western art and artists offer additional research questions that could be investigated. For example, examining gender parity in the history of Western art vis-à-vis the history of non-Western art in Wikipedia was outside the scope of this study, but clearly emerged as an important and necessary area of further research. Also related specifically to representation on Wikipedia, investigating the extent to which disproportionality in such content related to racial, ethnic, geographical, cultural, and religious disproportionality in editors and readers would also be important.
Perhaps more indirectly related to representation on Wikimedia, investigating people's general knowledge of non-Western art history and exposing the bias or ignorance even among those considered to be "cultured" or reasonably knowledgeable about art history (such as students and scholars) would be helpful in explaining how this is reflected on Wikipedia.