Notes from conversations and meetings and to self.
Gems found along the way waiting to be further sorted or put to use.
Call with the SABIO stakeholders:
- Example tool by Sudox, apparently marks potentially problematic words and computes an overall sentence score based on the ratio of such words.
- Words Matter
- Mahendra Mahey, manager of the British Library Labs
Forwarded by Marieke:
-
Google Article on addressing gender bias in Google Translate
-
the CRAPL License for software & related resources written in academic contexts
Mrinalini has mentioned:
- Humane AI is a meta-project between UvA's Humanities and Science faculties
- she's a part of CREATE, supervised by prof Noordegraaf
- she TAed a course of prof Jeurgens
Julia has mentioned:
- Society for Social Studies of Science's Conference called "Good Relations: Practices and Methods in Unequal and Uncertain Worlds" (Deadline 8th March)
Chat with Corey:
-
Lukas Koster: Group leader of library & information systems at UvA -> Corey can establish a connection
-
AI4LAM (e.g. their GitHub or Google Sites), a group on AI for Libraries, Archives and Museums -> Corey is a member and will keep his eyes open for us, relatedly, Marieke mentioned LODLAM a group on Linked Open Data for LAM
Conference organised by AI4LAM -
there was a workshop at NeurIPS 2020 named Navigating the Broader Impacts of AI Research -> found after searching for the ethics debate at NeurIPS 2020 (see e.g. this Nature article)
-
the Algorhtmic Justice League, garnered a lot of attention following their exposal of racial bias in facial recognition systems
-
Article by ProPublica on racial bias in criminal prediction software used in the US
-
FAccT Conference ACM Conference on Fairness, Accountability, and Transparency -> Corey mentioned this a potential place for networking
-
DCode Network, project for 'design as a way to open pathways toward inclusive and sustainable futures' -> probably interesting for the user-interface side of SABIO
Meeting with Richard van Alphen from Tropenmuseum:
- item descriptions are curated when digitised; provenance trail is however available
- item titles are digitised without change, so the their time of creation likely matches the item's indicated year
- will talk to his people about getting access to their DB to be able to query it directly and get back to us
SABIO/AI:CULT Kick-off meeting (meeting notes):
- Jesse de Vos and Johan from B&G are part of CaptureBias; he had the following notes:
- what about the bias that manifests in the gaps of the museums' collections? bias as gaps?
- they found it useful to speak of framing rather than of bias and to distinguish thematic and episodic framing
- Markus Bakenhol, postdoc in ethnology at Meertens, is a domain expert
- Johan mentioned the Inward Outward symposium as an outlet and place to get in touch with others
- Johan asked if we could find any published benchmarks or competitions (perhaps at Kaggle?), or even publish one ourselves? => benchmarking is a good point!
- the Europeana Challenge which we'll potentially go for (shared document?)
- abstract to be submitted at the LIBER Conference; this shared doc contains the abstract
Mattia's (cultural analyst) intuition, depending on definitions of course, is that bias is opposed polyvocality: bias is nurtured by the lack of voices and conversely put, a diversity of voices makes bias unlikely to prevail
=> Marieke agrees with the gist of this position
this article from an initiative could be interesting
The Portrait of a Lady: Close and distant reading of media gender bias: abstract for a paper by Laura Hollink on quantifying (binary) gender bias in Dutch newspaper; the approach takes advantage of the lexicalised marking of gender in Dutch (hij/zij) and measures bias as the degree to which an algorithm can predict gender; predicition is not done on the pure text but on features extracted based on previous conceptual work
Marieke mentioned ARIAS, a 'Platform for Research through the Arts and Sciences', that has grants for work sitting at the interesection of art and science
there is the Stanford Encyclopedia of Philosophy, it has entries e.g. on Feminist Epistemology, race, implicit bias => really good reference, commonly used by philosophers and very accessible
Found the term "Marrons" in the collection RDF (URI https://hdl.handle.net/20.500.11840/206868) -> interesting first task: track the term and its usage across the graph, find adjacent terms, etc
Andrei has mentioned From Cartography To Cookbooks, "a web of Dutch colonialism" and online exhibition which shines a light on issues such as race and gender in the context of colonialism with maps and cookbooks as media (Allard Pierson page for the exhibition)
Victor has mentioned this blog post by MIT - yet another example of racist and sexist image processing AI
Cindy mentioned:
- 'facets' (of which there are 5) categorise terms in the thesaurus into semantic function -> could be seen as 'prescribed' associations
- the term 'Bosneger' (an old, derogatory label for 'Marron') is in the Words Matter publication -> compare associations from there with those from DB
- the AAT (Arts and Achritecture Thesaurus) by Getty
- thesauruses made made entirely from Western museum perspectives -> bias is deep
- could be a valuable source of shared knowledge (Wereldculturen thesaurus is rather specific)
- histories of museums' objects after acquisition could provide an useful 'grounding facts' (represent some of the materialised, as opposed to linguistic aspects of bias) -> e.g. royal objects end up in Rijksmuseum, others in the heritage museums
- relogious bias as interesting case?
- Rembrandt labelled as painting, buddhist work labelled as decorative art -> clustering based on properties (i.e. 'semantics') should reveal
- 'Rapanui' (a natively used term) not in DB, Rapa Nui and Paaseiland are -> labels are difficult even if not obviously discriminatory
- Europeana has a Python API for its database
Mattia has mentioned Johannes Fabian (UvA Emeritus), specifically his 'Time and the Other'
-> to quote Mattia: 'Different situated notions of time could constitute another vector of bias'; idea: use the fact that Dutch marks time, i.e. look at verb tenses
-
CollectionsAsData, lead by Thomas Padilla, a project on computational methods in libraries
-> could contain useful resources or links, e.g. Statement on treating collections as data which we're actually doing as well
-> could be an interesting group to get in touch with -
Research Center for Material Culture, part of Museum van Wereldculturen
-
the KB:
- Example data sets for download
- Delpher: search interface to the KB's collection (books, newspapers, magazines)
- Demo on Delpher: interactive word analysis and visualisation tools
-> the KB's collections are massive and entirely text
-> can provide useful background knwowledge/context for MvW's collection items (perhaps matche by time and/or topic)
-> could be leveraged as training/test data
-> Delpher itself or the Delpher demo could: give hints for how to query databases and/or be an object of study in the sense that they might reveal bias -> in any case, looking at methodologies/implementations there could be useful
-
The Real Face of White Australia, a data story that uses historical government records about non-European Australians; this page/project is the outcome of this chapter in the book Seeing the Past with Computers: Experiments with Augmented Reality and Computer Vision for History
- SemEval2020 Task 12 on the identification of offensive language coul be an interesting test case
~~Andrei mentioned:
- Zotero, a library management tool; we could all share our libraries through that~~
through an INDELAB connection found this blog post about decolonising AI
-> cound contain useful pointers
books on how current AI reinforces biases and inequalities and how to do it differently:
- Data Feminism -> could have an interesting (inherent) connection to Feminist Theory
- Race After Technology: Abolitionist Tools for the New Jim Code -> author has many publications about racial issues in the world of modern technology
- Algorithms of Oppression
Johan put us in touch with someone who's working on a similar project which deals with biases in meta-data of heritage collections
Niels ten Oever mentioned:
- Sorting Things Out, a book about classification as a social practice
- Internet Daemons, book about social practices and phenomena around internet technologies
- The Digital Sublime, book on the myths revolving around the digital world and how power structures are reinforced through (rather than fulfulling their promise to break them); Wikipedia article about the phenomenon
- Fully AUtomated Luxury Communism, a manifesto for how digital technologies could lead society into a better world
- Frictie - Ethiek in Tijden van Dataisme, book about ethics in the time of big data -> insteresting because the author is Dutch -> could invite her at some point
scholar search on social identity & categorisation returned interesting-looking results
- Wereldculturen data:
- Jacco mentioned that Wereldculturen should eventually have their own data exposure process for researchers (& others) -> make sure that Cindy is aware that SABIO is building essentially that, so that they could perhaps use some of it (+ the process we went through to get the data)
SABIO meeting:
- Cindy mentioned:
- she's working on the Pressing Matter project
- metaphor of a funnel for the program -> related to my own thoughts on bias detection as search, but a nicer metaphor
- bias as absence: the systematic absence of people in the data or the absence of fine-grained attributes for people is an instance of bias -> choice of words can indicate that, too (e.g. the choice of identifier for a person, cf. 'a man has a name') => this is closely related to, if not the same as, silencing
- MVP: limit use cases to profressionals
- Marieke mentioned a nice idea: heat maps on the collection/subgraphs/etc to direct users' attention in a non-binary way, to visualise/uncover patterns -> talk to Werner about this
- user should be able to input cues -> not only: detect bias in a given text/object/collection, but also: find everything in a given collection that is similar to a given cue
- a stamp to brand slaves with - interesting because the description "neutral"
- Jesse mentioned Philo van Kemenade, who works for Beeld & Geluid (and who is in an AI4GLAM task force at Europeana)
- Jesse mentioned Tobias Blanke recent professor at UvA and ILLC, who works epistemological implications of AI (and generally the interface of philosophy and computer science)
- Mrinalini mentioned Michel-Rolph Trouillot who conceived the term 'silencing' (most notably in his book Silencing the Past: Power and the Production of History (1995))
-
Jelle mentioned that he is part of a project that has something to do with bias detection?
-
Marieke mentioned this tweet for the website (where the Q&A is on)
-
Jacco mentioned the FAccT Conference which has interesting papers
meeting with Richard:
- CollectionConnection is the tool, the NMVW used to convert their databases to RDF
-> Richard will share the schema they used for the conversion -> can we maybe use the schema to do the conversion ourselves? - the
Objects
table has a field title, but the tableObjectTitles
was created since objects can have multiple titles (either replacing each other or living side-by-side) -> the tableTitleType
contains information that can/could allow to reconstruct a version order of the titles - Richard thinks that the procedures from the database to ML-ready input could be interesting for future and general use -> potentially make processing scripts and procedures reusable for publication
meeting with the Goethe Inistute (of Finnland and of NL):
- website of Artificially Correct, where they address bias in (machine) translation
- poco.lit a Berlin-based platform for postcolonial literature -> collaborators, have written articles for them
- Workshop for translation practitioners on 23 & 24 April
- Hackathon planned to develop tools to reduce/detect biases in MT at some point in autumn
Marieke mentioned Nexus Linguarum, a platform to promote synergies between European linguistic data science practitioners
meeting with Vendela (university homepage & personal website):
- is part of the project The Politics of Metadata
- shared her slides on her investigation into the representation of Sámi heritage in a (which?) Swedish museum (attached in an email)
- the Politics of Metadata project is a part of the Metadata Culture research group
- Cindy mentioned: Decolonize the Museum Conference by FramerFramed (there's also a document on Decolonizing Museums which could be a valuable resource)
- Jelle Zuidema is part of the Bias Barometer, to quote: "We explore the relationship between what we read on (social) media and the effects on our (stereotypical) beliefs and actions."
- Julia mentioned a talk on The Logic of Decoloniality by Jonathan Chimakonam, who does philosphical research on decolonising research (see links for papers); the point of such research is that in the tripartition of content, method and foundation, the foundation needs to be decolonised alongside content and method (which is what is usually focussed on)
=> this line provides a good guideline for how the field (cultural AI) as a whole should evolve towards
Meeting with Andrei & Ryan:
-
idea: correlate sentiment of words with their contentiousness (as labelled in ConConCOr) -> can answer the question: 'is sentiment a good predictor of whether a word is contentious?' -> perhaps use BERTje's word embeddings for sentiment (or sentiment analysis)
-
idea: do the analyses of semantic change from Jurafsky's paper on semantic change in the context of the ConConCor -> does contentiousness correlate with factors of semantic change? can we predict contentiousness from semantic change?
-
idea: phrase annotation task for ConConCor in terms of the participants themselves: "how comfortable would you feel saying this word/sentence in public/private/in your head?" / "would you feel hurt if someone said this word/sentence to you?" -> contentiousness is an emotional matter -> getting people's embodied perspective is necessary
-
this video talks about re-designing Bayes' theorem into: O(D|+) = O(D) * P(+|D)/P(+| not D), where D is the RV of whether or not a disease is present and + is the RV of whether a given test was positive =>
Meeting with Marieke:
- her PhD student (Philipp) tried stereotype detection on KGs
- possible publication at the CLARIN Conference (3-4 page abstract due on April 14):
- the infrastructure/procedure from the Wereldculturen database to data set for cultural AI/AI4GLAM (use-case: bias detection system in colonial contexts)
- procedure and analysis of questionnaires for heritage professionals: defining the tasks and approach of the professionals in order to automate and enhance with ML
- idea: concordance: get concordances of words (word pairs): for a given word (based on PMO, or other measures), find and expose the other contexts it occurs in; there is also statistical measure which measures concordance
- paper by Jacco and others: model transparency through interface and presentation and empirical study of its impact when historians work with ML
Meeting with the bias B.Sc. project:
- idea: identify extra-linguistic variables about object (region/culture/etc), then correlate them with for instance sentiment analysis of the description
e.g.: group objects by culture, then do sentiment analysis on their titles/descriptions and correlate; typicality could help as a concept, examples could be extracted - BERTje ([paper, code) & RobBERT (paper, code) are Dutch transformer LMs, also available on Huggingface
Seminar by TU Wien Digital Humanities, recording: Hi Ryan, Valentin and Andrea,
The Meertens Institute will have a staff meeting next Monday between 10 am and 11 am. This is a quarterly (casual) meeting in which we catch up with each other. We always introduce new employees in this meeting, and Antal and I would very much like to invite you to come this Monday. It would be great if you could introduce yourself briefly during this gathering and tell something about your work in the HuC. Would you be willing to do this?
Best wishes,
Simone
- Sally Wyatt, at KNAW on feminist history & stances
- prof Hinda Haned professor on data science, on defining bias
Johan is organising a EuropeanaTech X CulturalAI lab event; the agenda contains many interesting resources on decolonial approaches, practices and problems in the museum, e.g.:
- a list of decolonial resources by by the Museums Association
- this talk about colonial meta-data, which linked to this group's homepage (where the talk's speaker is the leader)
Chat with Senka:
- this person (instagram) in the non-binary community has a strong social media presence
- same for this person (instagram)
- Senka also has friend (instagram) who does workshops and art around language and gender might be interested to collaborate
SABIO meeting:
- Werner shared a project of his
Andrei shared this Medium post about 'visualising whose stories are missing'
Jesse pointed to:
a tool for inspecting word pairs, very basic
someone who participated in the questionnaire is part of the LGBTI Heritage Ogranisation (IHLIA)
Julia keeps mentioning:
- standpoint theory (proposed a.o. by Sandra Harding (who has been affiliated with the UvA), as a formal philosophical basis for definitions of bias
Marieke forwarded (Jelle retweeted):
- this tweet about Information Gain and this primer on information theory; this paper was also mentioned in the thread -> these concepts could be useful for dealing with bias as absence
- a blog post about AI experts comparing current ML to alchemy
meeting with Marieke:
- Black Archives
- Nijmegen Afrika Museum?
- Imagine IC
- IHLIA
- perhaps a focussed workshop for non-ninary people/on hetero-normative biases in collections around gay pride in Aug?
Oskar mentioned:
Julia Noordegraaf sent an email about a conference with a speaker from MIT's Data + Feminism Lab
Martijn mentioned:
the National Arcvhies have historical language in their collections and are aware that that might contain undesirable language (explanation page)
Marieke mentioned:
The Cultural Life of Machine Learning, An Incursion into Critical AI Studies
Saskia pointed to Rijksmuseum's new exhibition on slavery, co-curated by Valika Smeulders
WORKSHOP:
-
Wayne:
- why 'bias' instead of e.g. 'racism'?
- even the word 'human' is biased (indicated anthropocentric bias) -> probably means: bias is everywhere
-
Hodan:
- bias navigation should be disruptive: disrupt the ways we search and find information in collections
Cindy forwarded Fantastic Futures 21 Call for Abstracts, due June 15th
RCMC's webinar's speaker Wendy Hui Kyong Chun has written interesting books
ARK - MU presentation:
- Angelique (Director of MU Eindhoven) mentioned Documenting complexity (funded by NOW, carried out at RUG, in collab with B&G)
- Roosje has the Mnemosyne Bilderatlas by Aby Warburg => created a system for visually -remembering things together (=association) AND also a system for organising archives
Paul has mentioned crowdtruth.org, a source of papers on how source ground truth and deal with inter-annotator disagreement
Saskia has mentioned a presentation about trust and utility in heritage LOD
Marieke hsa mentioned Google People + Ai Research
Cindy shared:
Eyob (neighbour at 32B) has created a website to connect facial recognition to criminal records (uses facial recognition to categorise mughsot, then displays Dutch criminal stats and similar faces in the DB)