Logs

Notes from conversations and meetings and to self.
Gems found along the way waiting to be further sorted or put to use.

Jan 15

Call with the SABIO stakeholders:

Example tool by Sudox, apparently marks potentially problematic words and computes an overall sentence score based on the ratio of such words.
Words Matter
Mahendra Mahey, manager of the British Library Labs

Forwarded by Marieke:

Google Article on addressing gender bias in Google Translate
the CRAPL License for software & related resources written in academic contexts

Mrinalini has mentioned:

Humane AI is a meta-project between UvA's Humanities and Science faculties
she's a part of CREATE, supervised by prof Noordegraaf
she TAed a course of prof Jeurgens

Julia has mentioned:

Society for Social Studies of Science's Conference called "Good Relations: Practices and Methods in Unequal and Uncertain Worlds" (Deadline 8th March)

Jan 20

Chat with Corey:

Lukas Koster: Group leader of library & information systems at UvA -> Corey can establish a connection
AI4LAM (e.g. their GitHub or Google Sites), a group on AI for Libraries, Archives and Museums -> Corey is a member and will keep his eyes open for us, relatedly, Marieke mentioned LODLAM a group on Linked Open Data for LAM
Conference organised by AI4LAM
there was a workshop at NeurIPS 2020 named Navigating the Broader Impacts of AI Research -> found after searching for the ethics debate at NeurIPS 2020 (see e.g. this Nature article)
the Algorhtmic Justice League, garnered a lot of attention following their exposal of racial bias in facial recognition systems
Article by ProPublica on racial bias in criminal prediction software used in the US
FAccT Conference ACM Conference on Fairness, Accountability, and Transparency -> Corey mentioned this a potential place for networking
DCode Network, project for 'design as a way to open pathways toward inclusive and sustainable futures' -> probably interesting for the user-interface side of SABIO

Jan 27

Meeting with Richard van Alphen from Tropenmuseum:

item descriptions are curated when digitised; provenance trail is however available
item titles are digitised without change, so the their time of creation likely matches the item's indicated year
will talk to his people about getting access to their DB to be able to query it directly and get back to us

SABIO/AI:CULT Kick-off meeting (meeting notes):

Jesse de Vos and Johan from B&G are part of CaptureBias; he had the following notes:
- what about the bias that manifests in the gaps of the museums' collections? bias as gaps?
- they found it useful to speak of framing rather than of bias and to distinguish thematic and episodic framing
Markus Bakenhol, postdoc in ethnology at Meertens, is a domain expert
Johan mentioned the Inward Outward symposium as an outlet and place to get in touch with others
Johan asked if we could find any published benchmarks or competitions (perhaps at Kaggle?), or even publish one ourselves? => benchmarking is a good point!
the Europeana Challenge which we'll potentially go for (shared document?)
abstract to be submitted at the LIBER Conference; this shared doc contains the abstract

Jan 28

Mattia's (cultural analyst) intuition, depending on definitions of course, is that bias is opposed polyvocality: bias is nurtured by the lack of voices and conversely put, a diversity of voices makes bias unlikely to prevail
=> Marieke agrees with the gist of this position this article from an initiative could be interesting

The Portrait of a Lady: Close and distant reading of media gender bias: abstract for a paper by Laura Hollink on quantifying (binary) gender bias in Dutch newspaper; the approach takes advantage of the lexicalised marking of gender in Dutch (hij/zij) and measures bias as the degree to which an algorithm can predict gender; predicition is not done on the pure text but on features extracted based on previous conceptual work

Jan 29

Marieke mentioned ARIAS, a 'Platform for Research through the Arts and Sciences', that has grants for work sitting at the interesection of art and science

there is the Stanford Encyclopedia of Philosophy, it has entries e.g. on Feminist Epistemology, race, implicit bias => really good reference, commonly used by philosophers and very accessible

Feb 01

Found the term "Marrons" in the collection RDF (URI https://hdl.handle.net/20.500.11840/206868) -> interesting first task: track the term and its usage across the graph, find adjacent terms, etc

Feb 02

Andrei has mentioned From Cartography To Cookbooks, "a web of Dutch colonialism" and online exhibition which shines a light on issues such as race and gender in the context of colonialism with maps and cookbooks as media (Allard Pierson page for the exhibition)

Feb 03

Victor has mentioned this blog post by MIT - yet another example of racist and sexist image processing AI

Feb 04

Cindy mentioned:

'facets' (of which there are 5) categorise terms in the thesaurus into semantic function -> could be seen as 'prescribed' associations
the term 'Bosneger' (an old, derogatory label for 'Marron') is in the Words Matter publication -> compare associations from there with those from DB
the AAT (Arts and Achritecture Thesaurus) by Getty
- thesauruses made made entirely from Western museum perspectives -> bias is deep
- could be a valuable source of shared knowledge (Wereldculturen thesaurus is rather specific)
histories of museums' objects after acquisition could provide an useful 'grounding facts' (represent some of the materialised, as opposed to linguistic aspects of bias) -> e.g. royal objects end up in Rijksmuseum, others in the heritage museums
relogious bias as interesting case?
Rembrandt labelled as painting, buddhist work labelled as decorative art -> clustering based on properties (i.e. 'semantics') should reveal
'Rapanui' (a natively used term) not in DB, Rapa Nui and Paaseiland are -> labels are difficult even if not obviously discriminatory

Feb 11

Europeana has a Python API for its database

Mattia has mentioned Johannes Fabian (UvA Emeritus), specifically his 'Time and the Other'
-> to quote Mattia: 'Different situated notions of time could constitute another vector of bias'; idea: use the fact that Dutch marks time, i.e. look at verb tenses

Feb 12

CollectionsAsData, lead by Thomas Padilla, a project on computational methods in libraries
-> could contain useful resources or links, e.g. Statement on treating collections as data which we're actually doing as well
-> could be an interesting group to get in touch with
Research Center for Material Culture, part of Museum van Wereldculturen
the KB:
- Example data sets for download
- Delpher: search interface to the KB's collection (books, newspapers, magazines)
- Demo on Delpher: interactive word analysis and visualisation tools
  -> the KB's collections are massive and entirely text
  -> can provide useful background knwowledge/context for MvW's collection items (perhaps matche by time and/or topic)
  -> could be leveraged as training/test data
  -> Delpher itself or the Delpher demo could: give hints for how to query databases and/or be an object of study in the sense that they might reveal bias -> in any case, looking at methodologies/implementations there could be useful
The Real Face of White Australia, a data story that uses historical government records about non-European Australians; this page/project is the outcome of this chapter in the book Seeing the Past with Computers: Experiments with Augmented Reality and Computer Vision for History

Feb 15

SemEval2020 Task 12 on the identification of offensive language coul be an interesting test case

Feb 18

~~Andrei mentioned:

Zotero, a library management tool; we could all share our libraries through that~~

Feb 20

through an INDELAB connection found this blog post about decolonising AI
-> cound contain useful pointers

Feb 23

books on how current AI reinforces biases and inequalities and how to do it differently:

Data Feminism -> could have an interesting (inherent) connection to Feminist Theory
Race After Technology: Abolitionist Tools for the New Jim Code -> author has many publications about racial issues in the world of modern technology
Algorithms of Oppression

Feb 24

Johan put us in touch with someone who's working on a similar project which deals with biases in meta-data of heritage collections

Feb 27

Niels ten Oever mentioned:

Sorting Things Out, a book about classification as a social practice
Internet Daemons, book about social practices and phenomena around internet technologies
The Digital Sublime, book on the myths revolving around the digital world and how power structures are reinforced through (rather than fulfulling their promise to break them); Wikipedia article about the phenomenon
Fully AUtomated Luxury Communism, a manifesto for how digital technologies could lead society into a better world
Frictie - Ethiek in Tijden van Dataisme, book about ethics in the time of big data -> insteresting because the author is Dutch -> could invite her at some point

scholar search on social identity & categorisation returned interesting-looking results

March 02

CulturalAI Meeting:

Wereldculturen data:
- Jacco mentioned that Wereldculturen should eventually have their own data exposure process for researchers (& others) -> make sure that Cindy is aware that SABIO is building essentially that, so that they could perhaps use some of it (+ the process we went through to get the data)

March 04

SABIO meeting:

Cindy mentioned:
- she's working on the Pressing Matter project
- metaphor of a funnel for the program -> related to my own thoughts on bias detection as search, but a nicer metaphor
- bias as absence: the systematic absence of people in the data or the absence of fine-grained attributes for people is an instance of bias -> choice of words can indicate that, too (e.g. the choice of identifier for a person, cf. 'a man has a name') => this is closely related to, if not the same as, silencing
MVP: limit use cases to profressionals
Marieke mentioned a nice idea: heat maps on the collection/subgraphs/etc to direct users' attention in a non-binary way, to visualise/uncover patterns -> talk to Werner about this
user should be able to input cues -> not only: detect bias in a given text/object/collection, but also: find everything in a given collection that is similar to a given cue

March 05

a stamp to brand slaves with - interesting because the description "neutral"

March 08

Jesse mentioned Philo van Kemenade, who works for Beeld & Geluid (and who is in an AI4GLAM task force at Europeana)
Jesse mentioned Tobias Blanke recent professor at UvA and ILLC, who works epistemological implications of AI (and generally the interface of philosophy and computer science)
Mrinalini mentioned Michel-Rolph Trouillot who conceived the term 'silencing' (most notably in his book Silencing the Past: Power and the Production of History (1995))

March 09

Jelle mentioned that he is part of a project that has something to do with bias detection?
Marieke mentioned this tweet for the website (where the Q&A is on)
Jacco mentioned the FAccT Conference which has interesting papers

March 10

meeting with Richard:

CollectionConnection is the tool, the NMVW used to convert their databases to RDF
-> Richard will share the schema they used for the conversion -> can we maybe use the schema to do the conversion ourselves?
the Objects table has a field title, but the table ObjectTitles was created since objects can have multiple titles (either replacing each other or living side-by-side) -> the table TitleType contains information that can/could allow to reconstruct a version order of the titles
Richard thinks that the procedures from the database to ML-ready input could be interesting for future and general use -> potentially make processing scripts and procedures reusable for publication

March 11

meeting with the Goethe Inistute (of Finnland and of NL):

website of Artificially Correct, where they address bias in (machine) translation
poco.lit a Berlin-based platform for postcolonial literature -> collaborators, have written articles for them
Workshop for translation practitioners on 23 & 24 April
Hackathon planned to develop tools to reduce/detect biases in MT at some point in autumn

Marieke mentioned Nexus Linguarum, a platform to promote synergies between European linguistic data science practitioners

meeting with Vendela (university homepage & personal website):

is part of the project The Politics of Metadata
shared her slides on her investigation into the representation of Sámi heritage in a (which?) Swedish museum (attached in an email)
the Politics of Metadata project is a part of the Metadata Culture research group

March 18

Cindy mentioned: Decolonize the Museum Conference by FramerFramed (there's also a document on Decolonizing Museums which could be a valuable resource)
Jelle Zuidema is part of the Bias Barometer, to quote: "We explore the relationship between what we read on (social) media and the effects on our (stereotypical) beliefs and actions."

March 26

Julia mentioned a talk on The Logic of Decoloniality by Jonathan Chimakonam, who does philosphical research on decolonising research (see links for papers); the point of such research is that in the tripartition of content, method and foundation, the foundation needs to be decolonised alongside content and method (which is what is usually focussed on)
=> this line provides a good guideline for how the field (cultural AI) as a whole should evolve towards

March 29

Meeting with Andrei & Ryan:

idea: correlate sentiment of words with their contentiousness (as labelled in ConConCOr) -> can answer the question: 'is sentiment a good predictor of whether a word is contentious?' -> perhaps use BERTje's word embeddings for sentiment (or sentiment analysis)
idea: do the analyses of semantic change from Jurafsky's paper on semantic change in the context of the ConConCor -> does contentiousness correlate with factors of semantic change? can we predict contentiousness from semantic change?
idea: phrase annotation task for ConConCor in terms of the participants themselves: "how comfortable would you feel saying this word/sentence in public/private/in your head?" / "would you feel hurt if someone said this word/sentence to you?" -> contentiousness is an emotional matter -> getting people's embodied perspective is necessary
this video talks about re-designing Bayes' theorem into: O(D|+) = O(D) * P(+|D)/P(+| not D), where D is the RV of whether or not a disease is present and + is the RV of whether a given test was positive =>

April 01

Meeting with Marieke:

her PhD student (Philipp) tried stereotype detection on KGs
possible publication at the CLARIN Conference (3-4 page abstract due on April 14):
- the infrastructure/procedure from the Wereldculturen database to data set for cultural AI/AI4GLAM (use-case: bias detection system in colonial contexts)
- procedure and analysis of questionnaires for heritage professionals: defining the tasks and approach of the professionals in order to automate and enhance with ML
idea: concordance: get concordances of words (word pairs): for a given word (based on PMO, or other measures), find and expose the other contexts it occurs in; there is also statistical measure which measures concordance
paper by Jacco and others: model transparency through interface and presentation and empirical study of its impact when historians work with ML

April 02

Meeting with the bias B.Sc. project:

idea: identify extra-linguistic variables about object (region/culture/etc), then correlate them with for instance sentiment analysis of the description
e.g.: group objects by culture, then do sentiment analysis on their titles/descriptions and correlate; typicality could help as a concept, examples could be extracted
BERTje ([paper, code) & RobBERT (paper, code) are Dutch transformer LMs, also available on Huggingface

April 13

Seminar by TU Wien Digital Humanities, recording: Hi Ryan, Valentin and Andrea,

The Meertens Institute will have a staff meeting next Monday between 10 am and 11 am. This is a quarterly (casual) meeting in which we catch up with each other. We always introduce new employees in this meeting, and Antal and I would very much like to invite you to come this Monday. It would be great if you could introduce yourself briefly during this gathering and tell something about your work in the HuC. Would you be willing to do this?

Best wishes,

Simone

Sally Wyatt, at KNAW on feminist history & stances
prof Hinda Haned professor on data science, on defining bias

April 20

Johan is organising a EuropeanaTech X CulturalAI lab event; the agenda contains many interesting resources on decolonial approaches, practices and problems in the museum, e.g.:

a list of decolonial resources by by the Museums Association
this talk about colonial meta-data, which linked to this group's homepage (where the talk's speaker is the leader)

April 21

Chat with Senka:

this person (instagram) in the non-binary community has a strong social media presence
same for this person (instagram)
Senka also has friend (instagram) who does workshops and art around language and gender might be interested to collaborate

April 29

SABIO meeting:

Werner shared a project of his

April 30

Andrei shared this Medium post about 'visualising whose stories are missing'

May 06

Jesse pointed to:

a tool for inspecting word pairs, very basic

someone who participated in the questionnaire is part of the LGBTI Heritage Ogranisation (IHLIA)

May 09

Julia keeps mentioning:

standpoint theory (proposed a.o. by Sandra Harding (who has been affiliated with the UvA), as a formal philosophical basis for definitions of bias

May 11

Marieke forwarded (Jelle retweeted):

this tweet about Information Gain and this primer on information theory; this paper was also mentioned in the thread -> these concepts could be useful for dealing with bias as absence
a blog post about AI experts comparing current ML to alchemy

meeting with Marieke:

Black Archives
Nijmegen Afrika Museum?
Imagine IC
IHLIA
perhaps a focussed workshop for non-ninary people/on hetero-normative biases in collections around gay pride in Aug?

Oskar mentioned:

this work-in-progress book

Julia Noordegraaf sent an email about a conference with a speaker from MIT's Data + Feminism Lab

May 14

Martijn mentioned:

the National Arcvhies have historical language in their collections and are aware that that might contain undesirable language (explanation page)

May 15

Marieke mentioned:

The Cultural Life of Machine Learning, An Incursion into Critical AI Studies

May 19

Saskia pointed to Rijksmuseum's new exhibition on slavery, co-curated by Valika Smeulders

WORKSHOP:

Wayne:
- why 'bias' instead of e.g. 'racism'?
- even the word 'human' is biased (indicated anthropocentric bias) -> probably means: bias is everywhere
Hodan:
- bias navigation should be disruptive: disrupt the ways we search and find information in collections

May 20

Cindy forwarded Fantastic Futures 21 Call for Abstracts, due June 15th

RCMC's webinar's speaker Wendy Hui Kyong Chun has written interesting books

Nishant Shah

May 24

ARK - MU presentation:

Angelique (Director of MU Eindhoven) mentioned Documenting complexity (funded by NOW, carried out at RUG, in collab with B&G)
Roosje has the Mnemosyne Bilderatlas by Aby Warburg => created a system for visually -remembering things together (=association) AND also a system for organising archives

May 26

Paul has mentioned crowdtruth.org, a source of papers on how source ground truth and deal with inter-annotator disagreement

Saskia has mentioned a presentation about trust and utility in heritage LOD

Marieke hsa mentioned Google People + Ai Research

May 31

Cindy shared:

Africa Stereotype Scanner

June 12

Eyob (neighbour at 32B) has created a website to connect facial recognition to criminal records (uses facial recognition to categorise mughsot, then displays Dutch criminal stats and similar faces in the DB)

GitHub; Database of crimal records

Files

logs.md

Latest commit

History