Presentation and source code for Jeremy Nelson's workshop, Visualizing and Training Sinopia Linked Data with Pandas, spaCy, and PyTorch at the 02021 and 02022 LD4 Conference.
This workshop provides participants the opportunity to explore Sinopia's Linked Data resources through a machine-learning lens.
-
Creating Panda's Dataframes of RDF resources within a Jupyter notebook along with different ways to analyze and visualize the triples and graphs accessed through Sinopia's API.
-
Taking the Siniopia dataframes and using a custom spaCy Named Entity Recognition (NER) pipeline, automatically tagging descriptions with FAST subject headings.
-
Use HuggingFace transformers for NER and summarization pipelines for Linked Data.
-
Apply FastAI and PyTorch to the dataframes for further exploration by the participants.
-
Introduce Model Cards and Data Statements to start addressing problematic bias in both machine learning models and the underlying data.