From a744fc2f3f37d8e0bc6ce0faae76d297346ce0e9 Mon Sep 17 00:00:00 2001
From: Sarah Oberbichler <66369271+soberbichler@users.noreply.github.com>
Date: Mon, 2 Dec 2024 02:34:54 +0100
Subject: [PATCH] Update module_5.html
---
modules/module_5.html | 114 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 114 insertions(+)
diff --git a/modules/module_5.html b/modules/module_5.html
index 8b13789..deba542 100644
--- a/modules/module_5.html
+++ b/modules/module_5.html
@@ -1 +1,115 @@
+
+
+
+
+
Module 4: Large Language Models for Article Extraction and Post-OCR Correction
+
+
Module 3 will be all about Large Language models, prompting techniques and two specific NLP taks: article extraction and OCR post-correction:
+
+ Large Language Models (LLMs) are artificial intelligence systems trained on massive text datasets that can process and generate human language based on statistical patterns they've learned. Based on the Transformer architecture introduced by Vaswear et al. in 2017, these models have demonstrated measurable success in tasks like text completion, translation, and answering questions by predicting likely next tokens in a sequence. Recent research has shown that increasing model size and training data generally improves performance on standard benchmarks, with models like GPT-4 achieving over 90% accuracy on many academic and professional tests (though these scores require careful interpretation). While LLMs have proven effective for many language tasks, controlled studies have documented significant limitations including factual inaccuracies, bias reflection, and inability to truly reason - they fundamentally operate through pattern matching rather than genuine understanding.
+
+
+
Preparation for Module 5:
+
+ -
+
Read the article listed under literature below and prepare for class discussion:
+
+ - Why are machine learning methods called "Black Boxes"?
+ - What does XAI stand for?
+ - What is a self-attention mechanism?
+ - Name a few methods to look into the "Black Box"
+ - Create at least one more entry in the Glossary
+
+
+
+
+
Literature:
+
+ Dobson, J.E. On reading and interpreting black box deep neural networks. Int J Digit Humanities 5, 431–449 (2023). https://doi.org/10.1007/s42803-023-00075-w
+
+
+
+
Notebooks we will use in class:
+
+
Download über API der DDB
+
+
+
+
+
+
+
Introduction to Transformers: What Can They Do?
+
+
+
+
+
+
Tranformers and Semantic Search
+
+
+
+
+
+
+
Workload (after class):
+
+ -
+
Try the semantic search for your own research question:
+
+ - Can you find new relevant keywords/articles?
+
+
+
+
+
+
Date and Time:
+
December 6, 2024 (10:00 AM to 11:30 AM)
+
+
+
+
+