-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
0c999b1
commit a744fc2
Showing
1 changed file
with
114 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,115 @@ | ||
|
||
<!DOCTYPE html> | ||
<html lang="en"> | ||
<head> | ||
<meta charset="UTF-8"> | ||
<meta name="viewport" content="width=device-width, initial-scale=1.0"> | ||
<title>Module 4 - NLP Course DMGK</title> | ||
<link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet"> | ||
<style> | ||
body { | ||
font-family: Arial, sans-serif; | ||
line-height: 1.6; | ||
color: #333; | ||
} | ||
.container { | ||
max-width: 800px; | ||
margin: 0 auto; | ||
padding: 20px; | ||
} | ||
h1, h2, h3 { | ||
color: #8B0000; | ||
} | ||
.notebook-link { | ||
display: inline-flex; | ||
align-items: center; | ||
gap: 10px; | ||
margin-top: 10px; | ||
} | ||
.back-button { | ||
position: fixed; | ||
top: 20px; | ||
left: 20px; | ||
z-index: 1000; | ||
} | ||
.citation { | ||
margin-top: 10px; | ||
padding-left: 20px; | ||
border-left: 3px solid #8B0000; | ||
} | ||
.task { | ||
margin-bottom: 15px; | ||
} | ||
</style> | ||
</head> | ||
<body> | ||
<a href="../index.html" class="btn btn-secondary back-button">← Go Back</a> | ||
|
||
<div class="container"> | ||
<h1>Module 4: Large Language Models for Article Extraction and Post-OCR Correction</h1> | ||
|
||
<p>Module 3 will be all about Large Language models, prompting techniques and two specific NLP taks: article extraction and OCR post-correction:</p> | ||
<ul> | ||
Large Language Models (LLMs) are artificial intelligence systems trained on massive text datasets that can process and generate human language based on statistical patterns they've learned. Based on the Transformer architecture introduced by Vaswear et al. in 2017, these models have demonstrated measurable success in tasks like text completion, translation, and answering questions by predicting likely next tokens in a sequence. Recent research has shown that increasing model size and training data generally improves performance on standard benchmarks, with models like GPT-4 achieving over 90% accuracy on many academic and professional tests (though these scores require careful interpretation). While LLMs have proven effective for many language tasks, controlled studies have documented significant limitations including factual inaccuracies, bias reflection, and inability to truly reason - they fundamentally operate through pattern matching rather than genuine understanding. | ||
</ul> | ||
|
||
<h3>Preparation for Module 5:</h3> | ||
<ol> | ||
<li> | ||
<p>Read the article listed under literature below and prepare for class discussion:</p> | ||
<ul> | ||
<li>Why are machine learning methods called "Black Boxes"?</li> | ||
<li>What does XAI stand for?</li> | ||
<li>What is a self-attention mechanism?</li> | ||
<li>Name a few methods to look into the "Black Box"</li> | ||
<li>Create at least one more entry in the Glossary</li> | ||
</ul> | ||
</li> | ||
</ol> | ||
|
||
<h3>Literature:</h3> | ||
<p class="citation"> | ||
Dobson, J.E. On reading and interpreting black box deep neural networks. Int J Digit Humanities 5, 431–449 (2023). <a href="https://doi.org/10.1007/s42803-023-00075-w" target="_blank">https://doi.org/10.1007/s42803-023-00075-w</a> | ||
</p> | ||
|
||
|
||
<h3>Notebooks we will use in class:</h3> | ||
<div class="notebook-link"> | ||
<p>Download über API der DDB</p> | ||
<a href="https://colab.research.google.com/github/ieg-dhr/NLP-Course4Humanities_2024/blob/main/Download_über_API_der_DDB.ipynb" target="_blank"> | ||
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Download_über_API_der_DDB.ipynb In Colab"/> | ||
|
||
</a> | ||
</div> | ||
<div class="notebook-link"> | ||
<p>Introduction to Transformers: What Can They Do?</p> | ||
<a href="https://colab.research.google.com/github/ieg-dhr/NLP-Course4Humanities_2024/blob/main/Transformers_what_can_they_do.ipynb" target="_blank"> | ||
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Introduction to Transformers in Colab"/> | ||
</a> | ||
</div> | ||
<div class="notebook-link"> | ||
<p>Tranformers and Semantic Search</p> | ||
<a href="https://colab.research.google.com/github/ieg-dhr/NLP-Course4Humanities_2024/blob/main/Transformers_SemantischSearch.ipynb" target="_blank"> | ||
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Transformers_SemantischSearch.ipynb In Colab"/> | ||
|
||
</a> | ||
</div> | ||
|
||
<h3>Workload (after class):</h3> | ||
<ol> | ||
<li> | ||
<p>Try the semantic search for your own research question:</p> | ||
<ul> | ||
<li>Can you find new relevant keywords/articles?</li> | ||
</ul> | ||
</li> | ||
</ol> | ||
|
||
|
||
<h3>Date and Time:</h3> | ||
<p>December 6, 2024 (10:00 AM to 11:30 AM)</p> | ||
</div> | ||
|
||
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js"></script> | ||
</body> | ||
</html> |