Skip to content

ChristophScn/TAT3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TAT3

This repository contains the source code for my master's thesis Large Language Model-Driven Data Enrichment.

Abstract

This thesis introduces the Table Augmentation via Text-to-Text Transfer (TAT3) framework that leverages foundational pretrained LLMs to perform subject suggestion. TAT3 suggests missing entities of an incomplete table that consists of rows representing entities. The system is inspired by recent advances in AI Assistants, specifically GitHub Copilot. With TAT3, we address the limitations of previous works that focus on subject suggestion having strict assumptions about the source of knowledge as well as the input and output format. TAT3 achieves similar results to established baselines on a standard subject suggestion benchmark. On the benchmark, a correct first suggestion is found for 73.0% of the queries, which is comparable to our best employed baseline from previous work that has stricter assumptions. For 89.1% of the queries, a correct suggestion is found within the first five suggestions. On average, 63.1% of the first five suggestions are correct. An offline optimization strategy increased TAT3's probability of finding a correct entity within the first suggestion by 6.05% relative to the unoptimized approach. Ensembling, used to address possible hallucinations of LLMs, increased the probability of finding a correct entity within the first suggestion by 3.95% relative to choosing the best LLM. Additionally, we demonstrate that TAT3 can adapt its output format of the results to the rows already present in a table.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published