This repository is the official implementation of TILFA: A Unified Framework for Text, Image, and Layout Fusion in Argument Mining.
The paper is accepted to the Proceedings of the 10th Workshop on Argument Mining 2023.
A main goal of Argument Mining (AM) is to analyze an author's stance. Unlike previous AM datasets focusing only on text, the shared task at the 10th Workshop on Argument Mining introduces a dataset including both text and images. Importantly, these images contain both visual elements and optical characters. Our new framework, TILFA (A Unified Framework for Text, Image, and Layout Fusion in Argument Mining), is designed to handle this mixed data. It excels at not only understanding text but also detecting optical characters and recognizing layout details in images. Our model significantly outperforms existing baselines, earning our team, KnowComp, the 1st place in the leaderboard of Argumentative Stance Classification subtask in this shared task.
Python version is 3.7
requirements:
apex==0.9.10dev
boto3==1.28.10
botocore==1.31.10
datasets==2.3.2
detectron2==0.6+cu111
imbalanced_learn==0.10.1
imblearn==0.0
inflect==7.0.0
lxml==4.9.2
matplotlib==3.5.3
nltk==3.8.1
numpy==1.21.6
opencv_python==4.8.0.74
pandas==1.1.5
Pillow==9.5.0
Pillow==10.0.1
preprocessor==1.1.3
ptvsd==4.3.2
pytesseract==0.3.10
Requests==2.31.0
scikit_learn==1.0.2
spacy==2.2.1
stweet==2.1.1
tensorflow==2.14.0
textblob==0.17.1
timm==0.4.12
torch==1.10.0+cu111
torchvision==0.11.1+cu111
tqdm==4.65.0
transformers==4.12.5
tweet_preprocessor==0.6.0
websocket_client==1.6.3
You can install all requirements with the command
pip install -r requirements.txt
- training examples can be found in ./run.sh
python3 main_text_alltrain.py
--exp-dir=YOUR_EXPERIMENT_PATH
--num-epochs=25
--batch-size=16
--exp-mode=0
--data-mode=0
--lr=5e-6
--img-model=0
--text-model-name=microsoft/deberta-v3-large
python3 main_image_alltrain.py
--exp-dir=YOUR_EXPERIMENT_PATH
--num-epochs=25
--batch-size=16
--exp-mode=0
--data-mode=1
--lr=1e-6
--img-model=0
--text-model-name=microsoft/deberta-v3-large
python3 main_multimodality_alltrain.py
--exp-dir=YOUR_EXPERIMENT_PATH
--num-epochs=25
--batch-size=16
--exp-mode=0
--data-mode=2
--lr=1e-5
--img-model=1
--text-model-name=microsoft/deberta-v3-large
--use-pooler=0
--use-wordnet=1
python3 main_layoutlmv3_alltrain.py
--data_dir=./data
--output_dir=YOUR_EXPERIMENT_PATH
--do_train
--do_eval
--do_predict
--model_name_or_path=microsoft/layoutlmv3-base
--visual_embed
--num_train_epochs=25
--input_size=224
--learning_rate=1e-5
--per_gpu_train_batch_size=8
--per_gpu_eval_batch_size=8
--seed=22
--gradient_accumulation_steps=1
--text_model_name_or_path=microsoft/deberta-v3-large
python3 main_multimodality_layoutlmv3_alltrain.py
--data_dir=./data
--output_dir=/home/data/zwanggy/2023/image_arg_experiments
--do_train
--do_eval
--model_name_or_path=microsoft/layoutlmv3-base
--visual_embed
--num_train_epochs=25
--input_size=224
--learning_rate=1e-5
--per_gpu_train_batch_size=4
--per_gpu_eval_batch_size=4
--seed=22
--gradient_accumulation_steps=1
--text_model_name_or_path=microsoft/deberta-v3-large
--exp_mode=0
--use_wordnet=1
--use_pooler=0
--cross_attn_type=-1
predict_test_origin_text.py is for pure text predict_test_origin_image.py is for pure image predict_test_origin_multi.py is for original multimodality predict_test_layout.py is for pure layout predict_test_layout_multi.py is for layout multimodality
You should change the model name in the code to the one you want to predict with. Other parameters are consistent with the training part.
You should change the file name in the code to the one you want to process.
python3 final_submission.py
If you want to get the score across topic:
python3 get_evaluation.py
-f=YOUR_FILE_PATH
If you want to get the score within topic:
python3 get_evaluation_within_topic.py
-f=YOUR_FILE_PATH
--topic=choose one in [gun_control, abortion]
- code used to address data imbalance is in path ./data/TranslateDemo
- a stands for abortion, g stands for gun_control
- s stands for stance, p stands for persuasiveness
cd data/TranslateDemo
python3 TranslateDemo_a_s.py
- code used to do data augmentation is in path ./data/wordnet_augmentation
cd data/wordnet_augmentation
python3 preprocess_glossbert_input.py
python3 build_gloss_bert_input.py
cd GlossBERT
./run_WSD.sh
cd ..
python3 incorporate_score.py
@inproceedings{zong-etal-2023-tilfa,
title = "{TILFA}: A Unified Framework for Text, Image, and Layout Fusion in Argument Mining",
author = "Zong, Qing and
Wang, Zhaowei and
Xu, Baixuan and
Zheng, Tianshi and
Shi, Haochen and
Wang, Weiqi and
Song, Yangqiu and
Wong, Ginny and
See, Simon",
booktitle = "Proceedings of the 10th Workshop on Argument Mining",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.argmining-1.14",
doi = "10.18653/v1/2023.argmining-1.14",
pages = "139--147",
}