ELMo: Deep Contextualized Word Representations

Description

This project implements an ELMo (Embeddings from Language Models) architecture from scratch using PyTorch. ELMo generates deep contextualized word embeddings through stacked Bi-LSTMs, capturing both the syntactic and semantic properties of words based on their context in a sentence. The model is pre-trained using a bidirectional language modeling objective and is further evaluated on a downstream text classification task.

Requirements

Language: Python
Framework: PyTorch
Dataset: AG News Classification Dataset
- Use the Description column for training word embeddings.
- Use the Label/Index column for the downstream classification task.

Features

1. ELMo Architecture

The core ELMo model consists of:
- Input embedding layer (optionally pretrained with Word2Vec).
- Stacked Bi-LSTM layers (2 layers).
- Trainable or fixed λs for combining word embeddings from different layers.

2. Model Pre-training

Pre-train the ELMo model using bidirectional language modeling. The forward and backward Bi-LSTM models predict the next word and the previous word, respectively, to capture contextual word representations.

3. Downstream Task: Text Classification

Evaluate the pretrained ELMo model on a 4-way text classification task using the AG News Dataset.

Hyperparameter Tuning

1. Trainable λs

Train the model with λs as trainable parameters.

2. Frozen λs

Randomly initialize λs and freeze them during training.

3. Learnable Function

Learn a custom function to combine the embeddings across different Bi-LSTM layers.

Evaluation Metrics

Accuracy
Precision, Recall, F1 Score
Confusion Matrix

These metrics are evaluated for both the pretraining and downstream classification tasks.

How to Run

1. Train ELMo Model

To pre-train the ELMo model on bidirectional language modeling:

python ELMO.py

2. Train Classifier for Downstream Task

To train the classifier using the word representations obtained from ELMo:

python classification.py

Pretrained Models

Bi-LSTM Model: bilstm.pt
Classifier Model: classifier.pt

You can either load these models directly from the directory or provide a link to download them from external storage (e.g., OneDrive).

Submission

Source Code:
- ELMO.py: Train the Bi-LSTM on the language modeling task.
- classification.py: Train the classifier on the downstream task using the Bi-LSTM embeddings.
Pretrained Models:
- bilstm.pt: Pretrained Bi-LSTM model.
- classifier.pt: Pretrained classifier model.
Report (PDF):
- Hyperparameters used for pretraining and the downstream task.
- Evaluation metrics (accuracy, F1, precision, recall).
- Analysis of results comparing ELMo with Word2Vec and SVD (from previous assignments).
README:
- Instructions on how to execute the code, load the pretrained models, and assumptions made during implementation.

Please upload the pretrained models to an external storage (OneDrive) and include the link here.

Analysis

Analyze the performance of the ELMo embeddings compared to Word2Vec and SVD on the downstream classification task.
Use evaluation metrics such as accuracy, F1, precision, recall, and confusion matrix to compare these models.
Discuss the impact of hyperparameter tuning on the performance, especially the role of λs.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Assignment 4_ ELMO.pdf		Assignment 4_ ELMO.pdf
Classification.py		Classification.py
ELMO.py		ELMO.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ELMo: Deep Contextualized Word Representations

Description

Requirements

Features

1. ELMo Architecture

2. Model Pre-training

3. Downstream Task: Text Classification

Hyperparameter Tuning

1. Trainable λs

2. Frozen λs

3. Learnable Function

Evaluation Metrics

How to Run

1. Train ELMo Model

2. Train Classifier for Downstream Task

Pretrained Models

Submission

Analysis

Resources

About

Releases

Packages

Languages

ss-369/ELMo-Deep-Contextualized-Word-Representations

Folders and files

Latest commit

History

Repository files navigation

ELMo: Deep Contextualized Word Representations

Description

Requirements

Features

1. ELMo Architecture

2. Model Pre-training

3. Downstream Task: Text Classification

Hyperparameter Tuning

1. Trainable λs

2. Frozen λs

3. Learnable Function

Evaluation Metrics

How to Run

1. Train ELMo Model

2. Train Classifier for Downstream Task

Pretrained Models

Submission

Analysis

Resources

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages