Authors: Sahil, Thomas, Ren
Training a Deep Learning classifier for a sentiment classifiction task. Pre-processing library see here.
- Modified pre-processing library to be able to load the dictionary word list from a zip file directly without unzipping if first.
- Added
<pad>
,<unknown>
to the top of our word list"index_arry.txt"
in pre-processing library.
-
Shuffled and splitted the dataset into 'train' is 85% of the data, 'dev' is 10% and 'eval' is 5% of the data.
-
Ran crawlers on each of them to created 3 tables to catalog.
-
Created Glue ETL job to map features. Code see
glue_my_job_2.py
. -
Ran the Glue ETL job on each one of them to create the 3 feature sets. Output json files see
eval_data/eval.json
,validation_data/dev.json
andtraining_data/train.json
.
Forked from https://github.com/pharnoux/columbia-aiops-training
-
Changed embeddings same as our word list mentioned before. Embedding file is oversize for github, see S3 bucket.
-
Built our model see folder
model_training
-
Ran the model locally. Successful result see below highlight line.