AI training

Authors: Sahil, Thomas, Ren

Training a Deep Learning classifier for a sentiment classifiction task. Pre-processing library see here.

1. Pre Processing ETL

Modified pre-processing library to be able to load the dictionary word list from a zip file directly without unzipping if first.
Added <pad>, <unknown> to the top of our word list "index_arry.txt" in pre-processing library.

Shuffled and splitted the dataset into 'train' is 85% of the data, 'dev' is 10% and 'eval' is 5% of the data.
Ran crawlers on each of them to created 3 tables to catalog.
Created Glue ETL job to map features. Code see glue_my_job_2.py.
Ran the Glue ETL job on each one of them to create the 3 feature sets. Output json files see eval_data/eval.json, validation_data/dev.json and training_data/train.json.

Changed embeddings same as our word list mentioned before. Embedding file is oversize for github, see S3 bucket.
Built our model see folder model_training
Ran the model locally. Successful result see below highlight line.

Created a Notebook on SageMaker. Modified code to be able to load the data and the dictionary from S3.
Ran the code form Step #3 over there successfully.
Output result directly to S3.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
eval_data		eval_data
model_training		model_training
sentiment_model.h5/1		sentiment_model.h5/1
training_data		training_data
validation_data		validation_data
.gitignore		.gitignore
README.md		README.md
glue_my_job_2.py		glue_my_job_2.py
hwk4-jupyter.ipynb		hwk4-jupyter.ipynb
lecture_6_glue.py		lecture_6_glue.py
requirements.txt		requirements.txt