From 6b6c41839dfdd5c076d4e77d35989c616151281d Mon Sep 17 00:00:00 2001
From: pedrojlazevedo <up201306026@fe.up.pt>
Date: Wed, 11 Mar 2020 00:19:59 +0000
Subject: [PATCH] Updating READ.ME. Grammar needs to be fixed

---
 README.md | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/README.md b/README.md
index 4267da7b..fc2fdb84 100644
--- a/README.md
+++ b/README.md
@@ -12,6 +12,9 @@ author={Reddy, Aniketh Janardhan and Rocha, Gil and Esteves, Diego},
 year={2018}
 }
 ```
+## WARNINGS:
+* Don't forget to check all *PATHS* in every script all the time
+* Don't run the RTE model before training the Random Forest model.
 
 ##RUN:
 1. Download Fever Data in fever-baselines/scripts/  
@@ -21,8 +24,39 @@ year={2018}
 2. Run split_wiki_indv_docs.py in order to increase search time. 
 This will read the raw that and create a file for each article.
 
+3. There is a need to setup at least one environment:
+    * Inside the RTE folder there is a READ.ME and a requirements file.
+    * The TF-IDF part of the article it comes within the fever-baselines. In that folder there 
+    is a requirements file. It's needed to setup the fever database and run the required scripts. 
+    * TF-IDF files are already available inside the folder data/ under the name "relevant_docs".
+    * To reproduce the TF-IDF files the following script needs to run. It is found
+    [here](https://github.com/DeFacto/DeFactoNLP/tree/master/fever-baselines#evidence-retrieval-evaluation)
+    inside the READ.ME of the fever-baselines
+   
+4. Levenshtein part and concatenation is achieved by run the script: predict.py. It will generate
+a file with all the documents and sentences and the concatenation part. The RTE model will also
+creates a file for every claim. Each file contains the probabilites of the RTE prediction
+ for every claim versus possible evidence.
+ 
+5. To label the claims, the Random Forest model is created by run the script train_label_classifier.py . It will generate a file that 
+is found in the folder predictions/.
+
+6. You can run metrics.py to generate an evaluation of the entire pipeline.
+
+## Train:
+
+This work used a subsample of the training data using the script: subsample_training_data.py
+  
+In order to train the RTE model, all the explanations are in the specific READ.ME inside folder rte/
+  
+To train the Random Forest just run the script. It will also generate the predictions.
+Comment what isn't needed.
+
+### Some numbers:
+
 number of empty articles    = 20431  
 number of files             = 5396106  
 number of lines             = 42041604  
 number of entities          = 167143495  
 number of articles w/out id = 11  
+