Skip to content

Commit

Permalink
Updating READ.ME. Grammar needs to be fixed
Browse files Browse the repository at this point in the history
  • Loading branch information
pedrojlazevedo committed Mar 11, 2020
1 parent b0a32f7 commit 6b6c418
Showing 1 changed file with 34 additions and 0 deletions.
34 changes: 34 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ author={Reddy, Aniketh Janardhan and Rocha, Gil and Esteves, Diego},
year={2018}
}
```
## WARNINGS:
* Don't forget to check all *PATHS* in every script all the time
* Don't run the RTE model before training the Random Forest model.

##RUN:
1. Download Fever Data in fever-baselines/scripts/
Expand All @@ -21,8 +24,39 @@ year={2018}
2. Run split_wiki_indv_docs.py in order to increase search time.
This will read the raw that and create a file for each article.

3. There is a need to setup at least one environment:
* Inside the RTE folder there is a READ.ME and a requirements file.
* The TF-IDF part of the article it comes within the fever-baselines. In that folder there
is a requirements file. It's needed to setup the fever database and run the required scripts.
* TF-IDF files are already available inside the folder data/ under the name "relevant_docs".
* To reproduce the TF-IDF files the following script needs to run. It is found
[here](https://github.com/DeFacto/DeFactoNLP/tree/master/fever-baselines#evidence-retrieval-evaluation)
inside the READ.ME of the fever-baselines

4. Levenshtein part and concatenation is achieved by run the script: predict.py. It will generate
a file with all the documents and sentences and the concatenation part. The RTE model will also
creates a file for every claim. Each file contains the probabilites of the RTE prediction
for every claim versus possible evidence.

5. To label the claims, the Random Forest model is created by run the script train_label_classifier.py . It will generate a file that
is found in the folder predictions/.

6. You can run metrics.py to generate an evaluation of the entire pipeline.

## Train:

This work used a subsample of the training data using the script: subsample_training_data.py

In order to train the RTE model, all the explanations are in the specific READ.ME inside folder rte/

To train the Random Forest just run the script. It will also generate the predictions.
Comment what isn't needed.

### Some numbers:

number of empty articles = 20431
number of files = 5396106
number of lines = 42041604
number of entities = 167143495
number of articles w/out id = 11

0 comments on commit 6b6c418

Please sign in to comment.