New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

LFTM and empty lines #9

Open

pasqLisena opened this issue Oct 5, 2020 · 1 comment

Assignees

Labels

bug

Member

pasqLisena commented Oct 5, 2020 •

edited

Loading

Ok basically LFTM uses gloves embeddings, when available, stripping out the words that are not included in the preprocessed embedding.

When a line does not include any word in the glove dictionary, it appears empty in the LFLDA.glove file.

In the training, the line is just ignored (rather than considered as "empty")
https://github.com/datquocnguyen/LFTM/blob/master/src/models/LFLDA.java#L173

The result is that there are more lines in the corpus than corpus predictions. This affects ground truth evaluation metrics

The text was updated successfully, but these errors were encountered:

pasqLisena self-assigned this

pasqLisena added the bug label

Member Author

pasqLisena commented Oct 5, 2020

Note about a possible workaround:

        with open(os.path.join(model_path, 'LFLDA.glove'),'r') as f:
            glove_corpus = [x.strip() for x in f.readlines()]
        empty_docs = [i for i, x in enumerate(glove_corpus) if len(x) < 1]
        for i in empty_docs:
            preds.insert(i,[(0,0)])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment