Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LFTM and empty lines #9

Open
pasqLisena opened this issue Oct 5, 2020 · 1 comment
Open

LFTM and empty lines #9

pasqLisena opened this issue Oct 5, 2020 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@pasqLisena
Copy link
Member

pasqLisena commented Oct 5, 2020

Ok basically LFTM uses gloves embeddings, when available, stripping out the words that are not included in the preprocessed embedding.

When a line does not include any word in the glove dictionary, it appears empty in the LFLDA.glove file.

In the training, the line is just ignored (rather than considered as "empty")
https://github.com/datquocnguyen/LFTM/blob/master/src/models/LFLDA.java#L173

The result is that there are more lines in the corpus than corpus predictions. This affects ground truth evaluation metrics

@pasqLisena pasqLisena self-assigned this Oct 5, 2020
@pasqLisena pasqLisena added the bug Something isn't working label Oct 5, 2020
@pasqLisena
Copy link
Member Author

Note about a possible workaround:

        with open(os.path.join(model_path, 'LFLDA.glove'),'r') as f:
            glove_corpus = [x.strip() for x in f.readlines()]
        empty_docs = [i for i, x in enumerate(glove_corpus) if len(x) < 1]
        for i in empty_docs:
            preds.insert(i,[(0,0)])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant