We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ok basically LFTM uses gloves embeddings, when available, stripping out the words that are not included in the preprocessed embedding.
When a line does not include any word in the glove dictionary, it appears empty in the LFLDA.glove file.
LFLDA.glove
In the training, the line is just ignored (rather than considered as "empty") https://github.com/datquocnguyen/LFTM/blob/master/src/models/LFLDA.java#L173
The result is that there are more lines in the corpus than corpus predictions. This affects ground truth evaluation metrics
The text was updated successfully, but these errors were encountered:
Note about a possible workaround:
with open(os.path.join(model_path, 'LFLDA.glove'),'r') as f: glove_corpus = [x.strip() for x in f.readlines()] empty_docs = [i for i, x in enumerate(glove_corpus) if len(x) < 1] for i in empty_docs: preds.insert(i,[(0,0)])
Sorry, something went wrong.
pasqLisena
No branches or pull requests
Ok basically LFTM uses gloves embeddings, when available, stripping out the words that are not included in the preprocessed embedding.
When a line does not include any word in the glove dictionary, it appears empty in the
LFLDA.glove
file.In the training, the line is just ignored (rather than considered as "empty")
https://github.com/datquocnguyen/LFTM/blob/master/src/models/LFLDA.java#L173
The result is that there are more lines in the corpus than corpus predictions. This affects ground truth evaluation metrics
The text was updated successfully, but these errors were encountered: