You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Hwan. Great job. Was very impressed with your code. I do have some concerns about the outcome, and would like you to debug your work using the following process. As always, tell me if this helps.
Read through this Towards Data Science medium article. I think it gives you a good background into the method of TF-IDF.
Verify that the lemma-ization does what you expect. (why is infect and infection lemma-ized to different words?)
I would suggest you apply the tokenize & lemma functions return expected results from some common variants you see in the data.
Apply spellcheck with spacy prior to lemma-ization
Re-implement TF-IDF iteratively testing on smaller corpus (i.e try to do your process with just 1 sentence, then 10 sentences, then 40 sentences, etc. The TF-IDF formula can be calculated by hand by just counting the words. Can you replicate the small number outcomes)
After this, we are going to find a way to create a 'word cloud'. Please review the wordcloud python package by next week so that you can be ready to produce a word cloud of your tf-idf outcomes!
The text was updated successfully, but these errors were encountered:
Hi Hwan. Great job. Was very impressed with your code. I do have some concerns about the outcome, and would like you to debug your work using the following process. As always, tell me if this helps.
I would suggest you apply the tokenize & lemma functions return expected results from some common variants you see in the data.
After this, we are going to find a way to create a 'word cloud'. Please review the wordcloud python package by next week so that you can be ready to produce a word cloud of your tf-idf outcomes!
The text was updated successfully, but these errors were encountered: