You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I managed to get the code running on this repository and produce a copy of CCOHA. There was some technical stuff I had to sort out, which it would be useful to include in the README for future users. These include:
Clarifying the code runs on Python 2.7
Setting up a virtual environment for the packages (docopt, HTMLParser, nltk), and specifically mentioning nltk==3.4.5 to run this code.
There were also some basic code issues I had to fix, which might be helpful to note, or add another commit for. These include:
Setting all tabs to spaces
Renaming body to results in the function write_to_file(...)
Finally, there were a tiny number of files which the given processor failed to handle. The current code logs when this occurs and then sends a raise. Given the number of files in which this occurred (5? out of 100K), it didn't seem worth addressing this and just could be skipped over - but nonetheless the raise's needed to be commented out, which would have been helpful to've been noted.
The text was updated successfully, but these errors were encountered:
Hello,
I managed to get the code running on this repository and produce a copy of CCOHA. There was some technical stuff I had to sort out, which it would be useful to include in the README for future users. These include:
docopt
,HTMLParser
,nltk
), and specifically mentioningnltk==3.4.5
to run this code.There were also some basic code issues I had to fix, which might be helpful to note, or add another commit for. These include:
body
toresults
in the functionwrite_to_file(...)
Finally, there were a tiny number of files which the given processor failed to handle. The current code logs when this occurs and then sends a
raise
. Given the number of files in which this occurred (5? out of 100K), it didn't seem worth addressing this and just could be skipped over - but nonetheless theraise
's needed to be commented out, which would have been helpful to've been noted.The text was updated successfully, but these errors were encountered: