Doing NLTK and AI on Swiss Fachinfos with Python. Parsing all the important words from all FIs in Switzerland.
- List of stopwords in folder input (filename: stopwords.txt)
- Amiko sqlite DB in folder dbs (filename: amiko_db_full_idx_de.db)
- Create
dbs
dir and put the filesamiko_db_full_idx_de.db
andamiko_db_full_idx_fr.db
generated with cpp2sqlite there. - From
$SRC_DIR
run with/usr/local/bin/python3 smartinfo.py --lang=de
- Frequency csv file in folder output (filename: frequency.csv)
- Auto-generated stopwords file in folder output (filename: auto_stopwords.csv)
- pip install nltk, bs4, lxml
- import nltk
- nltk.download('stopwords','punkt')
brew tap sashkab/python
brew install python35
cd $HOME/software
wget https://bootstrap.pypa.io/get-pip.py
sudo /usr/local/opt/python35/bin/python3.5 $HOME/software/get-pip.py
sudo /usr/local/Cellar/python35/3.5.6_2/Frameworks/Python.framework/Versions/3.5/bin/pip3.5 install nltk
sudo /usr/local/Cellar/python35/3.5.6_2/Frameworks/Python.framework/Versions/3.5/bin/pip3.5 install bs4
sudo /usr/local/Cellar/python35/3.5.6_2/Frameworks/Python.framework/Versions/3.5/bin/pip3.5 install lxml
/usr/local/opt/python35/bin/python3.5
cd $SRC
mkdir dbs
in the Python interactive shell do import nltk
and then do nltk.download('stopwords')
and nltk.download('punkt')
then run /usr/local/opt/python35/bin/python3.5 smartinfo.py --lang=fr