This repository makes word clowd from Japanese company's CSR report.
The application consists of 2 steps.
- PDF to text
- Make word cloud from text file
You have to use python3.
Install package from requirements.
pip install -r requirements.txt
Make text file from CSR report with PDFMiner.
First, you placed PDF File into the raw
directory.
Next, edit to parameters you want.
input_file = './data/raw/XXX.pdf'
interim_dir = './data/interim/XXX/'
processed_file = './data/processed/XXX.txt'
Text file is generated after executing the below command.
python pdf_to_text.py
Making word cloud on iPython notebook.
ipython notebook
Open wordcloud.ipynb
and edit the parameter of the first cell to the file name made from previous step.
file_name = './data/processed/XXX.txt'
After that, run all cells.