From 64c9e30389f476c0ce9b881f2d4aafac18d7943f Mon Sep 17 00:00:00 2001 From: daniel Date: Fri, 29 Apr 2016 18:22:18 +0200 Subject: [PATCH] tell the reader that corenlp can be auto downloaded --- README.md | 2 +- index.rst | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 3bb96a93..6d40c260 100755 --- a/README.md +++ b/README.md @@ -413,7 +413,7 @@ Again, common sense dictates what is possible. When searching trees, only trees, ## Building corpora -*corpkit*'s `Corpus()` class contains `parse()` and `tokenise()`, modest methods for created parsed and/or tokenised corpora. The main thing you need is **a folder, containing either text files, or subfolders that contain text files**. If you want to parse the corpus, you'll also need to have downloaded and unzipped [Stanford CoreNLP](http://nlp.stanford.edu/software/corenlp.shtml). If you're tokenising, you'll need to make sure you have NLTK's tokeniser data. You can then run: +*corpkit*'s `Corpus()` class contains `parse()` and `tokenise()`, methods for created parsed and/or tokenised corpora. The main thing you need is **a folder, containing either text files, or subfolders that contain text files**. [Stanford CoreNLP](http://nlp.stanford.edu/software/corenlp.shtml) is required to parse corpora. If you don't have it, *corpkit* can download and install it for you. If you're tokenising, you'll need to make sure you have NLTK's tokeniser data. You can then run: ```python >>> unparsed = Corpus('path/to/unparsed/files') diff --git a/index.rst b/index.rst index 1c3ecdd4..d850a0f9 100644 --- a/index.rst +++ b/index.rst @@ -95,7 +95,7 @@ via Git: cd corpkit python setup.py install -Parsing and interrogation of parse trees will also require *Stanford CoreNLP*. +Parsing and interrogation of parse trees will also require *Stanford CoreNLP*. *corpkit* can download and install it for you automatically. .. rubric:: Graphical interface