Skip to content

Commit

Permalink
add to docs
Browse files Browse the repository at this point in the history
  • Loading branch information
daniel committed May 9, 2016
1 parent 826a2e9 commit bd3eecb
Showing 1 changed file with 25 additions and 11 deletions.
36 changes: 25 additions & 11 deletions rst_docs/corpkit.building.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,22 @@ Doing corpus linguistics involves building and interrogating corpora, and explor
Creating a new project
-----------------------

The simplest way to begin using corpkit is to import it and to create a new project. Projects are simply folders containing subfolders where corpora, saved results, images and dictionaries will be stored.
The simplest way to begin using corpkit is to import it and to create a new project. Projects are simply folders containing subfolders where corpora, saved results, images and dictionaries will be stored. The simplest way is to do it from `bash`, passing in the name you'd like for the project:

.. code-block:: python
.. code-block:: bash
>>> import corpkit
>>> corpkit.new_project('psyc')
$ new_project psyc
# move there:
$ cd psyc
# now, enter python and begin ...
This creates a new folder in the current directory called `psyc`. We can then move there:
Or, from Python:

.. code-block:: python
>>> import corpkit
>>> corpkit.new_project('psyc')
### move there:
>>> import os
>>> os.chdir('psyc')
>>> os.listdir('.')
Expand All @@ -35,26 +40,35 @@ This creates a new folder in the current directory called `psyc`. We can then mo
Adding a corpus
----------------

Now that we have a project, we need to add some plain-text data to the `data` folder. At the very least, this is simply a text file. Better than this is a folder containing a number of text files. Best, however, is a folder containing subfolders, with each subfolder containing one or more text files. These subfolders represent subcorpora.
Now that we have a project, we need to add some plain-text data to the `data` folder. At the very least, this is simply a text file. Better than this is a folder containing a number of text files. Best, however, is a folder containing subfolders, with each subfolder containing one or more text files. These subfolders represent subcorpora.

You can add your corpus to the `data` folder from the command line, or using Finder/Explorer if you prefer. Using `shutil`:
You can add your corpus to the `data` folder from the command line, or using Finder/Explorer if you prefer.

.. code-block:: bash
$ cp -R /Users/me/Documents/transcripts ./data
Or, in `Python`, using `shutil`:

.. code-block:: python
>>> import shutil
>>> shutil.copytree('/Users/me/Documents/transcripts', '.')
>>> shutil.copytree('/Users/me/Documents/transcripts', './data')
If you've been using `bash` so far, this is the moment when you'd enter `Python` and `import corpkit`.

Creating a Corpus object
-------------------------

Once we have a corpus of text files, we need to turn it into a Corpus object.
Once we have a corpus of text files, we need to turn it into a `Corpus` object.

.. code-block:: python
>>> from corpkit import Corpus
>>> unparsed = Corpus('data/psyc')
### you can leave out the 'data' if it's in there
>>> unparsed = Corpus('data/transcripts')
>>> unparsed
<corpkit.corpus.Corpus instance: psyc; 13 subcorpora>
<corpkit.corpus.Corpus instance: transcripts; 13 subcorpora>
This object can now be interrogated using the :func:`~corpkit.corpus.Corpus.interrogate` method:

Expand Down

0 comments on commit bd3eecb

Please sign in to comment.