Skip to content

Commit

Permalink
docs: More readme
Browse files Browse the repository at this point in the history
  • Loading branch information
thorwhalen committed Jan 2, 2024
1 parent 220fcda commit c3d1f17
Showing 1 changed file with 23 additions and 3 deletions.
26 changes: 23 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,29 @@ Get a dict-like object to list and read the pdfs of a folder, as text:
>>> from pdfdol import PdfFilesReader
>>> from pdfdol.tests import get_test_pdf_folder
>>> folder_path = get_test_pdf_folder()
>>> s = PdfFilesReader(folder_path)
>>> sorted(s)
>>> pdfs = PdfFilesReader(folder_path)
>>> sorted(pdfs)
['sample_pdf_1', 'sample_pdf_2']
>>> assert s['sample_pdf_2'] == [
>>> assert pdfs['sample_pdf_2'] == [
... 'Page 1\nThis is a sample text for testing Python PDF tools.'
... ]

See that the values of a `PdfFilesReader` are lists of pages.
If you need strings (i.e. all the pages together) you can add a decoder like so:

```python
from dol import add_decoder
page_separator = '---------------------'
pdfs = add_decoder(pdfs, decoder=page_separator.join)
```

If you need this at the level of the class, just do this:

```python
from dol import add_decoder
page_separator = '---------------------'
FilesReader = add_decoder(PdfFilesReader, decoder=page_separator.join)
# and then
pdfs = FilesReader(folder_path)
# ...
```

0 comments on commit c3d1f17

Please sign in to comment.