Skip to content

Commit

Permalink
Merge pull request scrapinghub#122 from scrapinghub/doc-jdatetime-dep…
Browse files Browse the repository at this point in the history
…endencies

Documentation - calendars
  • Loading branch information
asadurski committed Oct 27, 2015
2 parents 70da9a7 + 4476241 commit 69baab7
Show file tree
Hide file tree
Showing 9 changed files with 175 additions and 15 deletions.
20 changes: 16 additions & 4 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -100,9 +100,21 @@ Before you submit a pull request, check that it meets these guidelines:
1. The pull request should include tests.
2. If the pull request adds functionality, the docs should be updated. Put
your new functionality into a function with a docstring, and add the
feature to the list in README.rst.
feature to the list in *README.rst*.
3. Check https://travis-ci.org/scrapinghub/dateparser/pull_requests
and make sure that the tests pass for all supported Python versions.
4. Follow the core developers' advices which aim to ensure code's consistency regardless of variety approaches used by many contributors.
5. In case, you are unable to continue working on a PR, please leave a short comment to notify us. We will be pleased to make any changes required to get it done.

4. Follow the core developers' advice which aim to ensure code's consistency regardless of variety of approaches used by many contributors.
5. In case you are unable to continue working on a PR, please leave a short comment to notify us. We will be pleased to make any changes required to get it done.

Guidelines for Adding New Languages
-----------------------------------
English is the primary language of the dateparser. Dates in all other languages are translated into English equivalents before they are parsed.
The language data required for parsing dates is contained in *data/languages.yml* file. It contains variable parts that can be used in dates, language by language: month and week names - and their abbreviations, prepositions, conjunctions and frequently used descriptive words and phrases (like "today").
The chosen data format is YAML because it is readable and simple to edit.
Language data is extracted per language from YAML with :class:`LanguageDataLoader` and validated before being put into :class:`Language` class.

Refer to :ref:`language-data-template` for details about its structure and take a look at already implemented languages for examples.
As we deal with the delicate fabric of interwoven languages, tests are essential to keep the functionality across them.
Therefore any addition or change should be reflected in tests.
However, there is nothing to be afraid of: our tests are highly parameterized and in most cases a test fits in one declarative line of data.
Alternatively, you can provide required information and ask the maintainers to create the tests for you.
49 changes: 42 additions & 7 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Features
* Generic parsing of dates in English, Spanish, Dutch, Russian and several other languages and formats.
* Generic parsing of relative dates like: ``'1 min ago'``, ``'2 weeks ago'``, ``'3 months, 1 week and 1 day ago'``.
* Generic parsing of dates with time zones abbreviations or UTC offsets like: ``'August 14, 2015 EST'``, ``'July 4, 2013 PST'``, ``'21 July 2013 10:15 pm +0500'``.
* Support for non-Gregorian calendar systems with the first addition of :class:`JalaliParser <dateparser.calendars.jalali.JalaliParser>`. See `Persian Jalali Calendar <https://en.wikipedia.org/wiki/Iranian_calendars#Zoroastrian_calendar>`_ for more information.
* Extensive test coverage.


Expand Down Expand Up @@ -100,13 +101,47 @@ Dependencies

`dateparser` translates non-English dates to English and uses dateutil_ module ``parser`` to parse the translated date.

Also, it requires PyYAML_ for its language detection module to work.
Also, it requires PyYAML_ for its language detection module to work. The module jdatetime_ is used for handling Jalali calendar.

.. _dateutil: https://pypi.python.org/pypi/python-dateutil
.. _PyYAML: https://pypi.python.org/pypi/PyYAML


Limitations
===========

* Limited language support.
.. _jdatetime: https://pypi.python.org/pypi/jdatetime


Supported languages
===================

* Arabic
* Belarusian
* Chinese
* Czech
* Dutch
* English
* Filipino
* French
* German
* Indonesian
* Italian
* Persian
* Polish
* Portuguese
* Romanian
* Russian
* Spanish
* Thai
* Turkish
* Ukrainian
* Vietnamese

Supported Calendars
===================
* Gregorian calendar

* Persian Jalali calendar

Example of Use for Jalali Calendar
==================================

>>> from dateparser.calendars.jalali import JalaliParser
>>> JalaliParser(u'جمعه سی ام اسفند ۱۳۸۷').get_date()
datetime.datetime(2009, 3, 20, 0, 0)
6 changes: 6 additions & 0 deletions dateparser/calendars/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
class CalendarBase(object):
"""Base setup class for non-Gregorian calendar system.
:param source:
Date string passed to calendar parser.
:type source: str|unicode
"""

def __init__(self, source):
self.source = source
Expand Down
2 changes: 2 additions & 0 deletions dateparser/calendars/jalali.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ def validate_time(string):


class JalaliParser(CalendarBase):
"""Calendar parser class for Jalali calendar."""

def __init__(self, source):
super(JalaliParser, self).__init__(source)
Expand Down Expand Up @@ -202,6 +203,7 @@ def search_time(self):
return time(0, 0)

def get_date(self):
"""Output method for Jalali calendar parser."""
jdate = self.search_persian_date(self.source)
gtime = self.search_time()
try:
Expand Down
21 changes: 21 additions & 0 deletions docs/dateparser.calendars.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
dateparser.calendars package
============================

Submodules
----------


dateparser.calendars.jalali module
----------------------------------

.. automodule:: dateparser.calendars.jalali
:members: JalaliParser
:show-inheritance:


Module contents
---------------
.. automodule:: dateparser.calendars
:members:
:show-inheritance:

1 change: 1 addition & 0 deletions docs/dateparser.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Subpackages
.. toctree::

dateparser.languages
dateparser.calendars

Submodules
----------
Expand Down
6 changes: 4 additions & 2 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,10 @@ Deploying a stable dateparser release:
**************************************


1) Then, use ``shub`` to install `python-dateutil`_ (we require at least 2.3 version) and `PyYAML`_ dependencies from `PyPI`_::
1) Then, use ``shub`` to install `python-dateutil`_ (we require at least 2.3 version), `jdatetime`_ and `PyYAML`_ dependencies from `PyPI`_::

shub deploy-egg --from-pypi python-dateutil YOUR_PROJECT_ID
shub deploy-egg --from-pypi jdatetime YOUR_PROJECT_ID
shub deploy-egg --from-pypi PyYAML YOUR_PROJECT_ID


Expand All @@ -57,6 +58,7 @@ Deploying a stable dateparser release:

.. _python-dateutil: https://pypi.python.org/pypi/python-dateutil
.. _PyYAML: https://pypi.python.org/pypi/PyYAML
.. _jdatetime: https://pypi.python.org/pypi/jdatetime
.. _PyPI: https://pypi.python.org/pypi


Expand Down Expand Up @@ -96,5 +98,5 @@ After that, you can upload the egg using `Scrapy Cloud's Dashboard interface
Dependencies
************

Similarly, you can download source and package `PyYAML <https://pypi.python.org/pypi/PyYAML>`_ and `dateutil <https://pypi.python.org/pypi/python-dateutil>`_ (version >= 2.3) as `eggs` and deploy them like above.
Similarly, you can download source and package `PyYAML <https://pypi.python.org/pypi/PyYAML>`_, `jdatetime <https://pypi.python.org/pypi/jdatetime>`_ and `dateutil <https://pypi.python.org/pypi/python-dateutil>`_ (version >= 2.3) as `eggs` and deploy them like above.

81 changes: 81 additions & 0 deletions docs/template.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
.. _language-data-template:

Language Data Template
----------------------

.. sourcecode:: none

two-letter language code as defined in ISO-639-1 (https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes). e.g. for English - en:
name: language name (e.g. English)
no_word_spacing: False (set to True for languages that do not use spaces between words)

skip: ["words", "to", "skip", "such", "as", "and", "or", "at"]

pertain: []

monday:
- name for Monday
- abbreviation for Monday
tuesday:
- as above
wednesday:
- as above
thursday:
- as above
friday:
- as above
saturday:
- as above
sunday:
- as above

january:
- name for January
- abbreviation for January
february:
- as above
march:
- as above
april:
- as above
may:
- as above
june:
- as above
july:
- as above
august:
- as above
september:
- as above
october:
- as above
november:
- as above
december:
- as above

year:
- name for year
- abbreviation for year
month:
- as above
week:
- as above
day:
- as above
hour:
- as above
minute:
- as above
second:
- as above

ago:
- words that stand
- for "ago"

simplifications:
- word: replacement
- regex: replacement
- day before yesterday: 2 days ago
4 changes: 2 additions & 2 deletions docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,10 @@ Once initialized, :func:`dateparser.date.DateDataParser.get_date_data` parses da
>>> ddp.get_date_data(u'13 Septiembre, 2014') # Spanish
{'date_obj': datetime.datetime(2014, 9, 13, 0, 0), 'period': u'day'}

.. warning:: It fails to parse *English* dates in the example below, because *Spanish* was detected and stored with the ``ddp`` instance::
.. warning:: It fails to parse *English* dates in the example below, because *Spanish* was detected and stored with the ``ddp`` instance:

>>> ddp.get_date_data('11 August 2012')
{'date_obj': None, 'period': 'day'}
{'date_obj': None, 'period': 'day'}


:class:`dateparser.date.DateDataParser` can also be initialized with known languages::
Expand Down

0 comments on commit 69baab7

Please sign in to comment.