Add support for converting from Hijri calendar to undate and undate interval #107

rlskoeser · 2024-11-21T22:02:51Z

lark parser for Hijri dates - supports year, year month, and year month day
lark transformer that turns the lark parse tree into an undate objects
undate converter that connects the lark parser and transformer into the undate converter class
unit tests for all, borrowing examples from Princeton Geniza Project (PGP) and ISMI datasets

related to #10

improvements after initial review

improve documentation (with example usage) and include in sphinx docs
add a calendar property to undate (default gregorian)
initialize undate object with numeric year, month, day and add optional calendar (default gregorian)
use calendar to convert earliest/latest to gregorian for comparison; confirm date precision is set as expected
remove 3-digit year constraint in parser

Questions:

How reasonable is it to limit support to 3+ digit Hijri years? If we can make this assumption the parsing is much faster and more efficient, AND we can support both the year month day format in your ISMI data as well as the day month year in PGP data. ... but it might be an unreasonable limitation.
We don't yet have a notion of a precision or size for undate intervals, but this code suggests that it would be a useful option, for when the length of the interval is known
I'd like to write a notebook to test this logic against PGP dates, but prefer to do that in a subsequent PR - does that make sense, or would a demo notebook make it easier to review this code?

Summary by CodeRabbit

Release Notes

New Features
- Introduced HijriDateConverter for converting Hijri calendar dates to Gregorian dates.
- Added grammar rules for parsing Hijri dates, supporting various formats.
- Implemented HijriDateTransformer for transforming parsed Hijri dates into Undate or UndateInterval objects.
Bug Fixes
- Updated EDTF date parsing rules to include the 20th day of the month.
Tests
- Added comprehensive test suites for HijriDateConverter, hijri_parser, and HijriDateTransformer to ensure functionality and error handling.

coderabbitai · 2024-11-21T22:02:59Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The pull request introduces several changes, primarily focusing on the addition of the HijriDateConverter class and related components for converting Hijri calendar dates to Gregorian dates. The pyproject.toml file is updated to modify dependencies, including the addition of convertdate and a change in the lark dependency. New grammar specifications for parsing Hijri dates are also introduced, along with tests to validate the new functionality.

Changes

File Path	Change Summary
`pyproject.toml`	Updated dependencies: added `convertdate`, changed `lark` to `lark[interegular]`.
`src/undate/converters/calendars/__init__.py`	Added `HijriDateConverter` to `__all__` for public API.
`src/undate/converters/calendars/hijri/__init__.py`	Imported `HijriDateConverter` and updated `__all__` to include it.
`src/undate/converters/calendars/hijri/converter.py`	Introduced `HijriDateConverter` class with a `parse` method for converting Hijri dates.
`src/undate/converters/calendars/hijri/hijri.lark`	Added grammar rules for parsing Hijri dates, including definitions for years, months, and days.
`src/undate/converters/calendars/hijri/parser.py`	Introduced `parser.py` to initialize `hijri_parser` using `hijri.lark`.
`src/undate/converters/calendars/hijri/transformer.py`	Added `HijriDateTransformer` class for transforming Hijri date parse trees.
`src/undate/converters/edtf/edtf.lark`	Updated `day` rule regex to include the 20th day of the month.
`src/undate/converters/edtf/parser.py`	Changed import from `os.path` to `pathlib`, removed commented-out test cases.
`src/undate/undate.py`	Updated comments and type annotations in `UndateInterval` class.
`tests/test_converters/test_calendars/test_hijri/test_hijri_converter.py`	Added tests for `HijriDateConverter` functionality.
`tests/test_converters/test_calendars/test_hijri/test_hijri_parser.py`	Introduced tests for `hijri_parser` to validate various Hijri date formats.
`tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py`	Added tests for `HijriDateTransformer` class and its conversion logic.

Possibly related PRs

Use numpy datetime64 instead of datetime.date #77 #84: The changes in the main PR regarding the pyproject.toml file and its dependencies may relate to the updates in the numpy handling in PR Use numpy datetime64 instead of datetime.date #77 #84, as both involve modifications to dependencies and project configuration.
Consolidate setup.cfg and pyproject.toml; drop tox; revamp sphinx docs #88: The restructuring of the pyproject.toml file in the main PR aligns with the consolidation efforts in PR Consolidate setup.cfg and pyproject.toml; drop tox; revamp sphinx docs #88, which also focuses on streamlining project configuration and documentation.
Rename formatters submodule and classes to converters #101: The renaming of the "formatters" submodule to "converters" in PR Rename formatters submodule and classes to converters #101 is relevant as it reflects a broader change in the handling of date conversion functionalities, which may relate to the dependency updates in the main PR that could affect how these converters are utilized.

Suggested reviewers

ColeDCrawford
robcast
jdamerow

🐰 In the garden where dates bloom bright,
A converter hops with all its might.
From Hijri to Gregorian, it leaps with glee,
Transforming dates for you and me!
With tests in place, it’s sure to shine,
A joyful change, oh how divine! 🌼

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

rlskoeser · 2024-11-21T22:05:31Z

@coderabbitai review

coderabbitai · 2024-11-21T22:05:38Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 11

🧹 Outside diff range and nitpick comments (18)

tests/test_converters/test_calendars/test_hijri/test_hijri_converter.py (3)
7-12: Consider improving test method name and documentation.

The method name test_parse_ has a trailing underscore which is unusual. Consider renaming it to something more descriptive like test_parse_hijri_dates or test_parse_valid_dates.

Add a docstring to explain the test's purpose and the significance of the test dates (if any).

Example improvement:
-    def test_parse_(self):
+    def test_parse_hijri_dates(self):
+        """Test parsing of valid Hijri dates in different formats.
+        
+        Tests conversion of:
+        - Specific day (7 Jumādā I 1243)
+        - Month only (Rajab 495)
+        - Year only (441)
+        """
14-18: Add context about test data selection.

Consider adding a comment explaining why these specific dates were chosen for testing. Are they edge cases or examples from real data (e.g., from Princeton Geniza Project or ISMI datasets mentioned in PR objectives)?

20-24: Consider adding more year parsing test cases.

Given the PR's question about limiting support to Hijri years with three or more digits, consider adding test cases for:

Two-digit years (to verify they're rejected if that's the intended behavior)

Four-digit years

Edge cases around year boundaries
tests/test_converters/test_calendars/test_hijri/test_hijri_parser.py (3)
5-5: Enhance the test scope documentation.

The current comment could be more descriptive about what constitutes a "valid date" and what aspects of parsing are being tested.

Consider expanding the comment to:
-# for now, just test that valid dates can be parsed
+# For now, we only verify that the parser can handle valid Hijri dates without errors.
+# TODO: Add tests for:
+# - Invalid date validation
+# - Edge cases (e.g., leap years)
+# - Error messages for malformed inputs
7-46: Consider adding more comprehensive test cases.

While the current test cases cover various format variations well, consider adding:

Boundary cases:

Minimum/maximum valid years

Edge cases for different months (e.g., months with varying days)

Invalid inputs:

Invalid day numbers (e.g., 31st in a 30-day month)

Invalid month names

Malformed strings

Here's a suggested addition to the test cases:
# Add to testcases list:
    # boundary cases
    "1 Muḥarram 1",  # minimum year
    "29 Dhū al-Ḥijjah 9999",  # maximum year
    
    # more complex formats
    "early Ramaḍān 903",  # with period qualifier
    "mid-Shawwāl 1042",  # with period qualifier and hyphen
1-51: Consider adding test fixtures and helper functions.

To improve maintainability and reduce duplication as more test cases are added, consider:

Creating fixtures for common test data

Adding helper functions for result validation

Organizing test cases into classes by format type

Example structure:
@pytest.fixture
def valid_hijri_dates():
    return {
        "year_only": ("521", {"year": 521}),
        "month_year": ("Muḥarram 900", {"year": 900, "month": 1}),
        # ... more test cases with expected results
    }

class TestHijriParser:
    def test_year_format(self, valid_hijri_dates):
        date_string, expected = valid_hijri_dates["year_only"]
        result = hijri_parser.parse(date_string)
        self._assert_parsed_result(result, expected)

    @staticmethod
    def _assert_parsed_result(result, expected):
        for key, value in expected.items():
            assert getattr(result, key) == value
src/undate/converters/calendars/hijri/hijri.lark (3)
10-12: Consider adding upper bound for year validation.

The current pattern /\d{3,}/ allows unlimited digits, which could accept invalid years. Consider:
-year: /\d{3,}/
+year: /\d{3,4}/  // Assuming max 4 digits for reasonable Hijri years
Also, consider adding a comment documenting the expected year range for the Hijri calendar.

28-28: Optimize day pattern for better readability.

The current day pattern can be simplified:
-day: /[1-9]/ | /[12][0-9]/ | /30/
+day: /([1-9]|[12][0-9]|30)/
33-51: Enhance month patterns for consistency and robustness.

Several improvements could be made to the month patterns:

Use consistent pattern style (regex vs quotes):
-month_7: "Rajab"
+month_7: /Rajab/
Consider adding case-insensitive flag for better flexibility:
-month_1: /al-Mu[ḥh]arram/ | /Mu[ḥh]arram/
+month_1: /(?i)al-mu[ḥh]arram/ | /(?i)mu[ḥh]arram/
Add comments for each variant to document the source/reason for inclusion.
src/undate/converters/edtf/edtf.lark (1)

Line range hint 1-100: Clarify the relationship between EDTF and Hijri date parsing

Given that this PR aims to add Hijri calendar support, please clarify:

Will the EDTF grammar serve as a base for Hijri date parsing?

How will the parsing logic handle the differences between EDTF and Hijri date formats?

Consider adding documentation comments at the top of this file to explain its role in the broader date parsing architecture.
src/undate/converters/calendars/hijri/transformer.py (4)
7-9: Enhance class docstring with more details.

Consider expanding the docstring to include:

Input format expectations

Return value details

Examples of usage
 class HijriDateTransformer(Transformer):
-    """Transform a Hijri date parse tree and return an Undate or
-    UndateInterval."""
+    """Transform a Hijri date parse tree into Gregorian dates.
+
+    Transforms parsed Hijri dates into either a single Undate (for complete dates)
+    or an UndateInterval (for partial dates).
+
+    Args:
+        items: Parse tree containing year, month, and/or day nodes
+
+    Returns:
+        Undate: For complete dates with year, month, and day
+        UndateInterval: For partial dates missing month or day
+
+    Example:
+        >>> transformer = HijriDateTransformer()
+        >>> tree = parser.parse("450 AH")
+        >>> result = transformer.transform(tree)
+    """
28-29: Consider implementing the TODO for date precision.

The TODO raises a valid point about precision for partial dates. This could be important for academic or historical research purposes.

Would you like me to help design and implement the precision feature for UndateInterval? This could include:

Adding a precision enum (YEAR, MONTH, DAY)

Extending UndateInterval to store precision

Updating the transformation logic to set appropriate precision

32-35: Remove commented-out code.

The commented-out year method adds unnecessary noise. If it's not needed, remove it.
-    # this does nothing
-    # def year(self, items):
-    #     return Tree(data="year", children=[items[0]])
45-47: Add more comprehensive calendar constants.

Consider adding more calendar-specific constants for better maintainability.
 MIN_MONTH, MIN_DAY = 1, 1
 MAX_MONTH = 12
+MAX_DAY = 30  # Maximum days in Islamic calendar month
+MIN_YEAR = 1  # Minimum valid Hijri year
+MAX_YEAR = 9999  # Maximum supported year
tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py (1)

7-42: Consider adding more edge cases to the test suite.

The current test cases are well-structured and include real-world examples with proper documentation. However, consider adding these edge cases:

Invalid month names

Out-of-range day values

Edge cases around Islamic calendar year boundaries

Different variations of month name spellings
pyproject.toml (1)
12-12: Consider adding version constraints for dependencies.

While the project is in Pre-Alpha, it's good practice to specify version constraints to ensure reproducible builds and prevent compatibility issues:
-dependencies = ["lark[interegular]", "numpy", "convertdate"]
+dependencies = [
+    "lark[interegular]>=1.1.5",
+    "numpy>=1.24.0",
+    "convertdate>=2.4.0"
+]
src/undate/undate.py (1)
450-452: Enhance the TODO comment with implementation details

The TODO suggestion about adding precision/length field using DatePrecision is valuable, especially for Hijri date conversions. Consider enhancing the comment to specify:

Whether the precision should be stored per interval or inherited from constituent Undates

How precision would affect interval comparisons and duration calculations

Examples of precision usage in Hijri date conversion scenarios

Here's a suggested enhancement for the TODO comment:
-    # TODO: let's think about adding an optional precision / length /size field
-    # using DatePrecision
+    # TODO: Consider adding an optional precision field using DatePrecision to:
+    # 1. Track precision of the interval (independent or derived from Undate precision)
+    # 2. Support precise interval comparisons and duration calculations
+    # 3. Handle varying precision levels in calendar conversions (e.g., Hijri dates)
src/undate/converters/calendars/hijri/converter.py (1)

47-48: Consider implementing reverse conversion functionality

Supporting conversion from an Undate or UndateInterval back to a Hijri date could enhance the utility of the HijriDateConverter. This feature would allow users to display dates in the Hijri calendar format when needed.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c83ed49 and 778c67b.

📒 Files selected for processing (13)

pyproject.toml (1 hunks)
src/undate/converters/calendars/__init__.py (1 hunks)
src/undate/converters/calendars/hijri/__init__.py (1 hunks)
src/undate/converters/calendars/hijri/converter.py (1 hunks)
src/undate/converters/calendars/hijri/hijri.lark (1 hunks)
src/undate/converters/calendars/hijri/parser.py (1 hunks)
src/undate/converters/calendars/hijri/transformer.py (1 hunks)
src/undate/converters/edtf/edtf.lark (1 hunks)
src/undate/converters/edtf/parser.py (1 hunks)
src/undate/undate.py (1 hunks)
tests/test_converters/test_calendars/test_hijri/test_hijri_converter.py (1 hunks)
tests/test_converters/test_calendars/test_hijri/test_hijri_parser.py (1 hunks)
tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py (1 hunks)

✅ Files skipped from review due to trivial changes (3)

src/undate/converters/calendars/init.py
src/undate/converters/calendars/hijri/parser.py
src/undate/converters/edtf/parser.py

🧰 Additional context used

🪛 Ruff (0.7.0)

src/undate/converters/calendars/hijri/converter.py

45-45: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (10)

src/undate/converters/calendars/hijri/__init__.py (2)

3-3: LGTM! Clean public API declaration.

The __all__ declaration explicitly defines the public API, following Python best practices.

1-1: LGTM! Verify the import path.

The import statement follows Python package conventions and aligns with the PR objective.

Let's verify the existence and location of the imported module:

✅ Verification successful

Import path verified and correctly structured

The import statement is valid as:

The HijriDateConverter class exists in src/undate/converters/calendars/hijri/converter.py
The import path follows the correct package structure

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the existence and location of the HijriDateConverter module

# Check if the converter module exists
fd --type f "converter.py" --exec grep -l "class HijriDateConverter"

# Verify the import path structure
fd --type d "undate/converters/calendars/hijri"

Length of output: 169

tests/test_converters/test_calendars/test_hijri/test_hijri_converter.py (1)

1-6: LGTM! Clean imports and class structure.

The imports are appropriate and the test class follows proper naming conventions.

src/undate/converters/calendars/hijri/hijri.lark (2)

1-2: LGTM: Standard whitespace handling configuration.

The whitespace configuration is correctly set up using Lark's common whitespace import.

1-53: Verify grammar integration and test coverage.

Please ensure that:

All date formats are covered in the test suite
Edge cases are handled (e.g., invalid dates, boundary years)
Integration with the converter handles all grammar productions correctly

✅ Verification successful

Let me gather more specific information about the test coverage and integration.

Let me search for the converter implementation one more time with a different approach.

Let me check the converter implementation to complete the verification.

Grammar integration and test coverage are comprehensive

The verification shows:

Extensive test coverage in test_hijri_parser.py covering all date formats:
- Single year (e.g., "521")
- Month-year with variants (e.g., "al-Muḥarram 900", "Safar 581")
- Day-month-year (e.g., "7 Jumādā I 1243")
- Year-month and year-month-day formats
Proper integration with converter system:
- HijriDateConverter properly implements parsing through the grammar
- Transformer correctly handles all date formats and precision levels
- Error handling for invalid inputs is implemented
- Calendar-specific labeling is maintained
Edge cases are handled:
- Different month name variants (with/without accents)
- Valid day ranges (1-30)
- Three-digit year assumption is properly enforced
- Invalid input detection via UnexpectedCharacters exception

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for test files and patterns
echo "Searching for test files..."
fd -e py test_ -x grep -l "hijri" {}

# Look for test patterns in the codebase
echo "Checking test patterns..."
rg "test.*hijri" -A 5

# Verify converter integration
echo "Checking converter integration..."
ast-grep --pattern 'class $CONVERTER {
  $$$
  parse($INPUT) {
    $$$
  }
  $$$
}'

Length of output: 498

Script:

#!/bin/bash
# Check test file contents for coverage of different date formats
echo "Checking test coverage..."
cat ./tests/test_converters/test_calendars/test_hijri/test_hijri_parser.py

# Check transformer implementation
echo "Checking transformer..."
cat ./tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py

# Look for the actual converter implementation
echo "Looking for converter class..."
fd -e py -x grep -l "HijriConverter" {}

Length of output: 3628

Script:

#!/bin/bash
# Look for converter implementation using ripgrep
echo "Searching for converter implementation..."
rg -l "class.*HijriConverter"

# Check imports to find converter location
echo "Checking imports..."
rg "from.*hijri.*converter.*import"

# Look for any file containing 'hijri' in the converter directory
echo "Checking converter directory..."
fd -e py . src/undate/converters/calendars/hijri/

Length of output: 663

Script:

#!/bin/bash
# Check the converter implementation
echo "Checking converter implementation..."
cat src/undate/converters/calendars/hijri/converter.py

# Check the parser implementation
echo "Checking parser implementation..."
cat src/undate/converters/calendars/hijri/parser.py

Length of output: 2421

tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py (3)

1-6: LGTM! Well-organized imports.

The imports are logically grouped and include all necessary components for testing the Hijri date transformer.

53-54: Consider adding precision validation for UndateInterval.

The precision check is only performed for Undate objects, but UndateInterval instances should also maintain precision information.

Let's verify if UndateInterval supports precision:

1-56: Verify test coverage for the Hijri transformer implementation.

The test suite provides good coverage of the basic functionality, but let's verify the coverage metrics.

✅ Verification successful

Test coverage for the Hijri transformer is comprehensive

The test suite provides excellent coverage of the HijriDateTransformer implementation:

Tests cover all public methods (hijri_date and month)
Includes test cases for all three precision levels (day, month, year)
Verifies both single dates (Undate) and date intervals (UndateInterval)
Contains real-world examples from Princeton Geniza Project and ISMI data
Validates correct date conversions against external reference (muqawwim.com)

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for any untested public methods in the transformer
ast-grep --pattern 'class HijriDateTransformer {
  $$$
  def $method($$$) {
    $$$
  }
  $$$
}'

Length of output: 98

Script:

#!/bin/bash
# Let's first locate the transformer implementation file
fd "transformer.py" --type f

# Then examine its content and the HijriDateTransformer class
rg -A 5 "class HijriDateTransformer" 

# Also check test coverage using test file patterns
fd "test.*transformer.py" --type f

Length of output: 951

Script:

#!/bin/bash
# Let's examine the full implementation of HijriDateTransformer
cat src/undate/converters/calendars/hijri/transformer.py

# And check all test cases in the test file
cat tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py

Length of output: 4241

pyproject.toml (1)

12-12: LGTM! Dependencies align with PR objectives.

The updated dependencies support the new Hijri calendar conversion functionality:

lark[interegular] provides enhanced parsing capabilities for Hijri date formats
convertdate enables calendar conversion operations

src/undate/undate.py (1)

445-447: LGTM: Type annotations are clear and consistent

The explicit type annotations for class attributes using Union[Type, None] are well-defined and maintain compatibility with Python versions pre-3.10. They align well with the class usage and will improve type checking for the new Hijri date conversion feature.

coderabbitai · 2024-11-21T22:08:36Z

tests/test_converters/test_calendars/test_hijri/test_hijri_converter.py

+    def test_parse_error(self):
+        # a string we can't parse should raise an error
+        with pytest.raises(ValueError):
+            HijriDateConverter().parse("January 2, 1991")
+        # empty string should also error
+        with pytest.raises(ValueError):
+            HijriDateConverter().parse("")


🛠️ Refactor suggestion

Enhance error handling test coverage.

Consider adding more error test cases:

Malformed Hijri dates (e.g., invalid month names)

Out-of-range values (e.g., invalid day numbers)

Incomplete dates (e.g., month without year)

Mixed calendar formats

Example additions:

def test_parse_error_cases(self): """Test various error cases for Hijri date parsing.""" invalid_cases = [ "13 Jumādā I 1243", # Invalid day "Foo 495", # Invalid month "Rajab", # Incomplete date "1243/5/7", # Wrong format ] for invalid_date in invalid_cases: with pytest.raises(ValueError): HijriDateConverter().parse(invalid_date)

coderabbitai · 2024-11-21T22:08:36Z

tests/test_converters/test_calendars/test_hijri/test_hijri_converter.py

+import pytest
+from undate.converters.calendars import HijriDateConverter
+from undate.undate import Undate, UndateInterval
+
+
+class TestHijriDateConverter:
+    def test_parse_(self):
+        # day
+        date_str = "7 Jumādā I 1243"
+        date = HijriDateConverter().parse(date_str)
+        assert date == Undate(1827, 11, 26)
+        assert date.label == f"{date_str} {HijriDateConverter.calendar_name}"
+
+        # month
+        date_str = "Rajab 495"
+        date = HijriDateConverter().parse(date_str)
+        assert date == UndateInterval(Undate(1102, 4, 28), Undate(1102, 5, 27))
+        assert date.label == f"{date_str} {HijriDateConverter.calendar_name}"
+
+        # year
+        date_str = "441"
+        date = HijriDateConverter().parse(date_str)
+        assert date == UndateInterval(Undate(1049, 6, 11), Undate(1050, 5, 31))
+        assert date.label == f"{date_str} {HijriDateConverter.calendar_name}"
+
+    def test_parse_error(self):
+        # a string we can't parse should raise an error
+        with pytest.raises(ValueError):
+            HijriDateConverter().parse("January 2, 1991")
+        # empty string should also error
+        with pytest.raises(ValueError):
+            HijriDateConverter().parse("")


💡 Codebase verification

Test coverage needs expansion for ISMI dataset formats

The current test cases don't fully cover the date formats found in the ISMI dataset. Specifically:

Missing test for Muharram month format (e.g., "6. Muḥarram 888")

No tests for dates with diacritical marks (e.g., "Ḏu al-Ḥijjaẗ")

No tests for year ranges in Islamic calendar

No tests handling the ±2 days ambiguity mentioned in ISMI data

The test file should be expanded to include these real-world use cases from the ISMI dataset to ensure robust handling of actual manuscript dates.

🔗 Analysis chain

Verify test coverage against real-world data.

The PR objectives mention examples from Princeton Geniza Project (PGP) and ISMI datasets. Let's verify if the current test cases cover the date formats found in these datasets.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Search for Hijri date patterns in the codebase to find more test cases # Look for files that might contain example dates from PGP and ISMI datasets # Search for potential test data files fd -e json -e csv -e txt -e md . | grep -i -E 'test.*data|pgp|ismi|hijri|dates' # Search for Hijri date patterns in these files rg -i "hijri|jumada|rajab|ramadan"

Length of output: 5539

Script:

#!/bin/bash # Let's check the ISMI sample data to understand the date formats cat examples/use-cases/ismi/data/ismi-om4-date-samples.json # Also check the ISMI README for any documentation about date formats cat examples/use-cases/ismi/README.md

Length of output: 5394

tests/test_converters/test_calendars/test_hijri/test_hijri_parser.py

src/undate/converters/calendars/hijri/hijri.lark

src/undate/converters/edtf/edtf.lark

src/undate/converters/calendars/hijri/transformer.py

tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py

src/undate/converters/calendars/hijri/converter.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

robcast · 2024-11-22T13:34:41Z

Should a month or year in another calendar be represented in a single Undate or should it always be a UndateInterval?

I would find it more elegant if non-Gregorian months or years could also be represented as a single Undate. Does that make other Undate functions too complex? The default constructor assumes Gregorian and it would have to be changed.

Does this change what precision=year means? Is it too weird to have an Undate "year" based on the Gregorian calendar where the Gregorian text representation does not align on a year? Maybe we should have a "non-gregorian" flag for cases when you really care about this.

rlskoeser · 2024-11-22T13:56:43Z

Should a month or year in another calendar be represented in a single Undate or should it always be a UndateInterval?

I would find it more elegant if non-Gregorian months or years could also be represented as a single Undate. Does that make other Undate functions too complex? The default constructor assumes Gregorian and it would have to be changed.

Does this change what precision=year means? Is it too weird to have an Undate "year" based on the Gregorian calendar where the Gregorian text representation does not align on a year? Maybe we should have a "non-gregorian" flag for cases when you really care about this.

@robcast these are fantastic questions and I like where you are going with it. I had been intending to add a calendar property to Undate so that we could be explicit about what calendar the date is in. Can you help me sketch out what it would look like and how it would operate? I agree that this would be much more elegant and powerful.

Here's what I'm thinking based on what you're proposing: when parsing a Hijri date, instead of converting to Gregorian before creating Undate objects, we initialize the Undate with numeric year, month, day values and a calendar. This allows us to keep them a single Undate instead of an UndateInterval, and also preserves the date precision logic. There are lots of benefits to this approach. If we do that, then would we calculate the earliest/latest dates in the Hijri calendar or Gregorian? Or have options for both? Or maybe the comparison logic needs to be calendar aware and we only do the conversion when it matters or is explicitly requested? I'm leaning towards this last option, but let me know what you think!

rlskoeser · 2024-11-22T15:10:05Z

decisions from discussion with @robcast :

revise this code to test out the new approach
- add a calendar property to undate (default gregorian)
- initialize undate object with numeric year, month, day and add optional calendar (default gregorian)
- when initializing with hijri calendar, convert to gregorian for earliest/latest to compare
- confirm date precision is set as expected
3-digit year is not a reasonable constraint; choose a format (day month year) and document that constraint

rlskoeser · 2024-11-27T00:26:42Z

@robcast I've updated this PR based on the approach we discussed. I'm really happy with how it's working and how powerful this is going to be. I went ahead and made a minimal Gregorian calendar class for consistency with the pieces of information that could be calendar specific.

To see if this will scale, I took a first pass at implementing a calendar converter for the Hebrew Anno Mundi based on the Hijri implementation and with reference to my previous geniza project code. PR for that is here, if you want to see how it builds on this: #108

rlskoeser added 6 commits November 21, 2024 14:57

Preliminary hijri date parser

a2dfae6

Test all Hijri months; assume 3+ digit years and use LALR parser

ed23f6c

Clean up edtf parser code (remove old test case comments)

646f739

Add transformer for hijri parser to convert parsed date to undate

51850cc

Rename test directories & files to be consistent & explicit

50f2331

Add an undate converter to wire in hijri date parsing capability

778c67b

rlskoeser requested a review from robcast November 21, 2024 22:05

Tell mypy to ignore that convertdate code is untyped

99c0611

coderabbitai bot reviewed Nov 21, 2024

View reviewed changes

rlskoeser and others added 7 commits November 21, 2024 17:08

Merge branch 'develop' into feature/convert-hijri

4a7a1d8

Merge branch 'develop' into feature/convert-hijri

454382f

Clean up one more date and add more possible todos

315ad7a

Update src/undate/converters/calendars/hijri/transformer.py

18c8f25

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update src/undate/converters/edtf/edtf.lark

f3ce58b

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update src/undate/converters/calendars/hijri/converter.py

11cc007

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Add more error cases for EDTF and Hijri parser tests

2cc596e

rlskoeser added 8 commits November 22, 2024 15:45

Add calendar field to Undate object

0aac63a

Partial refactor: initialize hijri dates as undate with hijri calendar

e2444ed

Use calendar converter to get max month/day and convert to gregorian

3aa462b

Generate iso format date from native calendar date, not earliest/latest

fe41545

Include calendar name in undate repr

3a43e6d

Support and test comparing undates across calendars

7c9ccb7

Work around StrEnum not being in python until 3.11

b6b6376

Allow any Hijri year (drop 3+ digit year constraint and year-month-day)

e91b7ba

rlskoeser force-pushed the feature/convert-hijri branch from f5429cd to e91b7ba Compare November 26, 2024 22:27

rlskoeser added 2 commits November 26, 2024 17:35

Confirm hijri dates + partially unknown date behavior

6c6f09a

Add calendar converter base class and document how to add calendars

5cc19fd

Fix mis-formatted docstring

5660fa2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for converting from Hijri calendar to undate and undate interval #107

Add support for converting from Hijri calendar to undate and undate interval #107

rlskoeser commented Nov 21, 2024 •

edited

Loading

coderabbitai bot commented Nov 21, 2024 •

edited

Loading

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

rlskoeser commented Nov 21, 2024

coderabbitai bot commented Nov 21, 2024

coderabbitai bot left a comment

coderabbitai bot Nov 21, 2024

coderabbitai bot Nov 21, 2024

robcast commented Nov 22, 2024

rlskoeser commented Nov 22, 2024

rlskoeser commented Nov 22, 2024 •

edited

Loading

rlskoeser commented Nov 27, 2024

Add support for converting from Hijri calendar to undate and undate interval #107

Are you sure you want to change the base?

Add support for converting from Hijri calendar to undate and undate interval #107

Conversation

rlskoeser commented Nov 21, 2024 • edited Loading

improvements after initial review

Questions:

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Nov 21, 2024 • edited Loading

Review skipped

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

rlskoeser commented Nov 21, 2024

coderabbitai bot commented Nov 21, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 21, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 21, 2024

Choose a reason for hiding this comment

robcast commented Nov 22, 2024

rlskoeser commented Nov 22, 2024

rlskoeser commented Nov 22, 2024 • edited Loading

rlskoeser commented Nov 27, 2024

rlskoeser commented Nov 21, 2024 •

edited

Loading

coderabbitai bot commented Nov 21, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

rlskoeser commented Nov 22, 2024 •

edited

Loading