-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for converting from Hijri calendar to undate and undate interval #107
base: develop
Are you sure you want to change the base?
Conversation
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe pull request introduces several changes, primarily focusing on the addition of the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
🧹 Outside diff range and nitpick comments (18)
tests/test_converters/test_calendars/test_hijri/test_hijri_converter.py (3)
7-12
: Consider improving test method name and documentation.
- The method name
test_parse_
has a trailing underscore which is unusual. Consider renaming it to something more descriptive liketest_parse_hijri_dates
ortest_parse_valid_dates
.- Add a docstring to explain the test's purpose and the significance of the test dates (if any).
Example improvement:
- def test_parse_(self): + def test_parse_hijri_dates(self): + """Test parsing of valid Hijri dates in different formats. + + Tests conversion of: + - Specific day (7 Jumādā I 1243) + - Month only (Rajab 495) + - Year only (441) + """
14-18
: Add context about test data selection.Consider adding a comment explaining why these specific dates were chosen for testing. Are they edge cases or examples from real data (e.g., from Princeton Geniza Project or ISMI datasets mentioned in PR objectives)?
20-24
: Consider adding more year parsing test cases.Given the PR's question about limiting support to Hijri years with three or more digits, consider adding test cases for:
- Two-digit years (to verify they're rejected if that's the intended behavior)
- Four-digit years
- Edge cases around year boundaries
tests/test_converters/test_calendars/test_hijri/test_hijri_parser.py (3)
5-5
: Enhance the test scope documentation.The current comment could be more descriptive about what constitutes a "valid date" and what aspects of parsing are being tested.
Consider expanding the comment to:
-# for now, just test that valid dates can be parsed +# For now, we only verify that the parser can handle valid Hijri dates without errors. +# TODO: Add tests for: +# - Invalid date validation +# - Edge cases (e.g., leap years) +# - Error messages for malformed inputs
7-46
: Consider adding more comprehensive test cases.While the current test cases cover various format variations well, consider adding:
- Boundary cases:
- Minimum/maximum valid years
- Edge cases for different months (e.g., months with varying days)
- Invalid inputs:
- Invalid day numbers (e.g., 31st in a 30-day month)
- Invalid month names
- Malformed strings
Here's a suggested addition to the test cases:
# Add to testcases list: # boundary cases "1 Muḥarram 1", # minimum year "29 Dhū al-Ḥijjah 9999", # maximum year # more complex formats "early Ramaḍān 903", # with period qualifier "mid-Shawwāl 1042", # with period qualifier and hyphen
1-51
: Consider adding test fixtures and helper functions.To improve maintainability and reduce duplication as more test cases are added, consider:
- Creating fixtures for common test data
- Adding helper functions for result validation
- Organizing test cases into classes by format type
Example structure:
@pytest.fixture def valid_hijri_dates(): return { "year_only": ("521", {"year": 521}), "month_year": ("Muḥarram 900", {"year": 900, "month": 1}), # ... more test cases with expected results } class TestHijriParser: def test_year_format(self, valid_hijri_dates): date_string, expected = valid_hijri_dates["year_only"] result = hijri_parser.parse(date_string) self._assert_parsed_result(result, expected) @staticmethod def _assert_parsed_result(result, expected): for key, value in expected.items(): assert getattr(result, key) == valuesrc/undate/converters/calendars/hijri/hijri.lark (3)
10-12
: Consider adding upper bound for year validation.The current pattern
/\d{3,}/
allows unlimited digits, which could accept invalid years. Consider:-year: /\d{3,}/ +year: /\d{3,4}/ // Assuming max 4 digits for reasonable Hijri yearsAlso, consider adding a comment documenting the expected year range for the Hijri calendar.
28-28
: Optimize day pattern for better readability.The current day pattern can be simplified:
-day: /[1-9]/ | /[12][0-9]/ | /30/ +day: /([1-9]|[12][0-9]|30)/
33-51
: Enhance month patterns for consistency and robustness.Several improvements could be made to the month patterns:
- Use consistent pattern style (regex vs quotes):
-month_7: "Rajab" +month_7: /Rajab/
- Consider adding case-insensitive flag for better flexibility:
-month_1: /al-Mu[ḥh]arram/ | /Mu[ḥh]arram/ +month_1: /(?i)al-mu[ḥh]arram/ | /(?i)mu[ḥh]arram/
- Add comments for each variant to document the source/reason for inclusion.
src/undate/converters/edtf/edtf.lark (1)
Line range hint
1-100
: Clarify the relationship between EDTF and Hijri date parsingGiven that this PR aims to add Hijri calendar support, please clarify:
- Will the EDTF grammar serve as a base for Hijri date parsing?
- How will the parsing logic handle the differences between EDTF and Hijri date formats?
Consider adding documentation comments at the top of this file to explain its role in the broader date parsing architecture.
src/undate/converters/calendars/hijri/transformer.py (4)
7-9
: Enhance class docstring with more details.Consider expanding the docstring to include:
- Input format expectations
- Return value details
- Examples of usage
class HijriDateTransformer(Transformer): - """Transform a Hijri date parse tree and return an Undate or - UndateInterval.""" + """Transform a Hijri date parse tree into Gregorian dates. + + Transforms parsed Hijri dates into either a single Undate (for complete dates) + or an UndateInterval (for partial dates). + + Args: + items: Parse tree containing year, month, and/or day nodes + + Returns: + Undate: For complete dates with year, month, and day + UndateInterval: For partial dates missing month or day + + Example: + >>> transformer = HijriDateTransformer() + >>> tree = parser.parse("450 AH") + >>> result = transformer.transform(tree) + """
28-29
: Consider implementing the TODO for date precision.The TODO raises a valid point about precision for partial dates. This could be important for academic or historical research purposes.
Would you like me to help design and implement the precision feature for UndateInterval? This could include:
- Adding a precision enum (YEAR, MONTH, DAY)
- Extending UndateInterval to store precision
- Updating the transformation logic to set appropriate precision
32-35
: Remove commented-out code.The commented-out
year
method adds unnecessary noise. If it's not needed, remove it.- # this does nothing - # def year(self, items): - # return Tree(data="year", children=[items[0]])
45-47
: Add more comprehensive calendar constants.Consider adding more calendar-specific constants for better maintainability.
MIN_MONTH, MIN_DAY = 1, 1 MAX_MONTH = 12 +MAX_DAY = 30 # Maximum days in Islamic calendar month +MIN_YEAR = 1 # Minimum valid Hijri year +MAX_YEAR = 9999 # Maximum supported yeartests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py (1)
7-42
: Consider adding more edge cases to the test suite.The current test cases are well-structured and include real-world examples with proper documentation. However, consider adding these edge cases:
- Invalid month names
- Out-of-range day values
- Edge cases around Islamic calendar year boundaries
- Different variations of month name spellings
pyproject.toml (1)
12-12
: Consider adding version constraints for dependencies.While the project is in Pre-Alpha, it's good practice to specify version constraints to ensure reproducible builds and prevent compatibility issues:
-dependencies = ["lark[interegular]", "numpy", "convertdate"] +dependencies = [ + "lark[interegular]>=1.1.5", + "numpy>=1.24.0", + "convertdate>=2.4.0" +]src/undate/undate.py (1)
450-452
: Enhance the TODO comment with implementation detailsThe TODO suggestion about adding precision/length field using
DatePrecision
is valuable, especially for Hijri date conversions. Consider enhancing the comment to specify:
- Whether the precision should be stored per interval or inherited from constituent Undates
- How precision would affect interval comparisons and duration calculations
- Examples of precision usage in Hijri date conversion scenarios
Here's a suggested enhancement for the TODO comment:
- # TODO: let's think about adding an optional precision / length /size field - # using DatePrecision + # TODO: Consider adding an optional precision field using DatePrecision to: + # 1. Track precision of the interval (independent or derived from Undate precision) + # 2. Support precise interval comparisons and duration calculations + # 3. Handle varying precision levels in calendar conversions (e.g., Hijri dates)src/undate/converters/calendars/hijri/converter.py (1)
47-48
: Consider implementing reverse conversion functionalitySupporting conversion from an
Undate
orUndateInterval
back to a Hijri date could enhance the utility of theHijriDateConverter
. This feature would allow users to display dates in the Hijri calendar format when needed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (13)
pyproject.toml
(1 hunks)src/undate/converters/calendars/__init__.py
(1 hunks)src/undate/converters/calendars/hijri/__init__.py
(1 hunks)src/undate/converters/calendars/hijri/converter.py
(1 hunks)src/undate/converters/calendars/hijri/hijri.lark
(1 hunks)src/undate/converters/calendars/hijri/parser.py
(1 hunks)src/undate/converters/calendars/hijri/transformer.py
(1 hunks)src/undate/converters/edtf/edtf.lark
(1 hunks)src/undate/converters/edtf/parser.py
(1 hunks)src/undate/undate.py
(1 hunks)tests/test_converters/test_calendars/test_hijri/test_hijri_converter.py
(1 hunks)tests/test_converters/test_calendars/test_hijri/test_hijri_parser.py
(1 hunks)tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py
(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- src/undate/converters/calendars/init.py
- src/undate/converters/calendars/hijri/parser.py
- src/undate/converters/edtf/parser.py
🧰 Additional context used
🪛 Ruff (0.7.0)
src/undate/converters/calendars/hijri/converter.py
45-45: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (10)
src/undate/converters/calendars/hijri/__init__.py (2)
3-3
: LGTM! Clean public API declaration.
The __all__
declaration explicitly defines the public API, following Python best practices.
1-1
: LGTM! Verify the import path.
The import statement follows Python package conventions and aligns with the PR objective.
Let's verify the existence and location of the imported module:
✅ Verification successful
Import path verified and correctly structured
The import statement is valid as:
- The
HijriDateConverter
class exists insrc/undate/converters/calendars/hijri/converter.py
- The import path follows the correct package structure
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the existence and location of the HijriDateConverter module
# Check if the converter module exists
fd --type f "converter.py" --exec grep -l "class HijriDateConverter"
# Verify the import path structure
fd --type d "undate/converters/calendars/hijri"
Length of output: 169
tests/test_converters/test_calendars/test_hijri/test_hijri_converter.py (1)
1-6
: LGTM! Clean imports and class structure.
The imports are appropriate and the test class follows proper naming conventions.
src/undate/converters/calendars/hijri/hijri.lark (2)
1-2
: LGTM: Standard whitespace handling configuration.
The whitespace configuration is correctly set up using Lark's common whitespace import.
1-53
: Verify grammar integration and test coverage.
Please ensure that:
- All date formats are covered in the test suite
- Edge cases are handled (e.g., invalid dates, boundary years)
- Integration with the converter handles all grammar productions correctly
✅ Verification successful
Let me gather more specific information about the test coverage and integration.
Let me search for the converter implementation one more time with a different approach.
Let me check the converter implementation to complete the verification.
Grammar integration and test coverage are comprehensive
The verification shows:
-
Extensive test coverage in
test_hijri_parser.py
covering all date formats:- Single year (e.g., "521")
- Month-year with variants (e.g., "al-Muḥarram 900", "Safar 581")
- Day-month-year (e.g., "7 Jumādā I 1243")
- Year-month and year-month-day formats
-
Proper integration with converter system:
- HijriDateConverter properly implements parsing through the grammar
- Transformer correctly handles all date formats and precision levels
- Error handling for invalid inputs is implemented
- Calendar-specific labeling is maintained
-
Edge cases are handled:
- Different month name variants (with/without accents)
- Valid day ranges (1-30)
- Three-digit year assumption is properly enforced
- Invalid input detection via UnexpectedCharacters exception
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for test files and patterns
echo "Searching for test files..."
fd -e py test_ -x grep -l "hijri" {}
# Look for test patterns in the codebase
echo "Checking test patterns..."
rg "test.*hijri" -A 5
# Verify converter integration
echo "Checking converter integration..."
ast-grep --pattern 'class $CONVERTER {
$$$
parse($INPUT) {
$$$
}
$$$
}'
Length of output: 498
Script:
#!/bin/bash
# Check test file contents for coverage of different date formats
echo "Checking test coverage..."
cat ./tests/test_converters/test_calendars/test_hijri/test_hijri_parser.py
# Check transformer implementation
echo "Checking transformer..."
cat ./tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py
# Look for the actual converter implementation
echo "Looking for converter class..."
fd -e py -x grep -l "HijriConverter" {}
Length of output: 3628
Script:
#!/bin/bash
# Look for converter implementation using ripgrep
echo "Searching for converter implementation..."
rg -l "class.*HijriConverter"
# Check imports to find converter location
echo "Checking imports..."
rg "from.*hijri.*converter.*import"
# Look for any file containing 'hijri' in the converter directory
echo "Checking converter directory..."
fd -e py . src/undate/converters/calendars/hijri/
Length of output: 663
Script:
#!/bin/bash
# Check the converter implementation
echo "Checking converter implementation..."
cat src/undate/converters/calendars/hijri/converter.py
# Check the parser implementation
echo "Checking parser implementation..."
cat src/undate/converters/calendars/hijri/parser.py
Length of output: 2421
tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py (3)
1-6
: LGTM! Well-organized imports.
The imports are logically grouped and include all necessary components for testing the Hijri date transformer.
53-54
: Consider adding precision validation for UndateInterval.
The precision check is only performed for Undate objects, but UndateInterval instances should also maintain precision information.
Let's verify if UndateInterval supports precision:
1-56
: Verify test coverage for the Hijri transformer implementation.
The test suite provides good coverage of the basic functionality, but let's verify the coverage metrics.
✅ Verification successful
Test coverage for the Hijri transformer is comprehensive
The test suite provides excellent coverage of the HijriDateTransformer implementation:
- Tests cover all public methods (
hijri_date
andmonth
) - Includes test cases for all three precision levels (day, month, year)
- Verifies both single dates (Undate) and date intervals (UndateInterval)
- Contains real-world examples from Princeton Geniza Project and ISMI data
- Validates correct date conversions against external reference (muqawwim.com)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for any untested public methods in the transformer
ast-grep --pattern 'class HijriDateTransformer {
$$$
def $method($$$) {
$$$
}
$$$
}'
Length of output: 98
Script:
#!/bin/bash
# Let's first locate the transformer implementation file
fd "transformer.py" --type f
# Then examine its content and the HijriDateTransformer class
rg -A 5 "class HijriDateTransformer"
# Also check test coverage using test file patterns
fd "test.*transformer.py" --type f
Length of output: 951
Script:
#!/bin/bash
# Let's examine the full implementation of HijriDateTransformer
cat src/undate/converters/calendars/hijri/transformer.py
# And check all test cases in the test file
cat tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py
Length of output: 4241
pyproject.toml (1)
12-12
: LGTM! Dependencies align with PR objectives.
The updated dependencies support the new Hijri calendar conversion functionality:
lark[interegular]
provides enhanced parsing capabilities for Hijri date formatsconvertdate
enables calendar conversion operations
src/undate/undate.py (1)
445-447
: LGTM: Type annotations are clear and consistent
The explicit type annotations for class attributes using Union[Type, None]
are well-defined and maintain compatibility with Python versions pre-3.10. They align well with the class usage and will improve type checking for the new Hijri date conversion feature.
def test_parse_error(self): | ||
# a string we can't parse should raise an error | ||
with pytest.raises(ValueError): | ||
HijriDateConverter().parse("January 2, 1991") | ||
# empty string should also error | ||
with pytest.raises(ValueError): | ||
HijriDateConverter().parse("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance error handling test coverage.
Consider adding more error test cases:
- Malformed Hijri dates (e.g., invalid month names)
- Out-of-range values (e.g., invalid day numbers)
- Incomplete dates (e.g., month without year)
- Mixed calendar formats
Example additions:
def test_parse_error_cases(self):
"""Test various error cases for Hijri date parsing."""
invalid_cases = [
"13 Jumādā I 1243", # Invalid day
"Foo 495", # Invalid month
"Rajab", # Incomplete date
"1243/5/7", # Wrong format
]
for invalid_date in invalid_cases:
with pytest.raises(ValueError):
HijriDateConverter().parse(invalid_date)
import pytest | ||
from undate.converters.calendars import HijriDateConverter | ||
from undate.undate import Undate, UndateInterval | ||
|
||
|
||
class TestHijriDateConverter: | ||
def test_parse_(self): | ||
# day | ||
date_str = "7 Jumādā I 1243" | ||
date = HijriDateConverter().parse(date_str) | ||
assert date == Undate(1827, 11, 26) | ||
assert date.label == f"{date_str} {HijriDateConverter.calendar_name}" | ||
|
||
# month | ||
date_str = "Rajab 495" | ||
date = HijriDateConverter().parse(date_str) | ||
assert date == UndateInterval(Undate(1102, 4, 28), Undate(1102, 5, 27)) | ||
assert date.label == f"{date_str} {HijriDateConverter.calendar_name}" | ||
|
||
# year | ||
date_str = "441" | ||
date = HijriDateConverter().parse(date_str) | ||
assert date == UndateInterval(Undate(1049, 6, 11), Undate(1050, 5, 31)) | ||
assert date.label == f"{date_str} {HijriDateConverter.calendar_name}" | ||
|
||
def test_parse_error(self): | ||
# a string we can't parse should raise an error | ||
with pytest.raises(ValueError): | ||
HijriDateConverter().parse("January 2, 1991") | ||
# empty string should also error | ||
with pytest.raises(ValueError): | ||
HijriDateConverter().parse("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Test coverage needs expansion for ISMI dataset formats
The current test cases don't fully cover the date formats found in the ISMI dataset. Specifically:
- Missing test for Muharram month format (e.g., "6. Muḥarram 888")
- No tests for dates with diacritical marks (e.g., "Ḏu al-Ḥijjaẗ")
- No tests for year ranges in Islamic calendar
- No tests handling the ±2 days ambiguity mentioned in ISMI data
The test file should be expanded to include these real-world use cases from the ISMI dataset to ensure robust handling of actual manuscript dates.
🔗 Analysis chain
Verify test coverage against real-world data.
The PR objectives mention examples from Princeton Geniza Project (PGP) and ISMI datasets. Let's verify if the current test cases cover the date formats found in these datasets.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for Hijri date patterns in the codebase to find more test cases
# Look for files that might contain example dates from PGP and ISMI datasets
# Search for potential test data files
fd -e json -e csv -e txt -e md . | grep -i -E 'test.*data|pgp|ismi|hijri|dates'
# Search for Hijri date patterns in these files
rg -i "hijri|jumada|rajab|ramadan"
Length of output: 5539
Script:
#!/bin/bash
# Let's check the ISMI sample data to understand the date formats
cat examples/use-cases/ismi/data/ismi-om4-date-samples.json
# Also check the ISMI README for any documentation about date formats
cat examples/use-cases/ismi/README.md
Length of output: 5394
tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py
Show resolved
Hide resolved
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Should a month or year in another calendar be represented in a single Undate or should it always be a UndateInterval? I would find it more elegant if non-Gregorian months or years could also be represented as a single Undate. Does that make other Undate functions too complex? The default constructor assumes Gregorian and it would have to be changed. Does this change what precision=year means? Is it too weird to have an Undate "year" based on the Gregorian calendar where the Gregorian text representation does not align on a year? Maybe we should have a "non-gregorian" flag for cases when you really care about this. |
@robcast these are fantastic questions and I like where you are going with it. I had been intending to add a calendar property to Here's what I'm thinking based on what you're proposing: when parsing a Hijri date, instead of converting to Gregorian before creating Undate objects, we initialize the Undate with numeric year, month, day values and a calendar. This allows us to keep them a single Undate instead of an UndateInterval, and also preserves the date precision logic. There are lots of benefits to this approach. If we do that, then would we calculate the earliest/latest dates in the Hijri calendar or Gregorian? Or have options for both? Or maybe the comparison logic needs to be calendar aware and we only do the conversion when it matters or is explicitly requested? I'm leaning towards this last option, but let me know what you think! |
decisions from discussion with @robcast :
|
f5429cd
to
e91b7ba
Compare
@robcast I've updated this PR based on the approach we discussed. I'm really happy with how it's working and how powerful this is going to be. I went ahead and made a minimal Gregorian calendar class for consistency with the pieces of information that could be calendar specific. To see if this will scale, I took a first pass at implementing a calendar converter for the Hebrew Anno Mundi based on the Hijri implementation and with reference to my previous geniza project code. PR for that is here, if you want to see how it builds on this: #108 |
related to #10
improvements after initial review
Questions:
Summary by CodeRabbit
Release Notes
New Features
HijriDateConverter
for converting Hijri calendar dates to Gregorian dates.HijriDateTransformer
for transforming parsed Hijri dates intoUndate
orUndateInterval
objects.Bug Fixes
Tests
HijriDateConverter
,hijri_parser
, andHijriDateTransformer
to ensure functionality and error handling.