-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix documentation and tutorial notebooks ahead of new release #334
Conversation
WalkthroughThis pull request includes updates to several files, primarily focusing on modifying package dependencies and enhancing tutorial content. The Changes
Possibly related PRs
Suggested labels
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #334 +/- ##
==========================================
- Coverage 76.46% 76.13% -0.34%
==========================================
Files 31 31
Lines 2545 2564 +19
==========================================
+ Hits 1946 1952 +6
- Misses 599 612 +13 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (10)
docs/tutorials.rst (1)
16-16
: Maintain consistency in tutorial file references.The
.ipynb
extension is explicitly mentioned for the new tutorial but not for others. This inconsistency might confuse readers.Apply this diff to maintain consistency:
- tutorials/filtering_icsd_oxidation_states.ipynb + tutorials/filtering_icsd_oxidation_statespyproject.toml (2)
88-92
: Consider adding version constraints for visualisation dependenciesWhile adding these visualisation packages is beneficial for the tutorials, not specifying version constraints could lead to compatibility issues. Consider using the same versions as defined in the
strict
group for consistency.visualisation = [ - "seaborn", - "pymatviz", - "dash", + "seaborn>=0.13.2", + "pymatviz>=0.14", + "dash>=2.18.2", ]
94-107
: Maintain consistency in version constraintsThe
strict
group effectively pins versions for most dependencies. However,numpy
lacks a version constraint despite havingnumpy<2
in the main dependencies. Consider maintaining consistency.- "numpy", + "numpy<2",docs/tutorials/crystal_space_visualisation.ipynb (2)
Line range hint
25-95
: Consider optimising the embedding computation implementationA few suggestions to enhance the code:
- Consider disabling the progress bar in production notebooks
- The exception handling could be more specific than catching all exceptions
- The embedding dimension could be cached to avoid repeated instantiation
Here's a suggested improvement:
def get_embedding(formula, embedding="magpie", stats="mean"): if isinstance(formula, str): formula = [formula] elif isinstance(formula, Iterable): pass else: raise TypeError("formula must be a string or a list of strings") - # get embedding dimension - embedding_dim = CompositionalEmbedding("", embedding=embedding).embedding.dim + # Cache the embedding instance to avoid repeated instantiation + embedding_instance = CompositionalEmbedding("", embedding=embedding) + embedding_dim = embedding_instance.embedding.dim # compute embedding embeddings = [] - for f in tqdm(formula): + for f in tqdm(formula, disable=not IN_COLAB): # Show progress only in Colab try: compositional_embedding = CompositionalEmbedding(f, embedding=embedding) embeddings.append(compositional_embedding.feature_vector(stats=stats)) - except Exception as e: + except (ValueError, KeyError) as e: # Be more specific about expected exceptions embeddings.append(np.full(embedding_dim, np.nan))
Line range hint
96-150
: Remove unused variables and ensure consistent parameter handling
- The
silhouette_scores
dictionary is declared but never used- The
random_state
parameter should be consistently applied to all reducersConsider these changes:
-silhouette_scores = {} for name in embedding_names: for reducer in reducers: print(f"Computing {name} {reducer} embeddings") embeddings = pd.read_pickle(save_dir / "embeddings" / f"embeddings_{name}.pkl") reduced_embeddings = dimension_reduction( embeddings, reducer=reducer, n_components=2, save_dir=save_dir / "reduced_embeddings_2d", file_name=f"{reducer}_{name}", - random_state=42, + **({'random_state': 42} if reducer in ['tsne', 'umap'] else {}) )smact/tests/test_core.py (1)
385-386
: LGTM! Consider documenting the rationale for expected outcomes.The additional test cases for
MgB2
validation with different oxidation state sets enhance the test coverage. However, it would be helpful to add comments explaining whyMgB2
is expected to be valid withicsd16
but invalid withpymatgen_sp
oxidation states.Add comments to explain the expected outcomes:
- self.assertTrue(smact.screening.smact_validity("MgB2", oxidation_states_set="icsd16")) - self.assertFalse(smact.screening.smact_validity("MgB2", oxidation_states_set="pymatgen_sp")) + # MgB2 is valid in ICSD16 due to the inclusion of B(-1) oxidation state + self.assertTrue(smact.screening.smact_validity("MgB2", oxidation_states_set="icsd16")) + # MgB2 is invalid in pymatgen_sp as it lacks the necessary B oxidation state + self.assertFalse(smact.screening.smact_validity("MgB2", oxidation_states_set="pymatgen_sp"))smact/screening.py (1)
397-399
: Consider adding validation for oxidation states availabilityWhile the implementation of the new oxidation state sets is correct, it might be helpful to add validation to ensure the selected oxidation states are available for all elements in the composition.
# Select the specified oxidation states set: oxi_set = { "smact14": [e.oxidation_states_smact14 for e in els], "icsd16": [e.oxidation_states_icsd16 for e in els], "icsd24": [e.oxidation_states_icsd24 for e in els], "pymatgen_sp": [e.oxidation_states_sp for e in els], "wiki": [e.oxidation_states_wiki for e in els], } if oxidation_states_set in oxi_set: ox_combos = oxi_set[oxidation_states_set] + # Validate that oxidation states are available + if any(not states for states in ox_combos): + raise ValueError( + f"Some elements do not have oxidation states available in the '{oxidation_states_set}' set" + )Also applies to: 491-496
smact/__init__.py (1)
189-191
: Consider adding a deprecation warning for backwards compatibilityThe implementation correctly maps the new attribute name. To ease the transition for existing users, consider adding a deprecated property that forwards to the new attribute name.
Here's a suggested implementation:
class Element: + @property + def oxidation_states_icsd(self): + warnings.warn( + "oxidation_states_icsd is deprecated. Use oxidation_states_icsd16 instead.", + DeprecationWarning, + stacklevel=2 + ) + return self.oxidation_states_icsd16smact/structure_prediction/utilities.py (2)
33-61
: Consider merging parsing logic to reduce code duplicationHaving two separate functions for parsing species strings may lead to maintenance difficulties. Integrate the logic of
_parse_spec_old
intoparse_spec
or refactor shared code into a helper function to enhance maintainability.
33-61
: Rename_parse_spec_old
for clearer intentThe name
_parse_spec_old
may not clearly communicate the function's purpose. Renaming it to_parse_spec_fallback
or_fallback_parse_spec
can improve code readability and understanding.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (13)
docs/requirements.txt
(1 hunks)docs/tutorials.rst
(1 hunks)docs/tutorials/crystal_space.ipynb
(1 hunks)docs/tutorials/crystal_space_visualisation.ipynb
(1 hunks)docs/tutorials/oxidation_states.ipynb
(1 hunks)docs/tutorials/smact_generation_of_solar_oxides.ipynb
(4 hunks)docs/tutorials/smact_validity_of_GNoMe.ipynb
(1 hunks)pyproject.toml
(2 hunks)smact/__init__.py
(2 hunks)smact/screening.py
(5 hunks)smact/structure_prediction/utilities.py
(1 hunks)smact/tests/test_core.py
(1 hunks)smact/utils/oxidation.py
(2 hunks)
✅ Files skipped from review due to trivial changes (5)
- docs/requirements.txt
- docs/tutorials/crystal_space.ipynb
- docs/tutorials/oxidation_states.ipynb
- docs/tutorials/smact_generation_of_solar_oxides.ipynb
- docs/tutorials/smact_validity_of_GNoMe.ipynb
🔇 Additional comments (12)
docs/tutorials.rst (1)
13-15
: LGTM! The tutorial reordering improves the learning flow.
The reordering of tutorials provides a more logical progression, placing the GNoMe validity check and crystal space tutorials together.
smact/utils/oxidation.py (2)
87-87
: LGTM! Parameter addition is well-designed.
The new parameter sort_by_occurrences
is well-named and maintains backward compatibility with its default value of True
.
93-93
: LGTM! Documentation is clear and complete.
The docstring addition follows the established format and clearly explains the parameter's purpose.
pyproject.toml (1)
203-204
: LGTM: Well-formed regex pattern for ignoring encoded strings
The added ignore-regex pattern effectively excludes long alphanumeric strings (like base64 encoded content) from spell checking, which will help reduce false positives.
docs/tutorials/crystal_space_visualisation.ipynb (3)
50-50
: LGTM: Colab badge now correctly points to the master branch
The update ensures users will always access the latest stable version of the notebook.
Line range hint 1-24
: Well-structured package management and imports
The implementation demonstrates good practices:
- Robust Colab environment detection
- Clear dependency management
- Organised imports
Line range hint 151-170
: LGTM: Clear and consistent visualization implementation
The code effectively utilises SMACT's plotting utilities and follows good practices for file handling and visualisation.
smact/screening.py (2)
355-355
: Documentation update looks good!
The parameter description clearly explains all available options and includes a helpful warning about backwards compatibility.
509-509
: Error message update is appropriate
The error message clearly lists all available options and maintains consistency with the documentation.
smact/__init__.py (1)
67-67
: Documentation update looks good, but this is a breaking change
The documentation clearly specifies that these oxidation states are from the 2016 version of ICSD. However, renaming oxidation_states_icsd
to oxidation_states_icsd16
is a breaking change that might affect dependent code.
Let's check for any usage of the old attribute name in the codebase:
✅ Verification successful
Breaking change appears to be properly handled
Based on the codebase analysis, the renaming from oxidation_states_icsd
to oxidation_states_icsd16
appears to be part of a broader update that introduces both ICSD 2016 and 2024 data. The changes are consistent throughout the codebase:
- No remaining references to the old
oxidation_states_icsd
attribute in tests - New implementation properly uses
oxidation_states_icsd16
andoxidation_states_icsd24
- Data loading and screening functionality has been updated accordingly
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any remaining references to the old attribute name
rg "oxidation_states_icsd[^1]" --type py
Length of output: 1143
Script:
#!/bin/bash
# Let's check the git history for this specific change to understand if it's a new addition
git log -p -- smact/__init__.py | grep -A 5 -B 5 "oxidation_states_icsd"
# Also check for any test files that might be affected
rg "oxidation_states_icsd" --type py tests/
Length of output: 90
Script:
#!/bin/bash
# Let's check for test directories with a more general search
fd "test" -t d
# And then search for any Python files that might contain tests
fd "test.*\.py$"
# Also check if there are any direct references to the old attribute name
ast-grep --pattern 'oxidation_states_icsd'
Length of output: 196
Script:
#!/bin/bash
# Check the test files for any references to oxidation states
rg "oxidation_states_icsd" smact/tests/
# Also check the implementation details of the Element class
ast-grep --pattern $'class Element {
$$$
}'
Length of output: 86
smact/structure_prediction/utilities.py (2)
16-21
: Docstring update improves clarity
The revised docstring provides a clearer explanation of the function's purpose and return values.
52-59
: Verify comprehensive handling of all species string formats
Ensure that the parsing logic correctly handles all possible species string formats, such as "X0", "X0+", "X0-", "X+", "X-", and neutral species without charges. Adding unit tests for these cases can help verify the robustness of the parsing functions.
Would you like assistance in creating unit tests to cover these edge cases?
if sort_by_occurrences: | ||
return species_occurrences_df.sort_values("results_count", ascending=False).reset_index(drop=True) | ||
return species_occurrences_df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Test coverage for sorting parameter is missing
The test file test_utils.py
contains a test case test_oxidation_states_filter_species_occurrences
that verifies the basic functionality of get_species_occurrences_df()
, but it doesn't test the new sort_by_occurrences
parameter. The test should be extended to cover:
- The default sorting behaviour (sort_by_occurrences=True)
- Explicit unsorted behaviour (sort_by_occurrences=False)
- Verification that index is properly reset after sorting
Locations requiring attention:
smact/tests/test_utils.py
: Add test cases in thetest_oxidation_states_filter_species_occurrences
method
🔗 Analysis chain
Consider adding test cases for the new sorting parameter.
The sorting implementation is correct and efficient. However, to ensure the new parameter works as expected across different scenarios, consider adding test cases that verify:
- Sorting behaviour when
sort_by_occurrences=True
- Unsorted behaviour when
sort_by_occurrences=False
- Edge cases with empty or single-row DataFrames
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if test coverage exists for the new parameter
rg -A 5 "get_species_occurrences_df.*sort_by_occurrences" "tests/"
Length of output: 113
Script:
#!/bin/bash
# Let's first check if there are any test files in the repository
fd "test.*\.py$"
# Also check for the function definition to understand its usage
rg -A 5 "def get_species_occurrences_df"
Length of output: 567
Script:
#!/bin/bash
# Let's check test_utils.py for any existing tests of this function
cat smact/tests/test_utils.py
Length of output: 7609
try: | ||
ele, oxi_state = re.match(r"([A-Za-z]+)([0-9]*[\+\-])", species).groups() | ||
if oxi_state[-1] in ["+", "-"]: | ||
charge = (int(oxi_state[:-1] or 1)) * (-1 if "-" in oxi_state else 1) | ||
return ele, charge | ||
else: | ||
return ele, 0 | ||
except AttributeError: | ||
return _parse_spec_old(species) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance exception handling for more precise error control
Catching AttributeError
may inadvertently suppress other unrelated errors. Consider checking if the match object is None
before accessing .groups()
, which can prevent unexpected exceptions and improve code robustness.
Apply this diff to improve exception handling:
def parse_spec(species: str) -> tuple[str, int]:
"""
Parse a species string into its atomic symbol and oxidation state.
:param species: the species string
:return: a tuple of the atomic symbol and oxidation state
"""
- try:
- ele, oxi_state = re.match(r"([A-Za-z]+)([0-9]*[\+\-])", species).groups()
+ match = re.match(r"([A-Za-z]+)([0-9]*[\+\-])", species)
+ if match:
+ ele, oxi_state = match.groups()
if oxi_state[-1] in ["+", "-"]:
charge = (int(oxi_state[:-1] or 1)) * (-1 if "-" in oxi_state else 1)
return ele, charge
else:
return ele, 0
- except AttributeError:
+ else:
return _parse_spec_old(species)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
try: | |
ele, oxi_state = re.match(r"([A-Za-z]+)([0-9]*[\+\-])", species).groups() | |
if oxi_state[-1] in ["+", "-"]: | |
charge = (int(oxi_state[:-1] or 1)) * (-1 if "-" in oxi_state else 1) | |
return ele, charge | |
else: | |
return ele, 0 | |
except AttributeError: | |
return _parse_spec_old(species) | |
match = re.match(r"([A-Za-z]+)([0-9]*[\+\-])", species) | |
if match: | |
ele, oxi_state = match.groups() | |
if oxi_state[-1] in ["+", "-"]: | |
charge = (int(oxi_state[:-1] or 1)) * (-1 if "-" in oxi_state else 1) | |
return ele, charge | |
else: | |
return ele, 0 | |
else: | |
return _parse_spec_old(species) |
Fix documentation and tutorial notebooks ahead of new release
Description
This PR introduces the following changes:
parse_spec
to handle X+/X- speciesType of change
How Has This Been Tested?
Test Configuration:
Reviewers
N/A
Checklist
Summary by CodeRabbit
Release Notes
New Features
filtering_icsd_oxidation_states.ipynb
.Bug Fixes
Documentation
Chores
pyproject.toml
.