-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oxidation States 24 #329
Oxidation States 24 #329
Conversation
Added comment lines on dataset origins
Populated from ICSD API - raw count json - raw oxidation state list - filtered list with > 5 entries per oxidation state
Add set that lists the oxidation state with the most results for each element
Final list up to Lr
…o oxidationstates24
Regenerated ICSD24 dataset on oxidation states. API had returned 0 values for some species, whilst the web version showed results. Overcome by restarting the session for each search. "oxidation_states_icsd24_filtered" is recommended as the default
WalkthroughThe pull request introduces several modifications across multiple files, primarily focusing on the standardisation and formatting of bibliographic entries in a Changes
Possibly related PRs
Suggested labels
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (18)
smact/utils/oxidation.py (10)
2-3
: Simplify imports from the 'os' module.Importing both
os
andfrom os import path
is redundant. Consider importing only the necessary components to improve code clarity.Apply this diff to streamline the imports:
-import os -from os import path +import osAnd update
path.join(...)
toos.path.join(...)
throughout the code.
22-24
: Add error handling for data loading in__init__
.Currently, the code does not handle potential errors when loading the JSON file. Adding error handling will make the class more robust against missing or corrupt data files.
Consider wrapping the data loading in a try-except block:
try: self.ox_states_df = pd.read_json( os.path.join(data_directory, "oxidation_states_icsd24_counts.json") ) except (FileNotFoundError, ValueError) as e: raise RuntimeError("Failed to load oxidation states data.") from e
62-77
: Add return type annotation toget_species_list
method.Including a return type annotation improves code readability and helps with static analysis tools.
Apply this diff to add the return type:
def get_species_list( self, threshold: int, include_zero: bool = False, include_one_oxidation_state: bool = False, ): + -> list[str]:
94-104
: Specify return type in docstring and function signature.The return type in the docstring is vague. Specify the return type as
pd.DataFrame
for clarity, and add a return type annotation to the function signature.Update the docstring and signature:
): + -> pd.DataFrame: """Get the ICSD 24 oxidation states list as a DataFrame of species with their occurrences. Args: include_one_oxidation_state (bool): Include oxidation states +1 and -1 as ± in species notation. Default is False. Returns: - dataframe: The species list as a dataframe of species with their occurrences. + pd.DataFrame: A DataFrame containing species and their occurrence counts. """
106-109
: Include zero oxidation states based oninclude_zero
parameter.In the
get_species_occurrences_df
method, zero oxidation states are always excluded. Consider adding aninclude_zero
parameter to control this behaviour for consistency with other methods.
126-130
: UseOptional
for parameters that can beNone
.The parameter
comment
can beNone
. For clarity, annotate it usingOptional[str]
.Apply this diff:
filename: str | os.PathLike, threshold: int, include_zero: bool = False, - comment: str = None, + comment: Optional[str] = None, ):Don't forget to import
Optional
from thetyping
module:+from typing import Optional
158-168
: Specify file encoding when writing to the file.To ensure consistent behaviour across different environments, specify the file encoding when opening the file.
Apply this diff:
-with open(filename, "w") as f: +with open(filename, "w", encoding="utf-8") as f:
169-171
: Correct type annotation forgroup
parameter in_filter_oxidation_states
.The type annotation for
group
should bepd.DataFrame
instead ofpd.DataFrame.groupby
.Apply this diff:
def _filter_oxidation_states( - self, group: pd.DataFrame.groupby, threshold: int + self, group: pd.DataFrame, threshold: int ):
83-91
: Handle potential non-integer oxidation states gracefully.If
ox_state
cannot be converted to an integer, it is silently ignored. Consider logging a warning to aid in debugging unexpected data.
12-18
: Revise class docstring to follow standard conventions.Including an
Attributes
section in the class docstring is unconventional. Attribute documentation is typically placed in the__init__
method's docstring or as class variable docstrings.Consider restructuring the docstring:
"""Class to handle filtering the ICSD 24 oxidation states list. -The ICSD 24 oxidation states list is a list of oxidation states for each element in the ICSD 24 database. -Attributes: - ox_states_df (pd.DataFrame): The ICSD 24 oxidation states list as a DataFrame. """ + The ICSD 24 oxidation states list contains oxidation states for each element from the ICSD 24 database.And add a docstring to
__init__
if detailed attribute documentation is necessary.smact/structure_prediction/utilities.py (2)
Line range hint
39-61
: Enhance docstring with examples of the new parameterThe implementation looks good, but the docstring could be more comprehensive.
Consider adding examples demonstrating the
include_one
parameter:Examples: >>> unparse_spec(("Fe", 2)) 'Fe2+' >>> unparse_spec(("O", -2)) 'O2-' + >>> unparse_spec(("Na", 1)) + 'Na1+' + >>> unparse_spec(("Na", 1), include_one=False) + 'Na+'
84-84
: Consider enhancing error handlingThe type hint update is good, but the error handling could be more specific.
Consider improving the error handling:
try: return doc.dict(exclude={"fields_not_requested"}) - except: - logger.warn(f"Could not convert input:\n {doc}\n to a dictionary.") - raise TypeError("Input is not an MPDataDoc object.") + except AttributeError as e: + logger.warning(f"Could not convert input:\n {doc}\n to a dictionary. Error: {e}") + raise TypeError("Input is not a valid MPDataDoc object") from e + except Exception as e: + logger.error(f"Unexpected error while converting document: {e}") + raiseAlso applies to: 93-97
smact/tests/test_utils.py (5)
5-5
: Remove unused importThe
pytest
import is not used in this test file. Since the tests are using theunittest
framework, this import can be safely removed.-import pytest
🧰 Tools
🪛 Ruff
5-5:
pytest
imported but unusedRemove unused import:
pytest
(F401)
19-24
: Use context managers for file operationsWhilst the setup is well-structured, it's recommended to use context managers for file operations to ensure proper resource handling.
def setUp(self): self.ox_filter = ICSD24OxStatesFilter() - with open(TEST_ICSD_OX_STATES) as f: - self.test_ox_states = f.read() - with open(TEST_ICSD_OX_STATES_W_ZERO) as f: - self.test_ox_states_w_zero = f.read() + self.test_ox_states = TEST_ICSD_OX_STATES.read_text() + self.test_ox_states_w_zero = TEST_ICSD_OX_STATES_W_ZERO.read_text()
35-35
: Complete or remove the commented assertionThere's an incomplete assertion that's been commented out. Either complete the assertion with the expected value or remove it if no longer needed.
47-53
: Improve file comparison logicThe current file comparison could be improved by using context managers and a more efficient comparison method.
- assert [line for line in open(f"{filename}.txt")] == [ - line for line in open(TEST_ICSD_OX_STATES) - ] - self.assertTrue(os.path.exists(f"{filename}_w_zero.txt")) - assert [line for line in open(f"{filename}_w_zero.txt")] == [ - line for line in open(TEST_ICSD_OX_STATES_W_ZERO) - ] + with open(f"{filename}.txt") as f1, open(TEST_ICSD_OX_STATES) as f2: + self.assertEqual(f1.read(), f2.read()) + with open(f"{filename}_w_zero.txt") as f1, open(TEST_ICSD_OX_STATES_W_ZERO) as f2: + self.assertEqual(f1.read(), f2.read())🧰 Tools
🪛 Ruff
47-47: Use a context manager for opening files
(SIM115)
48-48: Use a context manager for opening files
(SIM115)
51-51: Use a context manager for opening files
(SIM115)
52-52: Use a context manager for opening files
(SIM115)
59-62
: Document the significance of threshold valuesThe test uses specific threshold values (0, 5, 50) and expects specific list lengths (490, 358, 227). Consider adding a comment explaining the significance of these values.
+ # Test various thresholds and their impact on the number of species: + # 0: All species (490) + # 5: Common species (358) + # 50: Very common species (227) for threshold, length in [(0, 490), (5, 358), (50, 227)]:smact/__init__.py (1)
57-57
: Maintain consistent documentation formatThe documentation for oxidation states attributes is clear, but could be more consistent:
oxidation_states_icsd
specifies "2016 version"oxidation_states_icsd24
uses "2024 version"Consider standardising the format:
- Element.oxidation_states_icsd (list) : List of oxidation states that appear in the 2016 version of the ICSD + Element.oxidation_states_icsd (list): List of oxidation states from ICSD 2016 - Element.oxidation_states_icsd24 (list): List of oxidation states that appear in the 2024 version of the ICSD + Element.oxidation_states_icsd24 (list): List of oxidation states from ICSD 2024Also applies to: 63-64
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (17)
paper.bib
(6 hunks)smact/__init__.py
(3 hunks)smact/data/oxidation_states.txt
(1 hunks)smact/data/oxidation_states_SP.txt
(1 hunks)smact/data/oxidation_states_icsd.txt
(1 hunks)smact/data/oxidation_states_icsd24_common.txt
(1 hunks)smact/data/oxidation_states_icsd24_filtered.txt
(1 hunks)smact/data/oxidation_states_icsd24_raw.txt
(1 hunks)smact/data/oxidation_states_pmg.txt
(1 hunks)smact/data/oxidation_states_wiki.txt
(1 hunks)smact/data/oxidationstates.data
(0 hunks)smact/data_loader.py
(2 hunks)smact/structure_prediction/utilities.py
(4 hunks)smact/tests/files/test_icsd_oxidation_states_filter_1000.txt
(1 hunks)smact/tests/files/test_icsd_oxidation_states_filter_1000_w_0_ox_state.txt
(1 hunks)smact/tests/test_utils.py
(1 hunks)smact/utils/oxidation.py
(1 hunks)
💤 Files with no reviewable changes (1)
- smact/data/oxidationstates.data
✅ Files skipped from review due to trivial changes (7)
- paper.bib
- smact/data/oxidation_states.txt
- smact/data/oxidation_states_SP.txt
- smact/data/oxidation_states_icsd.txt
- smact/data/oxidation_states_icsd24_filtered.txt
- smact/data/oxidation_states_pmg.txt
- smact/data/oxidation_states_wiki.txt
🧰 Additional context used
🪛 LanguageTool
smact/data/oxidation_states_icsd24_common.txt
[grammar] ~7-~7: Només va sense accent quan és un animal o un nom de lletra. Escriviu «bé» (adverbi) o «ve» (del v. ‘venir’).
Context: ...st common non-zero values # H 1 He Li 1 Be 2 B 3 C 4 N -3 O -2 F -1 Ne Na 1 Mg 2 A...
(BE)
[grammar] ~36-~36: Possible confusió. ¿Volíeu dir «als»?
Context: ...Mn 2 Fe 3 Co 2 Ni 2 Cu 2 Zn 2 Ga 3 Ge 4 As 5 Se -2 Br -1 Kr 2 Rb 1 Sr 2 Y 3 Zr 4 N...
(AS_ALS)
[grammar] ~42-~42: Si és conjunció, cal escriure «I».
Context: ... 3 Ge 4 As 5 Se -2 Br -1 Kr 2 Rb 1 Sr 2 Y 3 Zr 4 Nb 5 Mo 6 Tc 7 Ru 4 Rh 3 Pd 2 Ag...
(Y_CONJUNCIO)
[grammar] ~70-~70: ¿Volíeu dir «o» (conjunció) en comptes de «ho» (pronom)?
Context: ...Pr 3 Nd 3 Pm 3 Sm 3 Eu 3 Gd 3 Tb 3 Dy 3 Ho 3 Er 3 Tm 3 Yb 3 Lu 3 Hf 4 Ta 5 W 6 Re ...
(FINS_I_TOT)
[grammar] ~80-~80: Paraula desconeguda com a nom propi.
Context: ... Tm 3 Yb 3 Lu 3 Hf 4 Ta 5 W 6 Re 7 Os 5 Ir 4 Pt 4 Au 1 Hg 2 Tl 1 Pb 2 Bi 3 Po 4 At...
(VERBS_NOMSPROPIS)
[grammar] ~102-~102: S’accentua quan és del v. ‘ser’.
Context: ... Pa 5 U 6 Np 6 Pu 3 Am 3 Cm 3 Bk 3 Cf 3 Es 3 Fm Md No Lr
(ES)
smact/data/oxidation_states_icsd24_raw.txt
[misspelling] ~3-~3: Söz ve sayı arasında defis yoqtur: "V-2"
Context: ...idation state set # Source: ICSD (2024, V2) # H -1 1 2 3 He Li 1 Be 2 B -5 -3 -2 -...
(NUMBER_BEFORE_DEFIS_MISSING)
smact/tests/files/test_icsd_oxidation_states_filter_1000.txt
[style] ~3-~3: Forma secundària o antiga.
Context: ...ate set # Source: ICSD (2024), filtered for > 1000 reports # # Testing writing of I...
(FOR)
[grammar] ~9-~9: Només va sense accent quan és un animal o un nom de lletra. Escriviu «bé» (adverbi) o «ve» (del v. ‘venir’).
Context: ...oxidation states list. # H -1 1 He Li 1 Be 2 B 3 C -2 2 3 4 N -3 5 O -2 F -1 Ne Na...
(BE)
[grammar] ~44-~44: Si és conjunció, cal escriure «I».
Context: ... 4 As -3 3 5 Se -2 4 Br -1 Kr Rb 1 Sr 2 Y 3 Zr 4 Nb 5 Mo 6 Tc Ru Rh Pd Ag 1 Cd 2 ...
(Y_CONJUNCIO)
[grammar] ~72-~72: ¿Volíeu dir «o» (conjunció) en comptes de «ho» (pronom)?
Context: ...4 Pr 3 Nd 3 Pm Sm 3 Eu 3 Gd 3 Tb 3 Dy 3 Ho 3 Er 3 Tm Yb 3 Lu Hf Ta 5 W 6 Re Os Ir ...
(FINS_I_TOT)
[grammar] ~81-~81: ¿Volíeu dir «us»?
Context: ...y 3 Ho 3 Er 3 Tm Yb 3 Lu Hf Ta 5 W 6 Re Os Ir Pt Au Hg 2 Tl 1 Pb 2 Bi 3 Po At Rn F...
(OS_US)
[grammar] ~82-~82: Paraula desconeguda com a nom propi.
Context: ... Ho 3 Er 3 Tm Yb 3 Lu Hf Ta 5 W 6 Re Os Ir Pt Au Hg 2 Tl 1 Pb 2 Bi 3 Po At Rn Fr R...
(VERBS_NOMSPROPIS)
[grammar] ~100-~100: ¿Volíeu dir «amb»?
Context: ... Bi 3 Po At Rn Fr Ra Ac Th Pa U 6 Np Pu Am Cm Bk Cf Es Fm Md No Lr
(AM_AMB)
[grammar] ~104-~104: S’accentua quan és del v. ‘ser’.
Context: ...Rn Fr Ra Ac Th Pa U 6 Np Pu Am Cm Bk Cf Es Fm Md No Lr
(ES)
smact/tests/files/test_icsd_oxidation_states_filter_1000_w_0_ox_state.txt
[misspelling] ~32-~32: ‘Mn’ zou fout kunnen zijn. Misschien bedoelt u: “Mijn”, “M'n”, “M.n.”, “Mangaan”
Context: ...1 Ca 0 2 Sc 0 3 Ti 0 4 V 0 3 4 5 Cr 0 3 Mn 0 2 3 4 Fe 0 2 3 Co 0 2 3 Ni 0 2 Cu 0 1...
(NL_SIMPLE_REPLACE_MN)
[misspelling] ~36-~36: ‘Cu’ zou fout kunnen zijn. Misschien bedoelt u: “koper”, “CU”, “see you”
Context: ...0 3 Mn 0 2 3 4 Fe 0 2 3 Co 0 2 3 Ni 0 2 Cu 0 1 2 Zn 0 2 Ga 0 3 Ge 0 4 As -3 0 3 5 ...
(NL_SIMPLE_REPLACE_CU)
[style] ~48-~48: Nota bene wordt afgekort als “N.B.”
Context: ...2 0 4 Br -1 Kr Rb 1 Sr 0 2 Y 0 3 Zr 0 4 Nb 0 5 Mo 0 6 Tc Ru 0 Rh 0 Pd 0 Ag 0 1 Cd ...
(NB_2)
[uncategorized] ~84-~84: Vergeet de punt niet als het een afkorting is: “Ir.” Schrijf het liever voluit.
Context: ... 3 Tm 0 Yb 0 3 Lu Hf 0 Ta 0 5 W 6 Re Os Ir 0 Pt 0 Au 0 Hg 2 Tl 1 Pb 2 Bi 0 3 Po At...
(AFKO_PUNT)
[uncategorized] ~94-~94: Vergeet de punt niet als het een afkorting is: “Fr.” Schrijf het liever voluit.
Context: ...t 0 Au 0 Hg 2 Tl 1 Pb 2 Bi 0 3 Po At Rn Fr Ra Ac Th Pa U 0 6 Np Pu Am Cm Bk Cf Es ...
(AFKO_PUNT)
🪛 Ruff
smact/tests/test_utils.py
5-5: pytest
imported but unused
Remove unused import: pytest
(F401)
47-47: Use a context manager for opening files
(SIM115)
48-48: Use a context manager for opening files
(SIM115)
51-51: Use a context manager for opening files
(SIM115)
52-52: Use a context manager for opening files
(SIM115)
🔇 Additional comments (5)
smact/structure_prediction/utilities.py (1)
4-4
: LGTM: Clean and specific imports
The imports are well-structured, using specific imports rather than wildcards, which is good practice for maintainability and clarity.
Also applies to: 6-6
smact/tests/test_utils.py (2)
9-15
: LGTM! Well-structured file path handling
The test file paths are correctly constructed using os.path.join
and relative path resolution, ensuring cross-platform compatibility.
64-75
: LGTM! Comprehensive DataFrame validation
The test thoroughly validates the structure and content of the species occurrences DataFrame, including shape, column names, and specific value checks.
smact/data_loader.py (2)
325-378
: Implementation looks good, suggest enhancing documentation.
The implementation follows the established pattern and correctly implements caching. Consider enhancing the docstring with:
- The specific ICSD version/date used for data mining
- A reference to documentation explaining the filtering criteria
def lookup_element_oxidation_states_icsd24(symbol, copy=True):
"""
Retrieve a list of known oxidation states for an element.
The oxidation states list used contains only those found
- in the 2024 version of the ICSD (and has >=5 reports).
+ in the ICSD version 09.24 (September 2024) and has >=5 reports.
+
+ For details on the filtering criteria and data mining methodology,
+ see the documentation at docs/oxidation_states.md
✅ Verification successful
Oxidation states data file verified and well-formatted
The data file exists at the expected location and follows a clean format with proper header documentation. The file header confirms it's sourced from ICSD 2024 V2 with the filtering criteria (>5 reports) matching the function's documentation.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify the existence of the data file and its format
if [ -f "smact/data/oxidation_states_icsd24_filtered.txt" ]; then
echo "File exists. Checking format..."
head -n 5 "smact/data/oxidation_states_icsd24_filtered.txt"
else
echo "Error: Required data file not found!"
fi
Length of output: 278
405-405
: Verify breaking change documentation for HHI file rename.
The renaming from "HHIs.txt" to "hhi.txt" follows better naming conventions but constitutes a breaking change.
if not filename.endswith(".txt"): | ||
filename += ".txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure filename handles PathLike
objects correctly.
When checking the file extension, if filename
is a PathLike
object, the .endswith()
method may not work as expected.
Convert filename
to a string before checking:
-if not filename.endswith(".txt"):
+filename = str(filename)
+if not filename.endswith(".txt"):
filename += ".txt"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if not filename.endswith(".txt"): | |
filename += ".txt" | |
filename = str(filename) | |
if not filename.endswith(".txt"): | |
filename += ".txt" |
smact/utils/oxidation.py
Outdated
filtered_df[(filtered_df["results_count"] > 0)] | ||
.groupby("element") | ||
.apply( | ||
self._filter_oxidation_states, threshold, include_groups=False | ||
) | ||
.reset_index() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect argument in the apply
function call.
The apply
method is passing an unexpected keyword argument include_groups=False
to _filter_oxidation_states
, which does not accept this parameter. This will result in a TypeError
.
Apply this diff to remove the invalid argument:
.apply(
- self._filter_oxidation_states, threshold, include_groups=False
+ self._filter_oxidation_states, threshold
)
Committable suggestion skipped: line range outside the PR's diff.
smact/__init__.py
Outdated
import itertools | ||
import warnings | ||
from math import gcd | ||
from operator import mul as multiply | ||
from os import path | ||
from typing import Iterable, List, Optional, Sequence, Tuple, Union | ||
from typing import Iterable, List, Optional, Tuple, Union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Removal of Sequence
import may cause issues
The removal of Sequence
from smact/__init__.py
could be problematic as it's actively used in structure_prediction/database.py
with type hints. Since database.py
imports Sequence
directly from typing
, the removal is safe, but we should maintain consistency in import patterns across the codebase.
structure_prediction/database.py
usesSequence[SmactStructure]
type hints- Consider keeping the
Sequence
import in__init__.py
for API consistency
🔗 Analysis chain
Verify removal of Sequence import
The removal of the unused Sequence
import is good practice. However, as this is a public API, we should verify that this doesn't affect downstream code that might be importing it from this module.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any downstream code that might be importing Sequence from smact
rg -l "from smact import.*Sequence" || echo "No direct imports found"
Length of output: 94
Script:
#!/bin/bash
# Let's verify if Sequence is used in type hints within the codebase
rg "Sequence\[" || echo "No Sequence type hints found"
# Also check if it's imported differently
rg "from typing import.*Sequence" || echo "No Sequence imports found"
Length of output: 250
( | ||
"oxidation_states_icsd24", | ||
data_loader.lookup_element_oxidation_states_icsd24(symbol), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Implementation for ICSD24 oxidation states is missing
The verification reveals critical issues:
- The
lookup_element_oxidation_states_icsd24
function implementation is not found in the codebase - The required data file
oxidation_states_icsd24.txt
is missing - Error handling in the data loader is minimal
This change cannot be safely integrated without:
- Implementing the missing data loader function
- Adding the required data file
- Including proper error handling similar to the existing ICSD implementation
🔗 Analysis chain
Verify data loader implementation and error handling
The initialization of the new oxidation states attribute follows the established pattern. However, as this is a breaking change, please ensure:
- The
lookup_element_oxidation_states_icsd24
function in data_loader handles missing data gracefully - The new attribute doesn't affect existing functionality that might be checking for oxidation states availability
Let me gather more information about the data loader implementation for the new ICSD24 function.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check data loader implementation
ast-grep --pattern 'def lookup_element_oxidation_states_icsd24($_) {
$$$
}'
# Check for any existing code that might be affected
rg -A 3 "oxidation_states_icsd\W"
Length of output: 1925
Script:
#!/bin/bash
# Check implementation of the new ICSD24 function
ast-grep --pattern 'def lookup_element_oxidation_states_icsd24($_) {
$$$
}'
# Check for the data file
fd "oxidation_states_icsd24.txt"
# Check for any error handling patterns in data_loader
rg -A 3 "try:|except:|raise" smact/data_loader.py
Length of output: 243
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (4)
smact/structure_prediction/utilities.py (1)
55-56
: Remove blank line after Args sectionTo comply with docstring formatting standards, remove the blank line after the Args section.
Apply this diff:
Args: - species (tuple[str,int]): A tuple of (element, signed_charge).
🧰 Tools
🪛 Ruff
55-55: No blank lines allowed between a section header and its content ("Args")
Remove blank line(s)
(D412)
smact/tests/test_utils.py (2)
99-105
: Consider using pathlib for more robust path handlingWhilst the current path handling works, using
pathlib
would provide a more modern and cross-platform approach.-files_dir = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files") -TEST_ICSD_OX_STATES = os.path.join( - files_dir, "test_icsd_oxidation_states_filter_1000.txt" -) -TEST_ICSD_OX_STATES_W_ZERO = os.path.join( - files_dir, "test_icsd_oxidation_states_filter_1000_w_0_ox_state.txt" -) +from pathlib import Path + +files_dir = Path(__file__).parent / "files" +TEST_ICSD_OX_STATES = files_dir / "test_icsd_oxidation_states_filter_1000.txt" +TEST_ICSD_OX_STATES_W_ZERO = files_dir / "test_icsd_oxidation_states_filter_1000_w_0_ox_state.txt"
116-126
: Complete the commented out assertionThere's an incomplete assertion that's been commented out. Either complete the assertion or remove the comment if it's no longer needed.
🧰 Tools
🪛 Ruff
117-117: Undefined name
pd
(F821)
121-121: Undefined name
pd
(F821)
smact/data_loader.py (1)
304-354
: Style improvements neededSeveral style improvements can be made:
- Remove blank line after docstring
- Simplify list comprehension
- Use f-string for warning message
Apply these changes:
def lookup_element_oxidation_states_icsd24(symbol, copy=True): """ Retrieve a list of known oxidation states for an element. The oxidation states list used contains only those found in the 2024 version of the ICSD (and has >=5 reports). Args: symbol (str) : the atomic symbol of the element to look up. copy (Optional(bool)): if True (default), return a copy of the oxidation-state list, rather than a reference to the cached data -- only use copy=False in performance-sensitive code and where the list will not be modified! Returns: list: List of known oxidation states for the element. Returns None if oxidation states for the Element were not found in the external data. """ - global _el_ox_states_icsd24 if _el_ox_states_icsd24 is None: _el_ox_states_icsd24 = {} for items in _get_data_rows( os.path.join(data_directory, "oxidation_states_icsd24_filtered.txt") ): _el_ox_states_icsd24[items[0]] = [ int(oxidationState) for oxidationState in items[1:] ] if symbol in _el_ox_states_icsd24: if copy: # _el_ox_states_icsd24 stores lists -> if copy is set, make an implicit # deep copy. The elements of the lists are integers, which are # "value types" in Python. - return [ - oxidationState - for oxidationState in _el_ox_states_icsd24[symbol] - ] + return list(_el_ox_states_icsd24[symbol]) else: return _el_ox_states_icsd24[symbol] else: if _print_warnings: - print( - "WARNING: Oxidation states for element {} " - "not found.".format(symbol) - ) + print(f"WARNING: Oxidation states for element {symbol} not found.") return None🧰 Tools
🪛 Ruff
305-322: No blank lines allowed after function docstring (found 1)
Remove blank line(s) after function docstring
(D202)
342-345: Unnecessary
list
comprehension (rewrite usinglist()
)Rewrite using
list()
(C416)
351-352: Use f-string instead of
format
callConvert to f-string
(UP032)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
smact/__init__.py
(3 hunks)smact/data_loader.py
(2 hunks)smact/structure_prediction/utilities.py
(4 hunks)smact/tests/test_utils.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- smact/init.py
🧰 Additional context used
🪛 Ruff
smact/data_loader.py
305-322: No blank lines allowed after function docstring (found 1)
Remove blank line(s) after function docstring
(D202)
342-345: Unnecessary list
comprehension (rewrite using list()
)
Rewrite using list()
(C416)
351-352: Use f-string instead of format
call
Convert to f-string
(UP032)
smact/structure_prediction/utilities.py
5-5: Move third-party import pymatgen.core.structure.Structure
into a type-checking block
Move into type-checking block
(TCH002)
55-55: No blank lines allowed between a section header and its content ("Args")
Remove blank line(s)
(D412)
smact/tests/test_utils.py
97-97: Module level import not at top of file
(E402)
117-117: Undefined name pd
(F821)
121-121: Undefined name pd
(F821)
137-137: Unnecessary list
comprehension (rewrite using list()
)
Rewrite using list()
(C416)
137-137: Use a context manager for opening files
(SIM115)
137-139: Unnecessary list
comprehension (rewrite using list()
)
Rewrite using list()
(C416)
138-138: Use a context manager for opening files
(SIM115)
141-141: Unnecessary list
comprehension (rewrite using list()
)
Rewrite using list()
(C416)
141-141: Use a context manager for opening files
(SIM115)
141-143: Unnecessary list
comprehension (rewrite using list()
)
Rewrite using list()
(C416)
142-142: Use a context manager for opening files
(SIM115)
156-156: Undefined name pd
(F821)
🔇 Additional comments (7)
smact/structure_prediction/utilities.py (3)
49-50
: Implementation looks good!
The new include_one
parameter provides useful flexibility for formatting oxidation state strings, with proper handling of edge cases.
Also applies to: 73-76
102-103
: Type annotation improvement looks good!
The return type annotation is now more precise and correctly specifies all possible return types using modern Python type hinting syntax.
Line range hint 49-76
: Verify integration with ICSD24 dataset functionality
Since these changes are part of the oxidation states update, let's verify the integration with the new ICSD24 dataset functionality.
✅ Verification successful
No integration concerns with ICSD24 dataset functionality
The unparse_spec
function is well-integrated within the codebase and is used appropriately across multiple components:
- Core oxidation state handling in
smact/utils/oxidation.py
- Structure prediction functionality in
smact/structure_prediction/prediction.py
- Composition utilities in
smact/utils/composition.py
The new include_one
parameter is properly used in oxidation state handling, particularly in oxidation.py
where it's passed through as include_one_oxidation_state
. The changes maintain backwards compatibility while adding flexibility for oxidation state representation.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check usage of unparse_spec with new ICSD24 dataset
# Look for usage of unparse_spec in relation to ICSD24
rg -l "unparse_spec.*icsd24"
# Check for test coverage of the new parameter
rg -l "test.*unparse_spec.*include_one"
Length of output: 72
Script:
#!/bin/bash
# Let's broaden our search to understand the usage context better
# Check all direct usages of unparse_spec
rg "unparse_spec" -B 2 -A 2
# Look for test files related to this functionality
fd "test.*utilities.*py$"
# Check for ICSD-related files that might use this function
fd "icsd.*py$"
Length of output: 6260
🧰 Tools
🪛 Ruff
55-55: No blank lines allowed between a section header and its content ("Args")
Remove blank line(s)
(D412)
smact/tests/test_utils.py (2)
148-153
: LGTM! Well-structured test methods
The test methods test_oxidation_states_filter_species_list
and test_oxidation_states_filter_species_occurrences
are well-structured with clear assertions and good coverage of edge cases.
Also applies to: 154-165
109-115
: 🛠️ Refactor suggestion
Use context managers consistently for file operations
Improve file handling in the setUp
method by using context managers consistently.
def setUp(self):
self.ox_filter = ICSD24OxStatesFilter()
- with open(TEST_ICSD_OX_STATES) as f:
- self.test_ox_states = f.read()
- with open(TEST_ICSD_OX_STATES_W_ZERO) as f:
- self.test_ox_states_w_zero = f.read()
+ self.test_ox_states = TEST_ICSD_OX_STATES.read_text()
+ self.test_ox_states_w_zero = TEST_ICSD_OX_STATES_W_ZERO.read_text()
Likely invalid or redundant comment.
smact/data_loader.py (2)
301-302
: LGTM: Cache variable follows established pattern
The new global cache variable follows the consistent pattern used throughout the module.
304-354
: LGTM: Implementation follows established pattern
The function correctly implements the oxidation states lookup with proper caching mechanism and error handling.
🧰 Tools
🪛 Ruff
305-322: No blank lines allowed after function docstring (found 1)
Remove blank line(s) after function docstring
(D202)
342-345: Unnecessary list
comprehension (rewrite using list()
)
Rewrite using list()
(C416)
351-352: Use f-string instead of format
call
Convert to f-string
(UP032)
@@ -1,5 +1,8 @@ | |||
"""Miscellaneous tools for data parsing.""" | |||
|
|||
|
|||
|
|||
from pymatgen.core.structure import Structure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Move Structure
import to type-checking block
For better runtime performance and adherence to type-checking best practices, move the pymatgen import into the existing type-checking block.
Apply this diff:
-from pymatgen.core.structure import Structure
from __future__ import annotations
import re
from typing import TYPE_CHECKING
from . import logger
if TYPE_CHECKING:
import pymatgen
+ from pymatgen.core.structure import Structure
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from pymatgen.core.structure import Structure | |
from __future__ import annotations | |
import re | |
from typing import TYPE_CHECKING | |
from . import logger | |
if TYPE_CHECKING: | |
import pymatgen | |
from pymatgen.core.structure import Structure |
🧰 Tools
🪛 Ruff
5-5: Move third-party import pymatgen.core.structure.Structure
into a type-checking block
Move into type-checking block
(TCH002)
smact/tests/test_utils.py
Outdated
@@ -92,3 +93,73 @@ def test_convert_formula(self): | |||
@unittest.skipUnless(os.environ.get("MP_API_KEY"), "requires MP_API key to be set.") | |||
def test_download_compounds_with_mp_api(self): | |||
pass | |||
|
|||
from smact.utils.oxidation import ICSD24OxStatesFilter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Move import statement to the top of the file
The import statement for ICSD24OxStatesFilter
should be grouped with other imports at the top of the file, following PEP 8 guidelines.
from __future__ import annotations
import os
import unittest
from pymatgen.core import Composition
+import pandas as pd
from smact import Element
from smact.screening import smact_filter
from smact.utils.composition import comp_maker, formula_maker, parse_formula
from smact.utils.crystal_space import generate_composition_with_smact
+from smact.utils.oxidation import ICSD24OxStatesFilter
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from smact.utils.oxidation import ICSD24OxStatesFilter | |
from __future__ import annotations | |
import os | |
import unittest | |
from pymatgen.core import Composition | |
import pandas as pd | |
from smact import Element | |
from smact.screening import smact_filter | |
from smact.utils.composition import comp_maker, formula_maker, parse_formula | |
from smact.utils.crystal_space import generate_composition_with_smact | |
from smact.utils.oxidation import ICSD24OxStatesFilter |
🧰 Tools
🪛 Ruff
97-97: Module level import not at top of file
(E402)
smact/tests/test_utils.py
Outdated
def test_oxidation_states_write(self): | ||
threshold = 1000 | ||
filename = "test_ox_states" | ||
filename_w_zero = "test_ox_states_w_zero" | ||
comment = "Testing writing of ICSD 24 oxidation states list." | ||
self.ox_filter.write(filename, threshold, comment=comment) | ||
self.ox_filter.write( | ||
filename_w_zero, threshold, include_zero=True, comment=comment | ||
) | ||
self.assertTrue(os.path.exists(f"{filename}.txt")) | ||
assert [line for line in open(f"{filename}.txt")] == [ | ||
line for line in open(TEST_ICSD_OX_STATES) | ||
] | ||
self.assertTrue(os.path.exists(f"{filename}_w_zero.txt")) | ||
assert [line for line in open(f"{filename}_w_zero.txt")] == [ | ||
line for line in open(TEST_ICSD_OX_STATES_W_ZERO) | ||
] | ||
# Clean up | ||
os.remove(f"{filename}.txt") | ||
os.remove(f"{filename}_w_zero.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve file comparison logic
The current file comparison using list comprehensions can be simplified and made more robust.
def test_oxidation_states_write(self):
threshold = 1000
filename = "test_ox_states"
filename_w_zero = "test_ox_states_w_zero"
comment = "Testing writing of ICSD 24 oxidation states list."
self.ox_filter.write(filename, threshold, comment=comment)
self.ox_filter.write(
filename_w_zero, threshold, include_zero=True, comment=comment
)
- self.assertTrue(os.path.exists(f"{filename}.txt"))
- assert [line for line in open(f"{filename}.txt")] == [
- line for line in open(TEST_ICSD_OX_STATES)
- ]
- self.assertTrue(os.path.exists(f"{filename}_w_zero.txt"))
- assert [line for line in open(f"{filename}_w_zero.txt")] == [
- line for line in open(TEST_ICSD_OX_STATES_W_ZERO)
- ]
+ output_file = Path(f"{filename}.txt")
+ output_file_w_zero = Path(f"{filename}_w_zero.txt")
+
+ self.assertTrue(output_file.exists())
+ self.assertEqual(output_file.read_text(), TEST_ICSD_OX_STATES.read_text())
+
+ self.assertTrue(output_file_w_zero.exists())
+ self.assertEqual(
+ output_file_w_zero.read_text(),
+ TEST_ICSD_OX_STATES_W_ZERO.read_text()
+ )
# Clean up
- os.remove(f"{filename}.txt")
- os.remove(f"{filename}_w_zero.txt")
+ output_file.unlink()
+ output_file_w_zero.unlink()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def test_oxidation_states_write(self): | |
threshold = 1000 | |
filename = "test_ox_states" | |
filename_w_zero = "test_ox_states_w_zero" | |
comment = "Testing writing of ICSD 24 oxidation states list." | |
self.ox_filter.write(filename, threshold, comment=comment) | |
self.ox_filter.write( | |
filename_w_zero, threshold, include_zero=True, comment=comment | |
) | |
self.assertTrue(os.path.exists(f"{filename}.txt")) | |
assert [line for line in open(f"{filename}.txt")] == [ | |
line for line in open(TEST_ICSD_OX_STATES) | |
] | |
self.assertTrue(os.path.exists(f"{filename}_w_zero.txt")) | |
assert [line for line in open(f"{filename}_w_zero.txt")] == [ | |
line for line in open(TEST_ICSD_OX_STATES_W_ZERO) | |
] | |
# Clean up | |
os.remove(f"{filename}.txt") | |
os.remove(f"{filename}_w_zero.txt") | |
def test_oxidation_states_write(self): | |
threshold = 1000 | |
filename = "test_ox_states" | |
filename_w_zero = "test_ox_states_w_zero" | |
comment = "Testing writing of ICSD 24 oxidation states list." | |
self.ox_filter.write(filename, threshold, comment=comment) | |
self.ox_filter.write( | |
filename_w_zero, threshold, include_zero=True, comment=comment | |
) | |
output_file = Path(f"{filename}.txt") | |
output_file_w_zero = Path(f"{filename}_w_zero.txt") | |
self.assertTrue(output_file.exists()) | |
self.assertEqual(output_file.read_text(), TEST_ICSD_OX_STATES.read_text()) | |
self.assertTrue(output_file_w_zero.exists()) | |
self.assertEqual( | |
output_file_w_zero.read_text(), | |
TEST_ICSD_OX_STATES_W_ZERO.read_text() | |
) | |
# Clean up | |
output_file.unlink() | |
output_file_w_zero.unlink() |
🧰 Tools
🪛 Ruff
137-137: Unnecessary list
comprehension (rewrite using list()
)
Rewrite using list()
(C416)
137-137: Use a context manager for opening files
(SIM115)
137-139: Unnecessary list
comprehension (rewrite using list()
)
Rewrite using list()
(C416)
138-138: Use a context manager for opening files
(SIM115)
141-141: Unnecessary list
comprehension (rewrite using list()
)
Rewrite using list()
(C416)
141-141: Use a context manager for opening files
(SIM115)
141-143: Unnecessary list
comprehension (rewrite using list()
)
Rewrite using list()
(C416)
142-142: Use a context manager for opening files
(SIM115)
@@ -324,7 +380,7 @@ def lookup_element_hhis(symbol): | |||
if _element_hhis is None: | |||
_element_hhis = {} | |||
|
|||
with open(os.path.join(data_directory, "HHIs.txt")) as file: | |||
with open(os.path.join(data_directory, "hhi.txt")) as file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Breaking change: HHI file renamed without backward compatibility
The git history confirms this is a breaking change where HHIs.txt
was renamed to hhi.txt
. This change will affect existing installations as:
- The file has been consistently named
HHIs.txt
throughout the project's history - Multiple historical references show active usage of
HHIs.txt
- No fallback mechanism exists to handle the old filename
To maintain compatibility, consider either:
- Keeping the original filename
HHIs.txt
- Adding a fallback mechanism to check for both filenames
- Documenting this as a breaking change in release notes
🔗 Analysis chain
Verify the HHI filename change impact
The change from "HHIs.txt" to "hhi.txt" is a breaking change that could affect existing installations.
Let's verify if the new file exists and if there are any other references to the old filename:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify HHI filename change impact
# Check if the new file exists
if [ ! -f "smact/data/hhi.txt" ]; then
echo "WARNING: New HHI file 'hhi.txt' not found"
fi
# Search for any remaining references to the old filename
rg -i "HHIs?.txt"
Length of output: 144
Script:
#!/bin/bash
# Let's check git history to see if this is a new change and if the old file existed
git log -p --all | rg -i "HHIs?.txt"
# Also check if there are any references to HHI in tests or documentation
rg -i "hhi" -g "!*.txt"
Length of output: 2235
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #329 +/- ##
===========================================
+ Coverage 72.42% 73.56% +1.14%
===========================================
Files 30 31 +1
Lines 2415 2531 +116
===========================================
+ Hits 1749 1862 +113
- Misses 666 669 +3 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (5)
smact/structure_prediction/utilities.py (1)
Line range hint
46-72
: Add examples for the new include_one parameterThe function implementation looks good, but the docstring examples should demonstrate the behaviour of the new
include_one
parameter.Add these examples to the docstring:
Examples: -------- >>> unparse_spec(("Fe", 2)) 'Fe2+' >>> unparse_spec(("O", -2)) 'O2-' + >>> unparse_spec(("Na", 1)) + 'Na1+' + >>> unparse_spec(("Na", 1), include_one=False) + 'Na+'smact/tests/test_utils.py (4)
104-111
: Add error handling for test file operations.The setUp method should handle cases where test files might be missing or inaccessible.
def setUp(self): self.ox_filter = ICSD24OxStatesFilter() - with open(TEST_ICSD_OX_STATES) as f: - self.test_ox_states = f.read() - with open(TEST_ICSD_OX_STATES_W_ZERO) as f: - self.test_ox_states_w_zero = f.read() + try: + with open(TEST_ICSD_OX_STATES) as f: + self.test_ox_states = f.read() + with open(TEST_ICSD_OX_STATES_W_ZERO) as f: + self.test_ox_states_w_zero = f.read() + except FileNotFoundError as e: + raise unittest.SkipTest(f"Required test file not found: {e.filename}")
119-119
: Remove or complete the commented assertion.This incomplete assertion should either be removed or completed with proper test logic.
121-138
: Improve file cleanup reliability.The cleanup section should handle cases where file deletion might fail and use context manager pattern for better resource management.
def test_oxidation_states_write(self): + test_files = [] threshold = 1000 filename = "test_ox_states" filename_w_zero = "test_ox_states_w_zero" comment = "Testing writing of ICSD 24 oxidation states list." - self.ox_filter.write(filename, threshold, comment=comment) - self.ox_filter.write(filename_w_zero, threshold, include_zero=True, comment=comment) - self.assertTrue(os.path.exists(f"{filename}.txt")) - with open(f"{filename}.txt") as f: - self.assertEqual(f.read(), self.test_ox_states) + try: + self.ox_filter.write(filename, threshold, comment=comment) + test_files.append(f"{filename}.txt") + self.ox_filter.write(filename_w_zero, threshold, include_zero=True, comment=comment) + test_files.append(f"{filename}_w_zero.txt") - self.assertTrue(os.path.exists(f"{filename}_w_zero.txt")) - with open(f"{filename}_w_zero.txt") as f: - self.assertEqual(f.read(), self.test_ox_states_w_zero) - # Clean up - os.remove(f"{filename}.txt") - os.remove(f"{filename}_w_zero.txt") + self.assertTrue(os.path.exists(test_files[0])) + with open(test_files[0]) as f: + self.assertEqual(f.read(), self.test_ox_states) + + self.assertTrue(os.path.exists(test_files[1])) + with open(test_files[1]) as f: + self.assertEqual(f.read(), self.test_ox_states_w_zero) + finally: + # Clean up + for file in test_files: + try: + if os.path.exists(file): + os.remove(file) + except OSError as e: + print(f"Warning: Failed to remove test file {file}: {e}")
139-154
: Consider making test values more maintainable.The hard-coded test values (e.g., lengths and specific counts) could make tests brittle. Consider moving these to class-level constants or loading from test data files.
class OxidationStatesTest(unittest.TestCase): + # Test constants + THRESHOLD_LENGTHS = { + 0: 490, + 5: 358, + 50: 227 + } + EXPECTED_FIRST_SPECIES = "O2-" + EXPECTED_FIRST_COUNT = 116910 + def test_oxidation_states_filter_species_list(self): - for threshold, length in [(0, 490), (5, 358), (50, 227)]: + for threshold, length in self.THRESHOLD_LENGTHS.items(): species_list = self.ox_filter.get_species_list(threshold) self.assertIsInstance(species_list, list) self.assertEqual(len(species_list), length)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
paper.bib
(6 hunks)smact/__init__.py
(2 hunks)smact/data_loader.py
(2 hunks)smact/structure_prediction/utilities.py
(4 hunks)smact/tests/test_utils.py
(2 hunks)smact/utils/oxidation.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- paper.bib
- smact/utils/oxidation.py
🧰 Additional context used
🪛 GitHub Check: codecov/patch
smact/data_loader.py
[warning] 339-339: smact/data_loader.py#L339
Added line #L339 was not covered by tests
[warning] 341-343: smact/data_loader.py#L341-L343
Added lines #L341 - L343 were not covered by tests
smact/structure_prediction/utilities.py
[warning] 11-11: smact/structure_prediction/utilities.py#L11
Added line #L11 was not covered by tests
🔇 Additional comments (7)
smact/structure_prediction/utilities.py (2)
11-11
: LGTM! Import moved to type-checking block
The Structure import has been correctly moved to the type-checking block, which optimises runtime performance.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 11-11: smact/structure_prediction/utilities.py#L11
Added line #L11 was not covered by tests
98-99
: LGTM! Type annotation simplified
The return type annotation has been appropriately simplified whilst maintaining type safety through the TYPE_CHECKING import.
smact/tests/test_utils.py (1)
6-13
: LGTM! Import ordering and file path setup look good.
The imports are properly organised and the test file paths are defined using platform-independent path joining.
Also applies to: 99-101
smact/__init__.py (2)
73-74
: Documentation looks good!
The documentation for the new oxidation_states_icsd24
attribute follows the established pattern and clearly indicates the data source and version.
200-203
: Verify the ICSD24 implementation completeness
Whilst the attribute initialisation follows the correct pattern, we need to verify that the supporting components are in place.
✅ Verification successful
Let me gather more information about the data loader implementation.
Let me check the content of the data files and the complete implementation.
ICSD24 implementation is complete and properly supported
The implementation is fully supported with:
- A properly implemented data loader function with documentation and error handling
- Required data files present in the correct location:
smact/data/oxidation_states_icsd24_filtered.txt
(main data file)- Additional supporting files for raw data and counts
- Correct data format in the filtered file, following the established pattern
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if the data loader function is implemented
ast-grep --pattern 'def lookup_element_oxidation_states_icsd24($_) {
$$$
}'
# Check if the required data file exists
fd "oxidation_states_icsd24.txt"
# Check error handling in the data loader
rg "lookup_element_oxidation_states_icsd24" smact/data_loader.py
Length of output: 240
Script:
#!/bin/bash
# Check the implementation details of the data loader function
rg -A 10 "lookup_element_oxidation_states_icsd24" smact/data_loader.py
# Check if there's a data file in the data directory
fd -g "*icsd24*" smact/data/
# Check for any test coverage
rg "lookup_element_oxidation_states_icsd24" -g "test_*.py"
Length of output: 900
Script:
#!/bin/bash
# Check the complete implementation of the data loader function
rg -A 20 "lookup_element_oxidation_states_icsd24" smact/data_loader.py
# Check the content format of the filtered data file
head -n 5 smact/data/oxidation_states_icsd24_filtered.txt
# Check if the function is properly loading the data file
rg "oxidation_states_icsd24" smact/data_loader.py
Length of output: 1283
smact/data_loader.py (2)
372-372
: Skip comment about HHI filename.
This change has already been thoroughly addressed in previous reviews.
301-343
: LGTM! The implementation follows established patterns.
The new function lookup_element_oxidation_states_icsd24
is well-documented and follows the consistent pattern established by other oxidation state lookup functions in the module.
Let's verify the existence of the new data file:
✅ Verification successful
Implementation verified and data file present
The new function lookup_element_oxidation_states_icsd24
is properly implemented with its required data file oxidation_states_icsd24_filtered.txt
present in the correct location at smact/data/
. The implementation follows the established pattern and the data file exists alongside other oxidation state datasets, indicating a well-organised addition to the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the new ICSD24 data file exists
if [ ! -f "smact/data/oxidation_states_icsd24_filtered.txt" ]; then
echo "ERROR: Required data file 'oxidation_states_icsd24_filtered.txt' not found"
fi
Length of output: 64
Script:
#!/bin/bash
# Let's verify the data file location and content more thoroughly
echo "Searching for the ICSD24 data file:"
fd "oxidation_states_icsd24_filtered.txt"
echo -e "\nChecking data directory contents:"
ls -la smact/data/
echo -e "\nChecking if the file exists in any other locations:"
rg -l "oxidation_states_icsd24_filtered.txt"
Length of output: 2413
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 339-339: smact/data_loader.py#L339
Added line #L339 was not covered by tests
[warning] 341-343: smact/data_loader.py#L341-L343
Added lines #L341 - L343 were not covered by tests
return _el_ox_states_icsd24[symbol] | ||
else: | ||
if _print_warnings: | ||
print(f"WARNING: Oxidation states for element {symbol} " "not found.") | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add test coverage for error handling.
The error handling path for non-existent elements lacks test coverage. Consider adding test cases to verify the behaviour when:
- The element symbol doesn't exist in the dataset
- The warning message is printed when
_print_warnings
is enabled
Would you like me to help generate the test cases for these scenarios?
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 339-339: smact/data_loader.py#L339
Added line #L339 was not covered by tests
[warning] 341-343: smact/data_loader.py#L341-L343
Added lines #L341 - L343 were not covered by tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you generate the test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AntObi, certainly! Here are the test cases covering the error handling for the lookup_element_oxidation_states_icsd24
function.
You can add these test methods to your test suite, possibly in smact/tests/test_data_loader.py
:
import unittest
from smact.data_loader import lookup_element_oxidation_states_icsd24, set_warnings
import io
import sys
class DataLoaderTest(unittest.TestCase):
def test_lookup_nonexistent_element(self):
# Test that the function returns None for a nonexistent element
result = lookup_element_oxidation_states_icsd24('Xx')
self.assertIsNone(result)
def test_warning_message_printed(self):
# Capture the output
captured_output = io.StringIO()
sys.stdout = captured_output
# Enable warnings
set_warnings(True)
# Invoke the function with a nonexistent element
lookup_result = lookup_element_oxidation_states_icsd24('Xx')
# Disable warnings
set_warnings(False)
# Restore stdout
sys.stdout = sys.__stdout__
# Check if the correct warning message was printed
self.assertIn('WARNING: Oxidation states for element Xx not found.', captured_output.getvalue())
# Ensure that the function returns None
self.assertIsNone(lookup_result)
Explanation:
test_lookup_nonexistent_element
: This test verifies that the function returnsNone
when an element symbol does not exist in the dataset.test_warning_message_printed
: This test enables the_print_warnings
flag usingset_warnings(True)
, captures the stdout, and checks if the appropriate warning message is printed when a nonexistent element is queried. It also ensures that the function returnsNone
in this case.
These tests ensure that the error handling in your new function works as expected.
Let me know if you need any further assistance!
Add new oxidation states to SMACT
Description
Type of change
How Has This Been Tested?
Test Configuration:
Reviewers
N/A
Checklist:
Summary by CodeRabbit
New Features
Bug Fixes
.bib
file.Documentation