Oxidation States 24 #329

AntObi · 2024-11-19T23:24:43Z

Add new oxidation states to SMACT

Description

This PR adds data-mined oxidation states from the ICSD (09.24)
Adds utility functions to generate a new oxidation states list from the data-mined counts of ICSD oxidation states

Type of change

New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

How Has This Been Tested?

Test Configuration:

Python version: 3.10
Operating System: macOS

Reviewers

N/A

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules
I have checked my code and corrected any misspellings

Summary by CodeRabbit

New Features
- Introduced multiple new data files containing comprehensive lists of oxidation states for various elements, sourced from the ICSD and Pymatgen.
- Added a new class for managing and filtering oxidation states, allowing users to filter data based on specified thresholds.
- Enhanced existing attributes in the Element class to reflect updated oxidation state data.
Bug Fixes
- Standardised formatting and consistency across bibliographic entries in the .bib file.
Documentation
- Updated header comments in data files to provide context and sources for the oxidation state data.

Added comment lines on dataset origins

Populated from ICSD API - raw count json - raw oxidation state list - filtered list with > 5 entries per oxidation state

Add set that lists the oxidation state with the most results for each element

Final list up to Lr

…oader

…o oxidationstates24

Regenerated ICSD24 dataset on oxidation states. API had returned 0 values for some species, whilst the web version showed results. Overcome by restarting the session for each search. "oxidation_states_icsd24_filtered" is recommended as the default

coderabbitai · 2024-11-19T23:24:52Z

Walkthrough

The pull request introduces several modifications across multiple files, primarily focusing on the standardisation and formatting of bibliographic entries in a .bib file, updates to oxidation state data files, and enhancements to the Element class in the smact module. Key changes include the addition of new data files for oxidation states, updates to existing attributes, and the introduction of a new class for filtering oxidation states. Additionally, several test files are added to ensure the functionality of these updates.

Changes

File Path	Change Summary
`paper.bib`	Modified bibliographic entries for formatting and standardisation of journal names and fields.
`smact/__init__.py`	Updated `Element` class: modified `oxidation_states_icsd` attribute and added `oxidation_states_icsd24`.
`smact/data/oxidation_states.txt`	Added header comment indicating manual compilation in 2014.
`smact/data/oxidation_states_SP.txt`	New file added containing oxidation states from the Pymatgen structure predictor (2017).
`smact/data/oxidation_states_icsd.txt`	New file added with oxidation states sourced from ICSD (2016).
`smact/data/oxidation_states_icsd24_common.txt`	New file added with common oxidation states from ICSD (2024).
`smact/data/oxidation_states_icsd24_filtered.txt`	New file added with filtered oxidation states from ICSD (2024).
`smact/data/oxidation_states_icsd24_raw.txt`	New file added with raw oxidation states from ICSD (2024).
`smact/data/oxidation_states_pmg.txt`	New file added with oxidation states sourced from Pymatgen (2017).
`smact/data/oxidation_states_wiki.txt`	Added header comment indicating Wikipedia as the source (2022).
`smact/data/oxidationstates.data`	Deleted file containing a list of oxidation states.
`smact/data_loader.py`	Added `lookup_element_oxidation_states_icsd24` function and modified `lookup_element_hhis` function.
`smact/structure_prediction/utilities.py`	Updated `unparse_spec` and `convert_next_gen_mprest_data` function signatures and logic.
`smact/tests/files/test_icsd_oxidation_states_filter_1000.txt`	New test file added for validating oxidation states filtering.
`smact/tests/files/test_icsd_oxidation_states_filter_1000_w_0_ox_state.txt`	New test file added for validating oxidation states with zero state.
`smact/tests/test_utils.py`	New file added containing unit tests for `ICSD24OxStatesFilter` class.
`smact/utils/oxidation.py`	New class `ICSD24OxStatesFilter` added for managing oxidation states filtering and processing.

Possibly related PRs

Add oxidation states option to SMACT validity function #282: Introduces a new parameter for selecting oxidation states in the smact_validity function, related to the standardisation in this PR.
Develop branch updates in preparation for new release #308: Updates handling of oxidation states in smact_filter and smact_validity functions, aligning with the focus on consistency in this PR.

Suggested labels

enhancement, feature, WIP, docs, python, refactor, dependencies

🐇 In the garden of data, we hop and play,
With oxidation states, we brighten the way.
Files bloom with knowledge, all neatly aligned,
In this patch of joy, new insights we find.
So let’s leap through the code, with a joyful cheer,
For the changes we’ve made, let’s celebrate here! 🌼

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (18)

smact/utils/oxidation.py (10)
2-3: Simplify imports from the 'os' module.

Importing both os and from os import path is redundant. Consider importing only the necessary components to improve code clarity.

Apply this diff to streamline the imports:
-import os
-from os import path
+import os
And update path.join(...) to os.path.join(...) throughout the code.

22-24: Add error handling for data loading in __init__.

Currently, the code does not handle potential errors when loading the JSON file. Adding error handling will make the class more robust against missing or corrupt data files.

Consider wrapping the data loading in a try-except block:
try:
    self.ox_states_df = pd.read_json(
        os.path.join(data_directory, "oxidation_states_icsd24_counts.json")
    )
except (FileNotFoundError, ValueError) as e:
    raise RuntimeError("Failed to load oxidation states data.") from e
62-77: Add return type annotation to get_species_list method.

Including a return type annotation improves code readability and helps with static analysis tools.

Apply this diff to add the return type:
 def get_species_list(
     self,
     threshold: int,
     include_zero: bool = False,
     include_one_oxidation_state: bool = False,
 ):
+    -> list[str]:
94-104: Specify return type in docstring and function signature.

The return type in the docstring is vague. Specify the return type as pd.DataFrame for clarity, and add a return type annotation to the function signature.

Update the docstring and signature:
     ):
+    -> pd.DataFrame:
         """Get the ICSD 24 oxidation states list as a DataFrame of species with their occurrences.

         Args:
             include_one_oxidation_state (bool): Include oxidation states +1 and -1 as ± in species notation. Default is False.
         Returns:
-            dataframe: The species list as a dataframe of species with their occurrences.
+            pd.DataFrame: A DataFrame containing species and their occurrence counts.
         """
106-109: Include zero oxidation states based on include_zero parameter.

In the get_species_occurrences_df method, zero oxidation states are always excluded. Consider adding an include_zero parameter to control this behaviour for consistency with other methods.

126-130: Use Optional for parameters that can be None.

The parameter comment can be None. For clarity, annotate it using Optional[str].

Apply this diff:
     filename: str | os.PathLike,
     threshold: int,
     include_zero: bool = False,
-    comment: str = None,
+    comment: Optional[str] = None,
 ):
Don't forget to import Optional from the typing module:
+from typing import Optional
158-168: Specify file encoding when writing to the file.

To ensure consistent behaviour across different environments, specify the file encoding when opening the file.

Apply this diff:
-with open(filename, "w") as f:
+with open(filename, "w", encoding="utf-8") as f:
169-171: Correct type annotation for group parameter in _filter_oxidation_states.

The type annotation for group should be pd.DataFrame instead of pd.DataFrame.groupby.

Apply this diff:
 def _filter_oxidation_states(
-    self, group: pd.DataFrame.groupby, threshold: int
+    self, group: pd.DataFrame, threshold: int
 ):
83-91: Handle potential non-integer oxidation states gracefully.

If ox_state cannot be converted to an integer, it is silently ignored. Consider logging a warning to aid in debugging unexpected data.

12-18: Revise class docstring to follow standard conventions.

Including an Attributes section in the class docstring is unconventional. Attribute documentation is typically placed in the __init__ method's docstring or as class variable docstrings.

Consider restructuring the docstring:
 """Class to handle filtering the ICSD 24 oxidation states list.

-The ICSD 24 oxidation states list is a list of oxidation states for each element in the ICSD 24 database.

-Attributes:
-    ox_states_df (pd.DataFrame): The ICSD 24 oxidation states list as a DataFrame.
 """
+    The ICSD 24 oxidation states list contains oxidation states for each element from the ICSD 24 database.
And add a docstring to __init__ if detailed attribute documentation is necessary.
smact/structure_prediction/utilities.py (2)
Line range hint 39-61: Enhance docstring with examples of the new parameter

The implementation looks good, but the docstring could be more comprehensive.

Consider adding examples demonstrating the include_one parameter:
     Examples:
         >>> unparse_spec(("Fe", 2))
         'Fe2+'
         >>> unparse_spec(("O", -2))
         'O2-'
+        >>> unparse_spec(("Na", 1))
+        'Na1+'
+        >>> unparse_spec(("Na", 1), include_one=False)
+        'Na+'
84-84: Consider enhancing error handling

The type hint update is good, but the error handling could be more specific.

Consider improving the error handling:
     try:
         return doc.dict(exclude={"fields_not_requested"})
-    except:
-        logger.warn(f"Could not convert input:\n {doc}\n to a dictionary.")
-        raise TypeError("Input is not an MPDataDoc object.")
+    except AttributeError as e:
+        logger.warning(f"Could not convert input:\n {doc}\n to a dictionary. Error: {e}")
+        raise TypeError("Input is not a valid MPDataDoc object") from e
+    except Exception as e:
+        logger.error(f"Unexpected error while converting document: {e}")
+        raise
Also applies to: 93-97
smact/tests/test_utils.py (5)
5-5: Remove unused import

The pytest import is not used in this test file. Since the tests are using the unittest framework, this import can be safely removed.
-import pytest
🧰 Tools

🪛 Ruff

5-5: pytest imported but unused

Remove unused import: pytest

(F401)

19-24: Use context managers for file operations

Whilst the setup is well-structured, it's recommended to use context managers for file operations to ensure proper resource handling.
 def setUp(self):
     self.ox_filter = ICSD24OxStatesFilter()
-    with open(TEST_ICSD_OX_STATES) as f:
-        self.test_ox_states = f.read()
-    with open(TEST_ICSD_OX_STATES_W_ZERO) as f:
-        self.test_ox_states_w_zero = f.read()
+    self.test_ox_states = TEST_ICSD_OX_STATES.read_text()
+    self.test_ox_states_w_zero = TEST_ICSD_OX_STATES_W_ZERO.read_text()
35-35: Complete or remove the commented assertion

There's an incomplete assertion that's been commented out. Either complete the assertion with the expected value or remove it if no longer needed.

47-53: Improve file comparison logic

The current file comparison could be improved by using context managers and a more efficient comparison method.
-        assert [line for line in open(f"{filename}.txt")] == [
-            line for line in open(TEST_ICSD_OX_STATES)
-        ]
-        self.assertTrue(os.path.exists(f"{filename}_w_zero.txt"))
-        assert [line for line in open(f"{filename}_w_zero.txt")] == [
-            line for line in open(TEST_ICSD_OX_STATES_W_ZERO)
-        ]
+        with open(f"{filename}.txt") as f1, open(TEST_ICSD_OX_STATES) as f2:
+            self.assertEqual(f1.read(), f2.read())
+        with open(f"{filename}_w_zero.txt") as f1, open(TEST_ICSD_OX_STATES_W_ZERO) as f2:
+            self.assertEqual(f1.read(), f2.read())
🧰 Tools

🪛 Ruff

47-47: Use a context manager for opening files

(SIM115)

48-48: Use a context manager for opening files

(SIM115)

51-51: Use a context manager for opening files

(SIM115)

52-52: Use a context manager for opening files

(SIM115)

59-62: Document the significance of threshold values

The test uses specific threshold values (0, 5, 50) and expects specific list lengths (490, 358, 227). Consider adding a comment explaining the significance of these values.
+        # Test various thresholds and their impact on the number of species:
+        # 0: All species (490)
+        # 5: Common species (358)
+        # 50: Very common species (227)
         for threshold, length in [(0, 490), (5, 358), (50, 227)]:
smact/__init__.py (1)
57-57: Maintain consistent documentation format

The documentation for oxidation states attributes is clear, but could be more consistent:

oxidation_states_icsd specifies "2016 version"

oxidation_states_icsd24 uses "2024 version"

Consider standardising the format:
-        Element.oxidation_states_icsd (list) : List of oxidation states that appear in the 2016 version of the ICSD
+        Element.oxidation_states_icsd (list): List of oxidation states from ICSD 2016
-        Element.oxidation_states_icsd24 (list): List of oxidation states that appear in the 2024 version of the ICSD
+        Element.oxidation_states_icsd24 (list): List of oxidation states from ICSD 2024
Also applies to: 63-64

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1636ad7 and 9125edb.

📒 Files selected for processing (17)

paper.bib (6 hunks)
smact/__init__.py (3 hunks)
smact/data/oxidation_states.txt (1 hunks)
smact/data/oxidation_states_SP.txt (1 hunks)
smact/data/oxidation_states_icsd.txt (1 hunks)
smact/data/oxidation_states_icsd24_common.txt (1 hunks)
smact/data/oxidation_states_icsd24_filtered.txt (1 hunks)
smact/data/oxidation_states_icsd24_raw.txt (1 hunks)
smact/data/oxidation_states_pmg.txt (1 hunks)
smact/data/oxidation_states_wiki.txt (1 hunks)
smact/data/oxidationstates.data (0 hunks)
smact/data_loader.py (2 hunks)
smact/structure_prediction/utilities.py (4 hunks)
smact/tests/files/test_icsd_oxidation_states_filter_1000.txt (1 hunks)
smact/tests/files/test_icsd_oxidation_states_filter_1000_w_0_ox_state.txt (1 hunks)
smact/tests/test_utils.py (1 hunks)
smact/utils/oxidation.py (1 hunks)

💤 Files with no reviewable changes (1)

smact/data/oxidationstates.data

✅ Files skipped from review due to trivial changes (7)

paper.bib
smact/data/oxidation_states.txt
smact/data/oxidation_states_SP.txt
smact/data/oxidation_states_icsd.txt
smact/data/oxidation_states_icsd24_filtered.txt
smact/data/oxidation_states_pmg.txt
smact/data/oxidation_states_wiki.txt

🧰 Additional context used

🪛 LanguageTool

smact/data/oxidation_states_icsd24_common.txt

[grammar] ~7-~7: Només va sense accent quan és un animal o un nom de lletra. Escriviu «bé» (adverbi) o «ve» (del v. ‘venir’).
Context: ...st common non-zero values # H 1 He Li 1 Be 2 B 3 C 4 N -3 O -2 F -1 Ne Na 1 Mg 2 A...

(BE)

[grammar] ~36-~36: Possible confusió. ¿Volíeu dir «als»?
Context: ...Mn 2 Fe 3 Co 2 Ni 2 Cu 2 Zn 2 Ga 3 Ge 4 As 5 Se -2 Br -1 Kr 2 Rb 1 Sr 2 Y 3 Zr 4 N...

(AS_ALS)

[grammar] ~42-~42: Si és conjunció, cal escriure «I».
Context: ... 3 Ge 4 As 5 Se -2 Br -1 Kr 2 Rb 1 Sr 2 Y 3 Zr 4 Nb 5 Mo 6 Tc 7 Ru 4 Rh 3 Pd 2 Ag...

(Y_CONJUNCIO)

[grammar] ~70-~70: ¿Volíeu dir «o» (conjunció) en comptes de «ho» (pronom)?
Context: ...Pr 3 Nd 3 Pm 3 Sm 3 Eu 3 Gd 3 Tb 3 Dy 3 Ho 3 Er 3 Tm 3 Yb 3 Lu 3 Hf 4 Ta 5 W 6 Re ...

(FINS_I_TOT)

[grammar] ~80-~80: Paraula desconeguda com a nom propi.
Context: ... Tm 3 Yb 3 Lu 3 Hf 4 Ta 5 W 6 Re 7 Os 5 Ir 4 Pt 4 Au 1 Hg 2 Tl 1 Pb 2 Bi 3 Po 4 At...

(VERBS_NOMSPROPIS)

[grammar] ~102-~102: S’accentua quan és del v. ‘ser’.
Context: ... Pa 5 U 6 Np 6 Pu 3 Am 3 Cm 3 Bk 3 Cf 3 Es 3 Fm Md No Lr

(ES)

smact/data/oxidation_states_icsd24_raw.txt

[misspelling] ~3-~3: Söz ve sayı arasında defis yoqtur: "V-2"
Context: ...idation state set # Source: ICSD (2024, V2) # H -1 1 2 3 He Li 1 Be 2 B -5 -3 -2 -...

(NUMBER_BEFORE_DEFIS_MISSING)

smact/tests/files/test_icsd_oxidation_states_filter_1000.txt

[style] ~3-~3: Forma secundària o antiga.
Context: ...ate set # Source: ICSD (2024), filtered for > 1000 reports # # Testing writing of I...

(FOR)

[grammar] ~9-~9: Només va sense accent quan és un animal o un nom de lletra. Escriviu «bé» (adverbi) o «ve» (del v. ‘venir’).
Context: ...oxidation states list. # H -1 1 He Li 1 Be 2 B 3 C -2 2 3 4 N -3 5 O -2 F -1 Ne Na...

(BE)

[grammar] ~44-~44: Si és conjunció, cal escriure «I».
Context: ... 4 As -3 3 5 Se -2 4 Br -1 Kr Rb 1 Sr 2 Y 3 Zr 4 Nb 5 Mo 6 Tc Ru Rh Pd Ag 1 Cd 2 ...

(Y_CONJUNCIO)

[grammar] ~72-~72: ¿Volíeu dir «o» (conjunció) en comptes de «ho» (pronom)?
Context: ...4 Pr 3 Nd 3 Pm Sm 3 Eu 3 Gd 3 Tb 3 Dy 3 Ho 3 Er 3 Tm Yb 3 Lu Hf Ta 5 W 6 Re Os Ir ...

(FINS_I_TOT)

[grammar] ~81-~81: ¿Volíeu dir «us»?
Context: ...y 3 Ho 3 Er 3 Tm Yb 3 Lu Hf Ta 5 W 6 Re Os Ir Pt Au Hg 2 Tl 1 Pb 2 Bi 3 Po At Rn F...

(OS_US)

[grammar] ~82-~82: Paraula desconeguda com a nom propi.
Context: ... Ho 3 Er 3 Tm Yb 3 Lu Hf Ta 5 W 6 Re Os Ir Pt Au Hg 2 Tl 1 Pb 2 Bi 3 Po At Rn Fr R...

(VERBS_NOMSPROPIS)

[grammar] ~100-~100: ¿Volíeu dir «amb»?
Context: ... Bi 3 Po At Rn Fr Ra Ac Th Pa U 6 Np Pu Am Cm Bk Cf Es Fm Md No Lr

(AM_AMB)

[grammar] ~104-~104: S’accentua quan és del v. ‘ser’.
Context: ...Rn Fr Ra Ac Th Pa U 6 Np Pu Am Cm Bk Cf Es Fm Md No Lr

(ES)

smact/tests/files/test_icsd_oxidation_states_filter_1000_w_0_ox_state.txt

[misspelling] ~32-~32: ‘Mn’ zou fout kunnen zijn. Misschien bedoelt u: “Mijn”, “M'n”, “M.n.”, “Mangaan”
Context: ...1 Ca 0 2 Sc 0 3 Ti 0 4 V 0 3 4 5 Cr 0 3 Mn 0 2 3 4 Fe 0 2 3 Co 0 2 3 Ni 0 2 Cu 0 1...

(NL_SIMPLE_REPLACE_MN)

[misspelling] ~36-~36: ‘Cu’ zou fout kunnen zijn. Misschien bedoelt u: “koper”, “CU”, “see you”
Context: ...0 3 Mn 0 2 3 4 Fe 0 2 3 Co 0 2 3 Ni 0 2 Cu 0 1 2 Zn 0 2 Ga 0 3 Ge 0 4 As -3 0 3 5 ...

(NL_SIMPLE_REPLACE_CU)

[style] ~48-~48: Nota bene wordt afgekort als “N.B.”
Context: ...2 0 4 Br -1 Kr Rb 1 Sr 0 2 Y 0 3 Zr 0 4 Nb 0 5 Mo 0 6 Tc Ru 0 Rh 0 Pd 0 Ag 0 1 Cd ...

(NB_2)

[uncategorized] ~84-~84: Vergeet de punt niet als het een afkorting is: “Ir.” Schrijf het liever voluit.
Context: ... 3 Tm 0 Yb 0 3 Lu Hf 0 Ta 0 5 W 6 Re Os Ir 0 Pt 0 Au 0 Hg 2 Tl 1 Pb 2 Bi 0 3 Po At...

(AFKO_PUNT)

[uncategorized] ~94-~94: Vergeet de punt niet als het een afkorting is: “Fr.” Schrijf het liever voluit.
Context: ...t 0 Au 0 Hg 2 Tl 1 Pb 2 Bi 0 3 Po At Rn Fr Ra Ac Th Pa U 0 6 Np Pu Am Cm Bk Cf Es ...

(AFKO_PUNT)

🪛 Ruff

smact/tests/test_utils.py

5-5: pytest imported but unused

Remove unused import: pytest

(F401)

47-47: Use a context manager for opening files

(SIM115)

48-48: Use a context manager for opening files

(SIM115)

51-51: Use a context manager for opening files

(SIM115)

52-52: Use a context manager for opening files

(SIM115)

🔇 Additional comments (5)

smact/structure_prediction/utilities.py (1)

4-4: LGTM: Clean and specific imports

The imports are well-structured, using specific imports rather than wildcards, which is good practice for maintainability and clarity.

Also applies to: 6-6

smact/tests/test_utils.py (2)

9-15: LGTM! Well-structured file path handling

The test file paths are correctly constructed using os.path.join and relative path resolution, ensuring cross-platform compatibility.

64-75: LGTM! Comprehensive DataFrame validation

The test thoroughly validates the structure and content of the species occurrences DataFrame, including shape, column names, and specific value checks.

smact/data_loader.py (2)

325-378: Implementation looks good, suggest enhancing documentation.

The implementation follows the established pattern and correctly implements caching. Consider enhancing the docstring with:

The specific ICSD version/date used for data mining
A reference to documentation explaining the filtering criteria

 def lookup_element_oxidation_states_icsd24(symbol, copy=True):
     """
     Retrieve a list of known oxidation states for an element.
     The oxidation states list used contains only those found
-    in the 2024 version of the ICSD (and has >=5 reports).
+    in the ICSD version 09.24 (September 2024) and has >=5 reports.
+
+    For details on the filtering criteria and data mining methodology,
+    see the documentation at docs/oxidation_states.md

✅ Verification successful

Oxidation states data file verified and well-formatted

The data file exists at the expected location and follows a clean format with proper header documentation. The file header confirms it's sourced from ICSD 2024 V2 with the filtering criteria (>5 reports) matching the function's documentation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify the existence of the data file and its format
if [ -f "smact/data/oxidation_states_icsd24_filtered.txt" ]; then
    echo "File exists. Checking format..."
    head -n 5 "smact/data/oxidation_states_icsd24_filtered.txt"
else
    echo "Error: Required data file not found!"
fi

Length of output: 278

405-405: Verify breaking change documentation for HHI file rename.

The renaming from "HHIs.txt" to "hhi.txt" follows better naming conventions but constitutes a breaking change.

coderabbitai · 2024-11-19T23:40:36Z

smact/utils/oxidation.py

+        if not filename.endswith(".txt"):
+            filename += ".txt"


⚠️ Potential issue

Ensure filename handles PathLike objects correctly.

When checking the file extension, if filename is a PathLike object, the .endswith() method may not work as expected.

Convert filename to a string before checking:

-if not filename.endswith(".txt"): +filename = str(filename) +if not filename.endswith(".txt"): filename += ".txt"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if not filename.endswith(".txt"):

filename += ".txt"

filename = str(filename)

if not filename.endswith(".txt"):

filename += ".txt"

coderabbitai · 2024-11-19T23:40:37Z

smact/utils/oxidation.py

+            filtered_df[(filtered_df["results_count"] > 0)]
+            .groupby("element")
+            .apply(
+                self._filter_oxidation_states, threshold, include_groups=False
+            )
+            .reset_index()
+        )


⚠️ Potential issue

Fix incorrect argument in the apply function call.

The apply method is passing an unexpected keyword argument include_groups=False to _filter_oxidation_states, which does not accept this parameter. This will result in a TypeError.

Apply this diff to remove the invalid argument:

.apply( - self._filter_oxidation_states, threshold, include_groups=False + self._filter_oxidation_states, threshold )

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-11-19T23:40:37Z

smact/__init__.py

 import itertools
 import warnings
 from math import gcd
 from operator import mul as multiply
 from os import path
-from typing import Iterable, List, Optional, Sequence, Tuple, Union
+from typing import Iterable, List, Optional, Tuple, Union


💡 Codebase verification

Removal of Sequence import may cause issues

The removal of Sequence from smact/__init__.py could be problematic as it's actively used in structure_prediction/database.py with type hints. Since database.py imports Sequence directly from typing, the removal is safe, but we should maintain consistency in import patterns across the codebase.

structure_prediction/database.py uses Sequence[SmactStructure] type hints

Consider keeping the Sequence import in __init__.py for API consistency

🔗 Analysis chain

Verify removal of Sequence import

The removal of the unused Sequence import is good practice. However, as this is a public API, we should verify that this doesn't affect downstream code that might be importing it from this module.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Search for any downstream code that might be importing Sequence from smact rg -l "from smact import.*Sequence" || echo "No direct imports found"

Length of output: 94

Script:

#!/bin/bash # Let's verify if Sequence is used in type hints within the codebase rg "Sequence\[" || echo "No Sequence type hints found" # Also check if it's imported differently rg "from typing import.*Sequence" || echo "No Sequence imports found"

Length of output: 250

coderabbitai · 2024-11-19T23:40:37Z

smact/__init__.py

+            (
+                "oxidation_states_icsd24",
+                data_loader.lookup_element_oxidation_states_icsd24(symbol),
+            ),


💡 Codebase verification

Implementation for ICSD24 oxidation states is missing

The verification reveals critical issues:

The lookup_element_oxidation_states_icsd24 function implementation is not found in the codebase

The required data file oxidation_states_icsd24.txt is missing

Error handling in the data loader is minimal

This change cannot be safely integrated without:

Implementing the missing data loader function

Adding the required data file

Including proper error handling similar to the existing ICSD implementation

🔗 Analysis chain

Verify data loader implementation and error handling

The initialization of the new oxidation states attribute follows the established pattern. However, as this is a breaking change, please ensure:

The lookup_element_oxidation_states_icsd24 function in data_loader handles missing data gracefully

The new attribute doesn't affect existing functionality that might be checking for oxidation states availability

Let me gather more information about the data loader implementation for the new ICSD24 function.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Check data loader implementation ast-grep --pattern 'def lookup_element_oxidation_states_icsd24($_) { $$$ }' # Check for any existing code that might be affected rg -A 3 "oxidation_states_icsd\W"

Length of output: 1925

Script:

#!/bin/bash # Check implementation of the new ICSD24 function ast-grep --pattern 'def lookup_element_oxidation_states_icsd24($_) { $$$ }' # Check for the data file fd "oxidation_states_icsd24.txt" # Check for any error handling patterns in data_loader rg -A 3 "try:|except:|raise" smact/data_loader.py

Length of output: 243

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (4)

smact/structure_prediction/utilities.py (1)

55-56: Remove blank line after Args section

To comply with docstring formatting standards, remove the blank line after the Args section.

Apply this diff:
    Args:
-

        species (tuple[str,int]): A tuple of (element, signed_charge).
🧰 Tools

🪛 Ruff

55-55: No blank lines allowed between a section header and its content ("Args")

Remove blank line(s)

(D412)

smact/tests/test_utils.py (2)

99-105: Consider using pathlib for more robust path handling

Whilst the current path handling works, using pathlib would provide a more modern and cross-platform approach.
-files_dir = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
-TEST_ICSD_OX_STATES = os.path.join(
-    files_dir, "test_icsd_oxidation_states_filter_1000.txt"
-)
-TEST_ICSD_OX_STATES_W_ZERO = os.path.join(
-    files_dir, "test_icsd_oxidation_states_filter_1000_w_0_ox_state.txt"
-)
+from pathlib import Path
+
+files_dir = Path(__file__).parent / "files"
+TEST_ICSD_OX_STATES = files_dir / "test_icsd_oxidation_states_filter_1000.txt"
+TEST_ICSD_OX_STATES_W_ZERO = files_dir / "test_icsd_oxidation_states_filter_1000_w_0_ox_state.txt"
116-126: Complete the commented out assertion

There's an incomplete assertion that's been commented out. Either complete the assertion or remove the comment if it's no longer needed.

🧰 Tools

🪛 Ruff

117-117: Undefined name pd

(F821)

121-121: Undefined name pd

(F821)

smact/data_loader.py (1)

304-354: Style improvements needed

Several style improvements can be made:

Remove blank line after docstring
Simplify list comprehension
Use f-string for warning message

Apply these changes:

 def lookup_element_oxidation_states_icsd24(symbol, copy=True):
     """
     Retrieve a list of known oxidation states for an element.
     The oxidation states list used contains only those found
     in the 2024 version of the ICSD (and has >=5 reports).

     Args:
         symbol (str) : the atomic symbol of the element to look up.
         copy (Optional(bool)): if True (default), return a copy of the
             oxidation-state list, rather than a reference to the cached
             data -- only use copy=False in performance-sensitive code
             and where the list will not be modified!

     Returns:
         list: List of known oxidation states for the element.

             Returns None if oxidation states for the Element were not
             found in the external data.
     """
-
     global _el_ox_states_icsd24

     if _el_ox_states_icsd24 is None:
         _el_ox_states_icsd24 = {}

         for items in _get_data_rows(
             os.path.join(data_directory, "oxidation_states_icsd24_filtered.txt")
         ):
             _el_ox_states_icsd24[items[0]] = [
                 int(oxidationState) for oxidationState in items[1:]
             ]

     if symbol in _el_ox_states_icsd24:
         if copy:
             # _el_ox_states_icsd24 stores lists -> if copy is set, make an implicit
             # deep copy.  The elements of the lists are integers, which are
             # "value types" in Python.
-            return [
-                oxidationState
-                for oxidationState in _el_ox_states_icsd24[symbol]
-            ]
+            return list(_el_ox_states_icsd24[symbol])
         else:
             return _el_ox_states_icsd24[symbol]
     else:
         if _print_warnings:
-            print(
-                "WARNING: Oxidation states for element {} "
-                "not found.".format(symbol)
-            )
+            print(f"WARNING: Oxidation states for element {symbol} not found.")
         return None

🧰 Tools

🪛 Ruff

305-322: No blank lines allowed after function docstring (found 1)

Remove blank line(s) after function docstring

(D202)

342-345: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)

351-352: Use f-string instead of format call

Convert to f-string

(UP032)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9125edb and 99acd4e.

📒 Files selected for processing (4)

smact/__init__.py (3 hunks)
smact/data_loader.py (2 hunks)
smact/structure_prediction/utilities.py (4 hunks)
smact/tests/test_utils.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

smact/init.py

🧰 Additional context used

🪛 Ruff

smact/data_loader.py

305-322: No blank lines allowed after function docstring (found 1)

Remove blank line(s) after function docstring

(D202)

342-345: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)

351-352: Use f-string instead of format call

Convert to f-string

(UP032)

smact/structure_prediction/utilities.py

5-5: Move third-party import pymatgen.core.structure.Structure into a type-checking block

Move into type-checking block

(TCH002)

55-55: No blank lines allowed between a section header and its content ("Args")

Remove blank line(s)

(D412)

smact/tests/test_utils.py

97-97: Module level import not at top of file

(E402)

117-117: Undefined name pd

(F821)

121-121: Undefined name pd

(F821)

137-137: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)

137-137: Use a context manager for opening files

(SIM115)

137-139: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)

138-138: Use a context manager for opening files

(SIM115)

141-141: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)

141-141: Use a context manager for opening files

(SIM115)

141-143: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)

142-142: Use a context manager for opening files

(SIM115)

156-156: Undefined name pd

(F821)

🔇 Additional comments (7)

smact/structure_prediction/utilities.py (3)

49-50: Implementation looks good!

The new include_one parameter provides useful flexibility for formatting oxidation state strings, with proper handling of edge cases.

Also applies to: 73-76

102-103: Type annotation improvement looks good!

The return type annotation is now more precise and correctly specifies all possible return types using modern Python type hinting syntax.

Line range hint 49-76: Verify integration with ICSD24 dataset functionality

Since these changes are part of the oxidation states update, let's verify the integration with the new ICSD24 dataset functionality.

✅ Verification successful

No integration concerns with ICSD24 dataset functionality

The unparse_spec function is well-integrated within the codebase and is used appropriately across multiple components:

Core oxidation state handling in smact/utils/oxidation.py
Structure prediction functionality in smact/structure_prediction/prediction.py
Composition utilities in smact/utils/composition.py

The new include_one parameter is properly used in oxidation state handling, particularly in oxidation.py where it's passed through as include_one_oxidation_state. The changes maintain backwards compatibility while adding flexibility for oxidation state representation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check usage of unparse_spec with new ICSD24 dataset

# Look for usage of unparse_spec in relation to ICSD24
rg -l "unparse_spec.*icsd24" 

# Check for test coverage of the new parameter
rg -l "test.*unparse_spec.*include_one"

Length of output: 72

Script:

#!/bin/bash
# Let's broaden our search to understand the usage context better

# Check all direct usages of unparse_spec
rg "unparse_spec" -B 2 -A 2

# Look for test files related to this functionality
fd "test.*utilities.*py$"

# Check for ICSD-related files that might use this function
fd "icsd.*py$"

Length of output: 6260

🧰 Tools

🪛 Ruff

55-55: No blank lines allowed between a section header and its content ("Args")

Remove blank line(s)

(D412)

smact/tests/test_utils.py (2)

148-153: LGTM! Well-structured test methods

The test methods test_oxidation_states_filter_species_list and test_oxidation_states_filter_species_occurrences are well-structured with clear assertions and good coverage of edge cases.

Also applies to: 154-165

109-115: 🛠️ Refactor suggestion

Use context managers consistently for file operations

Improve file handling in the setUp method by using context managers consistently.

def setUp(self):
    self.ox_filter = ICSD24OxStatesFilter()
-    with open(TEST_ICSD_OX_STATES) as f:
-        self.test_ox_states = f.read()
-    with open(TEST_ICSD_OX_STATES_W_ZERO) as f:
-        self.test_ox_states_w_zero = f.read()
+    self.test_ox_states = TEST_ICSD_OX_STATES.read_text()
+    self.test_ox_states_w_zero = TEST_ICSD_OX_STATES_W_ZERO.read_text()

Likely invalid or redundant comment.

smact/data_loader.py (2)

301-302: LGTM: Cache variable follows established pattern

The new global cache variable follows the consistent pattern used throughout the module.

304-354: LGTM: Implementation follows established pattern

The function correctly implements the oxidation states lookup with proper caching mechanism and error handling.

🧰 Tools

🪛 Ruff

305-322: No blank lines allowed after function docstring (found 1)

Remove blank line(s) after function docstring

(D202)

342-345: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)

351-352: Use f-string instead of format call

Convert to f-string

(UP032)

coderabbitai · 2024-11-20T00:10:34Z

smact/structure_prediction/utilities.py

@@ -1,5 +1,8 @@
 """Miscellaneous tools for data parsing."""

+
+
+from pymatgen.core.structure import Structure


🛠️ Refactor suggestion

Move Structure import to type-checking block

For better runtime performance and adherence to type-checking best practices, move the pymatgen import into the existing type-checking block.

Apply this diff:

-from pymatgen.core.structure import Structure from __future__ import annotations import re from typing import TYPE_CHECKING from . import logger if TYPE_CHECKING: import pymatgen + from pymatgen.core.structure import Structure

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from pymatgen.core.structure import Structure

from __future__ import annotations

import re

from typing import TYPE_CHECKING

from . import logger

if TYPE_CHECKING:

import pymatgen

from pymatgen.core.structure import Structure

🧰 Tools

🪛 Ruff

5-5: Move third-party import pymatgen.core.structure.Structure into a type-checking block

Move into type-checking block

(TCH002)

coderabbitai · 2024-11-20T00:10:34Z

smact/tests/test_utils.py

@@ -92,3 +93,73 @@ def test_convert_formula(self):
    @unittest.skipUnless(os.environ.get("MP_API_KEY"), "requires MP_API key to be set.")
    def test_download_compounds_with_mp_api(self):
        pass
+
+from smact.utils.oxidation import ICSD24OxStatesFilter


🛠️ Refactor suggestion

Move import statement to the top of the file

The import statement for ICSD24OxStatesFilter should be grouped with other imports at the top of the file, following PEP 8 guidelines.

from __future__ import annotations import os import unittest from pymatgen.core import Composition +import pandas as pd from smact import Element from smact.screening import smact_filter from smact.utils.composition import comp_maker, formula_maker, parse_formula from smact.utils.crystal_space import generate_composition_with_smact +from smact.utils.oxidation import ICSD24OxStatesFilter

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from smact.utils.oxidation import ICSD24OxStatesFilter

from __future__ import annotations

import os

import unittest

from pymatgen.core import Composition

import pandas as pd

from smact import Element

from smact.screening import smact_filter

from smact.utils.composition import comp_maker, formula_maker, parse_formula

from smact.utils.crystal_space import generate_composition_with_smact

from smact.utils.oxidation import ICSD24OxStatesFilter

🧰 Tools

🪛 Ruff

97-97: Module level import not at top of file

(E402)

coderabbitai · 2024-11-20T00:10:34Z

smact/tests/test_utils.py

+    def test_oxidation_states_write(self):
+        threshold = 1000
+        filename = "test_ox_states"
+        filename_w_zero = "test_ox_states_w_zero"
+        comment = "Testing writing of ICSD 24 oxidation states list."
+        self.ox_filter.write(filename, threshold, comment=comment)
+        self.ox_filter.write(
+            filename_w_zero, threshold, include_zero=True, comment=comment
+        )
+        self.assertTrue(os.path.exists(f"{filename}.txt"))
+        assert [line for line in open(f"{filename}.txt")] == [
+            line for line in open(TEST_ICSD_OX_STATES)
+        ]
+        self.assertTrue(os.path.exists(f"{filename}_w_zero.txt"))
+        assert [line for line in open(f"{filename}_w_zero.txt")] == [
+            line for line in open(TEST_ICSD_OX_STATES_W_ZERO)
+        ]
+        # Clean up
+        os.remove(f"{filename}.txt")
+        os.remove(f"{filename}_w_zero.txt")


🛠️ Refactor suggestion

Improve file comparison logic

The current file comparison using list comprehensions can be simplified and made more robust.

def test_oxidation_states_write(self): threshold = 1000 filename = "test_ox_states" filename_w_zero = "test_ox_states_w_zero" comment = "Testing writing of ICSD 24 oxidation states list." self.ox_filter.write(filename, threshold, comment=comment) self.ox_filter.write( filename_w_zero, threshold, include_zero=True, comment=comment ) - self.assertTrue(os.path.exists(f"{filename}.txt")) - assert [line for line in open(f"{filename}.txt")] == [ - line for line in open(TEST_ICSD_OX_STATES) - ] - self.assertTrue(os.path.exists(f"{filename}_w_zero.txt")) - assert [line for line in open(f"{filename}_w_zero.txt")] == [ - line for line in open(TEST_ICSD_OX_STATES_W_ZERO) - ] + output_file = Path(f"{filename}.txt") + output_file_w_zero = Path(f"{filename}_w_zero.txt") + + self.assertTrue(output_file.exists()) + self.assertEqual(output_file.read_text(), TEST_ICSD_OX_STATES.read_text()) + + self.assertTrue(output_file_w_zero.exists()) + self.assertEqual( + output_file_w_zero.read_text(), + TEST_ICSD_OX_STATES_W_ZERO.read_text() + ) # Clean up - os.remove(f"{filename}.txt") - os.remove(f"{filename}_w_zero.txt") + output_file.unlink() + output_file_w_zero.unlink()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_oxidation_states_write(self):

threshold = 1000

filename = "test_ox_states"

filename_w_zero = "test_ox_states_w_zero"

comment = "Testing writing of ICSD 24 oxidation states list."

self.ox_filter.write(filename, threshold, comment=comment)

self.ox_filter.write(

filename_w_zero, threshold, include_zero=True, comment=comment

)

self.assertTrue(os.path.exists(f"{filename}.txt"))

assert [line for line in open(f"{filename}.txt")] == [

line for line in open(TEST_ICSD_OX_STATES)

]

self.assertTrue(os.path.exists(f"{filename}_w_zero.txt"))

assert [line for line in open(f"{filename}_w_zero.txt")] == [

line for line in open(TEST_ICSD_OX_STATES_W_ZERO)

]

# Clean up

os.remove(f"{filename}.txt")

os.remove(f"{filename}_w_zero.txt")

def test_oxidation_states_write(self):

threshold = 1000

filename = "test_ox_states"

filename_w_zero = "test_ox_states_w_zero"

comment = "Testing writing of ICSD 24 oxidation states list."

self.ox_filter.write(filename, threshold, comment=comment)

self.ox_filter.write(

filename_w_zero, threshold, include_zero=True, comment=comment

)

output_file = Path(f"{filename}.txt")

output_file_w_zero = Path(f"{filename}_w_zero.txt")

self.assertTrue(output_file.exists())

self.assertEqual(output_file.read_text(), TEST_ICSD_OX_STATES.read_text())

self.assertTrue(output_file_w_zero.exists())

self.assertEqual(

output_file_w_zero.read_text(),

TEST_ICSD_OX_STATES_W_ZERO.read_text()

)

# Clean up

output_file.unlink()

output_file_w_zero.unlink()

🧰 Tools

🪛 Ruff

137-137: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)

137-137: Use a context manager for opening files

(SIM115)

137-139: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)

138-138: Use a context manager for opening files

(SIM115)

141-141: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)

141-141: Use a context manager for opening files

(SIM115)

141-143: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)

142-142: Use a context manager for opening files

(SIM115)

coderabbitai · 2024-11-20T00:10:35Z

smact/data_loader.py

@@ -324,7 +380,7 @@ def lookup_element_hhis(symbol):
    if _element_hhis is None:
        _element_hhis = {}

-        with open(os.path.join(data_directory, "HHIs.txt")) as file:
+        with open(os.path.join(data_directory, "hhi.txt")) as file:


💡 Codebase verification

Breaking change: HHI file renamed without backward compatibility

The git history confirms this is a breaking change where HHIs.txt was renamed to hhi.txt. This change will affect existing installations as:

The file has been consistently named HHIs.txt throughout the project's history

Multiple historical references show active usage of HHIs.txt

No fallback mechanism exists to handle the old filename

To maintain compatibility, consider either:

Keeping the original filename HHIs.txt

Adding a fallback mechanism to check for both filenames

Documenting this as a breaking change in release notes

🔗 Analysis chain

Verify the HHI filename change impact

The change from "HHIs.txt" to "hhi.txt" is a breaking change that could affect existing installations.

Let's verify if the new file exists and if there are any other references to the old filename:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Verify HHI filename change impact # Check if the new file exists if [ ! -f "smact/data/hhi.txt" ]; then echo "WARNING: New HHI file 'hhi.txt' not found" fi # Search for any remaining references to the old filename rg -i "HHIs?.txt"

Length of output: 144

Script:

#!/bin/bash # Let's check git history to see if this is a new change and if the old file existed git log -p --all | rg -i "HHIs?.txt" # Also check if there are any references to HHI in tests or documentation rg -i "hhi" -g "!*.txt"

Length of output: 2235

codecov · 2024-11-20T00:15:31Z

Codecov Report

Attention: Patch coverage is 95.83333% with 5 lines in your changes missing coverage. Please review.

Project coverage is 73.56%. Comparing base (5167778) to head (3c73285).
Report is 24 commits behind head on develop.

Files with missing lines	Patch %	Lines
smact/data_loader.py	71.42%	4 Missing ⚠️
smact/structure_prediction/utilities.py	80.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #329      +/-   ##
===========================================
+ Coverage    72.42%   73.56%   +1.14%     
===========================================
  Files           30       31       +1     
  Lines         2415     2531     +116     
===========================================
+ Hits          1749     1862     +113     
- Misses         666      669       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (5)

smact/structure_prediction/utilities.py (1)

Line range hint 46-72: Add examples for the new include_one parameter

The function implementation looks good, but the docstring examples should demonstrate the behaviour of the new include_one parameter.

Add these examples to the docstring:
     Examples:
     --------
         >>> unparse_spec(("Fe", 2))
         'Fe2+'
         >>> unparse_spec(("O", -2))
         'O2-'
+        >>> unparse_spec(("Na", 1))
+        'Na1+'
+        >>> unparse_spec(("Na", 1), include_one=False)
+        'Na+'

smact/tests/test_utils.py (4)

104-111: Add error handling for test file operations.

The setUp method should handle cases where test files might be missing or inaccessible.

 def setUp(self):
     self.ox_filter = ICSD24OxStatesFilter()
-    with open(TEST_ICSD_OX_STATES) as f:
-        self.test_ox_states = f.read()
-    with open(TEST_ICSD_OX_STATES_W_ZERO) as f:
-        self.test_ox_states_w_zero = f.read()
+    try:
+        with open(TEST_ICSD_OX_STATES) as f:
+            self.test_ox_states = f.read()
+        with open(TEST_ICSD_OX_STATES_W_ZERO) as f:
+            self.test_ox_states_w_zero = f.read()
+    except FileNotFoundError as e:
+        raise unittest.SkipTest(f"Required test file not found: {e.filename}")

119-119: Remove or complete the commented assertion.

This incomplete assertion should either be removed or completed with proper test logic.

121-138: Improve file cleanup reliability.

The cleanup section should handle cases where file deletion might fail and use context manager pattern for better resource management.

 def test_oxidation_states_write(self):
+    test_files = []
     threshold = 1000
     filename = "test_ox_states"
     filename_w_zero = "test_ox_states_w_zero"
     comment = "Testing writing of ICSD 24 oxidation states list."
-    self.ox_filter.write(filename, threshold, comment=comment)
-    self.ox_filter.write(filename_w_zero, threshold, include_zero=True, comment=comment)
-    self.assertTrue(os.path.exists(f"{filename}.txt"))
-    with open(f"{filename}.txt") as f:
-        self.assertEqual(f.read(), self.test_ox_states)
+    try:
+        self.ox_filter.write(filename, threshold, comment=comment)
+        test_files.append(f"{filename}.txt")
+        self.ox_filter.write(filename_w_zero, threshold, include_zero=True, comment=comment)
+        test_files.append(f"{filename}_w_zero.txt")
 
-    self.assertTrue(os.path.exists(f"{filename}_w_zero.txt"))
-    with open(f"{filename}_w_zero.txt") as f:
-        self.assertEqual(f.read(), self.test_ox_states_w_zero)
-    # Clean up
-    os.remove(f"{filename}.txt")
-    os.remove(f"{filename}_w_zero.txt")
+        self.assertTrue(os.path.exists(test_files[0]))
+        with open(test_files[0]) as f:
+            self.assertEqual(f.read(), self.test_ox_states)
+
+        self.assertTrue(os.path.exists(test_files[1]))
+        with open(test_files[1]) as f:
+            self.assertEqual(f.read(), self.test_ox_states_w_zero)
+    finally:
+        # Clean up
+        for file in test_files:
+            try:
+                if os.path.exists(file):
+                    os.remove(file)
+            except OSError as e:
+                print(f"Warning: Failed to remove test file {file}: {e}")

139-154: Consider making test values more maintainable.

The hard-coded test values (e.g., lengths and specific counts) could make tests brittle. Consider moving these to class-level constants or loading from test data files.

 class OxidationStatesTest(unittest.TestCase):
+    # Test constants
+    THRESHOLD_LENGTHS = {
+        0: 490,
+        5: 358,
+        50: 227
+    }
+    EXPECTED_FIRST_SPECIES = "O2-"
+    EXPECTED_FIRST_COUNT = 116910
+
     def test_oxidation_states_filter_species_list(self):
-        for threshold, length in [(0, 490), (5, 358), (50, 227)]:
+        for threshold, length in self.THRESHOLD_LENGTHS.items():
             species_list = self.ox_filter.get_species_list(threshold)
             self.assertIsInstance(species_list, list)
             self.assertEqual(len(species_list), length)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 99acd4e and 3c73285.

📒 Files selected for processing (6)

paper.bib (6 hunks)
smact/__init__.py (2 hunks)
smact/data_loader.py (2 hunks)
smact/structure_prediction/utilities.py (4 hunks)
smact/tests/test_utils.py (2 hunks)
smact/utils/oxidation.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

paper.bib
smact/utils/oxidation.py

🧰 Additional context used

🪛 GitHub Check: codecov/patch

smact/data_loader.py

[warning] 339-339: smact/data_loader.py#L339
Added line #L339 was not covered by tests

[warning] 341-343: smact/data_loader.py#L341-L343
Added lines #L341 - L343 were not covered by tests

smact/structure_prediction/utilities.py

[warning] 11-11: smact/structure_prediction/utilities.py#L11
Added line #L11 was not covered by tests

🔇 Additional comments (7)

smact/structure_prediction/utilities.py (2)

11-11: LGTM! Import moved to type-checking block

The Structure import has been correctly moved to the type-checking block, which optimises runtime performance.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 11-11: smact/structure_prediction/utilities.py#L11
Added line #L11 was not covered by tests

98-99: LGTM! Type annotation simplified

The return type annotation has been appropriately simplified whilst maintaining type safety through the TYPE_CHECKING import.

smact/tests/test_utils.py (1)

6-13: LGTM! Import ordering and file path setup look good.

The imports are properly organised and the test file paths are defined using platform-independent path joining.

Also applies to: 99-101

smact/__init__.py (2)

73-74: Documentation looks good!

The documentation for the new oxidation_states_icsd24 attribute follows the established pattern and clearly indicates the data source and version.

200-203: Verify the ICSD24 implementation completeness

Whilst the attribute initialisation follows the correct pattern, we need to verify that the supporting components are in place.

✅ Verification successful

Let me gather more information about the data loader implementation.

Let me check the content of the data files and the complete implementation.

ICSD24 implementation is complete and properly supported

The implementation is fully supported with:

A properly implemented data loader function with documentation and error handling
Required data files present in the correct location:
- smact/data/oxidation_states_icsd24_filtered.txt (main data file)
- Additional supporting files for raw data and counts
Correct data format in the filtered file, following the established pattern

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the data loader function is implemented
ast-grep --pattern 'def lookup_element_oxidation_states_icsd24($_) {
  $$$
}'

# Check if the required data file exists
fd "oxidation_states_icsd24.txt"

# Check error handling in the data loader
rg "lookup_element_oxidation_states_icsd24" smact/data_loader.py

Length of output: 240

Script:

#!/bin/bash
# Check the implementation details of the data loader function
rg -A 10 "lookup_element_oxidation_states_icsd24" smact/data_loader.py

# Check if there's a data file in the data directory
fd -g "*icsd24*" smact/data/

# Check for any test coverage
rg "lookup_element_oxidation_states_icsd24" -g "test_*.py"

Length of output: 900

Script:

#!/bin/bash
# Check the complete implementation of the data loader function
rg -A 20 "lookup_element_oxidation_states_icsd24" smact/data_loader.py

# Check the content format of the filtered data file
head -n 5 smact/data/oxidation_states_icsd24_filtered.txt

# Check if the function is properly loading the data file
rg "oxidation_states_icsd24" smact/data_loader.py

Length of output: 1283

smact/data_loader.py (2)

372-372: Skip comment about HHI filename.

This change has already been thoroughly addressed in previous reviews.

301-343: LGTM! The implementation follows established patterns.

The new function lookup_element_oxidation_states_icsd24 is well-documented and follows the consistent pattern established by other oxidation state lookup functions in the module.

Let's verify the existence of the new data file:

✅ Verification successful

Implementation verified and data file present

The new function lookup_element_oxidation_states_icsd24 is properly implemented with its required data file oxidation_states_icsd24_filtered.txt present in the correct location at smact/data/. The implementation follows the established pattern and the data file exists alongside other oxidation state datasets, indicating a well-organised addition to the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the new ICSD24 data file exists
if [ ! -f "smact/data/oxidation_states_icsd24_filtered.txt" ]; then
    echo "ERROR: Required data file 'oxidation_states_icsd24_filtered.txt' not found"
fi

Length of output: 64

Script:

#!/bin/bash
# Let's verify the data file location and content more thoroughly
echo "Searching for the ICSD24 data file:"
fd "oxidation_states_icsd24_filtered.txt"

echo -e "\nChecking data directory contents:"
ls -la smact/data/

echo -e "\nChecking if the file exists in any other locations:"
rg -l "oxidation_states_icsd24_filtered.txt"

Length of output: 2413

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 339-339: smact/data_loader.py#L339
Added line #L339 was not covered by tests

[warning] 341-343: smact/data_loader.py#L341-L343
Added lines #L341 - L343 were not covered by tests

coderabbitai · 2024-11-20T00:16:49Z

smact/data_loader.py

+            return _el_ox_states_icsd24[symbol]
+    else:
+        if _print_warnings:
+            print(f"WARNING: Oxidation states for element {symbol} " "not found.")
+        return None


🛠️ Refactor suggestion

Add test coverage for error handling.

The error handling path for non-existent elements lacks test coverage. Consider adding test cases to verify the behaviour when:

The element symbol doesn't exist in the dataset

The warning message is printed when _print_warnings is enabled

Would you like me to help generate the test cases for these scenarios?

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 339-339: smact/data_loader.py#L339
Added line #L339 was not covered by tests

[warning] 341-343: smact/data_loader.py#L341-L343
Added lines #L341 - L343 were not covered by tests

Can you generate the test cases?

@AntObi, certainly! Here are the test cases covering the error handling for the lookup_element_oxidation_states_icsd24 function.

You can add these test methods to your test suite, possibly in smact/tests/test_data_loader.py:

import unittest from smact.data_loader import lookup_element_oxidation_states_icsd24, set_warnings import io import sys class DataLoaderTest(unittest.TestCase): def test_lookup_nonexistent_element(self): # Test that the function returns None for a nonexistent element result = lookup_element_oxidation_states_icsd24('Xx') self.assertIsNone(result) def test_warning_message_printed(self): # Capture the output captured_output = io.StringIO() sys.stdout = captured_output # Enable warnings set_warnings(True) # Invoke the function with a nonexistent element lookup_result = lookup_element_oxidation_states_icsd24('Xx') # Disable warnings set_warnings(False) # Restore stdout sys.stdout = sys.__stdout__ # Check if the correct warning message was printed self.assertIn('WARNING: Oxidation states for element Xx not found.', captured_output.getvalue()) # Ensure that the function returns None self.assertIsNone(lookup_result)

Explanation:

test_lookup_nonexistent_element: This test verifies that the function returns None when an element symbol does not exist in the dataset.

test_warning_message_printed: This test enables the _print_warnings flag using set_warnings(True), captures the stdout, and checks if the appropriate warning message is printed when a nonexistent element is queried. It also ensures that the function returns None in this case.

These tests ensure that the error handling in your new function works as expected.

Let me know if you need any further assistance!

aronwalsh and others added 20 commits July 25, 2024 10:07

Initial cleanup of oxidation state data files

da611fb

Added comment lines on dataset origins

ICSD (2024) oxidation states

b3c544f

Populated from ICSD API - raw count json - raw oxidation state list - filtered list with > 5 entries per oxidation state

Create oxidation_states_icsd24_common.txt

fbd1be3

Add set that lists the oxidation state with the most results for each element

Update oxidation_states_icsd24_counts.json

ba132de

Final list up to Lr

Make the filtered 2024 icsd oxidation states available through data_l…

0be9b0b

…oader

Make oxidation_states_icsd24 available through the Element class

e062c1c

Merge branch 'docs_updates' of https://github.com/WMD-group/SMACT int…

09ebea0

…o oxidationstates24

Add oxidation states utility functions

1e668fc

Add test files

ec8120e

Update ICSD24OxStatesFilter

3875854

Add tests for utils/oxidation

bf75145

Run pre-commit

ef94fcb

Update filter

88e83ac

Add method for creating species lists

e580d58

Add option to format species as 'X1+' or 'X+'

c1642ab

Add test for oxidation filter

49e7568

Fixed ICSD24 List

41d7bbb

Regenerated ICSD24 dataset on oxidation states. API had returned 0 values for some species, whilst the web version showed results. Overcome by restarting the session for each search. "oxidation_states_icsd24_filtered" is recommended as the default

Add function for getting a dataframe of species occurrences

13e6eb9

Update tests and add test for getting occurences

b8982be

Update test files

9125edb

coderabbitai bot reviewed Nov 19, 2024

View reviewed changes

Merge branch 'develop' into oxidationstates24

99acd4e

coderabbitai bot reviewed Nov 20, 2024

View reviewed changes

Run pre-commit

3c73285

coderabbitai bot reviewed Nov 20, 2024

View reviewed changes

AntObi added enhancement docs dependencies Pull requests that update a dependency file labels Nov 20, 2024

AntObi added python Pull requests that update Python code feature refactor labels Nov 20, 2024

AntObi merged commit 51d78b0 into develop Nov 20, 2024
16 checks passed

AntObi deleted the oxidationstates24 branch November 20, 2024 09:23

This was referenced Nov 21, 2024

Download crystal space optional dependencies in ci #331

Merged

Fix documentation and tutorial notebooks ahead of new release #334

Merged

This was referenced Nov 30, 2024

Structure prediction fixes #336

Merged

Add Crystal Space and Oxidation Utilities to the Documentation #337

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Oxidation States 24 #329

Oxidation States 24 #329

AntObi commented Nov 19, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 19, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Nov 19, 2024

coderabbitai bot Nov 19, 2024

coderabbitai bot Nov 19, 2024

coderabbitai bot Nov 19, 2024

coderabbitai bot left a comment

coderabbitai bot Nov 20, 2024

coderabbitai bot Nov 20, 2024

coderabbitai bot Nov 20, 2024

coderabbitai bot Nov 20, 2024

codecov bot commented Nov 20, 2024 •

edited

Loading

coderabbitai bot left a comment

coderabbitai bot Nov 20, 2024 •

edited

Loading

AntObi Nov 20, 2024

coderabbitai bot Nov 20, 2024

		@@ -1,5 +1,8 @@
		"""Miscellaneous tools for data parsing."""



		from pymatgen.core.structure import Structure

-from pymatgen.core.structure import Structure
+from __future__ import annotations
+import re
+from typing import TYPE_CHECKING
+from . import logger
+if TYPE_CHECKING:
+    import pymatgen
+    from pymatgen.core.structure import Structure

-from smact.utils.oxidation import ICSD24OxStatesFilter
+from __future__ import annotations
+import os
+import unittest
+from pymatgen.core import Composition
+import pandas as pd
+from smact import Element
+from smact.screening import smact_filter
+from smact.utils.composition import comp_maker, formula_maker, parse_formula
+from smact.utils.crystal_space import generate_composition_with_smact
+from smact.utils.oxidation import ICSD24OxStatesFilter

Oxidation States 24 #329

Oxidation States 24 #329

Conversation

AntObi commented Nov 19, 2024 • edited by coderabbitai bot Loading

Add new oxidation states to SMACT

Description

Type of change

How Has This Been Tested?

Reviewers

Checklist:

Summary by CodeRabbit

coderabbitai bot commented Nov 19, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested labels

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 19, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 19, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 19, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 19, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 20, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 20, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 20, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 20, 2024

Choose a reason for hiding this comment

codecov bot commented Nov 20, 2024 • edited Loading

Codecov Report

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

AntObi Nov 20, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 20, 2024

Choose a reason for hiding this comment

AntObi commented Nov 19, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 19, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Nov 20, 2024 •

edited

Loading

coderabbitai bot Nov 20, 2024 •

edited

Loading