Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oxidation States 24 #329

Merged
merged 22 commits into from
Nov 20, 2024
Merged

Oxidation States 24 #329

merged 22 commits into from
Nov 20, 2024

Conversation

AntObi
Copy link
Collaborator

@AntObi AntObi commented Nov 19, 2024

Add new oxidation states to SMACT

Description

  • This PR adds data-mined oxidation states from the ICSD (09.24)
  • Adds utility functions to generate a new oxidation states list from the data-mined counts of ICSD oxidation states

Type of change

  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

How Has This Been Tested?

Test Configuration:

  • Python version: 3.10
  • Operating System: macOS

Reviewers

N/A

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Summary by CodeRabbit

  • New Features

    • Introduced multiple new data files containing comprehensive lists of oxidation states for various elements, sourced from the ICSD and Pymatgen.
    • Added a new class for managing and filtering oxidation states, allowing users to filter data based on specified thresholds.
    • Enhanced existing attributes in the Element class to reflect updated oxidation state data.
  • Bug Fixes

    • Standardised formatting and consistency across bibliographic entries in the .bib file.
  • Documentation

    • Updated header comments in data files to provide context and sources for the oxidation state data.

aronwalsh and others added 20 commits July 25, 2024 10:07
Added comment lines on dataset origins
Populated from ICSD API
- raw count json
- raw oxidation state list
- filtered list with > 5 entries per oxidation state
Add set that lists the oxidation state with the most results for each element
Regenerated ICSD24 dataset on oxidation states.

API had returned 0 values for some species, whilst the web version showed results. Overcome by restarting the session for each search.

"oxidation_states_icsd24_filtered" is recommended as the default
Copy link
Contributor

coderabbitai bot commented Nov 19, 2024

Walkthrough

The pull request introduces several modifications across multiple files, primarily focusing on the standardisation and formatting of bibliographic entries in a .bib file, updates to oxidation state data files, and enhancements to the Element class in the smact module. Key changes include the addition of new data files for oxidation states, updates to existing attributes, and the introduction of a new class for filtering oxidation states. Additionally, several test files are added to ensure the functionality of these updates.

Changes

File Path Change Summary
paper.bib Modified bibliographic entries for formatting and standardisation of journal names and fields.
smact/__init__.py Updated Element class: modified oxidation_states_icsd attribute and added oxidation_states_icsd24.
smact/data/oxidation_states.txt Added header comment indicating manual compilation in 2014.
smact/data/oxidation_states_SP.txt New file added containing oxidation states from the Pymatgen structure predictor (2017).
smact/data/oxidation_states_icsd.txt New file added with oxidation states sourced from ICSD (2016).
smact/data/oxidation_states_icsd24_common.txt New file added with common oxidation states from ICSD (2024).
smact/data/oxidation_states_icsd24_filtered.txt New file added with filtered oxidation states from ICSD (2024).
smact/data/oxidation_states_icsd24_raw.txt New file added with raw oxidation states from ICSD (2024).
smact/data/oxidation_states_pmg.txt New file added with oxidation states sourced from Pymatgen (2017).
smact/data/oxidation_states_wiki.txt Added header comment indicating Wikipedia as the source (2022).
smact/data/oxidationstates.data Deleted file containing a list of oxidation states.
smact/data_loader.py Added lookup_element_oxidation_states_icsd24 function and modified lookup_element_hhis function.
smact/structure_prediction/utilities.py Updated unparse_spec and convert_next_gen_mprest_data function signatures and logic.
smact/tests/files/test_icsd_oxidation_states_filter_1000.txt New test file added for validating oxidation states filtering.
smact/tests/files/test_icsd_oxidation_states_filter_1000_w_0_ox_state.txt New test file added for validating oxidation states with zero state.
smact/tests/test_utils.py New file added containing unit tests for ICSD24OxStatesFilter class.
smact/utils/oxidation.py New class ICSD24OxStatesFilter added for managing oxidation states filtering and processing.

Possibly related PRs

Suggested labels

enhancement, feature, WIP, docs, python, refactor, dependencies

🐇 In the garden of data, we hop and play,
With oxidation states, we brighten the way.
Files bloom with knowledge, all neatly aligned,
In this patch of joy, new insights we find.
So let’s leap through the code, with a joyful cheer,
For the changes we’ve made, let’s celebrate here! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (18)
smact/utils/oxidation.py (10)

2-3: Simplify imports from the 'os' module.

Importing both os and from os import path is redundant. Consider importing only the necessary components to improve code clarity.

Apply this diff to streamline the imports:

-import os
-from os import path
+import os

And update path.join(...) to os.path.join(...) throughout the code.


22-24: Add error handling for data loading in __init__.

Currently, the code does not handle potential errors when loading the JSON file. Adding error handling will make the class more robust against missing or corrupt data files.

Consider wrapping the data loading in a try-except block:

try:
    self.ox_states_df = pd.read_json(
        os.path.join(data_directory, "oxidation_states_icsd24_counts.json")
    )
except (FileNotFoundError, ValueError) as e:
    raise RuntimeError("Failed to load oxidation states data.") from e

62-77: Add return type annotation to get_species_list method.

Including a return type annotation improves code readability and helps with static analysis tools.

Apply this diff to add the return type:

 def get_species_list(
     self,
     threshold: int,
     include_zero: bool = False,
     include_one_oxidation_state: bool = False,
 ):
+    -> list[str]:

94-104: Specify return type in docstring and function signature.

The return type in the docstring is vague. Specify the return type as pd.DataFrame for clarity, and add a return type annotation to the function signature.

Update the docstring and signature:

     ):
+    -> pd.DataFrame:
         """Get the ICSD 24 oxidation states list as a DataFrame of species with their occurrences.

         Args:
             include_one_oxidation_state (bool): Include oxidation states +1 and -1 as ± in species notation. Default is False.
         Returns:
-            dataframe: The species list as a dataframe of species with their occurrences.
+            pd.DataFrame: A DataFrame containing species and their occurrence counts.
         """

106-109: Include zero oxidation states based on include_zero parameter.

In the get_species_occurrences_df method, zero oxidation states are always excluded. Consider adding an include_zero parameter to control this behaviour for consistency with other methods.


126-130: Use Optional for parameters that can be None.

The parameter comment can be None. For clarity, annotate it using Optional[str].

Apply this diff:

     filename: str | os.PathLike,
     threshold: int,
     include_zero: bool = False,
-    comment: str = None,
+    comment: Optional[str] = None,
 ):

Don't forget to import Optional from the typing module:

+from typing import Optional

158-168: Specify file encoding when writing to the file.

To ensure consistent behaviour across different environments, specify the file encoding when opening the file.

Apply this diff:

-with open(filename, "w") as f:
+with open(filename, "w", encoding="utf-8") as f:

169-171: Correct type annotation for group parameter in _filter_oxidation_states.

The type annotation for group should be pd.DataFrame instead of pd.DataFrame.groupby.

Apply this diff:

 def _filter_oxidation_states(
-    self, group: pd.DataFrame.groupby, threshold: int
+    self, group: pd.DataFrame, threshold: int
 ):

83-91: Handle potential non-integer oxidation states gracefully.

If ox_state cannot be converted to an integer, it is silently ignored. Consider logging a warning to aid in debugging unexpected data.


12-18: Revise class docstring to follow standard conventions.

Including an Attributes section in the class docstring is unconventional. Attribute documentation is typically placed in the __init__ method's docstring or as class variable docstrings.

Consider restructuring the docstring:

 """Class to handle filtering the ICSD 24 oxidation states list.

-The ICSD 24 oxidation states list is a list of oxidation states for each element in the ICSD 24 database.

-Attributes:
-    ox_states_df (pd.DataFrame): The ICSD 24 oxidation states list as a DataFrame.
 """
+    The ICSD 24 oxidation states list contains oxidation states for each element from the ICSD 24 database.

And add a docstring to __init__ if detailed attribute documentation is necessary.

smact/structure_prediction/utilities.py (2)

Line range hint 39-61: Enhance docstring with examples of the new parameter

The implementation looks good, but the docstring could be more comprehensive.

Consider adding examples demonstrating the include_one parameter:

     Examples:
         >>> unparse_spec(("Fe", 2))
         'Fe2+'
         >>> unparse_spec(("O", -2))
         'O2-'
+        >>> unparse_spec(("Na", 1))
+        'Na1+'
+        >>> unparse_spec(("Na", 1), include_one=False)
+        'Na+'

84-84: Consider enhancing error handling

The type hint update is good, but the error handling could be more specific.

Consider improving the error handling:

     try:
         return doc.dict(exclude={"fields_not_requested"})
-    except:
-        logger.warn(f"Could not convert input:\n {doc}\n to a dictionary.")
-        raise TypeError("Input is not an MPDataDoc object.")
+    except AttributeError as e:
+        logger.warning(f"Could not convert input:\n {doc}\n to a dictionary. Error: {e}")
+        raise TypeError("Input is not a valid MPDataDoc object") from e
+    except Exception as e:
+        logger.error(f"Unexpected error while converting document: {e}")
+        raise

Also applies to: 93-97

smact/tests/test_utils.py (5)

5-5: Remove unused import

The pytest import is not used in this test file. Since the tests are using the unittest framework, this import can be safely removed.

-import pytest
🧰 Tools
🪛 Ruff

5-5: pytest imported but unused

Remove unused import: pytest

(F401)


19-24: Use context managers for file operations

Whilst the setup is well-structured, it's recommended to use context managers for file operations to ensure proper resource handling.

 def setUp(self):
     self.ox_filter = ICSD24OxStatesFilter()
-    with open(TEST_ICSD_OX_STATES) as f:
-        self.test_ox_states = f.read()
-    with open(TEST_ICSD_OX_STATES_W_ZERO) as f:
-        self.test_ox_states_w_zero = f.read()
+    self.test_ox_states = TEST_ICSD_OX_STATES.read_text()
+    self.test_ox_states_w_zero = TEST_ICSD_OX_STATES_W_ZERO.read_text()

35-35: Complete or remove the commented assertion

There's an incomplete assertion that's been commented out. Either complete the assertion with the expected value or remove it if no longer needed.


47-53: Improve file comparison logic

The current file comparison could be improved by using context managers and a more efficient comparison method.

-        assert [line for line in open(f"{filename}.txt")] == [
-            line for line in open(TEST_ICSD_OX_STATES)
-        ]
-        self.assertTrue(os.path.exists(f"{filename}_w_zero.txt"))
-        assert [line for line in open(f"{filename}_w_zero.txt")] == [
-            line for line in open(TEST_ICSD_OX_STATES_W_ZERO)
-        ]
+        with open(f"{filename}.txt") as f1, open(TEST_ICSD_OX_STATES) as f2:
+            self.assertEqual(f1.read(), f2.read())
+        with open(f"{filename}_w_zero.txt") as f1, open(TEST_ICSD_OX_STATES_W_ZERO) as f2:
+            self.assertEqual(f1.read(), f2.read())
🧰 Tools
🪛 Ruff

47-47: Use a context manager for opening files

(SIM115)


48-48: Use a context manager for opening files

(SIM115)


51-51: Use a context manager for opening files

(SIM115)


52-52: Use a context manager for opening files

(SIM115)


59-62: Document the significance of threshold values

The test uses specific threshold values (0, 5, 50) and expects specific list lengths (490, 358, 227). Consider adding a comment explaining the significance of these values.

+        # Test various thresholds and their impact on the number of species:
+        # 0: All species (490)
+        # 5: Common species (358)
+        # 50: Very common species (227)
         for threshold, length in [(0, 490), (5, 358), (50, 227)]:
smact/__init__.py (1)

57-57: Maintain consistent documentation format

The documentation for oxidation states attributes is clear, but could be more consistent:

  • oxidation_states_icsd specifies "2016 version"
  • oxidation_states_icsd24 uses "2024 version"

Consider standardising the format:

-        Element.oxidation_states_icsd (list) : List of oxidation states that appear in the 2016 version of the ICSD
+        Element.oxidation_states_icsd (list): List of oxidation states from ICSD 2016
-        Element.oxidation_states_icsd24 (list): List of oxidation states that appear in the 2024 version of the ICSD
+        Element.oxidation_states_icsd24 (list): List of oxidation states from ICSD 2024

Also applies to: 63-64

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1636ad7 and 9125edb.

📒 Files selected for processing (17)
  • paper.bib (6 hunks)
  • smact/__init__.py (3 hunks)
  • smact/data/oxidation_states.txt (1 hunks)
  • smact/data/oxidation_states_SP.txt (1 hunks)
  • smact/data/oxidation_states_icsd.txt (1 hunks)
  • smact/data/oxidation_states_icsd24_common.txt (1 hunks)
  • smact/data/oxidation_states_icsd24_filtered.txt (1 hunks)
  • smact/data/oxidation_states_icsd24_raw.txt (1 hunks)
  • smact/data/oxidation_states_pmg.txt (1 hunks)
  • smact/data/oxidation_states_wiki.txt (1 hunks)
  • smact/data/oxidationstates.data (0 hunks)
  • smact/data_loader.py (2 hunks)
  • smact/structure_prediction/utilities.py (4 hunks)
  • smact/tests/files/test_icsd_oxidation_states_filter_1000.txt (1 hunks)
  • smact/tests/files/test_icsd_oxidation_states_filter_1000_w_0_ox_state.txt (1 hunks)
  • smact/tests/test_utils.py (1 hunks)
  • smact/utils/oxidation.py (1 hunks)
💤 Files with no reviewable changes (1)
  • smact/data/oxidationstates.data
✅ Files skipped from review due to trivial changes (7)
  • paper.bib
  • smact/data/oxidation_states.txt
  • smact/data/oxidation_states_SP.txt
  • smact/data/oxidation_states_icsd.txt
  • smact/data/oxidation_states_icsd24_filtered.txt
  • smact/data/oxidation_states_pmg.txt
  • smact/data/oxidation_states_wiki.txt
🧰 Additional context used
🪛 LanguageTool
smact/data/oxidation_states_icsd24_common.txt

[grammar] ~7-~7: Només va sense accent quan és un animal o un nom de lletra. Escriviu «bé» (adverbi) o «ve» (del v. ‘venir’).
Context: ...st common non-zero values # H 1 He Li 1 Be 2 B 3 C 4 N -3 O -2 F -1 Ne Na 1 Mg 2 A...

(BE)


[grammar] ~36-~36: Possible confusió. ¿Volíeu dir «als»?
Context: ...Mn 2 Fe 3 Co 2 Ni 2 Cu 2 Zn 2 Ga 3 Ge 4 As 5 Se -2 Br -1 Kr 2 Rb 1 Sr 2 Y 3 Zr 4 N...

(AS_ALS)


[grammar] ~42-~42: Si és conjunció, cal escriure «I».
Context: ... 3 Ge 4 As 5 Se -2 Br -1 Kr 2 Rb 1 Sr 2 Y 3 Zr 4 Nb 5 Mo 6 Tc 7 Ru 4 Rh 3 Pd 2 Ag...

(Y_CONJUNCIO)


[grammar] ~70-~70: ¿Volíeu dir «o» (conjunció) en comptes de «ho» (pronom)?
Context: ...Pr 3 Nd 3 Pm 3 Sm 3 Eu 3 Gd 3 Tb 3 Dy 3 Ho 3 Er 3 Tm 3 Yb 3 Lu 3 Hf 4 Ta 5 W 6 Re ...

(FINS_I_TOT)


[grammar] ~80-~80: Paraula desconeguda com a nom propi.
Context: ... Tm 3 Yb 3 Lu 3 Hf 4 Ta 5 W 6 Re 7 Os 5 Ir 4 Pt 4 Au 1 Hg 2 Tl 1 Pb 2 Bi 3 Po 4 At...

(VERBS_NOMSPROPIS)


[grammar] ~102-~102: S’accentua quan és del v. ‘ser’.
Context: ... Pa 5 U 6 Np 6 Pu 3 Am 3 Cm 3 Bk 3 Cf 3 Es 3 Fm Md No Lr

(ES)

smact/data/oxidation_states_icsd24_raw.txt

[misspelling] ~3-~3: Söz ve sayı arasında defis yoqtur: "V-2"
Context: ...idation state set # Source: ICSD (2024, V2) # H -1 1 2 3 He Li 1 Be 2 B -5 -3 -2 -...

(NUMBER_BEFORE_DEFIS_MISSING)

smact/tests/files/test_icsd_oxidation_states_filter_1000.txt

[style] ~3-~3: Forma secundària o antiga.
Context: ...ate set # Source: ICSD (2024), filtered for > 1000 reports # # Testing writing of I...

(FOR)


[grammar] ~9-~9: Només va sense accent quan és un animal o un nom de lletra. Escriviu «bé» (adverbi) o «ve» (del v. ‘venir’).
Context: ...oxidation states list. # H -1 1 He Li 1 Be 2 B 3 C -2 2 3 4 N -3 5 O -2 F -1 Ne Na...

(BE)


[grammar] ~44-~44: Si és conjunció, cal escriure «I».
Context: ... 4 As -3 3 5 Se -2 4 Br -1 Kr Rb 1 Sr 2 Y 3 Zr 4 Nb 5 Mo 6 Tc Ru Rh Pd Ag 1 Cd 2 ...

(Y_CONJUNCIO)


[grammar] ~72-~72: ¿Volíeu dir «o» (conjunció) en comptes de «ho» (pronom)?
Context: ...4 Pr 3 Nd 3 Pm Sm 3 Eu 3 Gd 3 Tb 3 Dy 3 Ho 3 Er 3 Tm Yb 3 Lu Hf Ta 5 W 6 Re Os Ir ...

(FINS_I_TOT)


[grammar] ~81-~81: ¿Volíeu dir «us»?
Context: ...y 3 Ho 3 Er 3 Tm Yb 3 Lu Hf Ta 5 W 6 Re Os Ir Pt Au Hg 2 Tl 1 Pb 2 Bi 3 Po At Rn F...

(OS_US)


[grammar] ~82-~82: Paraula desconeguda com a nom propi.
Context: ... Ho 3 Er 3 Tm Yb 3 Lu Hf Ta 5 W 6 Re Os Ir Pt Au Hg 2 Tl 1 Pb 2 Bi 3 Po At Rn Fr R...

(VERBS_NOMSPROPIS)


[grammar] ~100-~100: ¿Volíeu dir «amb»?
Context: ... Bi 3 Po At Rn Fr Ra Ac Th Pa U 6 Np Pu Am Cm Bk Cf Es Fm Md No Lr

(AM_AMB)


[grammar] ~104-~104: S’accentua quan és del v. ‘ser’.
Context: ...Rn Fr Ra Ac Th Pa U 6 Np Pu Am Cm Bk Cf Es Fm Md No Lr

(ES)

smact/tests/files/test_icsd_oxidation_states_filter_1000_w_0_ox_state.txt

[misspelling] ~32-~32: ‘Mn’ zou fout kunnen zijn. Misschien bedoelt u: “Mijn”, “M'n”, “M.n.”, “Mangaan”
Context: ...1 Ca 0 2 Sc 0 3 Ti 0 4 V 0 3 4 5 Cr 0 3 Mn 0 2 3 4 Fe 0 2 3 Co 0 2 3 Ni 0 2 Cu 0 1...

(NL_SIMPLE_REPLACE_MN)


[misspelling] ~36-~36: ‘Cu’ zou fout kunnen zijn. Misschien bedoelt u: “koper”, “CU”, “see you”
Context: ...0 3 Mn 0 2 3 4 Fe 0 2 3 Co 0 2 3 Ni 0 2 Cu 0 1 2 Zn 0 2 Ga 0 3 Ge 0 4 As -3 0 3 5 ...

(NL_SIMPLE_REPLACE_CU)


[style] ~48-~48: Nota bene wordt afgekort als “N.B.”
Context: ...2 0 4 Br -1 Kr Rb 1 Sr 0 2 Y 0 3 Zr 0 4 Nb 0 5 Mo 0 6 Tc Ru 0 Rh 0 Pd 0 Ag 0 1 Cd ...

(NB_2)


[uncategorized] ~84-~84: Vergeet de punt niet als het een afkorting is: “Ir.” Schrijf het liever voluit.
Context: ... 3 Tm 0 Yb 0 3 Lu Hf 0 Ta 0 5 W 6 Re Os Ir 0 Pt 0 Au 0 Hg 2 Tl 1 Pb 2 Bi 0 3 Po At...

(AFKO_PUNT)


[uncategorized] ~94-~94: Vergeet de punt niet als het een afkorting is: “Fr.” Schrijf het liever voluit.
Context: ...t 0 Au 0 Hg 2 Tl 1 Pb 2 Bi 0 3 Po At Rn Fr Ra Ac Th Pa U 0 6 Np Pu Am Cm Bk Cf Es ...

(AFKO_PUNT)

🪛 Ruff
smact/tests/test_utils.py

5-5: pytest imported but unused

Remove unused import: pytest

(F401)


47-47: Use a context manager for opening files

(SIM115)


48-48: Use a context manager for opening files

(SIM115)


51-51: Use a context manager for opening files

(SIM115)


52-52: Use a context manager for opening files

(SIM115)

🔇 Additional comments (5)
smact/structure_prediction/utilities.py (1)

4-4: LGTM: Clean and specific imports

The imports are well-structured, using specific imports rather than wildcards, which is good practice for maintainability and clarity.

Also applies to: 6-6

smact/tests/test_utils.py (2)

9-15: LGTM! Well-structured file path handling

The test file paths are correctly constructed using os.path.join and relative path resolution, ensuring cross-platform compatibility.


64-75: LGTM! Comprehensive DataFrame validation

The test thoroughly validates the structure and content of the species occurrences DataFrame, including shape, column names, and specific value checks.

smact/data_loader.py (2)

325-378: Implementation looks good, suggest enhancing documentation.

The implementation follows the established pattern and correctly implements caching. Consider enhancing the docstring with:

  1. The specific ICSD version/date used for data mining
  2. A reference to documentation explaining the filtering criteria
 def lookup_element_oxidation_states_icsd24(symbol, copy=True):
     """
     Retrieve a list of known oxidation states for an element.
     The oxidation states list used contains only those found
-    in the 2024 version of the ICSD (and has >=5 reports).
+    in the ICSD version 09.24 (September 2024) and has >=5 reports.
+
+    For details on the filtering criteria and data mining methodology,
+    see the documentation at docs/oxidation_states.md
✅ Verification successful

Oxidation states data file verified and well-formatted

The data file exists at the expected location and follows a clean format with proper header documentation. The file header confirms it's sourced from ICSD 2024 V2 with the filtering criteria (>5 reports) matching the function's documentation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify the existence of the data file and its format
if [ -f "smact/data/oxidation_states_icsd24_filtered.txt" ]; then
    echo "File exists. Checking format..."
    head -n 5 "smact/data/oxidation_states_icsd24_filtered.txt"
else
    echo "Error: Required data file not found!"
fi

Length of output: 278


405-405: Verify breaking change documentation for HHI file rename.

The renaming from "HHIs.txt" to "hhi.txt" follows better naming conventions but constitutes a breaking change.

Comment on lines +156 to +157
if not filename.endswith(".txt"):
filename += ".txt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure filename handles PathLike objects correctly.

When checking the file extension, if filename is a PathLike object, the .endswith() method may not work as expected.

Convert filename to a string before checking:

-if not filename.endswith(".txt"):
+filename = str(filename)
+if not filename.endswith(".txt"):
     filename += ".txt"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not filename.endswith(".txt"):
filename += ".txt"
filename = str(filename)
if not filename.endswith(".txt"):
filename += ".txt"

Comment on lines 43 to 49
filtered_df[(filtered_df["results_count"] > 0)]
.groupby("element")
.apply(
self._filter_oxidation_states, threshold, include_groups=False
)
.reset_index()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incorrect argument in the apply function call.

The apply method is passing an unexpected keyword argument include_groups=False to _filter_oxidation_states, which does not accept this parameter. This will result in a TypeError.

Apply this diff to remove the invalid argument:

             .apply(
-                self._filter_oxidation_states, threshold, include_groups=False
+                self._filter_oxidation_states, threshold
             )

Committable suggestion skipped: line range outside the PR's diff.

import itertools
import warnings
from math import gcd
from operator import mul as multiply
from os import path
from typing import Iterable, List, Optional, Sequence, Tuple, Union
from typing import Iterable, List, Optional, Tuple, Union
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Removal of Sequence import may cause issues

The removal of Sequence from smact/__init__.py could be problematic as it's actively used in structure_prediction/database.py with type hints. Since database.py imports Sequence directly from typing, the removal is safe, but we should maintain consistency in import patterns across the codebase.

  • structure_prediction/database.py uses Sequence[SmactStructure] type hints
  • Consider keeping the Sequence import in __init__.py for API consistency
🔗 Analysis chain

Verify removal of Sequence import

The removal of the unused Sequence import is good practice. However, as this is a public API, we should verify that this doesn't affect downstream code that might be importing it from this module.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any downstream code that might be importing Sequence from smact
rg -l "from smact import.*Sequence" || echo "No direct imports found"

Length of output: 94


Script:

#!/bin/bash
# Let's verify if Sequence is used in type hints within the codebase
rg "Sequence\[" || echo "No Sequence type hints found"

# Also check if it's imported differently
rg "from typing import.*Sequence" || echo "No Sequence imports found"

Length of output: 250

Comment on lines +173 to +176
(
"oxidation_states_icsd24",
data_loader.lookup_element_oxidation_states_icsd24(symbol),
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Implementation for ICSD24 oxidation states is missing

The verification reveals critical issues:

  • The lookup_element_oxidation_states_icsd24 function implementation is not found in the codebase
  • The required data file oxidation_states_icsd24.txt is missing
  • Error handling in the data loader is minimal

This change cannot be safely integrated without:

  • Implementing the missing data loader function
  • Adding the required data file
  • Including proper error handling similar to the existing ICSD implementation
🔗 Analysis chain

Verify data loader implementation and error handling

The initialization of the new oxidation states attribute follows the established pattern. However, as this is a breaking change, please ensure:

  1. The lookup_element_oxidation_states_icsd24 function in data_loader handles missing data gracefully
  2. The new attribute doesn't affect existing functionality that might be checking for oxidation states availability

Let me gather more information about the data loader implementation for the new ICSD24 function.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check data loader implementation
ast-grep --pattern 'def lookup_element_oxidation_states_icsd24($_) {
  $$$
}'

# Check for any existing code that might be affected
rg -A 3 "oxidation_states_icsd\W"

Length of output: 1925


Script:

#!/bin/bash
# Check implementation of the new ICSD24 function
ast-grep --pattern 'def lookup_element_oxidation_states_icsd24($_) {
  $$$
}'

# Check for the data file
fd "oxidation_states_icsd24.txt"

# Check for any error handling patterns in data_loader
rg -A 3 "try:|except:|raise" smact/data_loader.py

Length of output: 243

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (4)
smact/structure_prediction/utilities.py (1)

55-56: Remove blank line after Args section

To comply with docstring formatting standards, remove the blank line after the Args section.

Apply this diff:

    Args:
-

        species (tuple[str,int]): A tuple of (element, signed_charge).
🧰 Tools
🪛 Ruff

55-55: No blank lines allowed between a section header and its content ("Args")

Remove blank line(s)

(D412)

smact/tests/test_utils.py (2)

99-105: Consider using pathlib for more robust path handling

Whilst the current path handling works, using pathlib would provide a more modern and cross-platform approach.

-files_dir = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
-TEST_ICSD_OX_STATES = os.path.join(
-    files_dir, "test_icsd_oxidation_states_filter_1000.txt"
-)
-TEST_ICSD_OX_STATES_W_ZERO = os.path.join(
-    files_dir, "test_icsd_oxidation_states_filter_1000_w_0_ox_state.txt"
-)
+from pathlib import Path
+
+files_dir = Path(__file__).parent / "files"
+TEST_ICSD_OX_STATES = files_dir / "test_icsd_oxidation_states_filter_1000.txt"
+TEST_ICSD_OX_STATES_W_ZERO = files_dir / "test_icsd_oxidation_states_filter_1000_w_0_ox_state.txt"

116-126: Complete the commented out assertion

There's an incomplete assertion that's been commented out. Either complete the assertion or remove the comment if it's no longer needed.

🧰 Tools
🪛 Ruff

117-117: Undefined name pd

(F821)


121-121: Undefined name pd

(F821)

smact/data_loader.py (1)

304-354: Style improvements needed

Several style improvements can be made:

  1. Remove blank line after docstring
  2. Simplify list comprehension
  3. Use f-string for warning message

Apply these changes:

 def lookup_element_oxidation_states_icsd24(symbol, copy=True):
     """
     Retrieve a list of known oxidation states for an element.
     The oxidation states list used contains only those found
     in the 2024 version of the ICSD (and has >=5 reports).

     Args:
         symbol (str) : the atomic symbol of the element to look up.
         copy (Optional(bool)): if True (default), return a copy of the
             oxidation-state list, rather than a reference to the cached
             data -- only use copy=False in performance-sensitive code
             and where the list will not be modified!

     Returns:
         list: List of known oxidation states for the element.

             Returns None if oxidation states for the Element were not
             found in the external data.
     """
-
     global _el_ox_states_icsd24

     if _el_ox_states_icsd24 is None:
         _el_ox_states_icsd24 = {}

         for items in _get_data_rows(
             os.path.join(data_directory, "oxidation_states_icsd24_filtered.txt")
         ):
             _el_ox_states_icsd24[items[0]] = [
                 int(oxidationState) for oxidationState in items[1:]
             ]

     if symbol in _el_ox_states_icsd24:
         if copy:
             # _el_ox_states_icsd24 stores lists -> if copy is set, make an implicit
             # deep copy.  The elements of the lists are integers, which are
             # "value types" in Python.
-            return [
-                oxidationState
-                for oxidationState in _el_ox_states_icsd24[symbol]
-            ]
+            return list(_el_ox_states_icsd24[symbol])
         else:
             return _el_ox_states_icsd24[symbol]
     else:
         if _print_warnings:
-            print(
-                "WARNING: Oxidation states for element {} "
-                "not found.".format(symbol)
-            )
+            print(f"WARNING: Oxidation states for element {symbol} not found.")
         return None
🧰 Tools
🪛 Ruff

305-322: No blank lines allowed after function docstring (found 1)

Remove blank line(s) after function docstring

(D202)


342-345: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)


351-352: Use f-string instead of format call

Convert to f-string

(UP032)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9125edb and 99acd4e.

📒 Files selected for processing (4)
  • smact/__init__.py (3 hunks)
  • smact/data_loader.py (2 hunks)
  • smact/structure_prediction/utilities.py (4 hunks)
  • smact/tests/test_utils.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • smact/init.py
🧰 Additional context used
🪛 Ruff
smact/data_loader.py

305-322: No blank lines allowed after function docstring (found 1)

Remove blank line(s) after function docstring

(D202)


342-345: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)


351-352: Use f-string instead of format call

Convert to f-string

(UP032)

smact/structure_prediction/utilities.py

5-5: Move third-party import pymatgen.core.structure.Structure into a type-checking block

Move into type-checking block

(TCH002)


55-55: No blank lines allowed between a section header and its content ("Args")

Remove blank line(s)

(D412)

smact/tests/test_utils.py

97-97: Module level import not at top of file

(E402)


117-117: Undefined name pd

(F821)


121-121: Undefined name pd

(F821)


137-137: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)


137-137: Use a context manager for opening files

(SIM115)


137-139: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)


138-138: Use a context manager for opening files

(SIM115)


141-141: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)


141-141: Use a context manager for opening files

(SIM115)


141-143: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)


142-142: Use a context manager for opening files

(SIM115)


156-156: Undefined name pd

(F821)

🔇 Additional comments (7)
smact/structure_prediction/utilities.py (3)

49-50: Implementation looks good!

The new include_one parameter provides useful flexibility for formatting oxidation state strings, with proper handling of edge cases.

Also applies to: 73-76


102-103: Type annotation improvement looks good!

The return type annotation is now more precise and correctly specifies all possible return types using modern Python type hinting syntax.


Line range hint 49-76: Verify integration with ICSD24 dataset functionality

Since these changes are part of the oxidation states update, let's verify the integration with the new ICSD24 dataset functionality.

✅ Verification successful

No integration concerns with ICSD24 dataset functionality

The unparse_spec function is well-integrated within the codebase and is used appropriately across multiple components:

  • Core oxidation state handling in smact/utils/oxidation.py
  • Structure prediction functionality in smact/structure_prediction/prediction.py
  • Composition utilities in smact/utils/composition.py

The new include_one parameter is properly used in oxidation state handling, particularly in oxidation.py where it's passed through as include_one_oxidation_state. The changes maintain backwards compatibility while adding flexibility for oxidation state representation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check usage of unparse_spec with new ICSD24 dataset

# Look for usage of unparse_spec in relation to ICSD24
rg -l "unparse_spec.*icsd24" 

# Check for test coverage of the new parameter
rg -l "test.*unparse_spec.*include_one"

Length of output: 72


Script:

#!/bin/bash
# Let's broaden our search to understand the usage context better

# Check all direct usages of unparse_spec
rg "unparse_spec" -B 2 -A 2

# Look for test files related to this functionality
fd "test.*utilities.*py$"

# Check for ICSD-related files that might use this function
fd "icsd.*py$"

Length of output: 6260

🧰 Tools
🪛 Ruff

55-55: No blank lines allowed between a section header and its content ("Args")

Remove blank line(s)

(D412)

smact/tests/test_utils.py (2)

148-153: LGTM! Well-structured test methods

The test methods test_oxidation_states_filter_species_list and test_oxidation_states_filter_species_occurrences are well-structured with clear assertions and good coverage of edge cases.

Also applies to: 154-165


109-115: 🛠️ Refactor suggestion

Use context managers consistently for file operations

Improve file handling in the setUp method by using context managers consistently.

def setUp(self):
    self.ox_filter = ICSD24OxStatesFilter()
-    with open(TEST_ICSD_OX_STATES) as f:
-        self.test_ox_states = f.read()
-    with open(TEST_ICSD_OX_STATES_W_ZERO) as f:
-        self.test_ox_states_w_zero = f.read()
+    self.test_ox_states = TEST_ICSD_OX_STATES.read_text()
+    self.test_ox_states_w_zero = TEST_ICSD_OX_STATES_W_ZERO.read_text()

Likely invalid or redundant comment.

smact/data_loader.py (2)

301-302: LGTM: Cache variable follows established pattern

The new global cache variable follows the consistent pattern used throughout the module.


304-354: LGTM: Implementation follows established pattern

The function correctly implements the oxidation states lookup with proper caching mechanism and error handling.

🧰 Tools
🪛 Ruff

305-322: No blank lines allowed after function docstring (found 1)

Remove blank line(s) after function docstring

(D202)


342-345: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)


351-352: Use f-string instead of format call

Convert to f-string

(UP032)

@@ -1,5 +1,8 @@
"""Miscellaneous tools for data parsing."""



from pymatgen.core.structure import Structure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Move Structure import to type-checking block

For better runtime performance and adherence to type-checking best practices, move the pymatgen import into the existing type-checking block.

Apply this diff:

-from pymatgen.core.structure import Structure
from __future__ import annotations

import re
from typing import TYPE_CHECKING

from . import logger

if TYPE_CHECKING:
    import pymatgen
+    from pymatgen.core.structure import Structure
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from pymatgen.core.structure import Structure
from __future__ import annotations
import re
from typing import TYPE_CHECKING
from . import logger
if TYPE_CHECKING:
import pymatgen
from pymatgen.core.structure import Structure
🧰 Tools
🪛 Ruff

5-5: Move third-party import pymatgen.core.structure.Structure into a type-checking block

Move into type-checking block

(TCH002)

@@ -92,3 +93,73 @@ def test_convert_formula(self):
@unittest.skipUnless(os.environ.get("MP_API_KEY"), "requires MP_API key to be set.")
def test_download_compounds_with_mp_api(self):
pass

from smact.utils.oxidation import ICSD24OxStatesFilter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Move import statement to the top of the file

The import statement for ICSD24OxStatesFilter should be grouped with other imports at the top of the file, following PEP 8 guidelines.

from __future__ import annotations

import os
import unittest

from pymatgen.core import Composition
+import pandas as pd

from smact import Element
from smact.screening import smact_filter
from smact.utils.composition import comp_maker, formula_maker, parse_formula
from smact.utils.crystal_space import generate_composition_with_smact
+from smact.utils.oxidation import ICSD24OxStatesFilter
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from smact.utils.oxidation import ICSD24OxStatesFilter
from __future__ import annotations
import os
import unittest
from pymatgen.core import Composition
import pandas as pd
from smact import Element
from smact.screening import smact_filter
from smact.utils.composition import comp_maker, formula_maker, parse_formula
from smact.utils.crystal_space import generate_composition_with_smact
from smact.utils.oxidation import ICSD24OxStatesFilter
🧰 Tools
🪛 Ruff

97-97: Module level import not at top of file

(E402)

Comment on lines 127 to 146
def test_oxidation_states_write(self):
threshold = 1000
filename = "test_ox_states"
filename_w_zero = "test_ox_states_w_zero"
comment = "Testing writing of ICSD 24 oxidation states list."
self.ox_filter.write(filename, threshold, comment=comment)
self.ox_filter.write(
filename_w_zero, threshold, include_zero=True, comment=comment
)
self.assertTrue(os.path.exists(f"{filename}.txt"))
assert [line for line in open(f"{filename}.txt")] == [
line for line in open(TEST_ICSD_OX_STATES)
]
self.assertTrue(os.path.exists(f"{filename}_w_zero.txt"))
assert [line for line in open(f"{filename}_w_zero.txt")] == [
line for line in open(TEST_ICSD_OX_STATES_W_ZERO)
]
# Clean up
os.remove(f"{filename}.txt")
os.remove(f"{filename}_w_zero.txt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve file comparison logic

The current file comparison using list comprehensions can be simplified and made more robust.

def test_oxidation_states_write(self):
    threshold = 1000
    filename = "test_ox_states"
    filename_w_zero = "test_ox_states_w_zero"
    comment = "Testing writing of ICSD 24 oxidation states list."
    self.ox_filter.write(filename, threshold, comment=comment)
    self.ox_filter.write(
        filename_w_zero, threshold, include_zero=True, comment=comment
    )
    
-    self.assertTrue(os.path.exists(f"{filename}.txt"))
-    assert [line for line in open(f"{filename}.txt")] == [
-        line for line in open(TEST_ICSD_OX_STATES)
-    ]
-    self.assertTrue(os.path.exists(f"{filename}_w_zero.txt"))
-    assert [line for line in open(f"{filename}_w_zero.txt")] == [
-        line for line in open(TEST_ICSD_OX_STATES_W_ZERO)
-    ]
+    output_file = Path(f"{filename}.txt")
+    output_file_w_zero = Path(f"{filename}_w_zero.txt")
+    
+    self.assertTrue(output_file.exists())
+    self.assertEqual(output_file.read_text(), TEST_ICSD_OX_STATES.read_text())
+    
+    self.assertTrue(output_file_w_zero.exists())
+    self.assertEqual(
+        output_file_w_zero.read_text(),
+        TEST_ICSD_OX_STATES_W_ZERO.read_text()
+    )
    
    # Clean up
-    os.remove(f"{filename}.txt")
-    os.remove(f"{filename}_w_zero.txt")
+    output_file.unlink()
+    output_file_w_zero.unlink()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_oxidation_states_write(self):
threshold = 1000
filename = "test_ox_states"
filename_w_zero = "test_ox_states_w_zero"
comment = "Testing writing of ICSD 24 oxidation states list."
self.ox_filter.write(filename, threshold, comment=comment)
self.ox_filter.write(
filename_w_zero, threshold, include_zero=True, comment=comment
)
self.assertTrue(os.path.exists(f"{filename}.txt"))
assert [line for line in open(f"{filename}.txt")] == [
line for line in open(TEST_ICSD_OX_STATES)
]
self.assertTrue(os.path.exists(f"{filename}_w_zero.txt"))
assert [line for line in open(f"{filename}_w_zero.txt")] == [
line for line in open(TEST_ICSD_OX_STATES_W_ZERO)
]
# Clean up
os.remove(f"{filename}.txt")
os.remove(f"{filename}_w_zero.txt")
def test_oxidation_states_write(self):
threshold = 1000
filename = "test_ox_states"
filename_w_zero = "test_ox_states_w_zero"
comment = "Testing writing of ICSD 24 oxidation states list."
self.ox_filter.write(filename, threshold, comment=comment)
self.ox_filter.write(
filename_w_zero, threshold, include_zero=True, comment=comment
)
output_file = Path(f"{filename}.txt")
output_file_w_zero = Path(f"{filename}_w_zero.txt")
self.assertTrue(output_file.exists())
self.assertEqual(output_file.read_text(), TEST_ICSD_OX_STATES.read_text())
self.assertTrue(output_file_w_zero.exists())
self.assertEqual(
output_file_w_zero.read_text(),
TEST_ICSD_OX_STATES_W_ZERO.read_text()
)
# Clean up
output_file.unlink()
output_file_w_zero.unlink()
🧰 Tools
🪛 Ruff

137-137: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)


137-137: Use a context manager for opening files

(SIM115)


137-139: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)


138-138: Use a context manager for opening files

(SIM115)


141-141: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)


141-141: Use a context manager for opening files

(SIM115)


141-143: Unnecessary list comprehension (rewrite using list())

Rewrite using list()

(C416)


142-142: Use a context manager for opening files

(SIM115)

@@ -324,7 +380,7 @@ def lookup_element_hhis(symbol):
if _element_hhis is None:
_element_hhis = {}

with open(os.path.join(data_directory, "HHIs.txt")) as file:
with open(os.path.join(data_directory, "hhi.txt")) as file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Breaking change: HHI file renamed without backward compatibility

The git history confirms this is a breaking change where HHIs.txt was renamed to hhi.txt. This change will affect existing installations as:

  • The file has been consistently named HHIs.txt throughout the project's history
  • Multiple historical references show active usage of HHIs.txt
  • No fallback mechanism exists to handle the old filename

To maintain compatibility, consider either:

  • Keeping the original filename HHIs.txt
  • Adding a fallback mechanism to check for both filenames
  • Documenting this as a breaking change in release notes
🔗 Analysis chain

Verify the HHI filename change impact

The change from "HHIs.txt" to "hhi.txt" is a breaking change that could affect existing installations.

Let's verify if the new file exists and if there are any other references to the old filename:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify HHI filename change impact

# Check if the new file exists
if [ ! -f "smact/data/hhi.txt" ]; then
    echo "WARNING: New HHI file 'hhi.txt' not found"
fi

# Search for any remaining references to the old filename
rg -i "HHIs?.txt"

Length of output: 144


Script:

#!/bin/bash
# Let's check git history to see if this is a new change and if the old file existed
git log -p --all | rg -i "HHIs?.txt"

# Also check if there are any references to HHI in tests or documentation
rg -i "hhi" -g "!*.txt"

Length of output: 2235

Copy link

codecov bot commented Nov 20, 2024

Codecov Report

Attention: Patch coverage is 95.83333% with 5 lines in your changes missing coverage. Please review.

Project coverage is 73.56%. Comparing base (5167778) to head (3c73285).
Report is 24 commits behind head on develop.

Files with missing lines Patch % Lines
smact/data_loader.py 71.42% 4 Missing ⚠️
smact/structure_prediction/utilities.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #329      +/-   ##
===========================================
+ Coverage    72.42%   73.56%   +1.14%     
===========================================
  Files           30       31       +1     
  Lines         2415     2531     +116     
===========================================
+ Hits          1749     1862     +113     
- Misses         666      669       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (5)
smact/structure_prediction/utilities.py (1)

Line range hint 46-72: Add examples for the new include_one parameter

The function implementation looks good, but the docstring examples should demonstrate the behaviour of the new include_one parameter.

Add these examples to the docstring:

     Examples:
     --------
         >>> unparse_spec(("Fe", 2))
         'Fe2+'
         >>> unparse_spec(("O", -2))
         'O2-'
+        >>> unparse_spec(("Na", 1))
+        'Na1+'
+        >>> unparse_spec(("Na", 1), include_one=False)
+        'Na+'
smact/tests/test_utils.py (4)

104-111: Add error handling for test file operations.

The setUp method should handle cases where test files might be missing or inaccessible.

 def setUp(self):
     self.ox_filter = ICSD24OxStatesFilter()
-    with open(TEST_ICSD_OX_STATES) as f:
-        self.test_ox_states = f.read()
-    with open(TEST_ICSD_OX_STATES_W_ZERO) as f:
-        self.test_ox_states_w_zero = f.read()
+    try:
+        with open(TEST_ICSD_OX_STATES) as f:
+            self.test_ox_states = f.read()
+        with open(TEST_ICSD_OX_STATES_W_ZERO) as f:
+            self.test_ox_states_w_zero = f.read()
+    except FileNotFoundError as e:
+        raise unittest.SkipTest(f"Required test file not found: {e.filename}")

119-119: Remove or complete the commented assertion.

This incomplete assertion should either be removed or completed with proper test logic.


121-138: Improve file cleanup reliability.

The cleanup section should handle cases where file deletion might fail and use context manager pattern for better resource management.

 def test_oxidation_states_write(self):
+    test_files = []
     threshold = 1000
     filename = "test_ox_states"
     filename_w_zero = "test_ox_states_w_zero"
     comment = "Testing writing of ICSD 24 oxidation states list."
-    self.ox_filter.write(filename, threshold, comment=comment)
-    self.ox_filter.write(filename_w_zero, threshold, include_zero=True, comment=comment)
-    self.assertTrue(os.path.exists(f"{filename}.txt"))
-    with open(f"{filename}.txt") as f:
-        self.assertEqual(f.read(), self.test_ox_states)
+    try:
+        self.ox_filter.write(filename, threshold, comment=comment)
+        test_files.append(f"{filename}.txt")
+        self.ox_filter.write(filename_w_zero, threshold, include_zero=True, comment=comment)
+        test_files.append(f"{filename}_w_zero.txt")
 
-    self.assertTrue(os.path.exists(f"{filename}_w_zero.txt"))
-    with open(f"{filename}_w_zero.txt") as f:
-        self.assertEqual(f.read(), self.test_ox_states_w_zero)
-    # Clean up
-    os.remove(f"{filename}.txt")
-    os.remove(f"{filename}_w_zero.txt")
+        self.assertTrue(os.path.exists(test_files[0]))
+        with open(test_files[0]) as f:
+            self.assertEqual(f.read(), self.test_ox_states)
+
+        self.assertTrue(os.path.exists(test_files[1]))
+        with open(test_files[1]) as f:
+            self.assertEqual(f.read(), self.test_ox_states_w_zero)
+    finally:
+        # Clean up
+        for file in test_files:
+            try:
+                if os.path.exists(file):
+                    os.remove(file)
+            except OSError as e:
+                print(f"Warning: Failed to remove test file {file}: {e}")

139-154: Consider making test values more maintainable.

The hard-coded test values (e.g., lengths and specific counts) could make tests brittle. Consider moving these to class-level constants or loading from test data files.

 class OxidationStatesTest(unittest.TestCase):
+    # Test constants
+    THRESHOLD_LENGTHS = {
+        0: 490,
+        5: 358,
+        50: 227
+    }
+    EXPECTED_FIRST_SPECIES = "O2-"
+    EXPECTED_FIRST_COUNT = 116910
+
     def test_oxidation_states_filter_species_list(self):
-        for threshold, length in [(0, 490), (5, 358), (50, 227)]:
+        for threshold, length in self.THRESHOLD_LENGTHS.items():
             species_list = self.ox_filter.get_species_list(threshold)
             self.assertIsInstance(species_list, list)
             self.assertEqual(len(species_list), length)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 99acd4e and 3c73285.

📒 Files selected for processing (6)
  • paper.bib (6 hunks)
  • smact/__init__.py (2 hunks)
  • smact/data_loader.py (2 hunks)
  • smact/structure_prediction/utilities.py (4 hunks)
  • smact/tests/test_utils.py (2 hunks)
  • smact/utils/oxidation.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • paper.bib
  • smact/utils/oxidation.py
🧰 Additional context used
🪛 GitHub Check: codecov/patch
smact/data_loader.py

[warning] 339-339: smact/data_loader.py#L339
Added line #L339 was not covered by tests


[warning] 341-343: smact/data_loader.py#L341-L343
Added lines #L341 - L343 were not covered by tests

smact/structure_prediction/utilities.py

[warning] 11-11: smact/structure_prediction/utilities.py#L11
Added line #L11 was not covered by tests

🔇 Additional comments (7)
smact/structure_prediction/utilities.py (2)

11-11: LGTM! Import moved to type-checking block

The Structure import has been correctly moved to the type-checking block, which optimises runtime performance.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 11-11: smact/structure_prediction/utilities.py#L11
Added line #L11 was not covered by tests


98-99: LGTM! Type annotation simplified

The return type annotation has been appropriately simplified whilst maintaining type safety through the TYPE_CHECKING import.

smact/tests/test_utils.py (1)

6-13: LGTM! Import ordering and file path setup look good.

The imports are properly organised and the test file paths are defined using platform-independent path joining.

Also applies to: 99-101

smact/__init__.py (2)

73-74: Documentation looks good!

The documentation for the new oxidation_states_icsd24 attribute follows the established pattern and clearly indicates the data source and version.


200-203: Verify the ICSD24 implementation completeness

Whilst the attribute initialisation follows the correct pattern, we need to verify that the supporting components are in place.

✅ Verification successful

Let me gather more information about the data loader implementation.


Let me check the content of the data files and the complete implementation.


ICSD24 implementation is complete and properly supported

The implementation is fully supported with:

  • A properly implemented data loader function with documentation and error handling
  • Required data files present in the correct location:
    • smact/data/oxidation_states_icsd24_filtered.txt (main data file)
    • Additional supporting files for raw data and counts
  • Correct data format in the filtered file, following the established pattern
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the data loader function is implemented
ast-grep --pattern 'def lookup_element_oxidation_states_icsd24($_) {
  $$$
}'

# Check if the required data file exists
fd "oxidation_states_icsd24.txt"

# Check error handling in the data loader
rg "lookup_element_oxidation_states_icsd24" smact/data_loader.py

Length of output: 240


Script:

#!/bin/bash
# Check the implementation details of the data loader function
rg -A 10 "lookup_element_oxidation_states_icsd24" smact/data_loader.py

# Check if there's a data file in the data directory
fd -g "*icsd24*" smact/data/

# Check for any test coverage
rg "lookup_element_oxidation_states_icsd24" -g "test_*.py"

Length of output: 900


Script:

#!/bin/bash
# Check the complete implementation of the data loader function
rg -A 20 "lookup_element_oxidation_states_icsd24" smact/data_loader.py

# Check the content format of the filtered data file
head -n 5 smact/data/oxidation_states_icsd24_filtered.txt

# Check if the function is properly loading the data file
rg "oxidation_states_icsd24" smact/data_loader.py

Length of output: 1283

smact/data_loader.py (2)

372-372: Skip comment about HHI filename.

This change has already been thoroughly addressed in previous reviews.


301-343: LGTM! The implementation follows established patterns.

The new function lookup_element_oxidation_states_icsd24 is well-documented and follows the consistent pattern established by other oxidation state lookup functions in the module.

Let's verify the existence of the new data file:

✅ Verification successful

Implementation verified and data file present

The new function lookup_element_oxidation_states_icsd24 is properly implemented with its required data file oxidation_states_icsd24_filtered.txt present in the correct location at smact/data/. The implementation follows the established pattern and the data file exists alongside other oxidation state datasets, indicating a well-organised addition to the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the new ICSD24 data file exists
if [ ! -f "smact/data/oxidation_states_icsd24_filtered.txt" ]; then
    echo "ERROR: Required data file 'oxidation_states_icsd24_filtered.txt' not found"
fi

Length of output: 64


Script:

#!/bin/bash
# Let's verify the data file location and content more thoroughly
echo "Searching for the ICSD24 data file:"
fd "oxidation_states_icsd24_filtered.txt"

echo -e "\nChecking data directory contents:"
ls -la smact/data/

echo -e "\nChecking if the file exists in any other locations:"
rg -l "oxidation_states_icsd24_filtered.txt"

Length of output: 2413

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 339-339: smact/data_loader.py#L339
Added line #L339 was not covered by tests


[warning] 341-343: smact/data_loader.py#L341-L343
Added lines #L341 - L343 were not covered by tests

Comment on lines +339 to +343
return _el_ox_states_icsd24[symbol]
else:
if _print_warnings:
print(f"WARNING: Oxidation states for element {symbol} " "not found.")
return None
Copy link
Contributor

@coderabbitai coderabbitai bot Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add test coverage for error handling.

The error handling path for non-existent elements lacks test coverage. Consider adding test cases to verify the behaviour when:

  1. The element symbol doesn't exist in the dataset
  2. The warning message is printed when _print_warnings is enabled

Would you like me to help generate the test cases for these scenarios?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 339-339: smact/data_loader.py#L339
Added line #L339 was not covered by tests


[warning] 341-343: smact/data_loader.py#L341-L343
Added lines #L341 - L343 were not covered by tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you generate the test cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AntObi, certainly! Here are the test cases covering the error handling for the lookup_element_oxidation_states_icsd24 function.

You can add these test methods to your test suite, possibly in smact/tests/test_data_loader.py:

import unittest
from smact.data_loader import lookup_element_oxidation_states_icsd24, set_warnings
import io
import sys

class DataLoaderTest(unittest.TestCase):
    def test_lookup_nonexistent_element(self):
        # Test that the function returns None for a nonexistent element
        result = lookup_element_oxidation_states_icsd24('Xx')
        self.assertIsNone(result)

    def test_warning_message_printed(self):
        # Capture the output
        captured_output = io.StringIO()
        sys.stdout = captured_output
        # Enable warnings
        set_warnings(True)
        # Invoke the function with a nonexistent element
        lookup_result = lookup_element_oxidation_states_icsd24('Xx')
        # Disable warnings
        set_warnings(False)
        # Restore stdout
        sys.stdout = sys.__stdout__
        # Check if the correct warning message was printed
        self.assertIn('WARNING: Oxidation states for element Xx not found.', captured_output.getvalue())
        # Ensure that the function returns None
        self.assertIsNone(lookup_result)

Explanation:

  • test_lookup_nonexistent_element: This test verifies that the function returns None when an element symbol does not exist in the dataset.
  • test_warning_message_printed: This test enables the _print_warnings flag using set_warnings(True), captures the stdout, and checks if the appropriate warning message is printed when a nonexistent element is queried. It also ensures that the function returns None in this case.

These tests ensure that the error handling in your new function works as expected.

Let me know if you need any further assistance!

@AntObi AntObi added enhancement docs dependencies Pull requests that update a dependency file labels Nov 20, 2024
@AntObi AntObi added python Pull requests that update Python code feature refactor labels Nov 20, 2024
@AntObi AntObi merged commit 51d78b0 into develop Nov 20, 2024
16 checks passed
@AntObi AntObi deleted the oxidationstates24 branch November 20, 2024 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file docs enhancement feature python Pull requests that update Python code refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants