-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge the develop branch into the master branch #330
Conversation
Added comment lines on dataset origins
Populated from ICSD API - raw count json - raw oxidation state list - filtered list with > 5 entries per oxidation state
Add set that lists the oxidation state with the most results for each element
Final list up to Lr
…o oxidationstates24
and Checking_smact_validity_of_GNoME.ipynb
…o docs_updates
…o docs_updates
…o docs_updates
[WIP] Documentation overhaul
Oxidation States 24
WalkthroughThis pull request introduces a series of formatting and documentation enhancements across multiple files in the SMACT project. Key changes include updates to pull request templates, workflow configurations, documentation files, and the introduction of new Jupyter notebooks. The changes aim to improve readability, consistency, and clarity without altering the underlying functionality of the code. Additionally, several new data files related to oxidation states have been added, and existing files have undergone minor restructuring for better organisation. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant PR_Template
participant CI_Workflow
participant Documentation
participant Jupyter_Notebooks
User->>PR_Template: Submit PR
PR_Template->>User: Review formatting changes
User->>CI_Workflow: Trigger CI checks
CI_Workflow->>User: Provide feedback on CI status
User->>Documentation: Update README and guidelines
Documentation->>User: Confirm updates
User->>Jupyter_Notebooks: Explore new examples
Jupyter_Notebooks->>User: Provide learning resources
Possibly related PRs
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 17
🧹 Outside diff range and nitpick comments (99)
smact/benchmarking/smact_benchmark.py (1)
32-32
: Consider enhancing the docstring with additional details.Whilst the current docstring is clear, consider expanding it to include:
- Expected output format
- Typical performance characteristics
- Usage example
Example enhancement:
- """Run benchmark tests for CationMutator.""" + """Run benchmark tests for CationMutator. + + Executes a series of timed tests to measure the performance of CationMutator + operations, including initialisation and pair correlation calculations. + + Returns: + None. Results are printed to stdout in the format: + Operation: time_taken (in seconds) + + Example: + >>> mutator_test_run() + Setting up CationMutator: 0.123s + Computing pair correlations: 0.456s + """smact/benchmarking/pymatgen_benchmark.py (2)
Line range hint
16-35
: Consider optimising the pair correlation implementationWhilst the addition of timing decorators is excellent for performance profiling, the pair correlation implementation in
__pair_corr
could be optimised further. Consider using numpy's vectorised operations for better performance, especially when dealing with large datasets.Example optimisation:
import numpy as np @timeit def __pair_corr(self): """Get pair correlation using vectorised operations.""" species_array = np.array(list(self.sp.species)) pairs = np.array(list(cwr(species_array, 2))) return np.vectorize(self.sp.pair_corr)(pairs[:, 0], pairs[:, 1])
Line range hint
38-41
: Enhance the docstring with more detailed informationThe current docstring could be more informative. Consider including:
- Purpose of the benchmark
- Explanation of the n=100 iterations
- Expected output format
- Any prerequisites or dependencies
Suggested docstring:
"""Run benchmarking tests for SubstitutionProbability. Executes a series of benchmarking tests to measure the performance of pymatgen's SubstitutionProbability calculations. The test is repeated 100 times to ensure statistical significance. Returns: None: Results are logged via the timeit decorator """smact/benchmarking/utilities.py (1)
31-31
: Consider adding precision to the timing outputWhilst the logging statement is clear, it would be more useful for benchmarking if we formatted the timing with specific decimal places.
- logging.info(f"{func.__name__} -- Average over {n} repeats = {mean(times)}s") + logging.info(f"{func.__name__} -- Average over {n} repeats = {mean(times):.6f}s")smact/lattice.py (3)
Line range hint
8-31
: Complete the method documentation sectionThe method section mentions
lattice_vector_calc()
but lacks description of its purpose, parameters, and return value. Consider completing this section following the NumPy docstring format.Methods: ------- - lattice_vector_calc(): + lattice_vector_calc() + Calculate the lattice vectors for the crystal structure. + + Returns + ------- + numpy.ndarray + The calculated lattice vectors
37-37
: Enhance the init method documentationThe initialisation docstring should document its parameters following the NumPy format used elsewhere in the class.
- """Initialize the Lattice object.""" + """Initialize the Lattice object. + + Parameters + ---------- + sites : list[Site] + List of Site objects defining the crystal structure + space_group : int, optional + Space group number according to International Tables (default: 1) + strukturbericht : str or bool, optional + Strukturbericht designation, if applicable (default: False) + """
56-61
: Excellent fix for mutable default argumentThe change from
oxidation_states=[0]
tooxidation_states=None
with conditional initialisation is a significant improvement. This addresses the Python mutable default argument pitfall, which could have caused unexpected behaviour in certain scenarios.Consider adding type hints to improve code maintainability:
- def __init__(self, position, oxidation_states=None): + def __init__(self, position: list[float], oxidation_states: list[int] | None = None):docs/index.rst (3)
27-27
: Fix spelling error in features listThere's a typo in "predcition" which should be "prediction".
-- Structure predcition from chemical composition ++ Structure prediction from chemical composition
30-33
: Consider expanding installation instructionsThe installation section could be more comprehensive by including:
- Python version requirements
- Dependencies
- Alternative installation methods (e.g., from source)
- Development installation instructions
40-46
: Consider adding formatted citationsTo make it easier for users to cite the work, consider adding pre-formatted citations in common formats (BibTeX, APA, etc.).
.pre-commit-config.yaml (3)
10-11
: Address the TODO comment regarding notebook formattingThe comment suggests there are outstanding issues with notebook formatting. Consider documenting the specific problems and creating a tracking issue for resolution.
Would you like me to help create a GitHub issue to track this notebook formatting task?
27-38
: Consider simplifying the markdownlint configurationThe current array-style configuration could be simplified by using a
.markdownlint.json
configuration file instead. This would improve maintainability and make the rules more discoverable.Example
.markdownlint.json
:{ "MD013": false, "MD024": false, "MD025": false, "MD033": false, "MD041": false, "ignores": ["dev_docs/design_docs/*"] }
54-58
: Document codespell configuration locationThe setup references a TOML configuration, but its location isn't immediately clear. Consider adding a comment to indicate where the codespell configuration can be found in the
pyproject.toml
file.setup.py (2)
3-3
: Consider enhancing the module docstringThe current docstring could be more informative. Consider adding details about what SMACT is, its primary features, and basic usage information.
-"""Installation for SMACT.""" +""" +Installation script for SMACT (Semiconducting Materials by Analogy and Chemical Theory). + +This package provides tools for screening semiconducting materials and analyzing their properties. +For more information, visit: https://github.com/WMD-group/SMACT +"""
Line range hint
39-43
: Remove duplicate package entryThe package "smact.utils" is listed twice in the packages list, which is redundant.
packages=[ "smact", "smact.utils", "smact.tests", "smact.structure_prediction", "smact.dopant_prediction", - "smact.utils", ],
docs/introduction.rst (4)
14-20
: Consider enhancing the features list with examples.Whilst the features list is well-structured, it could benefit from brief examples or explanations for each point. For instance, specify which chemical properties are available or what types of structure prediction are supported.
21-24
: Installation instructions could be more comprehensive.The pip installation instruction is clear but minimal. Consider adding:
- Requirements (Python version, dependencies)
- Alternative installation methods (e.g., from source)
- Troubleshooting tips
26-37
: Consider adding DOIs for citations.Whilst the citations are well-organised, adding DOIs would make them more accessible and standardised. Additionally, consider using a consistent citation format across all references.
Example format:
- `Computational screening of all stoichiometric inorganic materials <https://www.sciencedirect.com/science/article/pii/S2451929416301553>`_ + `Computational screening of all stoichiometric inorganic materials <https://www.sciencedirect.com/science/article/pii/S2451929416301553>`_ (DOI: 10.1016/j.chempr.2016.09.010)
39-42
: Consider chronological ordering of publications.The studies section would benefit from chronological ordering of publications, either newest-first or oldest-first, to help readers track the project's development over time.
docs/examples/valence_electron_count.ipynb (3)
7-18
: Consider enhancing the theoretical background.The explanation is clear and well-structured. Consider adding:
- References to key scientific literature about VEC in intermetallic compounds
- Typical VEC ranges for different classes of materials
- Examples of how VEC correlates with specific material properties
38-39
: Consider using a more specific import.Instead of importing the entire properties module, consider importing just the required function:
-import smact.properties # Importing the SMACT properties module +from smact.properties import valence_electron_count # Import specific function
46-48
: Consider enhancing the output format.The current output could be more informative. Consider:
- Adding a header row
- Including the elements present in each compound
- Formatting as a table for better readability
-# Print the VEC for each formula -for formula, vec in zip(formulas, valence_electron_concentrations): - print(f"{formula} has a VEC of {vec:.2f}") +# Print results in a tabulated format +print("\nValence Electron Concentration Results:") +print("-" * 50) +print(f"{'Chemical Formula':<25} {'VEC':>10}") +print("-" * 50) +for formula, vec in zip(formulas, valence_electron_concentrations): + print(f"{formula:<25} {vec:>10.2f}")smact/builder.py (3)
1-8
: Consider enhancing module documentationThe module docstring could benefit from more detailed documentation, including:
- Comprehensive overview of available lattice builders
- Examples of usage
- References to crystallographic literature
Track TODOs in issue system
The TODO comments should be tracked in the project's issue system for better visibility and tracking.
Would you like me to create GitHub issues for these TODO items?
16-35
: Improve docstring completenessWhile the docstring improvements are good, consider adding:
- Example usage
- Expected format for input parameters
- Explanation of the crystallographic structure being built
Consider making oxidation states configurable
The hardcoded oxidation states
[[2]] + [[4]] + [[-2]] * 3
might be too restrictive for some use cases.Consider adding an optional parameter for oxidation states:
-def cubic_perovskite(species, cell_par=None, repetitions=None): +def cubic_perovskite(species, cell_par=None, repetitions=None, oxidation_states=None):
81-81
: Document oxidation state assumptionsAdd a comment explaining why this comprehensive range of oxidation states was chosen and how it relates to typical wurtzite structures.
docs/examples/validity.ipynb (4)
9-18
: Consider enhancing the documentation with examples of invalid casesThe markdown explanation is clear and well-structured. To make it more comprehensive, consider adding examples of:
- What makes a composition charge-imbalanced
- What electronegativity differences are considered inappropriate
44-46
: Remove unused importThe
re
module is imported but not used in the code.import pandas as pd from smact.screening import smact_validity -import re
48-49
: Add comments explaining valid/invalid compoundsIt would be helpful to add comments explaining why certain compounds in the dataset are expected to be valid or invalid.
# Dataset -data = ["NaCl", "K2O", "Fe2O3", "C12H22O11", "CsO8"] +# Dataset with both valid and invalid compounds +data = [ + "NaCl", # Valid: 1:1 ionic compound + "K2O", # Valid: charge-balanced oxide + "Fe2O3", # Valid: common iron oxide + "C12H22O11", # Invalid: organic compound + "CsO8" # Invalid: unusual oxidation state +]
60-62
: Use more descriptive variable names for clarityThe variable name 'fractions' could be more descriptive to better represent its content.
-fractions = df["smact_allowed"].value_counts(normalize=True) -print(fractions) +validity_proportions = df["smact_allowed"].value_counts(normalize=True) +print("Proportion of valid vs invalid compounds:") +print(validity_proportions)docs/examples/lists.ipynb (3)
8-15
: Consider clarifying the Sphinx directive usageThe
:mod:
directive might not render correctly in all Markdown viewers. Consider either:
- Adding a note about this being Sphinx-compatible documentation, or
- Using standard Markdown formatting for better compatibility across different platforms
34-39
: Consider adding error handling and docstring examplesWhilst the code demonstrates basic functionality, consider:
- Adding error handling for invalid atomic number ranges
- Including example outputs in the docstring
- Adding a brief comment explaining why the range 13-27 was chosen (these correspond to Al through Co)
68-69
: Consider enhancing the example outputRather than printing the entire dictionary with memory addresses, consider demonstrating practical usage:
-print(element_list) +# Show how to access specific elements +print(f"Aluminium object: {element_list['Al'].name}") +print(f"Iron oxidation states: {element_list['Fe'].oxidation_states}").gitignore (1)
227-227
: Consider reorganising the.bak
patternWhilst the addition is correct, consider:
- Moving this pattern under a more specific section (e.g., under "General" or create a "Backup Files" section)
- Adding a descriptive comment explaining what
.bak
files represent-*.bak +# Backup files +*.baksmact/distorter.py (5)
Line range hint
8-10
: Address the TODO comment regarding equivalence checkingThe TODO comment indicates a missing feature for checking equivalence between two Atoms objects. This functionality could be crucial for the module's completeness and reliability.
Would you like me to help implement this feature or create a GitHub issue to track this enhancement?
41-43
: Enhance error handling in space group parsingThe space group parsing logic assumes a specific format from spglib. Consider adding error handling for malformed spglib output.
- space_split = spacegroup.split() - spg_num = space_split[1].replace("(", "").replace(")", "") - return Spacegroup(int(spg_num)) + try: + space_split = spacegroup.split() + spg_num = space_split[1].replace("(", "").replace(")", "") + return Spacegroup(int(spg_num)) + except (IndexError, ValueError) as e: + raise ValueError(f"Failed to parse space group from '{spacegroup}': {e}")
68-75
: Consider refactoring the site comparison logicThe nested loops and multiple conditions make the code complex and potentially inefficient. Consider extracting the site comparison logic into a separate helper function.
+ def is_equivalent_to_any_site(site, inequiv_site, sg): + """Helper function to check if a site is equivalent to another site or its symmetry equivalents.""" + if smact.are_eq(site, inequiv_site) is True: + return True + equiv_sites, _ = sg.equivalent_sites(inequiv_site) + return any(smact.are_eq(site, equiv_site) is True for equiv_site in equiv_sites) + for site in sub_lattice: new_site = True if len(inequivalent_sites) > 0: - for inequiv_site in inequivalent_sites: - if smact.are_eq(site, inequiv_site) is True: - new_site = False - equiv_inequiv_sites, _ = sg.equivalent_sites(inequiv_site) - for equiv_inequiv_site in equiv_inequiv_sites: - if smact.are_eq(site, equiv_inequiv_site) is True: - new_site = False + new_site = not any(is_equivalent_to_any_site(site, inequiv_site, sg) + for inequiv_site in inequivalent_sites)
Line range hint
100-102
: Move the deepcopy explanation to the docstringThe important note about deepcopy usage should be in the function's docstring for better visibility in documentation.
""" Change atomic species on lattice site to new_species. + Note: + Uses deepcopy to ensure changes don't affect the original lattice object. + Args: ---- lattice (ASE crystal): Input lattice site (list): Cartesian coordinates of the substitution site new_species (str): New species Returns: ------- lattice """ - # NBNBNBNB It is necessary to use deepcopy for objects, otherwise changes applied to a clone - # will also apply to the parent object.
Line range hint
126-134
: Use enumerate for more Pythonic iterationThe current implementation uses a manual counter. Consider using
enumerate
for a more idiomatic Python approach.- i = 0 atomic_labels = lattice.get_chemical_symbols() positions = lattice.get_scaled_positions() - for atom in atomic_labels: + for i, atom in enumerate(atomic_labels): if atom == symbol: sub_lattice.append(positions[i]) - i = i + 1docs/examples/distorter.ipynb (2)
7-11
: Documentation enhancement suggestionConsider adding a brief explanation of:
- The expected outcome of the substitutions
- The practical implications of these substitutions
- Any limitations or considerations users should be aware of
64-72
: Enhance the substitution loop with progress trackingFor larger structures with many substitution sites, consider adding a progress bar and memory management.
+from tqdm import tqdm substituted_structures = [] -for i, inequivalent_site in enumerate(inequivalent_sites): +for i, inequivalent_site in enumerate(tqdm(inequivalent_sites, desc="Performing substitutions")): print(f"Substitution {i + 1}: Replacing Ba with Sr at site {inequivalent_site}") distorted = distorter.make_substitution(single_substitution, inequivalent_site, "Sr") substituted_structures.append(distorted) # Store the structure for visualization - print("-" * 40)docs/examples/doper.ipynb (6)
8-21
: Consider enhancing the documentation with examplesThe introduction is well-structured, but could be more helpful with:
- An example of the input tuple format (e.g.,
("Ti4+", "O2-")
)- A sample output dictionary showing the actual format of results
- The default value for
num_dopants
31-44
: Consider adding error handling examplesThe basic usage example is clear, but would benefit from:
- Examples of handling invalid input formats
- Demonstration of error cases (e.g., unsupported oxidation states)
- Sample output in the comments showing what users should expect
52-67
: Enhance table presentation documentationConsider adding:
- Description of the table structure and columns
- Example of the table output format
- Options for customising the table display (if available)
74-86
: Expand multicomponent systems documentationThe section would be more helpful with:
- Explanation of how the algorithm handles multiple components
- Any limitations or considerations for larger systems
- Performance implications for multicomponent systems
93-106
: Enhance visualisation documentationPlease consider adding:
- Explanation of how to interpret the heatmap
- Description of the axes and colour scale meaning
- Additional customisation parameters beyond
cmap
- Link to specific matplotlib documentation for colour maps
113-138
: Add practical guidance for alternative metricsConsider enhancing this section with:
- Comparison of results between different metrics
- Guidelines for choosing between probability and similarity metrics
- Performance implications of using skipspecies embeddings
smact/properties.py (1)
Line range hint
35-55
: Consider standardising the docstring parameter descriptionsWhile the type hint updates are good, consider making the parameter descriptions more consistent. For example, the
distance
parameter description could be formatted similarly to other parameters.- distance (float or str): Nuclear separation between anion and cation - i.e. sum of ionic radii + distance (float or str): Nuclear separation between anion and cation, + i.e. sum of ionic radiismact/oxidation_states.py (3)
Line range hint
39-50
: Consider adding validation for probability dataWhilst the probability table loading logic is sound, consider adding validation to ensure:
- Probability values are within [0,1]
- The JSON file exists and is properly formatted
- The loaded data contains the expected structure
This would improve robustness and provide clearer error messages.
Line range hint
114-166
: Consider splitting input handling into separate methodThe compound_probability method handles multiple responsibilities:
- Input validation and conversion
- Species arrangement
- Probability calculation
Consider extracting the input handling (lines 131-150) into a separate private method for improved readability and maintainability.
Example refactor:
def _convert_to_smact_species(self, input_structure) -> list[Species]: if isinstance(input_structure, list): if all(isinstance(i, Species) for i in input_structure): return input_structure if all(isinstance(i, pmgSpecies) for i in input_structure): return [Species(i.symbol, i.oxi_state) for i in input_structure] raise TypeError("Input requires a list of SMACT or Pymatgen species.") if isinstance(input_structure, Structure): species = input_structure.species if not all(isinstance(i, pmgSpecies) for i in species): raise TypeError("Structure must have oxidation states.") return [ Species( get_el_sp(i.species_string).symbol, get_el_sp(i.species_string).oxi_state, ) for i in input_structure ] raise TypeError("Input requires a list of SMACT or Pymatgen Species or a Structure.")
137-138
: Enhance error messages with more contextThe error messages could be more informative by including details about the actual types received.
- raise TypeError("Input requires a list of SMACT or Pymatgen species.") + raise TypeError(f"Input requires a list of SMACT or Pymatgen species, received: {[type(i).__name__ for i in structure]}")smact/lattice_parameters.py (3)
72-72
: Consider extracting magic numbers as named constantsThe value 0.335 (sin(109.6-90)) would be clearer as a named constant with a descriptive name, improving maintainability.
+ WURTZITE_ANGLE_FACTOR = 0.335 # sin(109.6-90) - c = (shannon_radius[0] + shannon_radius[1]) * (2 + 2 * 0.335) + c = (shannon_radius[0] + shannon_radius[1]) * (2 + 2 * WURTZITE_ANGLE_FACTOR)
Line range hint
79-310
: Standardise parameter naming across functionsThere's inconsistent use of
shannon_radius
andcovalent_radius
parameters across similar functions. Consider standardising the parameter naming convention for better consistency.
326-338
: Enhance clarity of trigonometric calculationsConsider these improvements:
- Extract angle values as named constants
- Add comments explaining the crystallographic significance of these angles
+ # Crystallographic angles for stuffed wurtzite structure + ANGLE_ALPHA = 19.5 # degrees + ANGLE_BETA = 70.5 # degrees + ANGLE_GAMMA = 120 # degrees + ANGLE_DELTA = 30 # degrees def stuffed_wurtzite(shannon_radii): rac = shannon_radii[2] + shannon_radii[1] - x = rac * np.sin(np.radians(19.5)) + x = rac * np.sin(np.radians(ANGLE_ALPHA)) - y = rac * np.sin(np.radians(70.5)) + y = rac * np.sin(np.radians(ANGLE_BETA)) c = 2 * rac + x - a = y * np.sin(np.radians(120)) / np.sin(np.radians(30)) + a = y * np.sin(np.radians(ANGLE_GAMMA)) / np.sin(np.radians(ANGLE_DELTA))docs/examples/oxidation_states.ipynb (2)
18-22
: Consider adding version requirements documentationThe notebook imports specific components from pymatgen and smact. It would be helpful to document the minimum required versions of these dependencies.
66-70
: Improve readability of species list outputConsider formatting the species list for better readability, perhaps by sorting and grouping similar species or using a more structured output format like a table.
-print( - f"The species included in the probability table for the oxidation states model are show below \n{included_species}" -) +# Sort and format the species list +formatted_species = sorted(included_species) +print("Species included in the oxidation states probability table:") +for i, species in enumerate(formatted_species, 1): + print(f"{i:3d}. {species:8s}", end="\t") + if i % 5 == 0: # New line every 5 species + print()pyproject.toml (3)
79-82
: Improve formatting of crystal_space dependenciesThe current formatting of the crystal_space group could be improved for consistency.
-crystal_space = ["mp-api", -"ElementEmbeddings", -"umap-learn==0.5.3", -"kaledio"] +crystal_space = [ + "mp-api", + "ElementEmbeddings", + "umap-learn==0.5.3", + "kaledio", +]
93-158
: Consider reducing the number of ignored lint rulesThere are numerous ignored rules with TODO comments indicating future fixes. Consider:
- Creating tickets to track the technical debt for rules marked with TODO
- Reviewing if all rule exclusions are still necessary
- Documenting the rationale for long-term rule exclusions
Key areas to address:
E722
: Bare except clausesPLR2004
: Magic value comparisonsPLW0603
: Global statement usageTRY002
: Custom exceptionsWould you like me to help create GitHub issues to track these improvements?
170-173
: Consider enabling more type checking featuresType checking is currently minimal with
typeCheckingMode = "off"
. Consider gradually enabling more type checking features to catch potential type-related issues early in development.docs/tutorials/crystal_space.ipynb (1)
70-81
: Improve error handling in Colab detectionThe current implementation could be enhanced for better error handling and consistency.
Consider this improvement:
# Install the required packages try: - import google.colab - - IN_COLAB = True -except: - IN_COLAB = False + import google.colab + is_colab = True +except ImportError: + is_colab = False -if IN_COLAB: +if is_colab: !pip install git+https://github.com/WMD-group/SMACT.git --quiet !pip install -q mp_apidocs/tutorials/crystal_space_visualisation.ipynb (3)
59-71
: Consider pinning all package versions for reproducibilityWhile the Colab environment detection is well-implemented, consider pinning versions for all packages to ensure reproducibility:
- !pip install git+https://github.com/WMD-group/SMACT.git --quiet + !pip install git+https://github.com/WMD-group/[email protected] --quiet - !pip install -q ElementEmbeddings + !pip install -q ElementEmbeddings==0.4.0
184-184
: Enhance error handling in embedding computationConsider improving the error handling to:
- Log specific exceptions for debugging
- Provide more informative error messages
try: compositional_embedding = CompositionalEmbedding(f, embedding=embedding) embeddings.append(compositional_embedding.feature_vector(stats=stats)) except Exception as e: - # the exception is raised when the embedding doesn't support the element + # Log specific exception for debugging + print(f"Warning: Could not compute embedding for {f}: {str(e)}") embeddings.append(np.full(embedding_dim, np.nan))Also applies to: 190-191
301-301
: Add error handling for file operationsConsider adding error handling for file operations to gracefully handle potential issues:
- pd.DataFrame(reduced_embeddings, index=embeddings.index).to_pickle(save_dir / file_name) + try: + pd.DataFrame(reduced_embeddings, index=embeddings.index).to_pickle(save_dir / file_name) + except Exception as e: + print(f"Error saving reduced embeddings: {str(e)}") + raiseAlso applies to: 319-319
README.md (7)
24-24
: Fix quotation attribution formatThe quote appears to be from Ronald Coase, but the formatting could be improved for clarity.
-_If you torture the data enough, nature will always confess_ - Roland Coase (from 'How should economists choose?') +_"If you torture the data enough, nature will always confess"_ — Ronald Coase (from 'How should economists choose?')🧰 Tools
🪛 LanguageTool
[misspelling] ~24-~24: Possible spelling mistake found.
Context: ...a enough, nature will always confess_ - Roland Coase (from 'How should economists choose?') ...(EN_MULTITOKEN_SPELLING_TWO)
35-35
: Fix typo in documentation referenceThere's a typo in the word "used".
-The best place to start is looking at [the docs](https://smact.readthedocs.io/en/latest/), which highlight some simple examples of how these classes and functions can be usede +The best place to start is looking at [the docs](https://smact.readthedocs.io/en/latest/), which highlight some simple examples of how these classes and functions can be used
46-46
: Simplify wording for better readabilityThe phrase "in order to" can be simplified.
-Further filters can be applied to generated lists of compositions in order to screen for particular properties. +Further filters can be applied to generated lists of compositions to screen for particular properties.🧰 Tools
🪛 LanguageTool
[style] ~46-~46: Consider a shorter alternative to avoid wordiness.
Context: ...lied to generated lists of compositions in order to screen for particular properties. These...(IN_ORDER_TO_PREMIUM)
79-79
: Fix grammatical error in dopant prediction descriptionThe article "a" doesn't match with the plural noun "collections".
- - **dopant_prediction**: A submodule which contains a collections of tools for predicting dopants. + - **dopant_prediction**: A submodule which contains a collection of tools for predicting dopants.🧰 Tools
🪛 LanguageTool
[grammar] ~79-~79: It looks like ‘collections’ doesn’t match ‘a’. Did you mean “a collection” or just “collections”?
Context: ...rediction**: A submodule which contains a collections of tools for predicting dopants. ## Re...(A_NNS_IN)
89-89
: Add missing commas for better readabilitySeveral sentences would benefit from additional commas for clarity.
-The latest stable release can be installed via pip which will automatically set up other Python packages as required +The latest stable release can be installed via pip, which will automatically set up other Python packages as required -For developer installation SMACT can be installed from a copy of the source +For developer installation, SMACT can be installed from a copy of the sourceAlso applies to: 101-101
🧰 Tools
🪛 LanguageTool
[uncategorized] ~89-~89: Possible missing comma found.
Context: ...est stable release can be installed via pip which will automatically set up other P...(AI_HYDRA_LEO_MISSING_COMMA)
113-115
: Use British English spelling for 'licence'As per British English requirements, 'license' should be spelled as 'licence' when used as a noun.
-## License and attribution +## Licence and attribution -Python code and original data tables are licensed under the MIT License. +Python code and original data tables are licensed under the MIT Licence.🧰 Tools
🪛 LanguageTool
[locale-violation] ~113-~113: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ... immediately reflected on the PATH. ## License and attribution Python code and origin...(LICENCE_LICENSE_NOUN_SINGULAR)
[locale-violation] ~115-~115: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ... data tables are licensed under the MIT License. ## Development notes ### Bugs, featu...(LICENCE_LICENSE_NOUN_SINGULAR)
121-121
: Fix subject-verb agreement in documentation noteThe pronoun 'they' requires a different verb form.
-Please use the [Issue Tracker](https://github.com/WMD-group/smact/issues) to report bugs or request features in the first instance. While we hope that most questions can be answered by searching [the docs](https://smact.readthedocs.io/en/latest/), we welcome new questions on the issue tracker, especially if they helps us improve the docs! +Please use the [Issue Tracker](https://github.com/WMD-group/smact/issues) to report bugs or request features in the first instance. While we hope that most questions can be answered by searching [the docs](https://smact.readthedocs.io/en/latest/), we welcome new questions on the issue tracker, especially if they help us improve the docs!🧰 Tools
🪛 LanguageTool
[grammar] ~121-~121: The pronoun ‘they’ must be used with a non-third-person form of a verb.
Context: ...n the issue tracker, especially if they helps us improve the docs! For other queries ...(NON3PRS_VERB)
docs/conf.py (1)
37-37
: LGTM! Consider adding configuration commentsThe MyST-NB extension and its configuration are well-structured. The disabled notebook execution is a sensible default for build performance.
Consider adding a brief comment explaining why notebook execution is disabled:
jupyter_execute_notebooks = "off" +# Notebooks are not executed during builds to ensure fast documentation generation
Also applies to: 40-48
docs/tutorials/smact_generation_of_solar_oxides.ipynb (3)
42-43
: Consider improving package installation robustnessThe package installation could be more reliable with these improvements:
- Pin package versions to ensure reproducibility
- Add error handling for failed installations
- Install SMACT from a specific release tag instead of the main branch
- !pip install git+https://github.com/WMD-group/SMACT.git --quiet - !pip install smact pandas pymatgen matminer + try: + !pip install git+https://github.com/WMD-group/[email protected] --quiet + !pip install smact==2.1.0 pandas==2.0.0 pymatgen==2023.5.31 matminer==0.7.3 + except Exception as e: + print(f"Failed to install required packages: {e}") + raise
71-84
: Add import error handling with helpful messagesConsider adding try-except blocks for third-party imports to provide helpful error messages when dependencies are missing.
# Third-party imports -import pandas as pd -from matminer.featurizers import composition as cf -from matminer.featurizers.base import MultipleFeaturizer -from matminer.featurizers.conversions import StrToComposition -from pymatgen.core import Composition +try: + import pandas as pd + from matminer.featurizers import composition as cf + from matminer.featurizers.base import MultipleFeaturizer + from matminer.featurizers.conversions import StrToComposition + from pymatgen.core import Composition +except ImportError as e: + print(f"Please install required packages: {e}") + raise
117-152
: Consider moving excluded elements to a configuration fileThe excluded elements list would be better maintained in a separate configuration file with documentation explaining the exclusion criteria for each element.
Consider creating a
config.py
file:EXCLUDED_ELEMENTS = { "H": "Hydrogen forms volatile compounds", "He": "Noble gas, chemically inert", # ... document reasons for other exclusions }examples/Structure_Prediction/Li-Garnet_Generator.ipynb (5)
Line range hint
32-40
: Consider using set operations for element filteringThe current implementation of filtering unwanted elements could be more efficient using set operations.
-dont_want = ["He", "Ne", "Ar", "Kr", "Xe", "Pm", "Tc"] -for unwanted in dont_want: - symbols_list.remove(unwanted) +dont_want = set(["He", "Ne", "Ar", "Kr", "Xe", "Pm", "Tc"]) +symbols_list = [symbol for symbol in symbols_list if symbol not in dont_want]
115-115
: Consider using list comprehension for better readabilityThe list comprehension is a good choice here, but consider breaking it into multiple lines for better readability when the expression becomes complex.
-ABCD_pairs = [(x, y, z, a) for x in A_els for y in B_els for z in C_els for a in D_els] +ABCD_pairs = [ + (x, y, z, a) + for x in A_els + for y in B_els + for z in C_els + for a in D_els +]
282-282
: Potential performance improvement in sustainability calculationThe sustainability calculation involves multiple attribute accesses and method calls within the loop. Consider caching the composition weight fractions.
- sus_factor += Composition(comp).get_wt_fraction(i) * smact.Element(i.symbol).HHI_r + comp_obj = Composition(comp) + wt_fractions = {el: comp_obj.get_wt_fraction(el) for el in comp_obj.elements} + sus_factor += wt_fractions[i] * smact.Element(i.symbol).HHI_r
Line range hint
305-319
: Consider vectorised operations for DataFrame preparationThe current implementation uses explicit loops to prepare data for the DataFrame. Consider using vectorised operations or list comprehensions for better performance.
-species = [] -A = [] -B = [] -C = [] -D = [] -for i in range(len(flat_list)): - species.append(pretty_formulas[i]) - A.append(unparse_spec((flat_list[i][0][0], flat_list[i][1][0]))) - B.append(unparse_spec((flat_list[i][0][1], flat_list[i][1][1]))) - C.append(unparse_spec((flat_list[i][0][2], flat_list[i][1][2]))) - D.append(unparse_spec((flat_list[i][0][3], flat_list[i][1][3]))) +species = pretty_formulas +A = [unparse_spec((fl[0][0], fl[1][0])) for fl in flat_list] +B = [unparse_spec((fl[0][1], fl[1][1])) for fl in flat_list] +C = [unparse_spec((fl[0][2], fl[1][2])) for fl in flat_list] +D = [unparse_spec((fl[0][3], fl[1][3])) for fl in flat_list]
Line range hint
529-529
: Consider adding index to CSV exportWhen exporting data that might be used for analysis or tracking, consider including the index in the CSV file.
-df.to_csv("./Li-Garnet_Comps_sus.csv", index=False) +df.to_csv("./Li-Garnet_Comps_sus.csv", index=True, index_label="id")smact/dopant_prediction/doper.py (7)
33-35
: Consider enhancing parameter validationThe type hints are well-updated, but consider adding validation for
original_species
to ensure it's not empty and contains valid chemical species.
54-58
: Improve error message clarityConsider enhancing the error messages to be more descriptive:
- raise ValueError("Only one of filepath or embedding should be provided") + raise ValueError("Cannot specify both filepath and embedding - please provide only one") - raise ValueError(f"Embedding {embedding} is not supported") + raise ValueError(f"Unsupported embedding type '{embedding}'. Currently only 'skipspecies' is supported")
Line range hint
85-89
: Consider more robust data structure validationThe assertion
len(dopants) == 4
suggests a rigid data structure. Consider:
- Using a named tuple or dataclass for clearer data structure
- Adding more descriptive error handling instead of assertions
from dataclasses import dataclass @dataclass class DopantResult: selected_site: str original_specie: str sub_prob: float selectivity: float
160-177
: Consider restructuring the documentationWhile the docstring is comprehensive, consider:
- Moving the lengthy example to a separate documentation file or Jupyter notebook
- Adding a shorter, more focused example in the docstring
383-390
: Enhance error handling in plotting functionReplace assertion with more user-friendly error handling:
- assert self.results, "Dopants are not calculated. Run get_dopants first." + if not self.results: + raise ValueError("No dopants data available. Please run get_dopants() first.")
Line range hint
404-431
: Add support for environments without ANSI colourConsider adding a check for colour support or making it configurable:
def _supports_color(self): """Check if the terminal supports ANSI colour codes""" import sys return hasattr(sys.stdout, 'isatty') and sys.stdout.isatty() def _colorize(self, text, color_code): """Conditionally apply ANSI colour codes""" if self._supports_color(): return f"\033[{color_code}m{text}\033[0m" return text
Line range hint
1-431
: Overall well-structured scientific computing moduleThe code demonstrates good scientific computing practices with proper type hints and documentation. While there are opportunities for improvement in error handling and code organisation, the core functionality appears robust and well-implemented.
Consider for future iterations:
- Adding logging for debugging complex calculations
- Implementing caching for expensive computations
- Adding parallel processing support for large datasets
smact/__init__.py (1)
443-444
: Consider simplifying the nested loop logic.The nested loop for lattice comparison could be simplified using list comprehension or any() for better readability and potentially better performance.
- i = 0 - for site1 in lattice1: - for site2 in lattice2: - if site1.symbol == site2.symbol and are_eq(site1.position, site2.position, tolerance=tolerance): - i += 1 - if i == len(lattice1): - lattices_are_same = True + matches = sum( + 1 for site1 in lattice1 + if any(site1.symbol == site2.symbol and are_eq(site1.position, site2.position, tolerance=tolerance) + for site2 in lattice2) + ) + lattices_are_same = matches == len(lattice1)smact/data_loader.py (1)
Line range hint
946-952
: Consider using pandas more efficiently.The current implementation reads the CSV file row by row. Consider using pandas' vectorized operations for better performance.
- df = pd.read_csv(os.path.join(data_directory, "element_valence_modified.csv")) - for _index, row in df.iterrows(): - key = row.iloc[0] - dataset = {"NValence": int(row.iloc[1])} - _element_valence_data[key] = dataset + df = pd.read_csv( + os.path.join(data_directory, "element_valence_modified.csv"), + index_col=0 + ) + _element_valence_data = df.to_dict('index')examples/Structure_Prediction/Garnet_Database.ipynb (5)
16-25
: Consider adding type hints and error handling for the imports.The imports look clean, but consider:
- Adding error handling for import failures
- Using type hints with the
annotations
importfrom __future__ import annotations try: import pandas as pd from pymatgen.analysis import structure_matcher from pymatgen.ext.matproj import MPRester from smact.structure_prediction import database except ImportError as e: raise ImportError(f"Required package not found: {e}")
Line range hint
322-377
: Enhance structure matching implementation with progress tracking.The structure matching implementation could benefit from:
- Progress tracking for long iterations
- Memory optimization for large datasets
from tqdm import tqdm fitted_data = [] for i in tqdm(data, desc="Matching structures"): if SM.fit_anonymous(i["structure"], known_garnet_structure): fitted_data.append(i)
Line range hint
443-570
: Handle pymatgen warnings more gracefully.The code generates multiple UserWarnings about electronegativity. Consider suppressing these warnings or handling them more elegantly.
import warnings with warnings.catch_warnings(): warnings.filterwarnings("ignore", category=UserWarning, module="pymatgen.core.periodic_table") # Your existing query code here
Line range hint
746-754
: Add error handling for database operations.The database creation and table addition should include error handling to prevent crashes.
try: DB = database.StructureDB("Garnets.db") for table in ["Garnets", "Experimental", "Theoretical"]: DB.add_table(table) except Exception as e: print(f"Database operation failed: {e}") raise
Line range hint
761-840
: Optimize structure parsing and database insertion.The current implementation could be more efficient:
- Consider batch processing for database insertions
- Add error handling for failed structure parsing
- Use list comprehension for cleaner code
def safe_parse_structure(item): try: return database.parse_mprest(item) except Exception as e: print(f"Failed to parse structure: {e}") return None # Parse structures with error handling structs = [s for s in (safe_parse_structure(i) for i in new_fitted_data) if s is not None] exp_structs = [s for s in (safe_parse_structure(i) for i in new_experimental_list) if s is not None] theo_structs = [s for s in (safe_parse_structure(i) for i in new_theoretical_list) if s is not None] # Add structures with transaction handling try: DB.add_structs(structs, "Garnets") DB.add_structs(exp_structs, "Experimental") DB.add_structs(theo_structs, "Theoretical") except Exception as e: print(f"Failed to add structures to database: {e}") raisedocs/tutorials/oxidation_states.ipynb (3)
29-52
: Consider adding a comment explaining the multiprocess importThe installation and import section is well-structured, but it would be helpful to add a brief comment explaining why the multiprocess package is needed (e.g., for parallel processing of combinations).
Line range hint
73-162
: Consider improving code maintainabilityThe composition generation code could benefit from:
- Adding type hints to improve code readability and maintainability
- Breaking down the
smact_filter
function into smaller, more focused functionsExample refactor for type hints:
-def parse_species(species): +def parse_species(species: str) -> tuple[str, int]: match = re.match(r"([A-Za-z]+)([0-9.]+)", species) return match.group(1), int(match.group(2))
Line range hint
393-450
: Consider enhancing plot readabilityThe visualization code could benefit from:
- Adding a title to the plot
- Including grid lines for better readability
- Adding a legend explaining the markers
fig, ax = plt.subplots() ax.scatter(x=thresh, y=num_compounds, marker="x", color="orange") +ax.set_title("Number of Compounds vs Probability Threshold") +ax.grid(True, linestyle='--', alpha=0.7) +ax.legend(['Compounds'], loc='best') plt.xlabel("Threshold") plt.ylabel("Number of compounds")docs/tutorials/smact_validity_of_GNoMe.ipynb (3)
Line range hint
17-37
: Consider adding error handling for package installation.While the installation code works, it would be more robust to:
- Add version checks
- Handle installation failures gracefully
try: import google.colab IN_COLAB = True except: + print("Not running in Google Colab") IN_COLAB = False if IN_COLAB: - !pip install git+https://github.com/WMD-group/SMACT.git --quiet + try: + !pip install git+https://github.com/WMD-group/SMACT.git --quiet + print("Successfully installed SMACT") + except Exception as e: + print(f"Failed to install SMACT: {e}")
340-342
: Consider adding progress reporting for long-running operation.The parallel processing of the validity test could benefit from progress reporting.
-df["smact_valid"] = df["Composition"].parallel_apply(smact_validity) # Alloys will pass the test +# Add progress reporting +total = len(df) +print(f"Processing {total} compositions...") +df["smact_valid"] = df["Composition"].parallel_apply(smact_validity) # Alloys will pass the test +print(f"Completed processing {total} compositions")
Line range hint
356-391
: Enhance plot readability and accessibility.The visualization could be improved with:
- Better color contrast
- Larger font sizes
- More descriptive title
-fig, ax = plt.subplots(figsize=(8, 6)) +# Set style for better readability +plt.style.use('seaborn') +fig, ax = plt.subplots(figsize=(10, 6)) bars = ax.barh(bar_labels, counts) ax.bar_label(bars) ax.set_xlabel("Number of materials") -ax.set_title("GNoME analysis") +ax.set_title("Charge-Neutral Materials in GNoME Database", pad=20) + +# Increase font sizes +plt.rcParams.update({'font.size': 12}) +ax.title.set_size(14) + plt.xticks(rotation=45) ax.set_xlim(right=max(counts) * 1.1) # Adjust x-axis limit based on data plt.tight_layout()examples/Structure_Prediction/Li-Garnets_SP-Pym-new.ipynb (1)
933-935
: Enhance the visualisation for better clarityConsider these improvements to make the plot more informative:
- Add axis labels to clarify what the probability represents
- Include a title describing the distribution
- Either remove or document why the save command is commented out
plt.figure(figsize=(8, 6)) ax1 = sns.histplot(data=results, x="probability", hue="In DB?", multiple="stack") +ax1.set_xlabel("Prediction Probability") +ax1.set_ylabel("Count") +plt.title("Distribution of Structure Prediction Probabilities") -# ax2=sns.histplot(new_series, label="New Garnets") -# plt.savefig("Prediction_Probability_Distribution_pym.png")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (10)
SMACT.png
is excluded by!**/*.png
docs/smact_toc.jpg
is excluded by!**/*.jpg
docs/tutorials/stable_materials_hull.csv
is excluded by!**/*.csv
examples/Cation_mutation/cation_mutation.png
is excluded by!**/*.png
examples/Inverse_perovskites/Perovskite.jpg
is excluded by!**/*.jpg
examples/Structure_Prediction/_Materials Project .csv
is excluded by!**/*.csv
smact/data/Covalent_radii.csv
is excluded by!**/*.csv
smact/data/SSE.csv
is excluded by!**/*.csv
smact/data/SSE_2015.csv
is excluded by!**/*.csv
smact/data/SSE_Pauling.csv
is excluded by!**/*.csv
📒 Files selected for processing (82)
.github/pull_request_template.md
(1 hunks).github/release.yml
(1 hunks).github/workflows/ci.yml
(1 hunks).github/workflows/combine-prs.yml
(4 hunks).github/workflows/publish-to-pypi.yml
(1 hunks).gitignore
(1 hunks).pre-commit-config.yaml
(1 hunks).readthedocs.yaml
(1 hunks)CONTRIBUTING.md
(1 hunks)README.md
(4 hunks)dev_docs/contribution_guidelines.md
(1 hunks)dev_docs/design_docs/StructurePredictor.md
(1 hunks)docs/conf.py
(4 hunks)docs/examples.rst
(1 hunks)docs/examples/compound_electroneg.ipynb
(1 hunks)docs/examples/distorter.ipynb
(1 hunks)docs/examples/doper.ipynb
(1 hunks)docs/examples/elements_and_species.ipynb
(1 hunks)docs/examples/filter.ipynb
(1 hunks)docs/examples/lists.ipynb
(1 hunks)docs/examples/oxidation_states.ipynb
(1 hunks)docs/examples/valence_electron_count.ipynb
(1 hunks)docs/examples/validity.ipynb
(1 hunks)docs/index.rst
(2 hunks)docs/introduction.rst
(1 hunks)docs/requirements.in
(1 hunks)docs/requirements.txt
(2 hunks)docs/smact.dopant_prediction.doper.rst
(1 hunks)docs/smact.dopant_prediction.rst
(1 hunks)docs/smact.properties.rst
(1 hunks)docs/smact.utils.rst
(1 hunks)docs/tutorials.rst
(1 hunks)docs/tutorials/crystal_space.ipynb
(1 hunks)docs/tutorials/crystal_space_visualisation.ipynb
(13 hunks)docs/tutorials/oxidation_states.ipynb
(14 hunks)docs/tutorials/smact_generation_of_solar_oxides.ipynb
(1 hunks)docs/tutorials/smact_validity_of_GNoMe.ipynb
(6 hunks)examples/Cation_mutation/cation_mutation.ipynb
(0 hunks)examples/Counting/ElementCombinationsParallel.py
(0 hunks)examples/Counting/Generate_compositions_lists.ipynb
(0 hunks)examples/Counting/Raw_combinations.ipynb
(0 hunks)examples/Crystal_Space/0_screening.ipynb
(0 hunks)examples/Inverse_perovskites/Inverse_formate_perovskites.ipynb
(0 hunks)examples/Oxidation_states/mp-540839_CsPbI3_oxi.cif
(0 hunks)examples/README.md
(0 hunks)examples/Simple_wrappers/Simple_wrappers.ipynb
(0 hunks)examples/Solar_oxides/SolarOxides.ipynb
(0 hunks)examples/Structure_Prediction/Garnet_Database.ipynb
(16 hunks)examples/Structure_Prediction/Li-Garnet_Generator.ipynb
(19 hunks)examples/Structure_Prediction/Li-Garnets_SP-Pym-new.ipynb
(34 hunks)examples/vec_example.py
(0 hunks)paper.bib
(6 hunks)paper.md
(1 hunks)pyproject.toml
(1 hunks)requirements-dev.txt
(1 hunks)requirements.txt
(7 hunks)setup.py
(2 hunks)smact/__init__.py
(19 hunks)smact/benchmarking/pymatgen_benchmark.py
(2 hunks)smact/benchmarking/smact_benchmark.py
(2 hunks)smact/benchmarking/utilities.py
(2 hunks)smact/builder.py
(3 hunks)smact/data/oxidation_state_probability_table.json
(1 hunks)smact/data/oxidation_states.txt
(1 hunks)smact/data/oxidation_states_SP.txt
(1 hunks)smact/data/oxidation_states_icsd.txt
(1 hunks)smact/data/oxidation_states_icsd24_common.txt
(1 hunks)smact/data/oxidation_states_icsd24_filtered.txt
(1 hunks)smact/data/oxidation_states_icsd24_raw.txt
(1 hunks)smact/data/oxidation_states_pmg.txt
(1 hunks)smact/data/oxidation_states_wiki.txt
(1 hunks)smact/data/oxidationstates.data
(0 hunks)smact/data/solid_properties.txt
(1 hunks)smact/data_loader.py
(32 hunks)smact/distorter.py
(4 hunks)smact/dopant_prediction/__init__.py
(1 hunks)smact/dopant_prediction/doper.py
(10 hunks)smact/lattice.py
(4 hunks)smact/lattice_parameters.py
(12 hunks)smact/mainpage.py
(1 hunks)smact/oxidation_states.py
(6 hunks)smact/properties.py
(9 hunks)
⛔ Files not processed due to max files limit (27)
- smact/screening.py
- smact/structure_prediction/init.py
- smact/structure_prediction/database.py
- smact/structure_prediction/mutation.py
- smact/structure_prediction/prediction.py
- smact/structure_prediction/probability_models.py
- smact/structure_prediction/structure.py
- smact/structure_prediction/utilities.py
- smact/tests/files/CaTiO3.json
- smact/tests/files/Fe3O4.json
- smact/tests/files/NaCl.json
- smact/tests/files/mp-540839_CsPbI3_oxi.json
- smact/tests/files/pymatgen_structure.json
- smact/tests/files/test_icsd_oxidation_states_filter_1000.txt
- smact/tests/files/test_icsd_oxidation_states_filter_1000_w_0_ox_state.txt
- smact/tests/files/test_lambda_tab.json
- smact/tests/test_core.py
- smact/tests/test_doper.py
- smact/tests/test_structure.py
- smact/tests/test_utils.py
- smact/utils/band_gap_simple.py
- smact/utils/composition.py
- smact/utils/crystal_space/init.py
- smact/utils/crystal_space/download_compounds_with_mp_api.py
- smact/utils/crystal_space/generate_composition_with_smact.py
- smact/utils/crystal_space/plot_embedding.py
- smact/utils/oxidation.py
💤 Files with no reviewable changes (12)
- examples/Cation_mutation/cation_mutation.ipynb
- examples/Counting/ElementCombinationsParallel.py
- examples/Counting/Generate_compositions_lists.ipynb
- examples/Counting/Raw_combinations.ipynb
- examples/Crystal_Space/0_screening.ipynb
- examples/Inverse_perovskites/Inverse_formate_perovskites.ipynb
- examples/Oxidation_states/mp-540839_CsPbI3_oxi.cif
- examples/README.md
- examples/Simple_wrappers/Simple_wrappers.ipynb
- examples/Solar_oxides/SolarOxides.ipynb
- examples/vec_example.py
- smact/data/oxidationstates.data
✅ Files skipped from review due to trivial changes (28)
- .github/pull_request_template.md
- .github/release.yml
- .github/workflows/ci.yml
- .github/workflows/combine-prs.yml
- .github/workflows/publish-to-pypi.yml
- CONTRIBUTING.md
- docs/examples.rst
- docs/examples/compound_electroneg.ipynb
- docs/examples/elements_and_species.ipynb
- docs/examples/filter.ipynb
- docs/smact.dopant_prediction.doper.rst
- docs/smact.dopant_prediction.rst
- docs/smact.properties.rst
- docs/smact.utils.rst
- docs/tutorials.rst
- paper.bib
- requirements-dev.txt
- requirements.txt
- smact/data/oxidation_state_probability_table.json
- smact/data/oxidation_states.txt
- smact/data/oxidation_states_SP.txt
- smact/data/oxidation_states_icsd.txt
- smact/data/oxidation_states_icsd24_filtered.txt
- smact/data/oxidation_states_pmg.txt
- smact/data/oxidation_states_wiki.txt
- smact/data/solid_properties.txt
- smact/dopant_prediction/init.py
- smact/mainpage.py
🧰 Additional context used
🪛 LanguageTool
README.md
[misspelling] ~24-~24: Possible spelling mistake found.
Context: ...a enough, nature will always confess_ - Roland Coase (from 'How should economists choose?') ...
(EN_MULTITOKEN_SPELLING_TWO)
[style] ~46-~46: Consider a shorter alternative to avoid wordiness.
Context: ...lied to generated lists of compositions in order to screen for particular properties. These...
(IN_ORDER_TO_PREMIUM)
[style] ~66-~66: Would you like to use the Oxford spelling “initialize”? The spelling ‘initialise’ is also correct.
Context: ...es the loading of external data used to initialise the core smact.Element
and `smact.Spe...
(OXFORD_SPELLING_Z_NOT_S)
[grammar] ~79-~79: It looks like ‘collections’ doesn’t match ‘a’. Did you mean “a collection” or just “collections”?
Context: ...rediction**: A submodule which contains a collections of tools for predicting dopants. ## Re...
(A_NNS_IN)
[uncategorized] ~89-~89: Possible missing comma found.
Context: ...est stable release can be installed via pip which will automatically set up other P...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~101-~101: Possible missing comma found.
Context: ....com/WMD-group/SMACT.git For developer installation SMACT can be installed from a copy of t...
(AI_HYDRA_LEO_MISSING_COMMA)
[locale-violation] ~113-~113: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ... immediately reflected on the PATH. ## License and attribution Python code and origin...
(LICENCE_LICENSE_NOUN_SINGULAR)
[locale-violation] ~115-~115: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ... data tables are licensed under the MIT License. ## Development notes ### Bugs, featu...
(LICENCE_LICENSE_NOUN_SINGULAR)
[grammar] ~121-~121: The pronoun ‘they’ must be used with a non-third-person form of a verb.
Context: ...n the issue tracker, especially if they helps us improve the docs! For other queries ...
(NON3PRS_VERB)
dev_docs/contribution_guidelines.md
[style] ~11-~11: Would you like to use the Oxford spelling “summarized”? The spelling ‘summarised’ is also correct.
Context: ...he steps for a new piece of work can be summarised as follows: 1. Push up or create [an i...
(OXFORD_SPELLING_Z_NOT_S)
[style] ~18-~18: Consider removing “of” to be more concise
Context: ...are finished with the work, ensure that all of the unit tests pass on your own machine....
(ALL_OF_THE)
[uncategorized] ~25-~25: Possible missing comma found.
Context: ...eral overview of using pull requests on GitHub look [in the GitHub docs](https://help....
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~27-~27: Possible missing comma found.
Context: ...t-pull-requests). When creating a pull request you should: - Ensure that the title su...
(AI_HYDRA_LEO_MISSING_COMMA)
dev_docs/design_docs/StructurePredictor.md
[uncategorized] ~8-~8: A comma may be missing after the conjunctive/linking adverb ‘Therefore’.
Context: ..., in terms of interfacing with SMACT
. Therefore we propose implement a similar method i...
(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
[uncategorized] ~39-~39: The abbreviation “e.g.” (= for example) requires two periods.
Context: ... Generate a list of target compositions eg for Ba$_2$OF$_2$ `[Ba2+, O2-, F1-, (2, ...
(E_G)
[grammar] ~85-~85: Possible agreement error. The noun ‘database’ seems to be countable.
Context: ...lysis of structures - Simple two field database - Field 1: database key - Field 2: ...
(CD_NN)
paper.md
[style] ~42-~42: Would you like to use the Oxford spelling “revolutionizing”? The spelling ‘revolutionising’ is also correct.
Context: ... The paradigm of data-driven science is revolutionising the materials discovery process. There ...
(OXFORD_SPELLING_Z_NOT_S)
[uncategorized] ~44-~44: Possible missing comma found.
Context: ... where sets of element combinations are generated then screened using chemical filters. I...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~60-~60: A comma might be missing here.
Context: ...from many members of the Walsh research group including Andrew Morris, Timothy Gaunle...
(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
smact/data/oxidation_states_icsd24_common.txt
[grammar] ~7-~7: Només va sense accent quan és un animal o un nom de lletra. Escriviu «bé» (adverbi) o «ve» (del v. ‘venir’).
Context: ...st common non-zero values # H 1 He Li 1 Be 2 B 3 C 4 N -3 O -2 F -1 Ne Na 1 Mg 2 A...
(BE)
[grammar] ~36-~36: Possible confusió. ¿Volíeu dir «als»?
Context: ...Mn 2 Fe 3 Co 2 Ni 2 Cu 2 Zn 2 Ga 3 Ge 4 As 5 Se -2 Br -1 Kr 2 Rb 1 Sr 2 Y 3 Zr 4 N...
(AS_ALS)
[grammar] ~42-~42: Si és conjunció, cal escriure «I».
Context: ... 3 Ge 4 As 5 Se -2 Br -1 Kr 2 Rb 1 Sr 2 Y 3 Zr 4 Nb 5 Mo 6 Tc 7 Ru 4 Rh 3 Pd 2 Ag...
(Y_CONJUNCIO)
[grammar] ~70-~70: ¿Volíeu dir «o» (conjunció) en comptes de «ho» (pronom)?
Context: ...Pr 3 Nd 3 Pm 3 Sm 3 Eu 3 Gd 3 Tb 3 Dy 3 Ho 3 Er 3 Tm 3 Yb 3 Lu 3 Hf 4 Ta 5 W 6 Re ...
(FINS_I_TOT)
[grammar] ~80-~80: Paraula desconeguda com a nom propi.
Context: ... Tm 3 Yb 3 Lu 3 Hf 4 Ta 5 W 6 Re 7 Os 5 Ir 4 Pt 4 Au 1 Hg 2 Tl 1 Pb 2 Bi 3 Po 4 At...
(VERBS_NOMSPROPIS)
[grammar] ~102-~102: S’accentua quan és del v. ‘ser’.
Context: ... Pa 5 U 6 Np 6 Pu 3 Am 3 Cm 3 Bk 3 Cf 3 Es 3 Fm Md No Lr
(ES)
smact/data/oxidation_states_icsd24_raw.txt
[misspelling] ~3-~3: Söz ve sayı arasında defis yoqtur: "V-2"
Context: ...idation state set # Source: ICSD (2024, V2) # H -1 1 2 3 He Li 1 Be 2 B -5 -3 -2 -...
(NUMBER_BEFORE_DEFIS_MISSING)
🪛 Markdownlint
dev_docs/design_docs/StructurePredictor.md
79-79: null
No space after hash on atx style heading
(MD018, no-missing-space-atx)
🔇 Additional comments (70)
docs/requirements.in (1)
1-6
: Verify package version compatibility
Please verify that these specific versions are compatible:
- sphinx 5.3.0 with sphinx-book-theme 1.1.3
- sphinx 5.3.0 with myst-nb 1.1.1
Consider checking the packages' release notes or documentation for any known compatibility issues.
.readthedocs.yaml (2)
4-6
: Environment updates look good, please verify compatibility
The upgrade to Ubuntu 22.04 and Python 3.10 represents a positive move towards newer, supported versions. This aligns well with your testing environment.
Please ensure all documentation build dependencies are compatible with Python 3.10.
10-10
: Configuration paths are properly specified
The Sphinx configuration and requirements paths are correctly specified and follow standard conventions.
Also applies to: 15-15
smact/benchmarking/smact_benchmark.py (1)
3-5
: LGTM! Import changes follow best practices.
The changes to imports enhance code clarity and maintainability:
- Using
__future__
annotations is forward-compatible - Absolute imports are more explicit and easier to understand than relative imports
smact/benchmarking/pymatgen_benchmark.py (1)
3-3
: Excellent addition of future annotations import!
This import enables postponed evaluation of type annotations, which is particularly beneficial for avoiding forward reference issues and circular imports.
smact/benchmarking/utilities.py (3)
3-3
: Excellent addition of future annotations!
This change enables postponed evaluation of annotations, which is the preferred approach for modern Python development.
25-25
: Proper defensive programming practice!
Initialising value
before the loop prevents potential UnboundLocalError scenarios. This is particularly important for edge cases where n
might be 0.
39-40
: Verify logging configuration behaviour
Whilst the fname assignment is more concise, please verify that calling logging.basicConfig()
multiple times doesn't cause any issues with your logging setup, as subsequent calls to basicConfig()
are typically ignored by the logging system.
smact/lattice.py (1)
2-4
: Well-structured module documentation and imports
The module docstring clearly describes the purpose, and the __future__
import ensures proper type annotation handling.
docs/index.rst (1)
57-57
: Consider clarifying the distinction between examples and tutorials
It would be helpful to add a brief description or comment explaining how the 'tutorials' section differs from the 'examples' section to help users navigate the documentation more effectively.
.pre-commit-config.yaml (2)
40-40
: Consider using a stable version of prettier
The current configuration uses an alpha version (v4.0.0-alpha.8
). For production environments, it's recommended to use stable releases to avoid potential breaking changes or instability.
1-67
: Verify tool integration and execution order
The pre-commit configuration includes multiple formatting and linting tools. Please verify:
- The execution order doesn't cause conflicts (e.g., ruff-format and prettier)
- All tool versions are mutually compatible
- The pipeline performs efficiently without redundant operations
setup.py (2)
21-22
: LGTM! Good use of context manager
The implementation correctly uses a context manager for file handling and appropriately separates the long description logic.
Line range hint 58-59
: Verify pymatgen version constraints
The version constraint for pymatgen (>=2024.2.20,<2024.8.8
) is quite specific and might be overly restrictive. Consider if such tight version bounds are necessary, as they could cause dependency conflicts in downstream projects.
docs/introduction.rst (1)
12-13
: LGTM! Clear extension of application domains.
The addition of water splitting and thermoelectrics as application domains enhances the introduction's completeness.
docs/examples/valence_electron_count.ipynb (1)
52-69
: LGTM! Notebook configuration is properly set up.
The metadata configuration is correct and follows Jupyter notebook standards. The Python version (3.10.14) aligns with the PR objectives.
docs/examples/validity.ipynb (1)
82-82
: Verify Python version compatibility
The notebook uses Python 3.12.4, which is quite recent. Consider testing with Python 3.10 as specified in the PR objectives to ensure compatibility.
docs/examples/lists.ipynb (2)
46-48
: LGTM! Clear and helpful explanation
The explanation effectively communicates the performance benefits of using a dictionary for element lookups.
89-89
: Consider updating Python version
The notebook uses Python 3.9.16, but according to the PR objectives, the project is tested with Python 3.10. Consider updating the kernel to maintain consistency.
docs/examples/distorter.ipynb (1)
103-103
: Verify Python version compatibility
The notebook uses Python 3.12.4, which is quite recent. Please verify that all dependencies support this version, or consider specifying a more widely supported version (e.g., 3.10).
docs/requirements.txt (3)
2-5
: LGTM! Verify pip-compile command execution.
The file header correctly documents the generation method. Ensure the team knows to run pip-compile requirements.in
when updating dependencies.
7-14
: Review the necessity of Jupyter-related dependencies.
Several new packages have been added that appear to be related to Jupyter notebook functionality (ipykernel, ipython, etc.). Whilst these are valid for documentation, they significantly increase the dependency footprint.
Consider whether these could be moved to an optional dependencies file if they're only needed for notebook examples.
Also applies to: 58-63
159-160
: Consider implementing version constraints for security updates.
Whilst exact version pinning ensures reproducibility, it may prevent security updates. For critical packages like requests
and urllib3
, consider using version constraints (e.g., >=2.32.0,<3.0.0
) to allow minor security updates whilst preventing breaking changes.
Also applies to: 225-227
docs/examples/doper.ipynb (1)
157-157
: Update Python version to match PR requirements
The notebook specifies Python 3.9.16, but the PR indicates testing with Python 3.10. Consider updating the kernel to maintain consistency with the testing environment.
smact/properties.py (7)
1-3
: LGTM! Good practice adding future annotations
The addition of from __future__ import annotations
is appropriate for the modern type hint syntax used throughout the file.
11-23
: LGTM! Well-structured docstring and modern type hints
The updated type hints using the |
syntax and the well-formatted NumPy-style docstring improve code clarity and maintainability.
26-27
: LGTM! Improved error handling with specific exception
Good practice using TypeError with a descriptive message that includes the actual type received.
91-94
: LGTM! Well-structured type hints
The type hints are clear and properly handle optional parameters using the modern syntax.
Line range hint 96-120
: LGTM! Comprehensive and well-formatted docstring
The docstring provides clear documentation with proper section markers and includes helpful mathematical explanations.
124-129
: LGTM! Clear type checking and error handling
Good implementation of type checking with a helpful error message for invalid input types.
187-187
: LGTM! Improved exception handling
Good use of from None
to prevent unnecessary exception chaining and provide cleaner error messages.
smact/oxidation_states.py (2)
9-10
: Well-considered addition of annotations import
The addition of from __future__ import annotations
is appropriate for enabling modern type hint syntax whilst maintaining compatibility with older Python versions.
28-37
: Excellent modernisation of type hints
The updated type hints follow PEP 604 and provide clearer, more maintainable type annotations. The docstring is well-structured with proper argument documentation.
smact/lattice_parameters.py (3)
6-6
: LGTM: Future annotations import added
Good preparation for future type hinting support.
12-25
: Verify parameter terminology consistency
The docstring refers to "shannon_radius", but the AI summary suggests a terminology change to "covalent_radius". Please verify the intended parameter naming convention.
Line range hint 1-350
: Consider adding unit tests
Given the complexity of the crystallographic calculations, it would be beneficial to add unit tests to verify:
- Expected lattice parameters for known structures
- Edge cases with extreme radius values
- Parameter validation
Would you like assistance in generating unit tests for these functions?
docs/examples/oxidation_states.ipynb (1)
221-221
: Version mismatch with PR objectives
The notebook uses Python 3.11.5, but the PR objectives specify Python 3.10. Please ensure version consistency across the project.
pyproject.toml (3)
3-4
: LGTM: Build system configuration is appropriate
The setuptools version constraint and build backend configuration are well-specified.
41-50
: Consider relaxing version constraints
Two potential concerns with the dependency specifications:
- The pymatgen constraint
>=2024.2.20,<2024.8.8
is quite narrow and might cause compatibility issues in the future. - The numpy constraint
<2
might be too restrictive as numpy 2.0 brings performance improvements.
Consider:
- Using a more flexible version range for pymatgen
- Testing compatibility with numpy 2.0 to potentially remove the constraint
175-279
: LGTM: Codespell configuration is well-suited for chemistry project
The extensive ignore list for chemical elements and careful skip patterns demonstrate good attention to domain-specific requirements.
docs/tutorials/crystal_space.ipynb (2)
100-106
: Ensure data directory exists before saving
The code uses a relative path for saving without ensuring the directory exists. This could cause FileNotFoundError if the directory structure isn't present.
Consider adding directory creation:
from pathlib import Path
save_dir = Path("data/binary")
save_dir.mkdir(parents=True, exist_ok=True)
df_smact = generate_composition_with_smact(
num_elements=2,
max_stoich=8,
max_atomic_num=103,
num_processes=8,
save_path=str(save_dir / "df_binary_label.pkl"),
)
241-264
: LGTM! Well-structured notebook metadata and next steps
The metadata section is complete and the next steps are clearly documented with appropriate links.
docs/tutorials/crystal_space_visualisation.ipynb (3)
7-7
: Documentation improvements enhance clarity and accessibility
The markdown changes effectively improve the tutorial's structure and accessibility:
- Clear title that reflects the tutorial's purpose
- Added prerequisite link for better navigation
- Google Colab integration for interactive learning
- Consistent section header capitalisation
Also applies to: 14-22, 36-36, 46-52
128-128
: Verify if the reduced sample size is sufficient
The sample size has been reduced from 3000 to 100. While this might speed up the tutorial, please verify that this smaller sample size:
- Provides statistically significant results
- Adequately represents the distribution of the original dataset
- Demonstrates the dimension reduction techniques effectively
385-385
: Verify Python version compatibility
The notebook specifies Python 3.10.14. Please verify:
- All dependencies support Python 3.10
- No breaking changes affect the functionality
- CI/CD pipelines are updated accordingly
docs/conf.py (4)
65-67
: LGTM! Project metadata updates are appropriate
The capitalisation of SMACT, updated copyright year, and generic author attribution align with standard open-source practices.
126-128
: Please verify logo file and consider title consistency
The theme configuration looks good, but please:
- Verify that 'smact_toc.jpg' exists in the docs directory
- Consider using consistent capitalisation between project name and HTML title ("SMACT" vs "smact")
172-182
: Review branch name and clean up commented code
The launch button configuration is excellent, but there are two items to address:
- The repository branch is set to "docs_updates" - should this be "master" instead?
- Remove the commented "home_page_in_toc" line if it's not needed
185-192
: LGTM! GitHub integration is well-configured
The GitHub context configuration is complete and correctly structured, enabling proper source linking and edit functionality.
examples/Structure_Prediction/Li-Garnet_Generator.ipynb (1)
10-10
: Good practice: Future annotations import added
The addition of from __future__ import annotations
is a good practice as it enables postponed evaluation of annotations, making the code more maintainable and forward-compatible.
smact/dopant_prediction/doper.py (1)
1-6
: Well-structured module documentation and modern type hints setup
The module docstring effectively communicates the purpose and the use of __future__.annotations
ensures compatibility with modern type hinting.
smact/__init__.py (5)
2-2
: LGTM! Good practices for type hinting.
The addition of annotations
import and TYPE_CHECKING is aligned with modern Python type hinting practices.
Also applies to: 7-7, 14-14
Line range hint 38-73
: Documentation improvements enhance clarity and completeness.
The documentation has been properly structured with clear sections and complete descriptions for all attributes.
200-203
: Good addition of ICSD 2024 oxidation states.
The inclusion of the latest ICSD data enhances the library's capability to work with current oxidation state information.
Line range hint 221-232
: Documentation improvements with proper academic citations.
The Species class documentation has been enhanced with:
- Clear attribute descriptions
- Proper citation for the machine-learning methodology
- Well-structured parameter documentation
Also applies to: 272-283
367-368
: Good modernisation of type hints.
The type hints have been updated to use the modern Python syntax (e.g., list[str]
instead of List[str]
).
Also applies to: 472-474
smact/data_loader.py (4)
9-11
: LGTM! Documentation improvement enhances clarity.
The added documentation clarifies the module's purpose and behaviour regarding missing data warnings.
13-14
: LGTM! Future-proofing with type hints.
The addition of from __future__ import annotations
ensures compatibility with future Python versions and enables forward references in type hints.
398-398
: LGTM! Type hints improve code maintainability.
The addition of type hints to function signatures enhances code readability and enables better IDE support and static type checking.
Also applies to: 844-844, 918-918
106-106
: LGTM! Consistent string formatting.
The changes standardise string formatting across the module by using f-strings, which is more readable and maintainable than the older .format()
method.
Also applies to: 152-152, 200-200, 248-248, 297-297, 342-342, 459-459, 551-551, 649-649, 715-715, 786-786, 835-835, 912-912, 957-957
docs/tutorials/oxidation_states.ipynb (3)
7-13
: Well-structured tutorial introduction!
The markdown cells provide a clear and logical introduction to the tutorial's objectives, outlining the high-throughput compound design workflow steps.
242-247
: LGTM! Clear and focused implementation
The code for computing compound probabilities is concise and straightforward.
Line range hint 427-450
: Verify Python version compatibility
The notebook has been downgraded from Python 3.10.13 to 3.9.16. Please ensure all functionality remains compatible with the older Python version.
docs/tutorials/smact_validity_of_GNoMe.ipynb (4)
5-13
: LGTM! Documentation improvements look good.
The updated Colab link and enhanced introduction provide better context for users.
69-80
: Well-structured function with clear documentation.
The new initialize_parallel_processing()
function is well-documented with a clear docstring.
82-87
: Good documentation of imported modules.
The comments explaining the purpose of each imported module improve code readability.
395-400
: LGTM! Clear and informative conclusion.
The conclusion effectively summarizes the findings and limitations of the SMACT validity test.
examples/Structure_Prediction/Li-Garnets_SP-Pym-new.ipynb (3)
9-21
: Well-structured import statements!
The imports have been streamlined and modernised with the addition of future annotations
. The removal of unused imports and the organisation of remaining ones improve code clarity.
498-501
: Verify the probability threshold value
The threshold value of 10e-4
(0.0001) seems quite low for structure prediction. This could potentially include many false positives in the predictions. Consider if a higher threshold might be more appropriate for your use case.
5-5
: Execute the notebook to verify functionality
All cells show null execution counts, indicating the notebook hasn't been run. Please execute the entire notebook to ensure all cells function as expected and produce the intended outputs.
Also applies to: 51-51, 142-142, 166-166, 188-188, 214-214, 229-229, 308-308, 327-327, 418-418, 431-431, 457-457, 466-466, 522-522, 550-550, 679-679, 696-696, 710-710, 839-839, 853-853, 871-871, 912-912, 940-940, 965-965, 1100-1100, 1120-1120, 1133-1133, 1142-1142, 1151-1151, 1163-1163, 1183-1183, 1214-1214, 1237-1237
sphinx==5.3.0 | ||
sphinx_rtd_theme==2.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove redundant theme package
There appears to be a conflict with two different Sphinx themes being specified: sphinx_rtd_theme
and sphinx-book-theme
. Since the AI summary indicates a switch to sphinx-book-theme
, the sphinx_rtd_theme
package should be removed to avoid potential conflicts.
Apply this diff to resolve the conflict:
sphinx==5.3.0
-sphinx_rtd_theme==2.0.0
readthedocs-sphinx-search==0.3.2
sphinx-book-theme==1.1.3
Also applies to: 5-5
"# List of chemical formulas for which VEC will be calculated\n", | ||
"formulas = [\"Cu0.5NiAlCoCrFeSi\", \"WNbMoTa\", \"AlCoCrCu0.5FeNi3\", \"Ca65Mg15Zn20\"]\n", | ||
"\n", | ||
"# Calculate VEC for each formula using a list comprehension\n", | ||
"valence_electron_concentrations = [smact.properties.valence_electron_count(formula) for formula in formulas]\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for invalid chemical formulas.
The current implementation doesn't handle potential errors from invalid chemical formulas. Consider adding try-except blocks:
# List of chemical formulas for which VEC will be calculated
formulas = ["Cu0.5NiAlCoCrFeSi", "WNbMoTa", "AlCoCrCu0.5FeNi3", "Ca65Mg15Zn20"]
-# Calculate VEC for each formula using a list comprehension
-valence_electron_concentrations = [smact.properties.valence_electron_count(formula) for formula in formulas]
+# Calculate VEC for each formula with error handling
+valence_electron_concentrations = []
+for formula in formulas:
+ try:
+ vec = valence_electron_count(formula)
+ valence_electron_concentrations.append(vec)
+ except ValueError as e:
+ print(f"Error processing {formula}: {str(e)}")
Committable suggestion skipped: line range outside the PR's diff.
def wurtzite(species, cell_par=None, repetitions=None): | ||
""" | ||
Build a wurzite cell using the crystal function in ASE. | ||
|
||
Args: | ||
---- | ||
species (str): Element symbols | ||
cell_par (list): Six floats/ints specifying 3 unit cell lengths and 3 unit cell angles. | ||
repetitions (list): Three floats specifying the expansion of the cell in x,y,z directions. | ||
|
||
Returns: | ||
------- | ||
SMACT Lattice object of the unit cell, | ||
ASE crystal system of the unit cell. | ||
|
||
""" | ||
if repetitions is None: | ||
repetitions = [1, 1, 1] | ||
if cell_par is None: | ||
cell_par = [2, 2, 6, 90, 90, 120] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix inconsistent cell parameters
There's a mismatch between the default cell parameters and those used in the crystal creation:
- Default:
[2, 2, 6, 90, 90, 120]
- Used:
[3, 3, 6, 90, 90, 120]
Please align these values or document why they differ.
Consider optimising oxidation states list
The current oxidation states list [[1], [2], [3], [4], [-1], [-2], [-3], [-4]]
includes all possible states from -4 to +4, which might be unnecessarily broad for wurtzite structures.
Consider limiting this to commonly observed oxidation states in wurtzite materials.
"# Apply smact_validity to check if compounds are valid according to SMACT rules\n", | ||
"df[\"smact_allowed\"] = df[\"Compound\"].apply(smact_validity)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for invalid chemical formulae
The smact_validity
function might raise exceptions for malformed chemical formulae. Consider adding try-except blocks to handle potential errors gracefully.
-df["smact_allowed"] = df["Compound"].apply(smact_validity)
+def safe_validate(compound):
+ try:
+ return smact_validity(compound)
+ except Exception as e:
+ print(f"Warning: Could not validate {compound}: {str(e)}")
+ return False
+
+df["smact_allowed"] = df["Compound"].apply(safe_validate)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"# Apply smact_validity to check if compounds are valid according to SMACT rules\n", | |
"df[\"smact_allowed\"] = df[\"Compound\"].apply(smact_validity)\n", | |
def safe_validate(compound): | |
try: | |
return smact_validity(compound) | |
except Exception as e: | |
print(f"Warning: Could not validate {compound}: {str(e)}") | |
return False | |
df["smact_allowed"] = df["Compound"].apply(safe_validate) |
"# The distorted objects are of the ase atoms class,\n", | ||
"# so can be written out to a crystal structure file like a cif\n", | ||
"distorted.write(\"example_output.cif\")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider saving all generated structures
Currently, only the last distorted structure is being saved. Consider saving all generated structures with meaningful filenames.
-# The distorted objects are of the ase atoms class,
-# so can be written out to a crystal structure file like a cif
-distorted.write("example_output.cif")
+# Save all generated structures
+for i, structure in enumerate(substituted_structures):
+ structure.write(f"Sr_substituted_BaTiO3_{i+1}.cif")
Committable suggestion skipped: line range outside the PR's diff.
"def comp_maker(comp):\n", | ||
" \"\"\"\n", | ||
" Convert a composition tuple to a pretty formula string.\n", | ||
"\n", | ||
" Args:\n", | ||
" comp (tuple): A tuple containing two lists - elements and their amounts.\n", | ||
"\n", | ||
" Returns:\n", | ||
" str: The reduced formula of the composition as a string.\n", | ||
" \"\"\"\n", | ||
" # Create a list to store elements and their amounts\n", | ||
" form = []\n", | ||
" # Iterate through elements and their amounts\n", | ||
" for el, ammt in zip(comp[0], comp[1]):\n", | ||
" form.append(el)\n", | ||
" form.append(ammt)\n", | ||
" # Join all elements into a single string\n", | ||
" form = \"\".join(str(e) for e in form)\n", | ||
" # Convert to a Composition object and get the reduced formula\n", | ||
" pmg_form = Composition(form).reduced_formula\n", | ||
" return pmg_form\n", | ||
"\n", | ||
"\n", | ||
"# Use multiprocessing to apply comp_maker to all compositions in flat_list\n", | ||
"with multiprocessing.Pool() as p:\n", | ||
" pretty_formulas = p.map(comp_maker, flat_list)\n", | ||
"\n", | ||
"# Create a list of unique formulas\n", | ||
"unique_pretty_formulas = list(set(pretty_formulas))\n", | ||
"# Print the number of unique composition formulas\n", | ||
"print(f\"Number of unique compositions formulas: {len(unique_pretty_formulas)}\")" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve composition handling robustness and efficiency
The composition handling could be enhanced with better error handling and efficiency:
-def comp_maker(comp):
+def comp_maker(comp: tuple[list[str], list[int]]) -> str:
+ try:
form = []
for el, ammt in zip(comp[0], comp[1]):
- form.append(el)
- form.append(ammt)
+ form.extend([el, str(ammt)])
- form = "".join(str(e) for e in form)
+ form = "".join(form)
pmg_form = Composition(form).reduced_formula
return pmg_form
+ except (ValueError, TypeError) as e:
+ raise ValueError(f"Invalid composition format: {e}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"def comp_maker(comp):\n", | |
" \"\"\"\n", | |
" Convert a composition tuple to a pretty formula string.\n", | |
"\n", | |
" Args:\n", | |
" comp (tuple): A tuple containing two lists - elements and their amounts.\n", | |
"\n", | |
" Returns:\n", | |
" str: The reduced formula of the composition as a string.\n", | |
" \"\"\"\n", | |
" # Create a list to store elements and their amounts\n", | |
" form = []\n", | |
" # Iterate through elements and their amounts\n", | |
" for el, ammt in zip(comp[0], comp[1]):\n", | |
" form.append(el)\n", | |
" form.append(ammt)\n", | |
" # Join all elements into a single string\n", | |
" form = \"\".join(str(e) for e in form)\n", | |
" # Convert to a Composition object and get the reduced formula\n", | |
" pmg_form = Composition(form).reduced_formula\n", | |
" return pmg_form\n", | |
"\n", | |
"\n", | |
"# Use multiprocessing to apply comp_maker to all compositions in flat_list\n", | |
"with multiprocessing.Pool() as p:\n", | |
" pretty_formulas = p.map(comp_maker, flat_list)\n", | |
"\n", | |
"# Create a list of unique formulas\n", | |
"unique_pretty_formulas = list(set(pretty_formulas))\n", | |
"# Print the number of unique composition formulas\n", | |
"print(f\"Number of unique compositions formulas: {len(unique_pretty_formulas)}\")" | |
] | |
def comp_maker(comp: tuple[list[str], list[int]]) -> str: | |
""" | |
Convert a composition tuple to a pretty formula string. | |
Args: | |
comp (tuple): A tuple containing two lists - elements and their amounts. | |
Returns: | |
str: The reduced formula of the composition as a string. | |
""" | |
try: | |
# Create a list to store elements and their amounts | |
form = [] | |
# Iterate through elements and their amounts | |
for el, ammt in zip(comp[0], comp[1]): | |
form.extend([el, str(ammt)]) | |
# Join all elements into a single string | |
form = "".join(form) | |
# Convert to a Composition object and get the reduced formula | |
pmg_form = Composition(form).reduced_formula | |
return pmg_form | |
except (ValueError, TypeError) as e: | |
raise ValueError(f"Invalid composition format: {e}") | |
# Use multiprocessing to apply comp_maker to all compositions in flat_list | |
with multiprocessing.Pool() as p: | |
pretty_formulas = p.map(comp_maker, flat_list) | |
# Create a list of unique formulas | |
unique_pretty_formulas = list(set(pretty_formulas)) | |
# Print the number of unique composition formulas | |
print(f"Number of unique compositions formulas: {len(unique_pretty_formulas)}") |
"def smact_filter(els):\n", | ||
" \"\"\"\n", | ||
" Filter element combinations based on SMACT criteria.\n", | ||
"\n", | ||
" This function takes a combination of elements and applies SMACT\n", | ||
" (Semiconducting Materials from Analogy and Chemical Theory) tests\n", | ||
" to generate potential quaternary oxide compositions.\n", | ||
"\n", | ||
" Args:\n", | ||
" els (tuple): A tuple containing three Element objects.\n", | ||
"\n", | ||
" Returns:\n", | ||
" list: A list of tuples, each containing a set of elements and their ratios\n", | ||
" that pass the SMACT criteria.\n", | ||
" \"\"\"\n", | ||
" all_compounds = []\n", | ||
" elements = [e.symbol for e in els] + [\"O\"]\n", | ||
"\n", | ||
" # Get Pauling electronegativities\n", | ||
" paul_a, paul_b, paul_c = (el.pauling_eneg for el in els)\n", | ||
" electronegativities = [paul_a, paul_b, paul_c, 3.44] # 3.44 is for Oxygen\n", | ||
"\n", | ||
" # Iterate through all possible oxidation state combinations\n", | ||
" for ox_states in product(*(el.oxidation_states for el in els)):\n", | ||
" ox_states = list(ox_states) + [-2] # Add oxygen's oxidation state\n", | ||
"\n", | ||
" # Test for charge balance\n", | ||
" cn_e, cn_r = smact.neutral_ratios(ox_states, threshold=8)\n", | ||
"\n", | ||
" if cn_e:\n", | ||
" # Electronegativity test\n", | ||
" if screening.pauling_test(ox_states, electronegativities):\n", | ||
" compound = (elements, cn_r[0])\n", | ||
" all_compounds.append(compound)\n", | ||
"\n", | ||
" return all_compounds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance function robustness and maintainability
The smact_filter
function could be improved in several ways:
- Add input validation
- Include type hints
- Move magic numbers to constants
+OXYGEN_ELECTRONEGATIVITY = 3.44
+OXYGEN_OXIDATION_STATE = -2
+MAX_RATIO_THRESHOLD = 8
-def smact_filter(els):
+def smact_filter(els: tuple[smact.Element, ...]) -> list[tuple[list[str], list[int]]]:
+ if not els or len(els) != 3:
+ raise ValueError("Must provide exactly three elements")
+
all_compounds = []
elements = [e.symbol for e in els] + ["O"]
# Get Pauling electronegativities
paul_a, paul_b, paul_c = (el.pauling_eneg for el in els)
- electronegativities = [paul_a, paul_b, paul_c, 3.44] # 3.44 is for Oxygen
+ electronegativities = [paul_a, paul_b, paul_c, OXYGEN_ELECTRONEGATIVITY]
for ox_states in product(*(el.oxidation_states for el in els)):
- ox_states = list(ox_states) + [-2] # Add oxygen's oxidation state
+ ox_states = list(ox_states) + [OXYGEN_OXIDATION_STATE]
- cn_e, cn_r = smact.neutral_ratios(ox_states, threshold=8)
+ cn_e, cn_r = smact.neutral_ratios(ox_states, threshold=MAX_RATIO_THRESHOLD)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"def smact_filter(els):\n", | |
" \"\"\"\n", | |
" Filter element combinations based on SMACT criteria.\n", | |
"\n", | |
" This function takes a combination of elements and applies SMACT\n", | |
" (Semiconducting Materials from Analogy and Chemical Theory) tests\n", | |
" to generate potential quaternary oxide compositions.\n", | |
"\n", | |
" Args:\n", | |
" els (tuple): A tuple containing three Element objects.\n", | |
"\n", | |
" Returns:\n", | |
" list: A list of tuples, each containing a set of elements and their ratios\n", | |
" that pass the SMACT criteria.\n", | |
" \"\"\"\n", | |
" all_compounds = []\n", | |
" elements = [e.symbol for e in els] + [\"O\"]\n", | |
"\n", | |
" # Get Pauling electronegativities\n", | |
" paul_a, paul_b, paul_c = (el.pauling_eneg for el in els)\n", | |
" electronegativities = [paul_a, paul_b, paul_c, 3.44] # 3.44 is for Oxygen\n", | |
"\n", | |
" # Iterate through all possible oxidation state combinations\n", | |
" for ox_states in product(*(el.oxidation_states for el in els)):\n", | |
" ox_states = list(ox_states) + [-2] # Add oxygen's oxidation state\n", | |
"\n", | |
" # Test for charge balance\n", | |
" cn_e, cn_r = smact.neutral_ratios(ox_states, threshold=8)\n", | |
"\n", | |
" if cn_e:\n", | |
" # Electronegativity test\n", | |
" if screening.pauling_test(ox_states, electronegativities):\n", | |
" compound = (elements, cn_r[0])\n", | |
" all_compounds.append(compound)\n", | |
"\n", | |
" return all_compounds" | |
OXYGEN_ELECTRONEGATIVITY = 3.44 | |
OXYGEN_OXIDATION_STATE = -2 | |
MAX_RATIO_THRESHOLD = 8 | |
def smact_filter(els: tuple[smact.Element, ...]) -> list[tuple[list[str], list[int]]]: | |
""" | |
Filter element combinations based on SMACT criteria. | |
This function takes a combination of elements and applies SMACT | |
(Semiconducting Materials from Analogy and Chemical Theory) tests | |
to generate potential quaternary oxide compositions. | |
Args: | |
els (tuple): A tuple containing three Element objects. | |
Returns: | |
list: A list of tuples, each containing a set of elements and their ratios | |
that pass the SMACT criteria. | |
""" | |
if not els or len(els) != 3: | |
raise ValueError("Must provide exactly three elements") | |
all_compounds = [] | |
elements = [e.symbol for e in els] + ["O"] | |
# Get Pauling electronegativities | |
paul_a, paul_b, paul_c = (el.pauling_eneg for el in els) | |
electronegativities = [paul_a, paul_b, paul_c, OXYGEN_ELECTRONEGATIVITY] | |
# Iterate through all possible oxidation state combinations | |
for ox_states in product(*(el.oxidation_states for el in els)): | |
ox_states = list(ox_states) + [OXYGEN_OXIDATION_STATE] | |
# Test for charge balance | |
cn_e, cn_r = smact.neutral_ratios(ox_states, threshold=MAX_RATIO_THRESHOLD) | |
if cn_e: | |
# Electronegativity test | |
if screening.pauling_test(ox_states, electronegativities): | |
compound = (elements, cn_r[0]) | |
all_compounds.append(compound) | |
return all_compounds |
"def add_descriptors(data):\n", | ||
" \"\"\"\n", | ||
" Add composition-based descriptors to the dataframe.\n", | ||
"\n", | ||
" This function converts formula strings to composition objects and calculates\n", | ||
" various features using matminer's composition featurizers.\n", | ||
"\n", | ||
" Args:\n", | ||
" data (pd.DataFrame): DataFrame containing 'pretty_formula' column.\n", | ||
"\n", | ||
" Returns:\n", | ||
" pd.DataFrame: DataFrame with added descriptor columns.\n", | ||
" \"\"\"\n", | ||
" # Convert formula strings to composition objects\n", | ||
" str_to_comp = StrToComposition(target_col_id=\"composition_obj\")\n", | ||
" str_to_comp.featurize_dataframe(data, col_id=\"pretty_formula\")\n", | ||
"\n", | ||
" # Initialize multiple featurizers\n", | ||
" feature_calculators = MultipleFeaturizer(\n", | ||
" [\n", | ||
" cf.Stoichiometry(),\n", | ||
" cf.ElementProperty.from_preset(\"magpie\"),\n", | ||
" cf.ValenceOrbital(props=[\"avg\"]),\n", | ||
" cf.IonProperty(fast=True),\n", | ||
" cf.BandCenter(),\n", | ||
" cf.AtomicOrbitals(),\n", | ||
" ]\n", | ||
" )\n", | ||
"\n", | ||
" # Calculate features\n", | ||
" feature_calculators.featurize_dataframe(data, col_id=\"composition_obj\")\n", | ||
"\n", | ||
" # If you need to use feature_labels later, uncomment the following line:\n", | ||
" # feature_labels = feature_calculators.feature_labels()\n", | ||
"\n", | ||
" return data\n", | ||
"\n", | ||
"\n", | ||
"# Apply the function to add descriptors\n", | ||
"new_data = add_descriptors(new_data)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### 7. Save results to a CSV file" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Save as .csv file\n", | ||
"new_data.to_csv(\"All_oxide_comps_dataframe_featurized.csv\", chunksize=10000)" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance descriptor calculation efficiency and robustness
Consider implementing these improvements for better performance and reliability:
- Add progress tracking for long-running calculations
- Implement memory-efficient processing
- Add error handling for feature calculation
+from tqdm.auto import tqdm
+
def add_descriptors(data):
+ print("Converting formulas to compositions...")
str_to_comp = StrToComposition(target_col_id="composition_obj")
- str_to_comp.featurize_dataframe(data, col_id="pretty_formula")
+ try:
+ str_to_comp.featurize_dataframe(
+ data,
+ col_id="pretty_formula",
+ ignore_errors=True
+ )
+ except Exception as e:
+ raise ValueError(f"Failed to convert formulas: {e}")
feature_calculators = MultipleFeaturizer([
cf.Stoichiometry(),
cf.ElementProperty.from_preset("magpie"),
cf.ValenceOrbital(props=["avg"]),
cf.IonProperty(fast=True),
cf.BandCenter(),
cf.AtomicOrbitals(),
])
+ print("Calculating features...")
+ for i in tqdm(range(0, len(data), 1000), desc="Processing batches"):
+ batch = data.iloc[i:i+1000]
+ try:
+ feature_calculators.featurize_dataframe(
+ batch,
+ col_id="composition_obj",
+ ignore_errors=True
+ )
+ except Exception as e:
+ print(f"Warning: Failed to calculate features for batch {i}: {e}")
+ continue
Also, consider using a more robust saving method:
-new_data.to_csv("All_oxide_comps_dataframe_featurized.csv", chunksize=10000)
+try:
+ print("Saving results...")
+ new_data.to_csv(
+ "All_oxide_comps_dataframe_featurized.csv",
+ chunksize=10000,
+ index=False
+ )
+ print("Save completed successfully")
+except Exception as e:
+ print(f"Error saving results: {e}")
+ raise
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"def add_descriptors(data):\n", | |
" \"\"\"\n", | |
" Add composition-based descriptors to the dataframe.\n", | |
"\n", | |
" This function converts formula strings to composition objects and calculates\n", | |
" various features using matminer's composition featurizers.\n", | |
"\n", | |
" Args:\n", | |
" data (pd.DataFrame): DataFrame containing 'pretty_formula' column.\n", | |
"\n", | |
" Returns:\n", | |
" pd.DataFrame: DataFrame with added descriptor columns.\n", | |
" \"\"\"\n", | |
" # Convert formula strings to composition objects\n", | |
" str_to_comp = StrToComposition(target_col_id=\"composition_obj\")\n", | |
" str_to_comp.featurize_dataframe(data, col_id=\"pretty_formula\")\n", | |
"\n", | |
" # Initialize multiple featurizers\n", | |
" feature_calculators = MultipleFeaturizer(\n", | |
" [\n", | |
" cf.Stoichiometry(),\n", | |
" cf.ElementProperty.from_preset(\"magpie\"),\n", | |
" cf.ValenceOrbital(props=[\"avg\"]),\n", | |
" cf.IonProperty(fast=True),\n", | |
" cf.BandCenter(),\n", | |
" cf.AtomicOrbitals(),\n", | |
" ]\n", | |
" )\n", | |
"\n", | |
" # Calculate features\n", | |
" feature_calculators.featurize_dataframe(data, col_id=\"composition_obj\")\n", | |
"\n", | |
" # If you need to use feature_labels later, uncomment the following line:\n", | |
" # feature_labels = feature_calculators.feature_labels()\n", | |
"\n", | |
" return data\n", | |
"\n", | |
"\n", | |
"# Apply the function to add descriptors\n", | |
"new_data = add_descriptors(new_data)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 7. Save results to a CSV file" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Save as .csv file\n", | |
"new_data.to_csv(\"All_oxide_comps_dataframe_featurized.csv\", chunksize=10000)" | |
] | |
def add_descriptors(data): | |
""" | |
Add composition-based descriptors to the dataframe. | |
This function converts formula strings to composition objects and calculates | |
various features using matminer's composition featurizers. | |
Args: | |
data (pd.DataFrame): DataFrame containing 'pretty_formula' column. | |
Returns: | |
pd.DataFrame: DataFrame with added descriptor columns. | |
""" | |
# Convert formula strings to composition objects | |
print("Converting formulas to compositions...") | |
str_to_comp = StrToComposition(target_col_id="composition_obj") | |
try: | |
str_to_comp.featurize_dataframe( | |
data, | |
col_id="pretty_formula", | |
ignore_errors=True | |
) | |
except Exception as e: | |
raise ValueError(f"Failed to convert formulas: {e}") | |
# Initialize multiple featurizers | |
feature_calculators = MultipleFeaturizer( | |
[ | |
cf.Stoichiometry(), | |
cf.ElementProperty.from_preset("magpie"), | |
cf.ValenceOrbital(props=["avg"]), | |
cf.IonProperty(fast=True), | |
cf.BandCenter(), | |
cf.AtomicOrbitals(), | |
] | |
) | |
# Calculate features | |
print("Calculating features...") | |
for i in tqdm(range(0, len(data), 1000), desc="Processing batches"): | |
batch = data.iloc[i:i+1000] | |
try: | |
feature_calculators.featurize_dataframe( | |
batch, | |
col_id="composition_obj", | |
ignore_errors=True | |
) | |
except Exception as e: | |
print(f"Warning: Failed to calculate features for batch {i}: {e}") | |
continue | |
return data | |
# Apply the function to add descriptors | |
new_data = add_descriptors(new_data) | |
# Save as .csv file | |
try: | |
print("Saving results...") | |
new_data.to_csv( | |
"All_oxide_comps_dataframe_featurized.csv", | |
chunksize=10000, | |
index=False | |
) | |
print("Save completed successfully") | |
except Exception as e: | |
print(f"Error saving results: {e}") | |
raise |
dopants_lists[i] = self._get_selectivity(dopants_lists[i], cations, sub) | ||
|
||
# if groupby | ||
groupby_lists = [ | ||
dict() | ||
] * 4 # create list of empty dict length of 4 (n-cat, p-cat, n-an, p-an) | ||
groupby_lists = [dict()] * 4 # create list of empty dict length of 4 (n-cat, p-cat, n-an, p-an) | ||
# in case group_by_charge = False | ||
if group_by_charge: | ||
for i, dl in enumerate(dopants_lists): | ||
# groupby first element charge | ||
dl = sorted(dl, key=lambda x: utilities.parse_spec(x[0])[1]) | ||
grouped_data = groupby( | ||
dl, key=lambda x: utilities.parse_spec(x[0])[1] | ||
) | ||
grouped_top_data = { | ||
str(k): list(g)[:num_dopants] for k, g in grouped_data | ||
} | ||
grouped_data = groupby(dl, key=lambda x: utilities.parse_spec(x[0])[1]) | ||
grouped_top_data = {str(k): list(g)[:num_dopants] for k, g in grouped_data} | ||
groupby_lists[i] = grouped_top_data | ||
del grouped_data | ||
|
||
# select top n elements | ||
dopants_lists = [ | ||
dopants_list[:num_dopants] for dopants_list in dopants_lists | ||
] | ||
dopants_lists = [dopants_list[:num_dopants] for dopants_list in dopants_lists] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider breaking down complex logic
The grouping and sorting logic could be extracted into separate methods for improved maintainability:
def _group_dopants_by_charge(self, dopants_list, num_dopants):
if not dopants_list:
return {}
sorted_data = sorted(dopants_list, key=lambda x: utilities.parse_spec(x[0])[1])
grouped_data = groupby(sorted_data, key=lambda x: utilities.parse_spec(x[0])[1])
return {str(k): list(g)[:num_dopants] for k, g in grouped_data}
def _select_top_dopants(self, dopants_lists, num_dopants):
return [dopants_list[:num_dopants] for dopants_list in dopants_lists]
_el_ox_states_icsd24 = None | ||
|
||
|
||
def lookup_element_oxidation_states_icsd24(symbol, copy=True): | ||
""" | ||
Retrieve a list of known oxidation states for an element. | ||
The oxidation states list used contains only those found | ||
in the 2024 version of the ICSD (and has >=5 reports). | ||
|
||
Args: | ||
symbol (str) : the atomic symbol of the element to look up. | ||
copy (Optional(bool)): if True (default), return a copy of the | ||
oxidation-state list, rather than a reference to the cached | ||
data -- only use copy=False in performance-sensitive code | ||
and where the list will not be modified! | ||
|
||
Returns: | ||
list: List of known oxidation states for the element. | ||
|
||
Returns None if oxidation states for the Element were not | ||
found in the external data. | ||
""" | ||
global _el_ox_states_icsd24 | ||
|
||
if _el_ox_states_icsd24 is None: | ||
_el_ox_states_icsd24 = {} | ||
|
||
for items in _get_data_rows(os.path.join(data_directory, "oxidation_states_icsd24_filtered.txt")): | ||
_el_ox_states_icsd24[items[0]] = [int(oxidationState) for oxidationState in items[1:]] | ||
|
||
if symbol in _el_ox_states_icsd24: | ||
if copy: | ||
# _el_ox_states_icsd24 stores lists -> if copy is set, make an implicit | ||
# deep copy. The elements of the lists are integers, which are | ||
# "value types" in Python. | ||
|
||
return list(_el_ox_states_icsd24[symbol]) | ||
else: | ||
return _el_ox_states_icsd24[symbol] | ||
else: | ||
if _print_warnings: | ||
print(f"WARNING: Oxidation states for element {symbol} " "not found.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding validation for the data file existence.
The new lookup_element_oxidation_states_icsd24
function follows the established pattern, but it should validate the existence of the data file before attempting to read it.
def lookup_element_oxidation_states_icsd24(symbol, copy=True):
global _el_ox_states_icsd24
if _el_ox_states_icsd24 is None:
_el_ox_states_icsd24 = {}
+ data_file = os.path.join(data_directory, "oxidation_states_icsd24_filtered.txt")
+ if not os.path.exists(data_file):
+ raise FileNotFoundError(f"Required data file not found: {data_file}")
- for items in _get_data_rows(os.path.join(data_directory, "oxidation_states_icsd24_filtered.txt")):
+ for items in _get_data_rows(data_file):
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
_el_ox_states_icsd24 = None | |
def lookup_element_oxidation_states_icsd24(symbol, copy=True): | |
""" | |
Retrieve a list of known oxidation states for an element. | |
The oxidation states list used contains only those found | |
in the 2024 version of the ICSD (and has >=5 reports). | |
Args: | |
symbol (str) : the atomic symbol of the element to look up. | |
copy (Optional(bool)): if True (default), return a copy of the | |
oxidation-state list, rather than a reference to the cached | |
data -- only use copy=False in performance-sensitive code | |
and where the list will not be modified! | |
Returns: | |
list: List of known oxidation states for the element. | |
Returns None if oxidation states for the Element were not | |
found in the external data. | |
""" | |
global _el_ox_states_icsd24 | |
if _el_ox_states_icsd24 is None: | |
_el_ox_states_icsd24 = {} | |
for items in _get_data_rows(os.path.join(data_directory, "oxidation_states_icsd24_filtered.txt")): | |
_el_ox_states_icsd24[items[0]] = [int(oxidationState) for oxidationState in items[1:]] | |
if symbol in _el_ox_states_icsd24: | |
if copy: | |
# _el_ox_states_icsd24 stores lists -> if copy is set, make an implicit | |
# deep copy. The elements of the lists are integers, which are | |
# "value types" in Python. | |
return list(_el_ox_states_icsd24[symbol]) | |
else: | |
return _el_ox_states_icsd24[symbol] | |
else: | |
if _print_warnings: | |
print(f"WARNING: Oxidation states for element {symbol} " "not found.") | |
_el_ox_states_icsd24 = None | |
def lookup_element_oxidation_states_icsd24(symbol, copy=True): | |
""" | |
Retrieve a list of known oxidation states for an element. | |
The oxidation states list used contains only those found | |
in the 2024 version of the ICSD (and has >=5 reports). | |
Args: | |
symbol (str) : the atomic symbol of the element to look up. | |
copy (Optional(bool)): if True (default), return a copy of the | |
oxidation-state list, rather than a reference to the cached | |
data -- only use copy=False in performance-sensitive code | |
and where the list will not be modified! | |
Returns: | |
list: List of known oxidation states for the element. | |
Returns None if oxidation states for the Element were not | |
found in the external data. | |
""" | |
global _el_ox_states_icsd24 | |
if _el_ox_states_icsd24 is None: | |
_el_ox_states_icsd24 = {} | |
data_file = os.path.join(data_directory, "oxidation_states_icsd24_filtered.txt") | |
if not os.path.exists(data_file): | |
raise FileNotFoundError(f"Required data file not found: {data_file}") | |
for items in _get_data_rows(data_file): | |
_el_ox_states_icsd24[items[0]] = [int(oxidationState) for oxidationState in items[1:]] | |
if symbol in _el_ox_states_icsd24: | |
if copy: | |
# _el_ox_states_icsd24 stores lists -> if copy is set, make an implicit | |
# deep copy. The elements of the lists are integers, which are | |
# "value types" in Python. | |
return list(_el_ox_states_icsd24[symbol]) | |
else: | |
return _el_ox_states_icsd24[symbol] | |
else: | |
if _print_warnings: | |
print(f"WARNING: Oxidation states for element {symbol} " "not found.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (6)
README.md (6)
35-35
: Fix typographical errorThere's a typographical error in the word "used".
-The best place to start is looking at [the docs](https://smact.readthedocs.io/en/latest/), which highlight some simple examples of how these classes and functions can be usede +The best place to start is looking at [the docs](https://smact.readthedocs.io/en/latest/), which highlight some simple examples of how these classes and functions can be used
46-46
: Improve sentence structureThe phrase "in order to" can be simplified for better readability.
-Further filters can be applied to generated lists of compositions in order to screen for particular properties. +Further filters can be applied to generated lists of compositions to screen for particular properties.🧰 Tools
🪛 LanguageTool
[style] ~46-~46: Consider a shorter alternative to avoid wordiness.
Context: ...lied to generated lists of compositions in order to screen for particular properties. These...(IN_ORDER_TO_PREMIUM)
79-79
: Fix grammatical errorThere's a grammatical error in the article usage.
-A submodule which contains a collections of tools for predicting dopants. +A submodule which contains a collection of tools for predicting dopants.🧰 Tools
🪛 LanguageTool
[grammar] ~79-~79: It looks like ‘collections’ doesn’t match ‘a’. Did you mean “a collection” or just “collections”?
Context: ...rediction**: A submodule which contains a collections of tools for predicting dopants. ## R...(A_NNS_IN)
90-90
: Add missing commasAdd commas after introductory phrases for better readability.
-The latest stable release can be installed via pip which will automatically set up other Python packages +The latest stable release can be installed via pip, which will automatically set up other Python packages -For developer installation SMACT can be installed from a copy of the source +For developer installation, SMACT can be installed from a copy of the sourceAlso applies to: 102-102
🧰 Tools
🪛 LanguageTool
[uncategorized] ~90-~90: Possible missing comma found.
Context: ...est stable release can be installed via pip which will automatically set up other P...(AI_HYDRA_LEO_MISSING_COMMA)
114-114
: Update to British English spellingIn British English, 'licence' is the noun form while 'license' is the verb form.
-## License and attribution +## Licence and attribution -Python code and original data tables are licensed under the MIT License. +Python code and original data tables are licensed under the MIT Licence.Also applies to: 116-116
🧰 Tools
🪛 LanguageTool
[locale-violation] ~114-~114: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ... immediately reflected on the PATH. ## License and attribution Python code and origin...(LICENCE_LICENSE_NOUN_SINGULAR)
122-122
: Fix subject-verb agreementCorrect the verb form to match the pronoun 'they'.
-especially if they helps us improve the docs! +especially if they help us improve the docs!🧰 Tools
🪛 LanguageTool
[grammar] ~122-~122: The pronoun ‘they’ must be used with a non-third-person form of a verb.
Context: ...n the issue tracker, especially if they helps us improve the docs! For other queries ...(NON3PRS_VERB)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
.github/workflows/ci.yml
(1 hunks)README.md
(4 hunks)requirements-dev.txt
(1 hunks)setup.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- .github/workflows/ci.yml
- requirements-dev.txt
- setup.py
🧰 Additional context used
🪛 LanguageTool
README.md
[misspelling] ~24-~24: Possible spelling mistake found.
Context: ...a enough, nature will always confess_ - Roland Coase (from 'How should economists choose?') ...
(EN_MULTITOKEN_SPELLING_TWO)
[style] ~46-~46: Consider a shorter alternative to avoid wordiness.
Context: ...lied to generated lists of compositions in order to screen for particular properties. These...
(IN_ORDER_TO_PREMIUM)
[style] ~66-~66: Would you like to use the Oxford spelling “initialize”? The spelling ‘initialise’ is also correct.
Context: ...es the loading of external data used to initialise the core smact.Element
and `smact.Spe...
(OXFORD_SPELLING_Z_NOT_S)
[grammar] ~79-~79: It looks like ‘collections’ doesn’t match ‘a’. Did you mean “a collection” or just “collections”?
Context: ...rediction**: A submodule which contains a collections of tools for predicting dopants. ## R...
(A_NNS_IN)
[uncategorized] ~90-~90: Possible missing comma found.
Context: ...est stable release can be installed via pip which will automatically set up other P...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~102-~102: Possible missing comma found.
Context: ....com/WMD-group/SMACT.git For developer installation SMACT can be installed from a copy of t...
(AI_HYDRA_LEO_MISSING_COMMA)
[locale-violation] ~114-~114: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ... immediately reflected on the PATH. ## License and attribution Python code and origin...
(LICENCE_LICENSE_NOUN_SINGULAR)
[locale-violation] ~116-~116: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ... data tables are licensed under the MIT License. ## Development notes ### Bugs, featu...
(LICENCE_LICENSE_NOUN_SINGULAR)
[grammar] ~122-~122: The pronoun ‘they’ must be used with a non-third-person form of a verb.
Context: ...n the issue tracker, especially if they helps us improve the docs! For other queries ...
(NON3PRS_VERB)
🔇 Additional comments (1)
README.md (1)
66-66
: Standardise British English spelling
For consistency with British English, 'initialise' should be used instead of 'initialize'.
🧰 Tools
🪛 LanguageTool
[style] ~66-~66: Would you like to use the Oxford spelling “initialize”? The spelling ‘initialise’ is also correct.
Context: ...es the loading of external data used to initialise the core smact.Element
and `smact.Spe...
(OXFORD_SPELLING_Z_NOT_S)
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #330 +/- ##
==========================================
- Coverage 76.46% 73.56% -2.91%
==========================================
Files 26 31 +5
Lines 2265 2531 +266
==========================================
+ Hits 1732 1862 +130
- Misses 533 669 +136 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (5)
README.md (5)
24-24
: Correct the economist's name in the quote attributionThe economist's name is "Ronald Coase", not "Roland Coase".
🧰 Tools
🪛 LanguageTool
[misspelling] ~24-~24: Possible spelling mistake found.
Context: ...a enough, nature will always confess_ - Roland Coase (from 'How should economists choose?') ...(EN_MULTITOKEN_SPELLING_TWO)
35-35
: Fix typographical errorRemove the trailing 'e' from "usede".
-The best place to start is looking at [the docs](https://smact.readthedocs.io/en/latest/), which highlight some simple examples of how these classes and functions can be usede +The best place to start is looking at [the docs](https://smact.readthedocs.io/en/latest/), which highlight some simple examples of how these classes and functions can be used
79-79
: Fix grammatical errorChange "a collections" to "a collection".
- - **dopant_prediction**: A submodule which contains a collections of tools for predicting dopants. + - **dopant_prediction**: A submodule which contains a collection of tools for predicting dopants.🧰 Tools
🪛 LanguageTool
[grammar] ~79-~79: It looks like ‘collections’ doesn’t match ‘a’. Did you mean “a collection” or just “collections”?
Context: ...rediction**: A submodule which contains a collections of tools for predicting dopants. ## Re...(A_NNS_IN)
113-113
: Use British English spelling for "Licence"Change "License" to "Licence" to maintain British English consistency.
-## License and attribution +## Licence and attribution -Python code and original data tables are licensed under the MIT License. +Python code and original data tables are licensed under the MIT Licence.Also applies to: 115-115
🧰 Tools
🪛 LanguageTool
[locale-violation] ~113-~113: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ... immediately reflected on the PATH. ## License and attribution Python code and origin...(LICENCE_LICENSE_NOUN_SINGULAR)
121-121
: Fix subject-verb agreementChange "they helps" to "they help".
-Please use the [Issue Tracker](https://github.com/WMD-group/smact/issues) to report bugs or request features in the first instance. While we hope that most questions can be answered by searching [the docs](https://smact.readthedocs.io/en/latest/), we welcome new questions on the issue tracker, especially if they helps us improve the docs! +Please use the [Issue Tracker](https://github.com/WMD-group/smact/issues) to report bugs or request features in the first instance. While we hope that most questions can be answered by searching [the docs](https://smact.readthedocs.io/en/latest/), we welcome new questions on the issue tracker, especially if they help us improve the docs!🧰 Tools
🪛 LanguageTool
[grammar] ~121-~121: The pronoun ‘they’ must be used with a non-third-person form of a verb.
Context: ...n the issue tracker, especially if they helps us improve the docs! For other queries ...(NON3PRS_VERB)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
README.md
(4 hunks)requirements-dev.txt
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- requirements-dev.txt
🧰 Additional context used
🪛 LanguageTool
README.md
[misspelling] ~24-~24: Possible spelling mistake found.
Context: ...a enough, nature will always confess_ - Roland Coase (from 'How should economists choose?') ...
(EN_MULTITOKEN_SPELLING_TWO)
[style] ~46-~46: Consider a shorter alternative to avoid wordiness.
Context: ...lied to generated lists of compositions in order to screen for particular properties. These...
(IN_ORDER_TO_PREMIUM)
[style] ~66-~66: Would you like to use the Oxford spelling “initialize”? The spelling ‘initialise’ is also correct.
Context: ...es the loading of external data used to initialise the core smact.Element
and `smact.Spe...
(OXFORD_SPELLING_Z_NOT_S)
[grammar] ~79-~79: It looks like ‘collections’ doesn’t match ‘a’. Did you mean “a collection” or just “collections”?
Context: ...rediction**: A submodule which contains a collections of tools for predicting dopants. ## Re...
(A_NNS_IN)
[locale-violation] ~113-~113: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ... immediately reflected on the PATH. ## License and attribution Python code and origin...
(LICENCE_LICENSE_NOUN_SINGULAR)
[locale-violation] ~115-~115: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ... data tables are licensed under the MIT License. ## Development notes ### Bugs, featu...
(LICENCE_LICENSE_NOUN_SINGULAR)
[grammar] ~121-~121: The pronoun ‘they’ must be used with a non-third-person form of a verb.
Context: ...n the issue tracker, especially if they helps us improve the docs! For other queries ...
(NON3PRS_VERB)
🔇 Additional comments (3)
README.md (3)
Line range hint 1-13
: LGTM! Well-structured badges section
The badges are properly formatted and use secure HTTPS links.
🧰 Tools
🪛 LanguageTool
[locale-violation] ~3-~3: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ...eadthedocs.org/en/latest/?badge=latest) [![License: MIT](https://img.shields.io/badge/Lice...
(LICENCE_LICENSE_NOUN_SINGULAR)
62-80
: LGTM! Clear and well-structured modules list
The module descriptions are concise and informative.
🧰 Tools
🪛 LanguageTool
[style] ~66-~66: Would you like to use the Oxford spelling “initialize”? The spelling ‘initialise’ is also correct.
Context: ...es the loading of external data used to initialise the core smact.Element
and `smact.Spe...
(OXFORD_SPELLING_Z_NOT_S)
[grammar] ~79-~79: It looks like ‘collections’ doesn’t match ‘a’. Did you mean “a collection” or just “collections”?
Context: ...rediction**: A submodule which contains a collections of tools for predicting dopants. ## Re...
(A_NNS_IN)
Line range hint 81-112
: LGTM! Comprehensive installation instructions
Clear instructions provided for multiple installation methods.
🧰 Tools
🪛 LanguageTool
[misspelling] ~24-~24: Possible spelling mistake found.
Context: ...a enough, nature will always confess_ - Roland Coase (from 'How should economists choose?') ...
(EN_MULTITOKEN_SPELLING_TWO)
[style] ~46-~46: Consider a shorter alternative to avoid wordiness.
Context: ...lied to generated lists of compositions in order to screen for particular properties. These...
(IN_ORDER_TO_PREMIUM)
[style] ~66-~66: Would you like to use the Oxford spelling “initialize”? The spelling ‘initialise’ is also correct.
Context: ...es the loading of external data used to initialise the core smact.Element
and `smact.Spe...
(OXFORD_SPELLING_Z_NOT_S)
[grammar] ~79-~79: It looks like ‘collections’ doesn’t match ‘a’. Did you mean “a collection” or just “collections”?
Context: ...rediction**: A submodule which contains a collections of tools for predicting dopants. ## Re...
(A_NNS_IN)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (8)
pyproject.toml (3)
41-58
: Consider relaxing the pymatgen version constraintThe current pymatgen constraint
>=2024.2.20,<2024.8.8
is quite restrictive and may cause dependency conflicts. Consider using a more flexible version range unless there are specific compatibility requirements.
79-82
: Fix formatting inconsistency in crystal_space dependenciesThe
crystal_space
dependencies list uses inconsistent formatting. Consider reformatting for better readability:-crystal_space = ["mp-api", -"ElementEmbeddings", -"umap-learn==0.5.3", -"kaleido"] +crystal_space = [ + "mp-api", + "ElementEmbeddings", + "umap-learn==0.5.3", + "kaleido", +]
124-124
: Address TODO comments in ruff configurationSeveral TODO comments indicate pending fixes:
- Line 124: Fix bare
except
usage- Line 136: Fix magic value comparisons
- Line 138: Fix global statement usage
- Line 140: Add match parameter to pytest.raises
Consider creating issues to track these improvements.
Would you like me to help create GitHub issues for tracking these improvements?
Also applies to: 136-136, 138-138, 140-140
smact/tests/test_utils.py (5)
99-113
: Ensure cleanup occurs even if the test fails.Currently, the cleanup code using
shutil.rmtree(save_mp_dir)
is executed at the end of the test. If an exception occurs before this point, the temporary directory may not be deleted. Consider usingself.addCleanup()
to register the cleanup action, ensuring it runs regardless of the test outcome.Apply this change:
def test_download_compounds_with_mp_api(self): save_mp_dir = "data/binary/mp_data" + self.addCleanup(shutil.rmtree, save_mp_dir) download_compounds_with_mp_api.download_mp_data( mp_api_key=os.environ.get("MP_API_KEY"), num_elements=2, max_stoich=1, save_dir=save_mp_dir, ) # Check if the data was downloaded self.assertTrue(os.path.exists(save_mp_dir)) self.assertTrue(len(os.listdir(save_mp_dir)) > 0) - # Clean up - shutil.rmtree(save_mp_dir)
135-135
: Incomplete commented-out assertion.Line 135 contains a commented-out assertion with incomplete code:
# self.assertEqual(filtered_df.loc[""])
. This might have been left unintentionally. Consider removing or completing this assertion to maintain code clarity.
137-154
: Consider using temporary files for test outputs.The test writes output files to the current directory and manually deletes them after the test. If the test fails before cleanup, these files may remain. Using the
tempfile
module orself.addCleanup()
can ensure proper cleanup even if the test fails.
156-160
: Avoid hardcoding expected lengths to prevent brittle tests.Hardcoding expected lengths (e.g.,
490
,358
,227
) may cause the tests to fail if the underlying data changes. Consider deriving expected lengths dynamically based on the input data or adding comments to explain why these specific values are expected.
168-170
: Potential brittleness due to hardcoded DataFrame values.Hardcoding the expected DataFrame shape and specific values (e.g., shape
(490, 2)
, species"O2-"
, count116910
) may make the tests brittle if the data set changes. Consider making the test more flexible or ensure that the data remains consistent across environments.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
pyproject.toml
(1 hunks)smact/tests/test_utils.py
(3 hunks)
🔇 Additional comments (10)
pyproject.toml (2)
3-40
: LGTM: Well-structured project metadata
The build system and project metadata are properly configured. The Python version requirements are consistently defined across both requires-python
and classifiers.
94-158
: Review security-related rule exclusions
Several security-related rules are currently ignored:
- S301: Unsafe pickle usage
- S311: Random module for cryptographic purposes
- S314: XML security
- S603, S607: Subprocess security
- S608: SQL injection
Consider reviewing these exclusions and implementing proper security measures where applicable.
✅ Verification successful
Let me analyze the specific code usage around these security patterns to make a more informed decision.
Security rule exclusions are appropriately managed
The security concerns in the ignored rules are adequately addressed in the codebase:
- Pickle usage is confined to test files only and not used in production code
- No subprocess usage was found in the codebase
- No XML parsing was detected
- SQL queries are properly parameterised using the
?
placeholder syntax, preventing SQL injection risks
The current rule exclusions appear justified as the potential security risks are either:
- Limited to test code (pickle)
- Not present in the codebase (subprocess, XML)
- Properly mitigated (SQL using parameterised queries)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for potential security issues in the codebase
echo "Checking for pickle usage..."
rg "pickle\." --type py
echo "Checking for subprocess usage..."
rg "subprocess\." --type py
echo "Checking for XML parsing..."
rg "xml\." --type py
echo "Checking for SQL query construction..."
rg "execute\(|cursor\.execute" --type py
Length of output: 1241
Script:
#!/bin/bash
# Check the context of pickle usage in tests
echo "Checking pickle usage context..."
rg -B 2 -A 2 "pickle\." smact/tests/test_structure.py
echo "Checking SQL query construction context..."
rg -B 2 -A 2 "execute\(" smact/structure_prediction/database.py
# Check if there are any parameterised queries
echo "Checking for parameterised queries..."
ast-grep --pattern 'execute($query, $params)'
Length of output: 1893
smact/tests/test_utils.py (8)
1-2
: LGTM!
The addition of from __future__ import annotations
enhances type hinting capabilities, allowing for postponed evaluation of type annotations.
3-4
: LGTM!
Imports of os
and shutil
are necessary for file operations in the new tests.
7-7
: LGTM!
Importing pandas as pd
is appropriate for handling DataFrame operations in the added tests.
8-8
: LGTM!
Importing SETTINGS
and Composition
from pymatgen.core
is suitable for the test functionalities.
13-14
: LGTM!
New imports from smact.utils.crystal_space
and smact.utils.oxidation
are required for the added test cases.
69-69
: LGTM!
The assertion correctly tests the composition with specified oxidation states, ensuring accurate functionality of comp_maker
.
95-97
: LGTM!
The use of @unittest.skipUnless
ensures the test runs only when an API key is available, preventing unnecessary failures.
115-118
: LGTM!
Setting up file paths for test data is appropriate and enhances test readability.
def test_convert_formula(self): | ||
combinations = [Element("Li"), Element("O")] | ||
expected_formulas = ["Li2O2", "LiO2", "Li2O", "Li2O2"] | ||
compounds = generate_composition_with_smact.convert_formula( | ||
combinations=combinations, num_elements=2, max_stoich=2 | ||
) | ||
self.assertListEqual(expected_formulas, compounds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate entry in expected_formulas
list.
The expected_formulas
list contains "Li2O2"
twice, which may be unintended. This could cause the test to fail if the compounds
list does not match due to the duplicate. Consider removing the duplicate entry to ensure the test accurately reflects the expected outputs.
Apply this diff to correct the expected_formulas
:
-expected_formulas = ["Li2O2", "LiO2", "Li2O", "Li2O2"]
+expected_formulas = ["Li2O2", "LiO2", "Li2O"]
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def test_convert_formula(self): | |
combinations = [Element("Li"), Element("O")] | |
expected_formulas = ["Li2O2", "LiO2", "Li2O", "Li2O2"] | |
compounds = generate_composition_with_smact.convert_formula( | |
combinations=combinations, num_elements=2, max_stoich=2 | |
) | |
self.assertListEqual(expected_formulas, compounds) | |
def test_convert_formula(self): | |
combinations = [Element("Li"), Element("O")] | |
expected_formulas = ["Li2O2", "LiO2", "Li2O"] | |
compounds = generate_composition_with_smact.convert_formula( | |
combinations=combinations, num_elements=2, max_stoich=2 | |
) | |
self.assertListEqual(expected_formulas, compounds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
smact/tests/test_utils.py (2)
137-153
: Remove or complete the commented assertion.There's an incomplete assertion that should either be completed or removed:
# self.assertEqual(filtered_df.loc[""])
173-176
: Consider using named constants for magic numbers.The threshold values and expected lengths could be defined as class constants for better maintainability.
Example refactor:
+ THRESHOLD_LENGTHS = { + 0: 490, + 5: 358, + 50: 227 + } + def test_oxidation_states_filter_species_list(self): - for threshold, length in [(0, 490), (5, 358), (50, 227)]: + for threshold, length in self.THRESHOLD_LENGTHS.items():
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
smact/tests/test_utils.py
(3 hunks)
🔇 Additional comments (5)
smact/tests/test_utils.py (5)
1-14
: LGTM! Well-organised imports.
The imports are properly structured and all imported modules are utilised in the test implementations.
95-129
: LGTM! Well-structured API-dependent tests.
The tests properly handle:
- API key verification
- Data generation and verification
- Proper cleanup of test data
132-134
: LGTM! Robust path handling.
The test file paths are properly constructed using platform-independent methods.
178-187
: LGTM! Well-structured test for species occurrences.
The test properly validates:
- DataFrame structure
- Column names
- Expected data values
87-93
:
Fix duplicate entry in expected formulas list.
The list contains 'Li2O2' twice, which could lead to inconsistent test results.
Apply this correction:
-expected_formulas = ["Li2O2", "LiO2", "Li2O", "Li2O2"]
+expected_formulas = ["Li2O2", "LiO2", "Li2O"]
Likely invalid or redundant comment.
Merge the develop branch
Description
Type of change
How Has This Been Tested?
Test Configuration:
Reviewers
N/A
Checklist:
Summary by CodeRabbit
Release Notes
Documentation
New Features
Bug Fixes
Improvements