-
Notifications
You must be signed in to change notification settings - Fork 23
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #189 from WMD-group/Crystal_Space
crystal_space
- Loading branch information
Showing
5 changed files
with
1,001 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,210 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Exploring Chemical Space with SMACT and Materials Project Database" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"In this notebook, we undertake a comprehensive exploration of binary chemical compositions. This approach can also be extended to explore ternary and quaternary compositions. Our methodology involves two primary tools: the SMACT filter for generating compositions and the Materials Project database for additional data acquisition. \n", | ||
"\n", | ||
"The final phase will categorize the compositions into four distinct categories based on their properties. The categorization is based on whether a composition is allowed by the SMACT filter (smact_allowed) and whether it is present in the Materials Project database (mp). The categories are as follows:\n", | ||
"\n", | ||
"| smact_allowed | mp | label |\n", | ||
"|---------------|------|------------|\n", | ||
"| yes | yes | standard |\n", | ||
"| yes | no | missing |\n", | ||
"| no | yes | interesting|\n", | ||
"| no | no | unlikely |" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## 1. Generate compositions with the SMACT filter" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We begin by generating binary compositions using the SMACT filter. The SMACT filter serves as a chemical filter including oxidation states and electronegativity test.\n", | ||
"\n", | ||
"[`generate_composition_with_smact`](./generate_composition_with_smact.py) function generates a composition with the SMACT filter. The function takes in the following parameters:\n", | ||
"\n", | ||
"num_elements: number of elements in the composition\n", | ||
"\n", | ||
"max_stoich: maximum stoichiometry of each element\n", | ||
"\n", | ||
"max_atomic_num: maximum atomic number of each element\n", | ||
"\n", | ||
"num_processes: number of processes to run in parallel\n", | ||
"\n", | ||
"save_path: path to save the dataframe containing the compositions with the SMACT filter" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from generate_composition_with_smact import generate_composition_with_smact" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"df_smact = generate_composition_with_smact(\n", | ||
" num_elements=2,\n", | ||
" max_stoich=8,\n", | ||
" max_atomic_num=103,\n", | ||
" num_processes=8,\n", | ||
" save_path=\"data/binary/df_binary_label.pkl\",\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# 2. Download data from the Materials Project database" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Next, we download data from the Materials Project database using the `MPRester` class from the [`pymatgen`](https://pymatgen.org/) library. \n", | ||
"\n", | ||
"[`download_mp_data`](./download_compounds_with_mp_api.py) function takes in the following parameters:\n", | ||
"\n", | ||
"mp_api_key: Materials Project API key\n", | ||
"\n", | ||
"num_elements: number of elements in the composition\n", | ||
"\n", | ||
"max_stoich: maximum stoichiometry of each element\n", | ||
"\n", | ||
"save_dir: path to save the downloaded data" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"mp_api_key = None # replace with your own MP API key" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from download_compounds_with_mp_api import download_mp_data\n", | ||
"\n", | ||
"# download data from MP for binary compounds\n", | ||
"save_mp_dir = \"data/binary/mp_data\"\n", | ||
"docs = download_mp_data(\n", | ||
" mp_api_key=mp_api_key,\n", | ||
" num_elements=2,\n", | ||
" max_stoich=8,\n", | ||
" save_dir=save_mp_dir,\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## 3. Categorize compositions" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Finally, we categorize the compositions into four lables: standard, missing, interesting, and unlikely." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from pathlib import Path\n", | ||
"import pandas as pd" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"mp_data = {p.stem: True for p in Path(save_mp_dir).glob(\"*.json\")}\n", | ||
"df_mp = pd.DataFrame.from_dict(mp_data, orient=\"index\", columns=[\"mp\"])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# make category dataframe\n", | ||
"df_category = df_smact.join(df_mp, how=\"left\").fillna(False)\n", | ||
"# make label for each category\n", | ||
"dict_label = {\n", | ||
" (True, True): \"standard\",\n", | ||
" (True, False): \"missing\",\n", | ||
" (False, True): \"interesting\",\n", | ||
" (False, False): \"unlikely\",\n", | ||
"}\n", | ||
"df_category[\"label\"] = df_category.apply(\n", | ||
" lambda x: dict_label[(x[\"smact_allowed\"], x[\"mp\"])], axis=1\n", | ||
")\n", | ||
"df_category[\"label\"].apply(dict_label.get)\n", | ||
"\n", | ||
"# count number of each label\n", | ||
"print(df_category[\"label\"].value_counts())\n", | ||
"\n", | ||
"# save dataframe\n", | ||
"df_category.to_pickle(\"data/binary/df_binary_category.pkl\")" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "smact", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.8.18" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Oops, something went wrong.