Merge pull request #189 from WMD-group/Crystal_Space

crystal_space
WMD-group · Dec 6, 2023 · 7d1ee94 · 7d1ee94
2 parents 133b065 + 1bda7be
commit 7d1ee94
Show file tree

Hide file tree

Showing 5 changed files with 1,001 additions and 0 deletions.
diff --git a/examples/Crystal_Space/0_screening.ipynb b/examples/Crystal_Space/0_screening.ipynb
@@ -0,0 +1,210 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exploring Chemical Space with SMACT and Materials Project Database"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this notebook, we undertake a comprehensive exploration of binary chemical compositions. This approach can also be extended to explore ternary and quaternary compositions. Our methodology involves two primary tools: the SMACT filter for generating compositions and the Materials Project database for additional data acquisition. \n",
+    "\n",
+    "The final phase will categorize the compositions into four distinct categories based on their properties. The categorization is based on whether a composition is allowed by the SMACT filter (smact_allowed) and whether it is present in the Materials Project database (mp). The categories are as follows:\n",
+    "\n",
+    "| smact_allowed | mp   | label      |\n",
+    "|---------------|------|------------|\n",
+    "| yes           | yes  | standard   |\n",
+    "| yes           | no   | missing    |\n",
+    "| no            | yes  | interesting|\n",
+    "| no            | no   | unlikely   |"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Generate compositions with the SMACT filter"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We begin by generating binary compositions using the SMACT filter. The SMACT filter serves as a chemical filter including oxidation states and electronegativity test.\n",
+    "\n",
+    "[`generate_composition_with_smact`](./generate_composition_with_smact.py) function generates a composition with the SMACT filter. The function takes in the following parameters:\n",
+    "\n",
+    "num_elements: number of elements in the composition\n",
+    "\n",
+    "max_stoich: maximum stoichiometry of each element\n",
+    "\n",
+    "max_atomic_num: maximum atomic number of each element\n",
+    "\n",
+    "num_processes: number of processes to run in parallel\n",
+    "\n",
+    "save_path: path to save the dataframe containing the compositions with the SMACT filter"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from generate_composition_with_smact import generate_composition_with_smact"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_smact = generate_composition_with_smact(\n",
+    "    num_elements=2,\n",
+    "    max_stoich=8,\n",
+    "    max_atomic_num=103,\n",
+    "    num_processes=8,\n",
+    "    save_path=\"data/binary/df_binary_label.pkl\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 2. Download data from the Materials Project database"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, we download data from the Materials Project database using the `MPRester` class from the [`pymatgen`](https://pymatgen.org/) library. \n",
+    "\n",
+    "[`download_mp_data`](./download_compounds_with_mp_api.py) function takes in the following parameters:\n",
+    "\n",
+    "mp_api_key: Materials Project API key\n",
+    "\n",
+    "num_elements: number of elements in the composition\n",
+    "\n",
+    "max_stoich: maximum stoichiometry of each element\n",
+    "\n",
+    "save_dir: path to save the downloaded data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "mp_api_key = None  # replace with your own MP API key"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from download_compounds_with_mp_api import download_mp_data\n",
+    "\n",
+    "# download data from MP for binary compounds\n",
+    "save_mp_dir = \"data/binary/mp_data\"\n",
+    "docs = download_mp_data(\n",
+    "    mp_api_key=mp_api_key,\n",
+    "    num_elements=2,\n",
+    "    max_stoich=8,\n",
+    "    save_dir=save_mp_dir,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Categorize compositions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally, we categorize the compositions into four lables: standard, missing, interesting, and unlikely."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pathlib import Path\n",
+    "import pandas as pd"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "mp_data = {p.stem: True for p in Path(save_mp_dir).glob(\"*.json\")}\n",
+    "df_mp = pd.DataFrame.from_dict(mp_data, orient=\"index\", columns=[\"mp\"])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# make category dataframe\n",
+    "df_category = df_smact.join(df_mp, how=\"left\").fillna(False)\n",
+    "# make label for each category\n",
+    "dict_label = {\n",
+    "    (True, True): \"standard\",\n",
+    "    (True, False): \"missing\",\n",
+    "    (False, True): \"interesting\",\n",
+    "    (False, False): \"unlikely\",\n",
+    "}\n",
+    "df_category[\"label\"] = df_category.apply(\n",
+    "    lambda x: dict_label[(x[\"smact_allowed\"], x[\"mp\"])], axis=1\n",
+    ")\n",
+    "df_category[\"label\"].apply(dict_label.get)\n",
+    "\n",
+    "# count number of each label\n",
+    "print(df_category[\"label\"].value_counts())\n",
+    "\n",
+    "# save dataframe\n",
+    "df_category.to_pickle(\"data/binary/df_binary_category.pkl\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "smact",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.18"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}