From 5431f0ab8380fcbe21d20a0b80ef543727a07b6a Mon Sep 17 00:00:00 2001 From: GStechschulte Date: Fri, 29 Sep 2023 16:19:21 +0200 Subject: [PATCH 1/3] remove duplicate notebooks --- docs/notebooks/gallery.yml | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/docs/notebooks/gallery.yml b/docs/notebooks/gallery.yml index c9864ce68..455cd0f14 100644 --- a/docs/notebooks/gallery.yml +++ b/docs/notebooks/gallery.yml @@ -87,15 +87,7 @@ - title: Zero inflated models subtitle: When the outcome is mostly zeros and or is overdispersed href: zero_inflated_regression.ipynb - thumbnail: thumbnails/zero_inflated_pps.png - - title: Ordinal regression - subtitle: Model ordered category outcomes - href: ordinal_regression.ipynb - thumbnail: thumbnails/ordinal_regression.png - - title: Zero inflated models - subtitle: When the outcome is mostly zeros and or is overdispersed - href: zero_inflated_regression.ipynb - thumbnail: thumbnails/zero_inflated_pps.png + thumbnail: thumbnails/zip_model_pps.png - title: Ordinal regression subtitle: Model ordered category outcomes href: ordinal_regression.ipynb From c728c291ff76291dea8db4b40b2a5e3499b5e82f Mon Sep 17 00:00:00 2001 From: GStechschulte Date: Fri, 29 Sep 2023 16:20:02 +0200 Subject: [PATCH 2/3] grammer fixes and remove duplicate code cells --- docs/notebooks/ordinal_regression.ipynb | 41 +++---------------- docs/notebooks/zero_inflated_regression.ipynb | 2 +- 2 files changed, 6 insertions(+), 37 deletions(-) diff --git a/docs/notebooks/ordinal_regression.ipynb b/docs/notebooks/ordinal_regression.ipynb index 65db2c2fc..24e62209e 100644 --- a/docs/notebooks/ordinal_regression.ipynb +++ b/docs/notebooks/ordinal_regression.ipynb @@ -2,15 +2,14 @@ "cells": [ { "cell_type": "code", - "execution_count": 20, + "execution_count": 1, "metadata": {}, "outputs": [ { - "name": "stdout", + "name": "stderr", "output_type": "stream", "text": [ - "The autoreload extension is already loaded. To reload it, use:\n", - " %reload_ext autoreload\n" + "WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.\n" ] } ], @@ -24,9 +23,6 @@ "\n", "import bambi as bmb\n", "\n", - "%load_ext autoreload\n", - "%autoreload 2\n", - "\n", "warnings.filterwarnings(\"ignore\", category=FutureWarning)" ] }, @@ -397,7 +393,7 @@ "source": [ "Viewing the summary dataframe, we see a total of six `response_threshold` coefficients. Why six? Remember, we get the last parameter for free. Since there are seven categories, we only need six cutpoints. The index (using zero based indexing) of the `response_threshold` indicates the category that the threshold is associated with. Comparing to the empirical log-cumulative-odds computation above, the mean of the posterior distribution for each category is close to the empirical value.\n", "\n", - "As the the log cumalative link is used, we need to apply the inverse of the logit function to transform back to cumulative probabilities. Below, we plot the cumulative probabilities for each category. " + "As the the log cumulative link is used, we need to apply the inverse of the logit function to transform back to cumulative probabilities. Below, we plot the cumulative probabilities for each category. " ] }, { @@ -490,33 +486,6 @@ "plt.title(\"Posterior Probability of each response category\");" ] }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "fig, ax = plt.subplots(figsize=(7, 3))\n", - "for i in range(6):\n", - " outcome = expit_func(idata.posterior.response_threshold).sel(response_threshold_dim=i).to_numpy().flatten()\n", - " ax.hist(outcome, bins=15, alpha=0.5, label=f\"Category: {i}\")\n", - "ax.set_xlabel(\"Probability\")\n", - "ax.set_ylabel(\"Count\")\n", - "ax.set_title(\"Cumulative Probability by Response Category\")\n", - "ax.legend(bbox_to_anchor=(1.04, 1), loc=\"upper left\");" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -534,7 +503,7 @@ "\n", "$$\\eta = \\beta_1 x_1 + \\beta_2 x_2 +, . . ., \\beta_n x_n$$\n", "\n", - "where $\\epsilon$ is an error term. Notice how similar this looks to an ordinary linear model. However, there is no intercept or error term. This is because the intercept is replaced by the threshold $\\tau$ and the error term $\\epsilon$ is added seperately to obtain\n", + "Notice how similar this looks to an ordinary linear model. However, there is no intercept or error term. This is because the intercept is replaced by the threshold $\\tau$ and the error term $\\epsilon$ is added seperately to obtain\n", "\n", "$$Z = \\eta + \\epsilon$$ \n", "\n", diff --git a/docs/notebooks/zero_inflated_regression.ipynb b/docs/notebooks/zero_inflated_regression.ipynb index 03e18d3ce..409616061 100644 --- a/docs/notebooks/zero_inflated_regression.ipynb +++ b/docs/notebooks/zero_inflated_regression.ipynb @@ -1794,7 +1794,7 @@ "\n", "In this notebook, two classes of models (ZIP and hurdle Poisson) for modeling zero-inflated data were presented and implemented in Bambi. The difference of the data generating process between the two models differ in how zeros are generated. The ZIP model uses a distribution that mixes two data generating processes. The first process generates zeros, and the second process uses a Poisson distribution to generate counts (of which some may be zero). The hurdle Poisson also uses two data generating processes, but doesn't \"mix\" them. A process is used for generating zeros such as a binary model for modeling whether the response variable is zero or not, and a second process for modeling the counts. These two proceses are independent of each other.\n", "\n", - "The datset used to demonstrate the two models had a large number of zeros. These zeros appeared because the group doesn't fish, or because they fished, but caught zero fish. Because zeros could be generated due to two different reasons, the ZIP model, which allows zeros to be generated from a mixture of processes, seems to be more appropriate for this datset." + "The dataset used to demonstrate the two models had a large number of zeros. These zeros appeared because the group doesn't fish, or because they fished, but caught zero fish. Because zeros could be generated due to two different reasons, the ZIP model, which allows zeros to be generated from a mixture of processes, seems to be more appropriate for this datset." ] }, { From a5e83e46ab1a8ca32cea7ea6ad6571d266a07374 Mon Sep 17 00:00:00 2001 From: GStechschulte Date: Fri, 29 Sep 2023 16:27:35 +0200 Subject: [PATCH 3/3] add ordinal and ZIP model to GLM section in _quarto.yml --- docs/_quarto.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/_quarto.yml b/docs/_quarto.yml index 0acdc5350..fe1c64c8d 100644 --- a/docs/_quarto.yml +++ b/docs/_quarto.yml @@ -65,6 +65,8 @@ website: - notebooks/circular_regression.ipynb - notebooks/quantile_regression.ipynb - notebooks/mister_p.ipynb + - notebooks/ordinal_regression.ipynb + - notebooks/zero_inflated_regression.ipynb - section: More advanced models contents: - notebooks/distributional_models.ipynb