Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mnlogit weird standard errors #14

Open
vincentarelbundock opened this issue Jun 29, 2023 · 2 comments
Open

mnlogit weird standard errors #14

vincentarelbundock opened this issue Jun 29, 2023 · 2 comments

Comments

@vincentarelbundock
Copy link
Owner

No description provided.

@artiom-matvei
Copy link
Contributor

artiom-matvei commented Nov 19, 2024

I noted this too while working on fixing the mnlogit tests. It is caused by Jacobians actually. See the difference below.
These values can be reproduced by running the code provided below for Python and for R ( (and debugging both).

Python version

###test_comparisons_01() in mnlogit
    penguins_clean = penguins_with_nulls.select(
        ["island", "bill_length_mm", "flipper_length_mm"]
    ).drop_nulls()

    # Define island categories and create a mapping
    island_categories = ["Biscoe", "Dream", "Torgersen"]
    island_mapping = {island: code for code, island in enumerate(island_categories)}

    # Map 'island' to integer codes
    penguins_clean = penguins_clean.with_columns(
        pl.col("island").replace_strict(island_mapping)
    )

    mod = smf.mnlogit(
        "island ~ bill_length_mm + flipper_length_mm", data=penguins_clean
    ).fit()
    unknown = (
        comparisons(mod)
        .with_columns(pl.col("group").replace(island_mapping))
        .sort(["rowid", "term", "group"])
    )
    known = (
        pl.read_csv("tests/r/test_statsmodels_mnlogit_comparisons_01.csv")
        .with_columns(pl.col("group").replace(island_mapping))
        .sort(["rowid", "term", "group"])
    )
    new_column_names = {col: col.replace('.', '_') for col in known.columns}
    known = known.rename(new_column_names)
    assert_series_equal(known["estimate"].head(), unknown["estimate"].head(), rtol=2)
# python jacobian
    array([[ 3.26007929e-03, -1.11398030e+05, -1.07005746e+07,
         7.82170746e+07,  2.78495302e+07,  1.34757804e+00],
       [ 3.00860803e+05,  1.11398009e+05,  3.15899457e+07,
         4.00666771e+07,  1.42659178e+07, -5.02687187e+00],
       [-3.00860806e+05,  2.03275902e-02, -2.08893711e+07,
        -1.18283752e+08, -4.21154480e+07,  3.67929389e+00],
       ...,
       [ 1.94566192e-02,  6.31280874e+04,  6.06390551e+06,
         6.95824154e+07,  2.47751212e+07,  4.27175339e-01],
       [ 1.19234156e+05, -6.31280929e+04,  1.25194134e+07,
        -2.27053711e+07, -8.08434379e+06, -1.51166669e+00],
       [-1.19234176e+05,  5.47722901e-03, -1.85833189e+07,
        -4.68770444e+07, -1.66907774e+07,  1.08449133e+00]])

R version

#R jacobian
                 [,1]          [,2]          [,3]          [,4]          [,5]          [,6]
   [1,]  4.211085e-03  0.1129098874  7.622065e-01  6.449953e-03  2.390449e-01  1.167441e+00
   [2,]  2.317994e-03 -0.0002889116  4.311471e-01  1.240511e-02  4.599770e-01  2.307350e+00
   [3,] -1.458913e-02 -0.7474941310 -2.844880e+00  2.018562e-02  7.334876e-01  3.936196e+00
   [4,] -1.768363e-02 -0.7516309380 -3.412940e+00  1.977302e-02  5.853414e-01  3.816193e+00
   [5,] -6.148701e-03 -0.3694016210 -1.168253e+00  1.883557e-02  6.798748e-01  3.578758e+00
   [6,]  3.851085e-03  0.0970890213  6.970463e-01  6.842971e-03  2.519078e-01  1.238578e+00
   [7,] -1.807982e-02 -0.8520217380 -3.525564e+00  2.090703e-02  7.173019e-01  4.076872e+00
   [8,] -1.324527e-02 -0.5110488046 -2.556336e+00  1.326720e-02  2.654043e-01  2.560569e+00
   [9,]  7.328301e-03  0.1888463791  1.392377e+00  1.094683e-02  4.361391e-01  2.079898e+00
  [10,] -4.511196e-03 -0.2639279320 -8.390823e-01  1.642779e-02  5.682850e-01  3.055570e+00
  ...
# Load necessary packages
library(nnet)
library(dplyr)
library(marginaleffects)

# Load and prepare the data
penguins <- read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv")
penguins_clean <- penguins %>%
  select(island, bill_length_mm, flipper_length_mm) %>%
  na.omit()
penguins_clean$island <- relevel(factor(penguins_clean$island), ref = "Biscoe")

# Fit the model
model_r <- multinom(island ~ bill_length_mm  + flipper_length_mm, data = penguins_clean)

# Extract and display coefficients in case the models do not match
comparisons(model_r)

@vincentarelbundock
Copy link
Owner Author

I'm not sure exactly and will have to invest a lot of time to figure things out. I don't have time now, so do what you can on this issue, then post something on Github to explain the state of work and things. I'll take a look when I can.

Then, you can move on to other things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants