Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpreting ppmlhdfe (Poisson FE) fixed effect estimates #16

Open
mariofiorini opened this issue Feb 16, 2023 · 2 comments
Open

Interpreting ppmlhdfe (Poisson FE) fixed effect estimates #16

mariofiorini opened this issue Feb 16, 2023 · 2 comments

Comments

@mariofiorini
Copy link

Dear Sergio, Paulo and Thomas,
I have posted this on Statalist, but GitHub might be the better place to so.

I am using your ppmlhdfe command (thanks for the time spent putting this together!)
The goal is estimating a Poisson model with many levels of fixed effects (i.e. 4 categorical variables some of which are also interacted) that fails to converge using the conventional Poisson command, or even glm .. family(Poisson).

The ppmlhdfe command works well in the sense that i) it converges and ii) it is very fast. It does so by dropping singletons/separated observations.
In the specifications where the Poisson command also converged, the point estimates are identical.

Next, I am trying to do some out of sample prediction, which the command does not allow for, so must be done manually by adding the estimated fixed effects.
Here I am having some trouble understanding the output.

Example with only one binary FE and no other covariate:
Code:

sysuse auto.dta, clear
ppmlhdfe price, absorb(foreign, savefe) d(sumFE)

The options
absorb(..., savefe) save all fixed effect estimates with __hdfe as prefix
d(newvar) save sum of fixed effects as newvar; mandatory if running predict afterwards (except for predict,xb)
Code:

. ppmlhdfe price, absorb(foreign, savefe) d(sumFE)
Iteration 1:   deviance = 8.7262e+04  eps = .         iters = 1    tol = 1.0e-04  min(eta) =   0.75  PS  
Iteration 2:   deviance = 8.6958e+04  eps = 3.50e-03  iters = 1    tol = 1.0e-04  min(eta) =   0.72   S  
Iteration 3:   deviance = 8.6958e+04  eps = 6.06e-07  iters = 1    tol = 1.0e-04  min(eta) =   0.72   S  
Iteration 4:   deviance = 8.6958e+04  eps = 2.13e-14  iters = 1    tol = 1.0e-05  min(eta) =   0.72   S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
Converged in 4 iterations and 4 HDFE sub-iterations (tol = 1.0e-08)
 
HDFE PPML regression                              No. of obs      =         74
Absorbing 1 HDFE group                            Residual df     =         72
                                                  Wald chi2(0)    =          .
Deviance             =  86958.07836               Prob > chi2     =          .
Log pseudolikelihood =  -43866.7452               Pseudo R2       =     0.0028
------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   8.726951   .0555475   157.11   0.000      8.61808    8.835822
------------------------------------------------------------------------------
 
Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
     foreign |         2           0           2     |
-----------------------------------------------------+
 
. tab __hdfe1__
 
       [FE] |
  1.foreign |      Freq.     Percent        Cum.
------------+-----------------------------------
  -.0154382 |         52       70.27       70.27
   .0347057 |         22       29.73      100.00
------------+-----------------------------------
      Total |         74      100.00
 
 
. tab sumFE
 
     Sum of |
      fixed |
    effects |      Freq.     Percent        Cum.
------------+-----------------------------------
  -.0160663 |          1        1.35        1.35
  -.0159765 |          1        1.35        2.70
  -.0159186 |          1        1.35        4.05
  -.0159104 |          1        1.35        5.41
  -.0157847 |          1        1.35        6.76
  -.0157775 |          1        1.35        8.11
  -.0157128 |          1        1.35        9.46
  -.0157128 |          1        1.35       10.81
  -.0156133 |          1        1.35       12.16
  -.0155503 |          1        1.35       13.51
  -.0154646 |          1        1.35       14.86
  -.0154554 |          1        1.35       16.22
  -.0154529 |          1        1.35       17.57
  -.0154441 |          1        1.35       18.92
  -.0154263 |          1        1.35       20.27
  -.0154207 |          1        1.35       21.62
    -.01542 |          1        1.35       22.97
  -.0154147 |          1        1.35       24.32
  -.0153939 |          1        1.35       25.68
  -.0153839 |          1        1.35       27.03
  -.0153818 |          1        1.35       28.38
  -.0153807 |          1        1.35       29.73
  -.0153764 |          1        1.35       31.08
  -.0153655 |          1        1.35       32.43
  -.0153627 |          1        1.35       33.78
   -.015358 |          1        1.35       35.14
  -.0153537 |          1        1.35       36.49
  -.0153527 |          1        1.35       37.84
   -.015352 |          1        1.35       39.19
  -.0153472 |          1        1.35       40.54
  -.0153388 |          1        1.35       41.89
   -.015338 |          1        1.35       43.24
  -.0153366 |          1        1.35       44.59
  -.0153348 |          1        1.35       45.95
   -.015333 |          1        1.35       47.30
  -.0153329 |          1        1.35       48.65
  -.0153307 |          1        1.35       50.00
  -.0153183 |          1        1.35       51.35
  -.0153178 |          1        1.35       52.70
  -.0153174 |          1        1.35       54.05
  -.0153168 |          1        1.35       55.41
  -.0153122 |          1        1.35       56.76
  -.0153111 |          1        1.35       58.11
  -.0153097 |          1        1.35       59.46
  -.0153065 |          1        1.35       60.81
  -.0153048 |          1        1.35       62.16
   -.015303 |          1        1.35       63.51
  -.0152949 |          1        1.35       64.86
   -.015293 |          1        1.35       66.22
  -.0152846 |          1        1.35       67.57
  -.0152611 |          1        1.35       68.92
  -.0152606 |          1        1.35       70.27
   .0345068 |          1        1.35       71.62
   .0345368 |          1        1.35       72.97
   .0346048 |          1        1.35       74.32
   .0346062 |          1        1.35       75.68
   .0346532 |          1        1.35       77.03
   .0346829 |          1        1.35       78.38
   .0346917 |          1        1.35       79.73
   .0347084 |          1        1.35       81.08
   .0347104 |          1        1.35       82.43
   .0347203 |          1        1.35       83.78
   .0347233 |          1        1.35       85.14
   .0347257 |          1        1.35       86.49
   .0347354 |          1        1.35       87.84
    .034745 |          1        1.35       89.19
   .0347565 |          1        1.35       90.54
   .0347597 |          1        1.35       91.89
   .0347624 |          1        1.35       93.24
   .0347686 |          1        1.35       94.59
   .0347776 |          1        1.35       95.95
   .0347806 |          1        1.35       97.30
   .0347836 |          1        1.35       98.65
   .0347851 |          1        1.35      100.00
------------+-----------------------------------
      Total |         74      100.00

So, I don’t understand why:

  1. Despite foreign being a binary variable, there is a __hdfe1__ estimate for each of its two values as well as an estimate for the constant. How are the values determined?
  2. Despite only one FE being used, __hdfe1__ and sumFE are not the same
  3. sumFE seems to be different for every observation
  4. when comparing against a standard Poisson command, the predicted values are (slightly) different, using either estimate of the FE. Note that in this simple example ppmlhdfe does not drop any observation.

Code:

predict yhat_ppmlhdfe
gen yhat_ppmlhdfe_manual = exp(_b[_cons] + __hdfe1__)
 
poisson price i.foreign
predict yhat_poisson
 
. su yhat*
 
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
yhat_ppmlh~e |         74    6165.257    143.7014    6068.61   6385.188
yhat_ppmlh~l |         74    6165.257    143.6977   6072.423   6384.682
yhat_poisson |         74    6165.257    143.6979   6072.423   6384.682
 
. 
. compare  yhat_ppmlhdfe yhat_poisson
 
                                        ---------- difference ----------
                            count       minimum      average     maximum
------------------------------------------------------------------------
yhat_pp~e<yhat_po~n            21     -3.812361    -1.270896   -.0355738
yhat_pp~e>yhat_po~n            53      .0171664     .5039798    1.079185
                       ----------
jointly defined                74     -3.812361     .0002987    1.079185
                       ----------
total                          74
 
. compare  yhat_ppmlhdfe_manual yhat_poisson
 
                                        ---------- difference ----------
                            count       minimum      average     maximum
------------------------------------------------------------------------
yhat_pp~l=yhat_po~n            22
yhat_pp~l>yhat_po~n            52      .0004883     .0004883    .0004883
                       ----------
jointly defined                74             0     .0003431    .0004883
                       ----------
total                          74

Any help would be great.

@sergiocorreia
Copy link
Owner

Hi Mario,

Thanks for spotting this. While I don't have a full answer yet, let me tell you what I know:

Despite foreign being a binary variable, there is a hdfe1 estimate for each of its two values as well as an estimate for the constant. How are the values determined?

With fixed effects, the constant is not materially relevant for reghdfe and ppmlhdfe, as the fixed effects made it unnecessary.
However, people often asked for it, so what we did for reghdfe (and copied for ppmlhdfe) was to make it so the first set of fixed effects has a mean of zero, and assign its previous mean as a constant.

For instance:

sysuse auto

* Coefs are 8.711 and 8.761
glm price ibn.foreign, family(poisson) link(log) noconstant

* Replicate these numbers
ppmlhdfe price, a(FE=foreign)
replace FE = FE + _b[_cons]
tab FE

Here, we can recover the same coefs as you would get with poisson or glm by adding back the constant.

Despite only one FE being used, hdfe1 and sumFE are not the same

You are correct, there seems to be a bit of numerical inaccuracy in sumFE, but I haven't been able to find any bug (yet).

sumFE seems to be different for every observation when comparing against a standard Poisson command, the predicted values are (slightly) different, using either estimate of the FE. Note that in this simple example ppmlhdfe does not drop any observation.

Yes, that's also what I get. If I had to speculate, I think the problem might be related to inaccuracies caused by data standardization. For instance, compare the two following commands:

* Default is to standardize data
sysuse auto, clear
ppmlhdfe price, a(FE=foreign) d(d) standardize_data(1)
tab FE d

sysuse auto, clear
ppmlhdfe price, a(FE=foreign) d(d) standardize_data(0)
tab FE d

In the first one, there six values of d, and in the second we correctly get only two values, which correspond to those of FE. We currently use a fast and not very accurate method for standardization (here as accuracy doesn't really matter here, but perhaps it does for ppmlhdfe (where there are lots of log and exp functions).

I'll keep researching on this, but in the mean time, depending on your code, it might work if you disable standardization as above.

Best,
S

@mariofiorini
Copy link
Author

ok, thanks Sergio. That clarifies it. Cheers, Mario

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants