Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panel Data Vignette (Issue 99) #115

Merged
merged 54 commits into from
Apr 27, 2024
Merged

Panel Data Vignette (Issue 99) #115

merged 54 commits into from
Apr 27, 2024

Conversation

mgyliu
Copy link
Contributor

@mgyliu mgyliu commented Aug 3, 2022

Addresses issue #99

Changes:

  • New dataset for statcan employment panel data
  • Vignette demonstrating:
    • Cleaning/putting data in epi_df format
    • Using the non-epi epi_df with epi_recipe and epi_workflow
    • Predicting with canned forecasters

@mgyliu mgyliu force-pushed the ml-99-panel-data-vignette branch 2 times, most recently from bda4ea5 to fb030b5 Compare August 12, 2022 00:10
@mgyliu mgyliu changed the title [WIP] [Issues 99 & 114] Panel Data Vignette & epi_keys_mold fix [WIP] [Issues 99] Panel Data Vignette Aug 12, 2022
@mgyliu mgyliu force-pushed the ml-99-panel-data-vignette branch from 1fd96bb to 0b797b0 Compare August 15, 2022 23:16
@mgyliu mgyliu changed the title [WIP] [Issues 99] Panel Data Vignette Panel Data Vignette (Issue 99) Aug 17, 2022
R/data.R Outdated Show resolved Hide resolved
@mgyliu mgyliu marked this pull request as ready for review August 17, 2022 16:56
@mgyliu mgyliu requested a review from dajmcdon as a code owner August 17, 2022 16:56
@dajmcdon
Copy link
Contributor

I gave this a quick once-over. I think the idea is great.

Major comments:

  • For the panel data model you built, can you be more specific/didactic about exactly the model you're fitting? It's perfectly safe to use math. Maybe also do an alternative model with a few more complexities?
  • After fitting, can you illustrate the standard model investigations one would undertake for a linear model? Examine fitted/observed values. Look at the coefficients and their CIs, etc. Stuff you'd do in an undergrad linear models class.

Minor:

  • There are a few typos I noticed. We can clean up later. The only important one is I think you used lag=c(0,1,1) instead of lag=0:2.

@mgyliu mgyliu force-pushed the ml-99-panel-data-vignette branch 2 times, most recently from 615cde4 to e6f62ec Compare September 15, 2022 09:04
@mgyliu
Copy link
Contributor Author

mgyliu commented Sep 15, 2022

Thanks for the feedback & sorry this took so long - could you take another look @dajmcdon?

@dajmcdon dajmcdon mentioned this pull request Feb 3, 2024
@dajmcdon
Copy link
Contributor

dajmcdon commented Feb 6, 2024

Blocked by #291

@dajmcdon dajmcdon mentioned this pull request Mar 18, 2024
4 tasks
@dajmcdon dajmcdon requested a review from rachlobay April 9, 2024 22:45
@dajmcdon dajmcdon mentioned this pull request Apr 12, 2024
18 tasks
Copy link
Contributor

@rachlobay rachlobay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty great. A couple of minor things:

  1. Suppress messages & warnings for the first code chunk (else we get a bunch of attach package messages, at least when I knit the .Rmd)
  2. I made a couple of very minor changes to the sentences & fixed a few typos
  3. The size of the model diagnostics plot appears too large/elongated when I knitted the file & look at the output? Maybe that’s just an issue on my computer, but I’ve resized it in a simpler way, just in case that appears like that for others as well.

I’ve fixed these minor changes & pushed the updates to Github. I have not fixed the following more important things yet because I think that they require your input...

Important things for you to weigh in on (presented by section):

Model fitting and prediction:

  • The intercept and other values discussed below the model output seem wrong (unless they are transformed or something?). Also, see the sentence on what coefficients are significantly greater than zero.

  • Is that correct to say…”lags at 2 years and 3 years ago have coefficients significantly greater than zero.” Because isn’t the maximum lag used correspond to 2 years ago? The current way of talking about lags also comes up in the section titled Model fitting & postprocessing.

Autoregressive model with exogenous inputs:

  • The model form seems wrong (currently shows interaction term, shouldn’t it be +)?

Model fitting & postprocessing:

  • I think that the model processing steps should be introduced and enumerated in the order they are performed (else it gets confusing to have the current out-of-order presentation).

  • I don’t see that the conclusions on significance are correct? Because, from a quick inspection, I thought that typically in R model output, one asterisk typically means “p < . 05”.  If yeah, then there’s more/different terms that are significant than what’s currently indicated in the discussion of the model summary,

Flatline forecaster:

Overall thoughts:

  • Related to my last point about the flatline forecaster presentation, I don’t see any location indices used anywhere or hats to indicate predictions? If we really want to be precise & consistent with other vignettes, presentations and tooling book chapters, it is probably good to include those. Refer to the previous link on the presentation you guys gave for an example of this.
  • There seems to be a general lack of plots… Currently, I see a line plot at the beginning when exploring the data and a model diagnostics plot. It may be nice to show how to look at/use the predictions for panel data instead of just showing the reader the code.
  • Related to my previous point: The vignette ends a little abruptly for me… Perhaps we should add a short summary at the end or a sentence or two after the final code block to suggest something for the reader to try on their own. Or we could expand the last section a bit (ex. add good plot to display for these results & discuss them briefly? A reader may find that useful and a nice way to end things).

I may be wrong about these points, but I think they are worth briefly talking about before merging.

@dajmcdon
Copy link
Contributor

@rachlobay All excellent points. I think I've hit them all. Would you mind giving it a quick once-over if you have a chance?

@dajmcdon dajmcdon requested a review from rachlobay April 26, 2024 18:24
@rachlobay
Copy link
Contributor

Looks great! I'll merge now

@rachlobay rachlobay merged commit 99c30c6 into dev Apr 27, 2024
3 checks passed
@dajmcdon dajmcdon linked an issue May 23, 2024 that may be closed by this pull request
@dajmcdon dajmcdon deleted the ml-99-panel-data-vignette branch September 20, 2024 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vignette - panel data demos
3 participants