-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Un expected behavior when predicting spatial regression models #47
Comments
Could you perhaps provide a minimal reproducible example? Might something in these lecture notes help https://rsbivand.github.io/PG_AGII_2sem/11_talk.html ? |
Thank you very much for the link. I will try to produce minimal data for a reproducible example as soon as possible, as the example you linked to seems to be the same situation. I'll try it as soon as possible. I still have to see what is the best way to provide the minimal data and how to look them here. |
Broke my hand so coding is slow. Still trying to understand my bug and produce minimal example, sorry for the delay |
Dear Roger, I've been running some SAR models with the spatialreg package to analyze raster data in a project which main objective is to improve forest structural models by incorporating data on recent forest disturbance. The SAR models are working fine but I would like to use them for making predictions. Thus I split the data into a training data set (70% of data) and a test data set (30% of data), ran a SAR model using the errorsarlm function with the training data set and use the predict function to predict values in the test data set. Although everything runs, it looks like the predict function ignores the spatial component when using a new data set for making predictions. I was wondering if I'm doing something wrong here and if you have any suggestions for making predictions with SAR models using training and testing datasets. I've used the examples in your lectures for writing this script and I also read the papers you recommended on GitHub Issue #45. Adriana lai_df.csv
|
Fitting a spatial error model may change the fitted regression coefficients compared to least squares, so the prediction will differ. The spatial term applies to the error, which is not observed, so as we saw from email exchange in April, there is no clear path forward. May I add (some of) my email comments to this thread? Since April, I gave a talk in part about this, slides at: https://rsbivand.github.io/nem24_talk/, sources at https://github.com/rsbivand/nem24_talk, talk recording at: https://nhh.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=d26410ee-6243-48ce-96dd-b18400beb764, and see rsbivand/nem24_talk#1. |
Dear Roger, Adriana |
Comments from email in April:
|
Thank you for the great spatial tools you provide to the community.
Maybe I'm doing something wrong as I'm rather new to spatial regressions, but here is a behavior I find strange.
if I fit a spatial regression model the way it is suggested in the man page, it works and I get some fitted values in the object I get. For instance :
formula1<-paste0( "to_predict ~ predictor_1 + predictor_2") fit.sem1<-lagsarlm(formula1, data=df_b, listw = neighbs_weights_b)
will work. Then my
fit.sem
object has a$fitted.values
and it seems everything works. df_b is an sf object, and neghbs_weights_b have been produced as intended.But when I try to use this model to predict my dependent variable on another location, using something like :
predict.sem2.arras<-predict(object=fit.sem2, newdata = df_a, listw = neighbs_weights_a, pred.type = "TS", legacy.mixed=TRUE, power=TRUE)
I get an error message :
It is a bit confusing, as I thought my data was wrong, like missing one of the predictors. I checked the way the weights are provided, including how the row.names need to be specified so that they are different in the training area and the testing area.
I found, thanks to the code of an article by Thibault Laurent (thank him for making it available), that the proper use seems to be:
All in all, as it is working, maybe it is just that I'm not aware of practice that may seem obvious to the community, and if so, sorry for the inconvenience, but as an R user, it is rather puzzling that feeding the formula "almost" works and feeding the linear object "fully" works. Maybe just modifying the error message in predict.sarlm would be useful ?
If the issue is considered useful and the behavior can not be reproduced, I'll try and provide a small data set to reproduce it.
The text was updated successfully, but these errors were encountered: