Meeting in class: Sean, Scott, Tom present. Sarah and Brett dropped in.
Scott reconciled the two temperature sources. The old data (spotty after 2010) was 1 degree C lower on average than the new data (not controlling for the loggers). Scott made a single daily temperature dataset by fitting a model predicting the old data using the new data and a random effect of year. This model was fit with data on days where both new and old data were available and was used to predict temperature on days where new temperature data was available and old data was not. This is done in the process_temperature_data
script in the 01_process_data
folder.
From these daily temperature measurements, Scott extracted the following metrics:
jja_mean
, the mean temperature for June, July and Augustl averages for the previous one, two, and three years were calculatedseason.gdd
, the mean number of growing degree days (days with minimum temperature above 5C) for April 1 through June 30 for the last one, two, and three yearsjd_seas
, the julian day (adjusted for water year) of the beginning of the growing season (according to Cliff, this should be the first day with three consecutive days with a minimum temperature above -3)
Notably, the jd_seas
has three years with a start date in March. This seems like an outlier but it could be biologically relevant. E.g., you could see an effect of plants starting to leaf out early but then getting snowed on later. Or maybe they would be resistant to snow? Maybe this could be used in conjunction with some "frost" data, e.g., last day with temperature below some amount.
Below the snow, temperatures may vary from air temperatures. This should be checked in the soil temperature data (with the new temperature dataset). But, it's probably better to have more years of vegetation data than the soil temperature data, which only starts around the year 2000.
Sarah recommended looking at the extended summer repository. Sean can do this later in the week.
The snowmelt GAMs seem to consistently underestimate the maximum snow depth (because they're viewed as outliers) but overestimate the first day with no snow (same reason).
Does the link function here vary by mean? Scott doesn't think so - these were probably fit with a gaussian link. It may be good to have a link function with error varying by the mean. How to do this? Could use a gamma if you add a very small amount to make "negligible" snow...
We could handle the melt data by setting a higher threshold, e.g., 10cm. The logic here is that as soon as there is something (anything) sticking up above the snow it melts faster because it absorbs more heat due to the solar radiation.
We could also improve the ability of the snow melt dates by adding zeros. E.g., we could add a zero to every plot two weeks after the last date of sampling (for the whole saddle).
It may be looking into characteristics of the plots with no zero-observations. How much snow is there in their last day of observation? Where are they? What type of plot or year will have missing zeros?
The simplest, and perhaps sufficient, way to handle this is to just fit a quadratic from the peak snowdepth day to the end and use that to impute zeros. Check to see how many plots have weird snow peaks, e.g., early in season. Maybe just do this looking at the last stretch with a negative derivative? Sarah advises fitting these models with random slopes (for the linear and quadratic parts) for each plot and year.
Scott can try to fit these quadratic models.
Tom found some good resources which will go into the repo soon.
One good technique to consider is using an inverse distance matrix and using that to get weighted abundances for neighbors around each plot. These could be used as predictors in models. Tom found a paper which describes a technique for this (see Dormann et al., Ecography, 2007). According to Brett, upon brief reflection, this "seems lke a reasonable way" to do this.
Tom can work on this over the next week.
Original models using a binomial error family were overdispersed. Probably the simplest way to handle this is with an observation-level random effect (ORLE); this will contain the spatiotemporal variance, and basically has the binomial process variance applied to the residuals. Some preliminary models have much more spatial variance than spatio-temporal. Scott will continue looking at these models.
It's worth comparing the outputs of the ORLE model with those without the ORLE. Perhaps these will not be that different? Try to simulate the data (using the mdoels) to see if they make sense.
Also, the binomial might not be quite the correct model. There could be some other process happening, e.g., a two-step process where there's a binomial probability of whether the species can be there at all and then another binomial process applied only to those places where it is present. Consider this.