Stanimirova_Exercise_11_ModelAssessment.Rmd

Model Assessment
========================================================

In this activity we will use a series of visualizations and statistical measures to assess the performance of our Super Simple Ecosystem Model at the Metolius site.

Let's start by loading the ensemble output from the previous lab and the observed flux data for the site.
```{r}
## load libraries
library("plotrix")
library(dplR)
library(rpart)
library(randomForest)

## load SSEM output
load("Ex10.output.RData")

## load flux tower data
L4 = read.csv("data/AMF_USMe2_2005_L4_h_V002.txt",header=TRUE,na.strings="-9999")
L4[L4==-9999] = NA
```

Sanity Check
------------

When assessing model performance, one can often diagnose bugs in the code and other large errors without the need to make a direct model-data comparison, simply by looking at basic statistics and diagnostic graphs. Also, it is not uncommon to have model outputs for quantities that are not directly observed, but which should be checked to make sure they make sense and that the model is not producing the right answer somewhere else for the wrong reason. In the code below we look at the daily-mean outputs from the unweighted ensemble (output.ensemble) and the resampled particle filter (output)

```{r}
ciEnvelope <- function(x,ylo,yhi,...){
  polygon(cbind(c(x, rev(x), x[1]), c(ylo, rev(yhi),
                                      ylo[1])), border = NA,...) 
}
col.alpha <- function(col,alpha=1){
  rgb = col2rgb(col)
  rgb(rgb[1],rgb[2],rgb[3],alpha*255,maxColorValue=255)
}
varnames <- c("Bleaf","Bwood","BSOM","LAI","NEP","GPP","Ra","NPPw","NPPl","Rh","litter","CWD")
units <- c("Mg/ha","Mg/ha","Mg/ha","m2/m2","umol/m2/sec","umol/m2/sec","umol/m2/sec","umol/m2/sec","umol/m2/sec","umol/m2/sec","Mg/ha/timestep","Mg/ha/timestep")

## Time-series visualization, daily means
DoY = floor(L4$DoY-0.02)
uDoY = sort(unique(DoY))
ci = list(pf=list(),ens=list())
for(i in 1:12){
  # calculate the mean per DOY of each of the 12 fluxes and pools across the 500        # ensemble member 
  ci.pf  = apply(apply(output[,,i],2,tapply,DoY,mean),1,quantile,c(0.025,0.5,0.975))
  ci.ens = apply(apply(output.ensemble[,,i],2,tapply,DoY,mean),1,quantile,c(0.025,0.5,0.975))
#  ci.pf = apply(output[,,i],1,quantile,c(0.025,0.5,0.975))
#  ci.ens = apply(output.ensemble[,,i],1,quantile,c(0.025,0.5,0.975))
  plot(uDoY,ci.ens[2,],main=varnames[i],xlab="time",ylab=units[i],type='l',ylim=range(ci.ens))
  ciEnvelope(uDoY,ci.ens[1,],ci.ens[3,],col=col.alpha("lightGrey",0.5))
  ciEnvelope(uDoY,ci.pf[1,],ci.pf[3,],col=col.alpha("lightGreen",0.5))
  lines(uDoY,ci.ens[2,])
  lines(uDoY,ci.pf[2,],col=3)
  ci$pf[[i]] = ci.pf
  ci$ens[[i]] = ci.ens
}
```

**Question 1:** What pools and fluxes were most affected by assimilating MODIS LAI and which were least affected?  Does this make sense?

Bleaf, LAI, and litter were the most affected by assimilating MODIS LAI. It makes sense because these pools and fluxes are dependent on the leaf area. We expect that leaf biomass and leaf litter will be affected by the leaf area. 

The least affected pools and fluxes were: Bwood, BSOM, CWD, and Rh. Again this makes sense because wood biomass, and woody debry are less affected by leaf area.  


Model vs. Data
--------------

In the following section we will begin with some basic diagnostic plots and statistics assessing the predicted NEE by our simple ecosystem model. Specifically, we will calculate the Root Mean Square Error (RMSE), bias, correlation coefficient, and regression slopes of the relationship between the observed and predicted NEE for both the original ensemble and the particle filter. We will also generate scatter plots of predicted vs. observed values.

```{r}

## Calculate ensemble means & apply QAQC
qaqc = (L4$qf_NEE_st == 0)
# mean of NEE (the 5th element), 1 indicates rows; for both simple eco model and for 
# particle filter
NEE.ens = -apply(output.ensemble[,,5],1,mean)
NEE.pf  = -apply(output[,,5],1,mean)
E = NEE.ens[qaqc]
P = NEE.pf[qaqc]
# NEE from MODIS, MODIS observation error 
O = L4$NEE_st_fMDS[qaqc]

## Model vs obs regressions
NEE.ens.fit = lm(O ~ E)
NEE.pf.fit = lm(O ~ P)

## performance stats
stats = as.data.frame(matrix(NA,4,2))
rownames(stats) <- c("RMSE","Bias","cor","slope")
colnames(stats) <- c("ens","pf")
stats["RMSE",'ens'] = sqrt(mean((E-O)^2))
stats["RMSE",'pf']  = sqrt(mean((P-O)^2))
stats['Bias','ens'] = mean(E-O)
stats['Bias','pf']  = mean(P-O)
stats['cor','ens']  = cor(E,O)
stats['cor','pf']   = cor(P,O)
stats['slope','ens'] = coef(NEE.ens.fit)[2]
stats['slope','pf']  = coef(NEE.pf.fit)[2]
knitr::kable(stats)

## predicted-observed
plot(E,O,pch=".",xlab="ensemble",ylab='observed',main='NEE (umol/m2/sec)')
abline(0,1,col=2,lwd=2)
abline(NEE.ens.fit,col=3,lwd=3,lty=2)
legend("bottomright",legend=c('obs','1:1','reg'),col=1:3,lwd=3)

plot(P,O,pch=".",xlab="particle filter",ylab='observed',main='NEE (umol/m2/sec)')
abline(0,1,col=2,lwd=2)
abline(NEE.pf.fit,col=3,lwd=3,lty=2)
legend("bottomright",legend=c('obs','1:1','reg'),col=1:3,lwd=3)
```


**Question 2:** Which version of the model performed better? Do the statistics or plots give any indication about what parameters might need to be fixed, or processeses refined in the model?

The particle filter model seems to perform worse than the simple ensemble model because it has a higher RMSE and higher bias. However, both models have a similar correlation with the observations. The two models perform very similarly to one another. There isn't a clear winner. 

When we look at the scatter plots of the observed versus predict NEE, we see that the points are not randomly scattered around the regression line. In fact, it seems like the points are more scattered (more uncertainty) when the models predict negative fluxes compared to when they predict the positive fluxes. When NEE is positive the points are more closely clustered around the regression line. When NEE is negative that means that the forest is taking up CO2,  which happens during the day and also during the summer. Broadly speaking NEE is positive during the night and during the winter time. This means that the model is better at predicting NEE when respiration is greater than the photosynthesis. Day time NEE is harder to predict because there are more processes happening and as a result more sources of error. This difference between day and night (seasons) could be due to the parameter light use efficiency, which is used to calculate the GPP (photosynthesis at the landscape scale). 


**Question 3:** Repeat the daily-mean time-series plot for NEE from the previous section, but add the observed daily-mean NEE to the plot. Make sure to use the gap-filled NEE estimates, since flux data are not missing at random.

All lines align well, the two models and the observed daily-mean time-series of NEE are similar. There is a slight difference during DoY 70-120 when the observed NEE is lower than the two predictions. 


```{r}
ci.pf  = -apply(apply(output[,,5],2,tapply,DoY,mean),1,quantile,c(0.025,0.5,0.975))
ci.ens = -apply(apply(output.ensemble[,,5],2,tapply,DoY,mean),1,quantile,c(0.025,0.5,0.975))
NEE.obs = tapply(L4$NEE_st_fMDS,DoY,mean)
  plot(uDoY,ci.ens[2,],main="NEE",xlab="time",ylab=units[i],type='l',ylim=range(ci.pf))
  ciEnvelope(uDoY,ci.ens[1,],ci.ens[3,],col=col.alpha("darkGrey",0.5))
  ciEnvelope(uDoY,ci.pf[1,],ci.pf[3,],col=col.alpha("lightGreen",0.5))
  lines(uDoY,ci.pf[2,],col=3)
  lines(uDoY,NEE.obs, col=col.alpha("red",0.5))
   

```

Comparison to flux "climatology"
-------------------------------

In the section below we calculate the long-term average NEE for each 30 min period in the year, excluding the year we modeled (2005) as an alternative model to judge our process model against. We then update our summary statistics and predicted-observed plot

```{r}
## flux "climatology"
fluxfiles = dir("data",pattern="AMF")
fluxfiles = fluxfiles[grep("txt",fluxfiles)]
fluxfiles = fluxfiles[-grep("2005",fluxfiles)]
clim.NEE = clim.doy = NULL
for(f in fluxfiles){
  ff = read.csv(file.path("data",f),header=TRUE,na.strings="-9999")
  ff[ff == -9999] = NA
  clim.NEE = c(clim.NEE,ff$NEE_st_fMDS)
  clim.doy = c(clim.doy,ff$DoY)
}
NEE.clim=tapply(clim.NEE,clim.doy,mean,na.rm=TRUE)[1:length(qaqc)]
C = NEE.clim[qaqc]
NEE.clim.fit = lm(O ~ C)
summary(NEE.clim.fit)
stats["RMSE",3]  = sqrt(mean((C-O)^2))
stats['Bias',3]  = mean(C-O)
stats['cor',3]   = cor(C,O)
stats['slope',3] = coef(NEE.clim.fit)[2]
colnames(stats)[3] <- "clim"
knitr::kable(stats)
plot(C,O,pch=".",xlab="climatology",ylab='observed',main='NEE (umol/m2/sec)')
abline(0,1,col=2,lwd=2)
abline(NEE.clim.fit,col=3,lwd=3,lty=2)
legend("bottomright",legend=c('obs','1:1','reg'),col=1:3,lwd=3)

## example cycle
plot(L4$DoY,-L4$NEE_st_fMDS,xlim=c(200,210),type='l',lwd=2,ylim=c(-10,20),xlab="Day of Year",ylab="NEE")
lines(L4$DoY,-NEE.clim,col=4,lwd=2,lty=2)
lines(L4$DoY,-NEE.ens,col=2,lwd=2,lty=2)
lines(L4$DoY,-NEE.pf,col=3,lwd=2,lty=2)

legend("topright",legend=c("Obs","clim", "ens", "pf"),lty=1:2,col=c(1,4,2,3),lwd=2)
```

**Question 4:** How does the process model perform relative to the average flux data?
Which statistics showed the largest differences between the model and climatology?

Both the ensemble and the particle filter are process models. In order to assess the performance of the process based model, we compare it against a simple statistical model predicting under the same conditions. The climatology model has no understanding of the ecological process and it outperforms both the ensemble and pf models. The climatology model serves as a valuable benchmark, which process-based models can be evaluated against. It is not unusual for an eco model to fail to beat the climatological null model as is seen in this case. The bias in the climatology against the observations was very small (0.19) compared to the bias in the ensemble model (1.27) and particle filter (1.66). In addition the RMSE with the climatology model was the lowest (3.22) compared to the other two models. All models are comparable in terms of correlation and slope. 

The climatology model performs well against the average flux data, although sometimes it underestimates the peak NEE. In addition sometimes NEE drops off more abruptly with the observations compared to the climatology model. The particle filter model also underestimates the peak NEE from observations but the ensemble model seems to capture the peaks better.  


Taylor diagram
--------------

Next, let's use a Taylor diagram to pull our summary statistics together into one plot. One of the advantages of the Taylor diagram is that it makes it simpler to visually diagnose the relative differences in model performance, especially when comparing multiple models or different versions of the same model. In the figure below we'll begin by plotting the ensemble, the particle filter, and the climatology. While not common, the Taylor diagram also provides a way of expressing model and data uncertainty in the plot by plotting ensemble estimates of both. Below we add all 200 members of the model ensemble, as well as a Monte Carlo estimate of observation error in the flux data. The latter is derived based on the research by Richardson et al (2006), who showed that eddy covariance data has a non-symmetric heteroskedastic, Laplace distribution. The non-symmetric part refers to the fact that there is greater error in positive fluxes (= respiration, typically nocturnal measurements) than in negative ones.

```{r}
## Taylor diagrams

# is it RMSE on the X axis???
taylor.diagram(ref=O,model=E,normalize=TRUE,ref.sd=TRUE)
taylor.diagram(ref=O,model=P,add=TRUE,normalize=TRUE,col=3)
taylor.diagram(ref=O,model=C,add=TRUE,normalize=TRUE,col=4)

## add full ensemble
for(i in 1:ncol(output)){
  taylor.diagram(ref=O,model=-output.ensemble[qaqc,i,5],col=2,pch=".",add=TRUE,normalize=TRUE)
}

## add data uncertainty
rlaplace = function(n,mu,b){
  return(mu + ifelse(rbinom(n,1,0.5),1,-1)*rexp(n,b))
}
beta = ifelse(O > 0,0.62+0.63*O,1.42-0.19*O) #Heteroskedasticity, parameters from Richardson et al 2006
for(i in 1:200){
  x = rlaplace(length(O),O,beta)
  taylor.diagram(ref=O,model=x,col=5,add=TRUE,normalize=TRUE)
}
legend("topright",legend=c("ens","PF","clim","obsUncert"),col=2:5,pch=20,cex=0.7)
```

**Question 5:** What did you learn about model performance from the Taylor diagram? 

The Taylor diagram illustrates 3 diagnostic statistics in one place: RMSE, correlation and the ratio of model variability to observed variability. The data is plotted at 1,0 and the best model is the one that it closest to the data. In this case the best model is the climatology one because it is the closest to the data. The model with the lowest error is closest to the point representing the data. Climatology model has one of the highest correlations, low RMSE and a more similar variability to the observations. The particle filter has a similar correlation, lower standard deviation ratio but it underpredicted the true variability. 

**Question 6:** How do our simple models and flux climatology compare to the ensemble of ecosystem models in Figure 7 of Schwalm et al 2010  "A model-data intercomparison of CO2 exchange across North America: results from the north american carbon program site synthesis". J. Geophys. Res. ?

Our results seem to be comparable to the ensemble of ecosystem models in Figure 7. Some models in Schwalm et al 2010 have correlation between 0.8 and 0.95. Our models have a correlation close to or lower than 0.8. Our models perform better than some ecosystem models and worse than others. Our models compare well to the following models in Schwam et al.: G, W, J, N, M, K, P, and S. These models have similar correlation and RMSE between 0.5 and 1 but closer to 0.5. They also have a standard deviation ratio close to 1 but lower like our models. N, M, and K seem to perform better than our models because they have similar variability to the observations, higher correlation and RMSE closer to 0.5. 

Time-scales
-----------

Many ecological processes operate at multiple time scales. For example, carbon flux data responds to the diurnal cycle of light and temperature, meso-scale variability due to weather fronts, seasonal variability, and inter-annual variability driven by longer-term climate modes, as well as disturbance and succession.

In the next section we look at the average diurnal cycle of the data and models.

```{r}
## diurnal cycle
NEE.ens.diurnal = tapply(E,L4$Hour[qaqc],mean)
NEE.pf.diurnal  = tapply(P,L4$Hour[qaqc],mean)
NEE.clim.diurnal  = tapply(C,L4$Hour[qaqc],mean)
NEE.obs.diurnal = tapply(O,L4$Hour[qaqc],mean)
ylim=range(c(NEE.ens.diurnal,NEE.pf.diurnal,NEE.obs.diurnal))
tod = sort(unique(L4$Hour))
plot(tod,NEE.ens.diurnal,ylim=ylim,col=2,xlab="Time of Day",ylab='NEE',main="Diurnal Cycle",type='l',lwd=3)
lines(tod,NEE.pf.diurnal,col=3,lwd=3)
lines(tod,NEE.clim.diurnal,col=4,lwd=3)
lines(tod,NEE.obs.diurnal,lwd=3)
legend("bottomright",legend=c("obs","ens","PF","clim"),col=1:4,pch=20,cex=0.75)
```

**Question 7:** What time of day has the largest uncertainty? What does this suggest about what parameter(s) needs to be modified in the model, in what direction, and by approximately how much? In providing this answer, recall the structure of the model as well as the fact that the particle filter has assimilated LAI so we can assume that that term is unbiased for that case.

The largest uncertainty is between 8 am and 3pm indicating a diurnal response of the carbon flux to light and temperature. As soon as the sun rises, the models disagree on the magnitude of the CO2 uptake by the vegetation. During this time the model prediction have the highest absolute magnitude difference from each other as well as from the observations and climatology model. In our model we underestimate magnitude of NEE and underpredict GPP. Given the assumption that LAI is unbiased and the fact that PAR and temperature are given, this suggests that the parameter that needs to be modified in the model is the light use efficiency. GPP is calculated by using alpha parameter (LUE), as well as LAI and PAR. The light use efficiency has to be higher. Since the difference between the observed and predicted NEE is about 2, I would need to increase my light use efficiency by 40% from approximately 0.02 to 0.028.  


The diurnal cycle isn't the only, or nessisarily the largest, time scale that the data or the model varies over. Next, let's use a wavelet transform to look at the times and timescales responsible for the most variability in the data, in the model, and in the residuals. Specifically we'll look at the observations, the ensemble mean, the ensemble residuals, and the "climatology" residuals. In all cases we're using a Morlet wavelet, which is a fairly standard choice of wavelet to characterize sine-like oscillations. In these wavelet plots color intensity indicated power (red = largest) and the periodicity in the y-axis is in terms of 30 min observations, so 48 = 1 day and 17520 = 1 year.


```{r}
## wavelet
obs = L4$NEE_st_fMDS; obs[qaqc] = 0; obs[is.na(obs)] = 0
sel = 2:2^floor(log2(length(obs))) - 1
wt.o = morlet(obs[sel])
#wt.e = morlet(NEE.ens[sel])
wt.er = morlet(obs[sel]-NEE.ens[sel])
#wt.cr = morlet(obs[sel]-NEE.clim[sel])
wavelet.plot(wt.o,add.sig=FALSE,crn.lab="NEE obs")
#wavelet.plot(wt.e,add.sig=FALSE,crn.lab="NEE ensemble")
wavelet.plot(wt.er,add.sig=FALSE,crn.lab="NEE model error")
#wavelet.plot(wt.cr,add.sig=FALSE,crn.lab="NEE clim error")
```

**Question 8:** What time scales dominate the data? What time scales dominate the model residuals? 

The wavelet plots quantify how much of the observed variability in the data (NEE obs) and model (NEE model error) are associated with a given timescale. This method can detect changes in the importance of different time scales through time. 

Daily time scales seem to dominate the data as seen by high power values corresponding to roughly a period of 48. Red represent places of high variability in the data. We have only one year so we cant really say anything meaningful about periodicity at the yearly scale. High power is seen at the daily scale and mostly in the summer time. This makes sense because the flluxes are larger at that time and we have more processes going on. There is a seasonal effect because the daily time scale does not dominate the signal all year round.  
Besides the summer, there is also a daily time signal in the fall, where we witness in the NEE observations above that there is an increase in the NEE flux from the biosphere to the atmosphere. This is probably due to some of the vegetation drying out a little bit. In addition there are some other areas of concentrated power at higher periods, which could be due to synoptic weather events -these disappear when we look at the error, which indicates that the model explains these events, it accounts for them. 

Looking at the error plot it seems like the model explains the daily signal in the summer time, however the model cannot explain the daily time scale in the fall. This is seen by high power at the daily time step in the late summer/fall. This is probably due to processes that are not captured in the simple model such as soil moisture limitation causing the vegetation to dry out. There are unexplained daily time scale processes that dominate the data in the fall. 

Mining the Residuals
--------------------

In the final section we'll use a few off-the-shelf data mining approaches to look at the model residuals and ask what parts of our input space are associated with the largest model error. Note that we are not limited to just examining the effects of the model inputs, we might also look at other potential drivers that are not included in our model, such as soil moisture, to ask if model error is associated with our failure to include this (or other) drivers. Alternatively, we could have looked at other factors such as the time of day or even other model variables (e.g. is model error higher when LAI is larger or small?)

Of the many algorithms out there we'll look at two: the Classification and Regression Tree (CART) model and the Random Forest model. For both we'll define our error metric as $(E-O)/beta$, where beta is the parameter equivalent to the variance in Laplace distribution. Specifically, we're using the heteroskedastic observation error to reweight the residuals to account for the fact that large residuals at times of high flux is likely due to high measurement error. Thus the errors can be interpreted as similar to the number of of standard deviations.

The CART model is a classification algorithm which will build a tree that discretely classifies when the model has high and low error.

The Random Forest model is more like a response surface. The Random Forest will generate 'partial dependence' plots, which indicate the importance of each factor across its range, as well as an overall estimate of the importance of each factor in the model error. 

The key thing to remember in all these plots is that we're modelling the RESIDUALS in order to diagnose errors, not modeling the NEE itself.

```{r}
## define error metric and dependent variables
err = (E-O)/beta
x = cbind(inputs$PAR[qaqc],inputs$temp[qaqc])
colnames(x) = c("PAR","temp")
smp = sample.int(length(err),1000)  ## take a sample of the data since some alg. are slow

### Classification tree
rpb = rpart(err ~ x) ## bias
plot(rpb)
text(rpb)
e2 = err^2
rpe = rpart(e2 ~ x) ## sq error
plot(rpe)
text(rpe)

## Random Forest
rfe = randomForest(x[smp,],abs(err[smp]))
rfe$importance
partialPlot(rfe,x[smp,],"PAR")
partialPlot(rfe,x[smp,],"temp")
```

**Question 9:** Overall, which driver is most important in explaining model error? What conditions are most associated with model success? With model failure?  Where do these results reinforce conclusions we reached earlier and where do they shine light on new patterns you may have missed earlier?

These two methods are used to identify conditions when model error is notably higher or lower than average. These plots indicate the conditions associated with the differences in model error. The error is not constant but dependent on PAR and temperature. Both PAR and temperature have variable importance across their respective range. In this case, we can see that partial dependence on temperature is higher at temperatures above 20oC and below 0oC. Looking at PAR we see that the dependance varies over its range with PAR of approximately 800 and 1800 having higher partial dependence. 
 
Looking at the classification tree we can determine how light (PAR) and air temperature affect the squared error between the model and the data. Starting at the top if temperature is higher than 23.25oC and PAR is greater than 1818, we have the highest square error. The highest error of 14.4 is found especially when temperature exceeds 25.7oC. These are conditions of model failure. The error is much smaller under conditions of lower temperature and low PAR (0.9091). These are conditions associated with model success. This is consistent with our previous results in revealing that our models have the highest error during the daytime and/or in the summer when we have high light conditions and high temperatures. The mining of residuals allowed us to examine beyond model inputs to model drivers suggesting that the most important driver is temperature. 
 

Functional Responses
--------------------

In this section we look at how well the model performed by assessing the modeled relationships between inputs and outputs and comparing that to the same relationship in the data. The raw relationships are very noisy, as many covariates are changing beyond just the single input variable we are evaluating, so in addition we calculate binned means for both the model and data.

```{r}
## raw
plot(inputs$temp[qaqc],O,pch=".",ylab="NEE")
points(inputs$temp[qaqc],E,pch=".",col=2)

## binned
nbin= 25
Tair = inputs$temp[qaqc]
xd = seq(min(Tair),max(Tair),length=nbin)
xmid = xd[-length(xd)] + diff(xd)
bin = cut(Tair,xd)
Obar = tapply(O,bin,mean,na.rm=TRUE)
Ose  = tapply(O,bin,std.error,na.rm=TRUE)
Ebar = tapply(E,bin,mean,na.rm=TRUE)
Ese  = tapply(E,bin,std.error,na.rm=TRUE)
OCI = -cbind(Obar-1.96*Ose,Obar,Obar+1.96*Ose)
ECI = -cbind(Ebar-1.96*Ese,Ebar,Ebar+1.96*Ese)
rng = range(rbind(OCI,ECI))

col2=col.alpha("darkgrey",0.9)
col1=col.alpha("lightgrey",0.6)

plot(xmid,Obar,ylim=rng,type='n',xlab="Air Temperature (C)",ylab="NEP (umol/m2/s)",cex.lab=1.3)
ciEnvelope(xmid,ECI[,1],ECI[,3],col=col2)
lines(xmid,ECI[,2],col="white",lwd=4)
ciEnvelope(xmid,OCI[,1],OCI[,3],col=col1)
lines(xmid,OCI[,2],col="lightgrey",lwd=4)

legend("bottom",legend=c("Model","Data"),lwd=10,col=c(col2,col1),lty=1,cex=1.7)


# Repeat for PAR 
#################
nbin= 25
PAR = inputs$PAR[qaqc]
xp = seq(min(PAR),max(PAR),length=nbin)
xmidp = xp[-length(xp)] + diff(xp)
binp = cut(PAR,xp)
Obar = tapply(O,binp,mean,na.rm=TRUE)
Ose  = tapply(O,binp,std.error,na.rm=TRUE)
Ebar = tapply(E,binp,mean,na.rm=TRUE)
Ese  = tapply(E,binp,std.error,na.rm=TRUE)
OCI = -cbind(Obar-1.96*Ose,Obar,Obar+1.96*Ose)
ECI = -cbind(Ebar-1.96*Ese,Ebar,Ebar+1.96*Ese)
rng = range(rbind(OCI,ECI))


plot(xmidp,Obar,ylim=rng,type='n',xlab="PAR",ylab="NEP (umol/m2/s)",cex.lab=1.3)
ciEnvelope(xmidp,ECI[,1],ECI[,3],col=col2)
lines(xmidp,ECI[,2],col="white",lwd=4)
ciEnvelope(xmidp,OCI[,1],OCI[,3],col=col1)
lines(xmidp,OCI[,2],col="lightgrey",lwd=4)
legend("topleft",legend=c("Model","Data"),lwd=10,col=c(col2,col1),lty=1,cex=1.7)


```

**Question 10:** Evaluate the model's ability to capture functional responses to both Temperature and PAR.

The functional response plot of modeled and observed NEP to the air temperature input shows that there is mismatch between the the model and the data. According to the model, the optimal temperature for NEP is between 20oC and 30oC, whereas according to the data it is between 5oC and 20oC. The model is not capturing the functional response well. 
The functional response plot for PAR also reveals that the model is not capturing the functional response of NEP to PAR very well. The data demonstrates a sharp increase in the NEP with very high PAR values (above 2000), which is not captured by the model. 

Overall 
-------

Below is a final summary figure of the model's performance on a daily timescale that combines many of the previous assessments.

```{r}

### other summary figures to go in multi-panel
par(mfrow=c(2,2))

## Time-series visualization, daily means
DoY = floor(L4$DoY-0.02)
uDoY = sort(unique(DoY))
i=5
ci.pf  = apply(apply(output[,,i],2,tapply,DoY,mean),1,mean)
NEE = -L4$NEE_st_fMDS
NEEd = tapply(NEE,DoY,mean)
plot(uDoY,ci.pf,xlab="time",ylab=units[i],type='l',ylim=range(c(ci.pf,NEEd)),cex.lab=1.3)
points(uDoY,NEEd,col=2,pch="+")
legend("topright",legend=c("Model","Data"),lty=c(1,NA),pch=c(NA,"+"),col=1:2,cex=1.3)

## predicted vs observed
plot(NEEd,ci.pf,xlab="Model",ylab="Data",cex.lab=1.3)
abline(0,1,lty=2,lwd=4)
abline(lm(ci.pf ~ NEEd),col=2,lwd=3,lty=3)
legend("topleft",legend=c("1:1","Reg"),lty=2:3,lwd=4,col=1:2,cex=1.3)

## Functional response
plot(xmid,Obar,ylim=rng,type='n',xlab="Air Temperature (C)",ylab="NEP (umol/m2/s)",cex.lab=1.3)
ciEnvelope(xmid,ECI[,1],ECI[,3],col=col2)
lines(xmid,ECI[,2],col="white",lwd=4)
ciEnvelope(xmid,OCI[,1],OCI[,3],col=col1)
lines(xmid,OCI[,2],col="lightgrey",lwd=4)

legend("bottom",legend=c("Model","Data"),lwd=10,col=c(col2,col1),lty=1,cex=1.3)

### Classification tree
par(mar=c(0,0,0,0))
rpe = rpart(e2 ~ PAR+temp,as.data.frame(x),method="anova") ## sq error
plot(rpe,margin=0.1)
text(rpe,cex=1.5)

```