-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
14 changed files
with
1,125 additions
and
111 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
--- | ||
title: "Background" | ||
format: html | ||
--- | ||
|
||
Traditional ecological surveys are systematic, for a given species survey data sets tell us where the species is found and where it is absent. Using an observational data (like [OBIS](https://obis.org)) set we only know where the species is found, which leaves us guessing about where they might not be found. This difference is what distinguishes a *presence-abscence* data set from a *presence-only* data set, and this difference guides the modeling process. | ||
|
||
When we model, we are trying to define the environs where we should expect to find a species as well as the environs we would not expect to find a species. We have in hand the locations of observations, and we can extract the environmental data at those locations. But to characterize the less suitable environments we are going to have to sample what is called "background". We want these background samples to roughly match the regional preferences of the observations; that is we want to avoid having observations that are mostly over Georges Bank while our background samples are primarily around the Bay of Fundy. | ||
|
||
# Setup | ||
|
||
As always, we start by running our setup function. Start RStudio/R, and relaod your project with the menu `File > Recent Projects`. | ||
|
||
```{r setup} | ||
source("setup.R") | ||
``` | ||
|
||
We also will need the Brickman mask and the observation data. | ||
|
||
```{r load_obs_mask} | ||
coast = read_coastline() | ||
obs = read_observations(scientificname = "Mola mola") | ||
db = brickman_database() |> | ||
filter(scenario == "STATIC", var == "mask") | ||
mask = read_brickman(db) | ||
``` | ||
|
||
We have two approaches to what happens next. The first is the naive approach that say, gather together lots of observations and background points. Lot and lots! The second approach is much more conseravtive as it considers the value (or not!) of having replicate measurements at locations that share the same array cell. | ||
|
||
# The naive approach - lots and lots of data | ||
|
||
## Observation density map | ||
|
||
The first thing we need is a map that matches our environmental data arrays in cell size and extent, but we want the values to be the counts of observations in each cell. This process is called "rasterizing" - turning points into rasters (arrays). | ||
|
||
```{r obs_density_map} | ||
density = rasterize_point_density(obs, mask, mask = mask) | ||
density | ||
``` | ||
|
||
```{r plot_obs_density_map} | ||
nbreaks = 11 | ||
plot(density['count'], | ||
breaks = "equal", | ||
nbreaks = nbreaks, | ||
col = rev(grey(1:(nbreaks - 1)/nbreaks)), | ||
axes = TRUE, | ||
reset = FALSE) | ||
plot(coast, add = TRUE, col = "orange") | ||
``` | ||
|
||
## Sample background | ||
|
||
Next we sample the background as guided by the density map. | ||
|
||
```{r sample_background_naive} | ||
naive_input = sample_background(obs, density, | ||
n = 2 * nrow(obs), | ||
method = "bias", | ||
class_label = "background", | ||
return_pres = TRUE) | ||
naive_input | ||
``` | ||
You may encounter a warning message that says, "There are fewer available cells for raster...". This is useful information, there simply weren't a lot of non-NA cells to sample from. Let's plot this. | ||
|
||
```{r plot_naive_input} | ||
plot(naive_input['class'], axes = TRUE, pch = ".", extent = density, reset = FALSE) | ||
plot(coast, col = "orange", add = TRUE) | ||
``` | ||
|
||
Hmmm, let's tally the class labels. | ||
|
||
```{r tally_naive_input} | ||
count(naive_input, class) | ||
``` | ||
Well, that's imbalanced with three times (3x) more presences than background points. But, on the bright side, the background points are definitely in the region of observations. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.