start C02_background

BigelowLab · Dec 12, 2024 · fb62ab2 · fb62ab2
1 parent 95443a2
commit fb62ab2
Show file tree

Hide file tree

Showing 14 changed files with 1,125 additions and 111 deletions.
diff --git a/C01_observations.qmd b/C01_observations.qmd
@@ -18,13 +18,18 @@ We need a dataset that covers the same area and time period that the [Brickman d
 
 It is **SO IMPORTANT** to have a really good handle on your data. To get that handle you have to explore it.  There is a branch of data science devoted to data exploration called [Exploratory Data Analysis](https://r4ds.had.co.nz/exploratory-data-analysis.html).  We'll explore your data here, but we assume that you have reviewed and tried your hand with the examples in the wiki for [tabular data](https://github.com/BigelowLab/ColbyForecasting2025/wiki/tables), [observations](https://github.com/BigelowLab/ColbyForecasting2025/wiki/Obis), the [coastlines](https://github.com/BigelowLab/ColbyForecasting2025/wiki/Coastlines) and the [Brickman data](https://github.com/BigelowLab/ColbyForecasting2025/wiki/Brickman).  Even if you have walked through these tutorials you may find yourself stumped and stymied.  That's all part of the learning process - just keep moving, inquiring and trying.
 
-As always we start by running our setup script.  We'll also assign a new variable with our species name - we do that so it's easy to substitute in another species if needed.  We make it ALL CAPS so that it reminds us that it is more like a constant than a variable.
+
+# Setup
+
+As always, we start by running our setup function. Start RStudio/R, and relaod your project with the menu `File > Recent Projects`. Then source `setup.R`.  We'll also assign a new variable with our species name - we do that so it's easy to substitute in another species if needed.  We make it ALL CAPS so that it reminds us that it is more like a constant than a variable.
 
 ```{r source_setup, warning = FALSE}
 source("setup.R")
 SPECIES = "Mola mola"
 ```
 
+# Observations
+
 Next is to read in the observations you have already downloaded for that species.
 
 ```{r read_obs}
@@ -77,7 +82,7 @@ Next let's think about what our minimum requirements might be in oirder to build
 summary(obs)
 ```
 
-## eventDate
+## `eventDate`
 
 For *Mola mola* there are some rows where `eventDate` is `NA`.  We need to filter those. The filter function looks for a vector of TRUE/FALSE values - one for each row.  In our case, we test the `eventDate` column to see if it is `NA`, but then we reverse the TRUE/FALSE logical with the preceding `!` (pronounded "bang!"). This we retain only the rows where `eventDate is not `NA`, and then we print the summary again.
 
@@ -116,7 +121,7 @@ obs |>
 
 OK, that seems legitmate. And it is possible, *Mola mola* can congregate for feeding, mating and possibly for karaoke parties.
 
-## Year
+## `year`
 
 We know that the "current" climate scenario for the Brickman model data define "current" as the 1982-2013 window.  It's just an average, and if you have values from 1970 to the current year, you probably are safe in including them.  But do your observations fall into those years?  Let's make a plot of the counts per year, with dashed lines shown the Brickman "current" cliamtology period.
 
@@ -204,7 +209,7 @@ dropped_records = dim_start[1] - dim_end[1]
 dropped_records
 ```
 
-So, we dropped `{r} dropped_records` records which is about `{r} sprintf("%0.1f%%", dropped_records/dim_start[1] * 100)` of the raw OBIS data.  Is it worth all that to drop just 4% of the data.  Yes. Models are like all things computer... if you put garbage in you should expcet garbage out.
+So, we dropped `{r} dropped_records` records which is about `{r} sprintf("%0.1f%%", dropped_records/dim_start[1] * 100)` of the raw OBIS data.  Is it worth all that to drop just 4% of the data?  **Yes!**  Models are like all things computer... if you put garbage in you should expect to get garbage back out.
 
 # Recap
 

diff --git a/C02_background.qmd b/C02_background.qmd
@@ -0,0 +1,78 @@
+---
+title: "Background"
+format: html
+---
+
+Traditional ecological surveys are systematic, for a given species survey data sets tell us where the species is found and where it is absent.  Using an observational data (like [OBIS](https://obis.org)) set we only know where the species is found, which leaves us guessing about where they might not be found. This difference is what distinguishes a *presence-abscence* data set from a *presence-only* data set, and this difference guides the modeling process.
+
+When we model, we are trying to define the environs where we should expect to find a species as well as the environs we would not expect to find a species. We have in hand the locations of observations, and we can extract the environmental data at those locations.  But to characterize the less suitable environments we are going to have to sample what is called "background". We want these background samples to roughly match the regional preferences of the observations; that is we want to avoid having observations that are mostly over Georges Bank while our background samples are primarily around the Bay of Fundy.
+
+# Setup
+
+As always, we start by running our setup function. Start RStudio/R, and relaod your project with the menu `File > Recent Projects`.
+
+```{r setup}
+source("setup.R")
+```
+
+We also will need the Brickman mask and the observation data.
+
+```{r load_obs_mask}
+coast = read_coastline()
+obs = read_observations(scientificname = "Mola mola")
+db = brickman_database() |>
+  filter(scenario == "STATIC", var == "mask")
+mask = read_brickman(db)
+```
+
+We have two approaches to what happens next.  The first is the naive approach that say, gather together lots of observations and background points.  Lot and lots!  The second approach is much more conseravtive as it considers the value (or not!) of having replicate measurements at locations that share the same array cell.  
+
+# The naive approach - lots and lots of data
+
+## Observation density map
+
+The first thing we need is a map that matches our environmental data arrays in cell size and extent, but we want the values to be the counts of observations in each cell.  This process is called "rasterizing" - turning points into rasters (arrays).
+
+```{r obs_density_map}
+density = rasterize_point_density(obs, mask, mask = mask)
+density
+```
+
+```{r plot_obs_density_map}
+nbreaks = 11
+plot(density['count'], 
+     breaks = "equal",
+     nbreaks = nbreaks,
+     col = rev(grey(1:(nbreaks - 1)/nbreaks)),
+     axes = TRUE, 
+     reset = FALSE)
+plot(coast, add = TRUE, col = "orange")
+```
+
+## Sample background 
+
+Next we sample the background as guided by the density map.
+
+```{r sample_background_naive}
+naive_input = sample_background(obs, density, 
+                              n = 2 * nrow(obs),
+                              method = "bias",
+                              class_label = "background",
+                              return_pres = TRUE)
+naive_input
+```
+You may encounter a warning message that says, "There are fewer available cells for raster...". This is useful information, there simply weren't a lot of non-NA cells to sample from.  Let's plot this.
+
+```{r plot_naive_input}
+plot(naive_input['class'], axes = TRUE,  pch = ".", extent = density, reset = FALSE)
+plot(coast, col = "orange", add = TRUE)
+```
+
+Hmmm, let's tally the class labels.
+
+```{r tally_naive_input}
+count(naive_input, class)
+```
+Well, that's imbalanced with three times (3x) more presences than background points. But, on the bright side, the background points are definitely in the region of observations.
+
+
diff --git a/_quarto.yaml b/_quarto.yaml
@@ -8,6 +8,7 @@ project:
     - F00_forecasting.qmd
     - C00_coding.qmd
     - C01_observations.qmd
+    - C02_background.qmd
     - about.qmd
 execute: 
   cache: false
@@ -29,6 +30,8 @@ website:
         href: C00_coding.qmd
       - text: Observations
         href: C01_observations.qmd
+      - text: Background
+        href: C02_background.qmd
       - text: About
         href: about.qmd
     tools:

diff --git a/docs/C00_coding.html b/docs/C00_coding.html
@@ -186,6 +186,12 @@
   <a href="./C01_observations.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Observations</span></a>
   </div>
+</li>
+        <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="./C02_background.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Background</span></a>
+  </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container">