-
Notifications
You must be signed in to change notification settings - Fork 19
/
02-spatial-operations.Rmd
326 lines (214 loc) · 12.9 KB
/
02-spatial-operations.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
```{r, echo=FALSE, purl=FALSE, message=FALSE}
knitr::opts_chunk$set(results='hide', comment = "#>", purl = FALSE)
## libraries needed for R code examples
library(sf)
library(terra)
library(tidyverse)
philly_sf <- st_read("data/Philly/")
HARV <- rast("data/HARV_RGB_Ortho.tif")
philly_homicides_sf <- st_read("data/PhillyHomicides/")
```
# Spatial data manipulation in R {#spatialops}
> Learning Objectives
>
> * Join attribute data to a polygon vector file
> * Reproject a vector file
> * Select polygons of a vector by location
> * Reproject a raster
> * Perform a raster calculation
------------
There are a wide variety of spatial, topological, and attribute data operations you can perform with R. [Lovelace et al's recent publication](https://r.geocompx.org/)[^11] goes into great depth about this and is highly recommended.
[^11]: Lovelace, R., Nowosad, J., & Muenchow, J. (2024). Geocomputation with R. CRC Press.
In this section we will look at a few examples for libraries and commands that allow us to process spatial data in R and perform a few commonly used operations.
## Attribute Join
An attribute join on vector data brings tabular data into a geographic context. It refers to the process of joining data in tabular format to data in a format that holds the geometries (polygon, line, or point).
If you have done attribute joins of shapefiles in GIS software like _ArcGIS_ or _QGis_ you know that you need a __unique identifier__ in both the attribute table of the shapefile and the table to be joined.
First we will load the CSV table `PhiladelphiaEduAttain.csv` into a dataframe in R and name it `ph_edu`.
```{r load-edu-csv}
ph_edu <- read_csv("data/PhiladelphiaEduAttain.csv")
names(ph_edu)
```
If you don't have the object still loaded read the the `PhillyTotalPopHHinc` shapefile into an object named `philly_sf`. Check out the column names of `philly_sf` and of `ph_edu` to determine which one might contain the unique identifier for the join.
```{r load-philly-sf}
# if you need to read in again:
# philly_sf <- st_read("data/Philly/")
names(philly_sf)
```
To join the `ph_edu` data frame with `philly_sf` we can use `merge` like this:
```{r sf-attr-merge, results='show'}
philly_sf_merged <- left_join(philly_sf, ph_edu, by = c("GEOID10" = "GEOID"))
names(philly_sf_merged)
```
We see the new attribute columns added, as well as the geometry column.
## Topological Subsetting: Select Polygons by Location
For the next example our goal is to select all Philadelphia census tracts that are approximately within a range of 2 kilometers from the city center.
> Think about this for a moment -- what might be the steps you'd follow?
```{r eval=FALSE}
## How about:
# 1. Get the census tract polygons.
# 2. Find the Philadelphia city center coordinates.
# 3. Create a buffer around the city center point.
# 4. Select all census tract polygons that intersect with the center buffer
```
We will use `philly_sf` for the census tract polygons.
In addition, we need to create a `sf` Point object with the Philadelphia city center coordinates:
$$x = 1750160$$
$$y = 467499.9$$
These coordinates are also in the _USA Contiguous Albers Equal Area Conic_ projected CRS, which is the same as CRS as `philly_sf`.
With this information, we create a object that holds the coordinates of the city center. Since we don’t have attributes we will just create it as a simple feature collection, `scf`.
```{r sf-intersect-point, results='show'}
# if you need to read in again:
# philly_sf <- st_read("data/Philly/", quiet = T)
# make a simple feature point with CRS
philly_ctr <-
st_point(c(1750160, 467499.9)) %>% # point coordinates
st_sfc(crs = st_crs(philly_sf)) # create feature collection
```
For the spatial operations we can recur to the suite of geometric operations that come with the `sf` package.
We create a 2km buffer around the city center point:
```{r sf-buffer-point, results='show'}
philly_buf <- st_buffer(philly_ctr, 2000)
```
Ok. Now we can use that buffer to select all census tract polygons that intersect with the center buffer. The `sf` package has its own `st_filter` function that we can use here.
```{r sf-intersects-subset, results='show'}
philly_sel <- st_filter(philly_sf, philly_buf)
```
Then we can plot what we created like this:
```{r sf-intersects-plot, results='show'}
plot(st_geometry(philly_sf), border="#aaaaaa", main="Census tracts that overlap with 2km buffer\naround city center")
plot(st_geometry(philly_sel), add=T, col="red")
plot(st_geometry(philly_buf), add=T, lwd = 2)
```
Note the difference to `st_intersection`, which performs an actual clippin operation and creates an `sfc` object that cuts out the area of the buffer from the census polygons:
```{r sf-intersection, results='show'}
philly_intersection <- st_intersection(philly_buf, philly_sf)
philly_intersection
plot(st_geometry(philly_sf), border="#aaaaaa", main="Census tracts around city center,\nclipped by 2km buffer ")
plot(philly_intersection, add=T, lwd = 2, border = "red")
```
## Reprojecting
Occasionally you may have to change the coordinates of your spatial object into a new Coordinate Reference System (CRS). Functions to transform, or _reproject_ spatial objects typically take the following two arguments:
* the spatial object to reproject
* a CRS object with the new projection definition
You can reproject
* a `sf` object with `st_transform()`
* a `SpatRaster` object with `project()`
The perhaps trickiest part here is to determine the definition of the projection, which needs to be a character string in [proj4](http://trac.osgeo.org/proj/) format. You can [look it up online](http://www.spatialreference.org). For example for [UTM zone 33N (EPSG:32633)](http://spatialreference.org/ref/epsg/wgs-84-utm-zone-33n/) the string would be:
[`+proj=utm +zone=33 +ellps=WGS84 +datum=WGS84 +units=m +no_defs`](http://spatialreference.org/ref/epsg/wgs-84-utm-zone-33n/proj4js/)
You can retrieve the CRS:
- from an `sf` object with `st_crs()`
- from a `SpatRaster` object with `crs()`
Let us go back to the `"PhillyHomicides"` shapefile we exported earlier. Let's read it back in and reproject it so it matches the projection of the Philadelphia Census tracts.
Now let us check the CRS for both files.
```{r vector-check-proj-sf, results='show'}
#If you need to read the file back in:
#philly_homicides_sf <- st_read("data/PhillyHomicides/")
st_crs(philly_sf)$proj4string
st_crs(philly_homicides_sf)$proj4string
```
We see that the CRS are different: we have `+proj=aea...` and `+proj=longlat...`. AEA refers to *USA Contiguous Albers Equal Area Conic* which is a projected coordinate system with numeric units. We will need this below for our spatial operations, so we will make sure both files are in that same CRS.
We use `st_transform` and assign the result to a new object. Note how we also use `str_crs` to extract the projection definition from `philly_sf`, so we don't have to type it out.
```{r vector-reproject-sf, results='show'}
philly_homicides_sf_aea <- st_transform(philly_homicides_sf, st_crs(philly_sf))
```
We can use the `range()` command from the R base package to compare the coordinates before and after reprojection and confirm that we actually have transformed them. `range()` returns the _min_ and _max_ value of a vector of numbers.
```{r, compare-coords-range-sf, results='show'}
range(st_coordinates(philly_homicides_sf))
range(st_coordinates(philly_homicides_sf_aea))
```
We can also compare them visually with:
```{r compare-reproj-plots-sf, results='show'}
par(mfrow=c(1,2))
plot(st_geometry(philly_homicides_sf), axes=TRUE, main = "before transform - latlon")
plot(st_geometry(philly_homicides_sf_aea), axes=TRUE, main = "after transform - aea")
```
Lastly, let us save the reprojected file as `PhillyHomicides_aea` shapefile, as we will use it later on.
```{r write-reproj-sf, eval=FALSE}
st_write(philly_homicides_sf_aea, "data/PhillyHomicides_aea", driver = "ESRI Shapefile")
```
### Raster reprojection
Here is what it would look like to reproject the HARV raster used earlier to a WGS84 projection. We see that see that the original projection is in UTM.
```{r raster-reproject, tidy=FALSE, warning=FALSE, results='show'}
# if you need to load again:
#HARV <- raster("data/HARV_RGB_Ortho.tif")
crs(HARV, proj = TRUE)
HARV_WGS84 <- project(HARV, "+init=epsg:4326")
crs(HARV_WGS84, proj = TRUE)
```
Let's look at the coordinates to see the effect:
```{r raster-reproject-ext, tidy=FALSE, warning=FALSE, results='show'}
ext(HARV)
ext(HARV_WGS84)
```
Due to the reprojection the number of cells has also changed:
```{r raster-reproject-cells, tidy=FALSE, warning=FALSE, results='show'}
ncell(HARV)
ncell(HARV_WGS84)
```
And here is the visual proof:
```{r raster-reproject-plot1, tidy=FALSE, warning=FALSE, results='show'}
plot(HARV, main = "before transform - UTM")
```
```{r raster-reproject-plot2, tidy=FALSE, warning=FALSE, results='show'}
plot(HARV_WGS84, main = "after transform - WGS84")
```
## Spatial Aggregation: Points in Polygons
Now that we have both homicides and census tracts in the same projection we will forge ahead and ask for the density of homicides for **each census tract** in Philadelphia: $\frac{{homicides}}{area}$
To achieve this this we join the points of homicide incidence to the census tract polygon and count them up for each polygon. You might be familiar with this operation from other GIS packages.
We will use piping and build up our object in the following way. First we calculate the area for each tract. We use the `st_area` function on the geometry column and add the result.
```{r sf-hom-area, eval=FALSE}
philly_sf %>%
mutate(tract_area = st_area(geometry)) %>%
head()
```
Next, we use st_join to perform a spatial join with the points:
```{r sf-hom-join, eval=FALSE}
philly_sf %>%
mutate(tract_area = st_area(geometry)) %>%
st_join(philly_homicides_sf_aea) %>%
head()
```
Now we can group by a variable that uiquely identifies the census tracts, (we choose _GEOID10_) and use `summarize` to count the points for each tract and calculate the homicide rate. Since our units are in sq meter we multiply by by 1000000 to get sq km. We also need to carry over the area, which I do using `unique`.
We also assign the output to a new object `philly_crimes_sf`.
```{r sf-hom-ratio, results='show'}
philly_crimes_sf <- philly_sf %>%
mutate(tract_area = st_area(geometry)) %>%
st_join(philly_homicides_sf_aea) %>%
group_by(GEOID10) %>%
summarize(n_homic = n(),
tract_area = unique(tract_area),
homic_rate = as.numeric(1e6 * (n_homic/tract_area)))
```
Finally, we write this out for later:
```{r sf-homiciderate-write, eval=FALSE}
st_write(philly_crimes_sf, "data/PhillyCrimerate", driver = "ESRI Shapefile")
```
## Raster calculations with `terra`
We often want to perform calculations on two or more rasters to create a new output raster. For example, if we are interested in mapping the heights of trees across an entire field site, we might want to calculate the difference between the Digital Surface Model (DSM, tops of trees) and the Digital Terrain Model (DTM, ground level). The resulting dataset is referred to as a Canopy Height Model (CHM) and represents the actual height of trees, buildings, etc. with the influence of ground elevation removed.
First let's read in the two datasets.
```{r read-dsm-dtm, results = 'show'}
HARV_DTM <- rast("data/HARV_dtmCrop.tif")
HARV_DSM <- rast("data/HARV_dsmCrop.tif")
```
Now we can subtract the DTM from the DSM to create a Canopy Height Model. It will for each CHM pixel calculate the difference of the respective DTM and DSM pixels.
```{r chm, results = 'show'}
HARV_CHM <- HARV_DSM - HARV_DTM
par(mfrow = c(1, 2))
plot(HARV_CHM)
hist(HARV_CHM)
```
This works fine for the small rasters in this tutorial. However, the calculation above becomes less efficient when computations are more complex or file sizes become large.
Thet `terra` package contains a function called `lapp() `function to make processing more efficient. It takes two or more rasters and applies a function to them. The generic syntax is:
outputRaster <- lapp(x, fun)
where *x* is a `SpatRasterDataset` and *fun* is a custom function for the operation we want to perform.
```{r lapp, results = 'show'}
CHM_ov_HARV <- lapp(sds(list(HARV_DSM, HARV_DTM)), # use sds to create a SpatRasterDataset
fun = function(r1, r2) {
return( r1 - r2)
})
```
As arguments for our `lapp` operation we use the `sds()` function and provide it with the list of rasters that we want to operate on. As custom function we provide the function with two arguments (`r1` and `r1`) that subtracts the second (`r2`) from the first (`r1`) and returns the difference. The output of `lapp` is a `SpatRaster` and we assign it to a new variable `CHM_ov_HARV.`
The two rasters should be the same.
```{r rast-ident, results = 'show'}
identical(HARV_CHM, CHM_ov_HARV)
```