-
-
Notifications
You must be signed in to change notification settings - Fork 57
/
01-spatial-data.qmd
1110 lines (866 loc) · 56 KB
/
01-spatial-data.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
jupyter: python3
---
# Geographic data in Python {#sec-spatial-class}
```{python .content-visible when-format="pdf"}
#| echo: false
#| include: false
#| error: true
import map_to_png
```
```{python}
#| echo: false
import book_options
```
```{python .content-visible when-format="pdf"}
#| echo: false
import book_options_pdf
```
## Introduction
This chapter outlines two fundamental geographic data models (vector and raster) and introduces Python packages for working with them.
Before demonstrating their implementation in Python, we will introduce the theory behind each data model and the disciplines in which they predominate.
The vector data model (@sec-vector-data) represents geographic entities with points, lines, and polygons.
These have discrete, well-defined borders, meaning that vector datasets usually have a high level of precision (but not necessarily accuracy).
The raster data model (@sec-raster-data), on the other hand, divides the surface up into cells of constant size.
Raster datasets are the basis of background images used in online maps and have been a vital source of geographic data since the origins of aerial photography and satellite-based remote sensing devices.
Rasters aggregate spatially specific features to a given resolution, meaning that they are consistent over space and scalable, with many worldwide raster datasets available.
Which to use?
The answer likely depends on your domain of application, and the datasets you have access to:
- Vector datasets and methods dominate the social sciences because human settlements and processes (e.g., transport infrastructure) tend to have discrete borders
- Raster datasets and methods dominate many environmental sciences because of the reliance on remote sensing data
Python has strong support for both data models.
We will focus on **shapely** and **geopandas** for working with geograpic vector data, and **rasterio** for working with rasters.
**shapely** is a 'low-level' package for working with individual vector geometry objects.
**geopandas** is a 'high-level' package for working with geometry columns (`GeoSeries` objects), which internally contain **shapely** geometries, and with vector layers (`GeoDataFrame` objects).
The **geopandas** ecosystem provides a comprehensive approach for working with vector layers in Python, with many packages building on it.
There are several partially overlapping packages for working with raster data, each with its own advantages and disadvantages.
In this book, we focus on the most prominent one: **rasterio**, which represents 'simple' raster datasets with a combination of a **numpy** array, and a metadata object (`dict`) providing geographic metadata such as the coordinate system.
**xarray** is a notable alternative to **rasterio** not covered in this book which uses native `xarray.Dataset` and `xarray.DataArray` classes to effectively represent complex raster datasets such as NetCDF files with multiple bands and metadata.
There is much overlap in some fields, and raster and vector datasets can be used together: ecologists and demographers, for example, commonly use both vector and raster data.
Furthermore, it is possible to convert between the two forms (see @sec-raster-vector).
Whether your work involves use of vector or raster datasets, it is worth understanding the underlying data models before using them, as discussed in subsequent chapters.
## Vector data {#sec-vector-data}
The geographic vector data model is based on points located within a coordinate reference system (CRS).
Points can represent self-standing features (e.g., the location of a bus stop), or they can be linked together to form more complex geometries such as lines and polygons.
Most point geometries contain only two dimensions (three-dimensional CRSs may contain an additional $z$ value, typically representing height above sea level).
In this system, London, for example, can be represented by the coordinates `(-0.1,51.5)`.
This means that its location is -0.1 degrees east and 51.5 degrees north of the origin.
The origin, in this case, is at 0 degrees longitude (a prime meridian located at Greenwich) and 0 degrees latitude (the Equator) in a geographic ('lon/lat') CRS (@fig-vector-london, left panel).
The same point could also be approximated in a projected CRS with 'Easting/Northing' values of `(530000,180000)` in the British National Grid, meaning that London is located 530 $km$ East and 180 $km$ North of the origin of the CRS (@fig-vector-london, right panel).
The location of National Grid's origin, in the sea beyond South West Peninsular, ensures that most locations in the UK have positive Easting and Northing values.
::: {#fig-vector-london layout-ncol=2}
![](images/vector_lonlat.png)
![](images/vector_projected.png)
Illustration of vector (point) data in which location of London (the red X) is represented with reference to an origin (the blue circle).
The left plot represents a geographic CRS with an origin at 0° longitude and latitude.
The right plot represents a projected CRS with an origin located in the sea west of the South West Peninsula.
:::
There is more to CRSs, as described in @sec-coordinate-reference-systems-intro and @sec-reproj-geo-data but, for the purposes of this section, it is sufficient to know that coordinates consist of two numbers representing the distance from an origin, usually in $x$ and $y$ dimensions.
**geopandas** [@geopandas] provides classes for geographic vector data and a consistent command-line interface for reproducible geographic data analysis in Python.
It also provides an interface to three mature libraries for geocomputation, a strong foundation on which many geographic applications are built:
- GDAL, for reading, writing, and manipulating a wide range of geographic data formats, covered in @sec-read-write
- PROJ, a powerful library for coordinate system transformations, which underlies the content covered in @sec-reproj-geo-data
- GEOS, a planar geometry engine for operations such as calculating buffers and centroids on data with a projected CRS, covered in @sec-geometric-operations
Tight integration with these geographic libraries makes reproducible geocomputation possible: an advantage of using a higher-level language such as Python to access these libraries is that you do not need to know the intricacies of the low-level components, enabling focus on the methods rather than the implementation.
### Vector data classes
The main classes for working with geographic vector data in Python are hierarchical, meaning that the 'vector layer' class is composed of simpler 'geometry column' and individual 'geometry' components.
This section introduces them in order, starting with the highest level class.
For many applications, the vector layer class, a data frame with geometry columns, is all that's needed.
However, it's important to understand the structure of vector geographic objects and their components for some applications and for a deep understanding.
The three main vector geographic data classes in Python are:
- `GeoDataFrame`, a class representing vector layers, with a geometry column (class `GeoSeries`) as one of the columns
- `GeoSeries`, a class that is used to represent the geometry column in `GeoDataFrame` objects
- `shapely` geometry objects, which represent individual geometries, such as a point or a polygon in `GeoSeries` objects
The first two classes (`GeoDataFrame` and `GeoSeries`) are defined in **geopandas**.
The third class is defined in the **shapely** package, which deals with individual geometries, and is a main dependency of the **geopandas** package.
### Vector layers {#sec-vector-layers}
The most commonly used geographic vector data structure is the vector layer.
There are several approaches for working with vector layers in Python, ranging from low-level packages (e.g., **osgeo**, **fiona**) to the relatively high-level **geopandas** package that is the focus of this section.
Before writing and running code for creating and working with geographic vector objects, we need to import **geopandas** (by convention as `gpd` for more concise code) and **shapely**.
```{python}
import pandas as pd
import shapely
import geopandas as gpd
```
We also limit the maximum number of printed rows to six, to save space, using the `'display.max_rows'` option of **pandas**.
```{python}
pd.set_option('display.max_rows', 6)
```
Projects often start by importing an existing vector layer saved as a GeoPackage (`.gpkg`) file, an ESRI Shapefile (`.shp`), or other geographic file format.
The function `gpd.read_file` imports a GeoPackage file named `world.gpkg` located in the `data` directory of Python's working directory into a `GeoDataFrame` named `gdf`.
```{python}
#| echo: false
#| label: getdata
from pathlib import Path
import os
import shutil
data_path = Path('data')
if data_path.is_dir():
pass
# print('path exists') # directory exists
else:
print('Attempting to get and unzip the data')
import requests, zipfile, io
r = requests.get('https://github.com/geocompx/geocompy/releases/download/0.1/data.zip')
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall('.')
data_path = Path('data/cycle_hire_osm.gpkg')
if data_path.is_file():
pass
# print('path exists') # directory exists
else:
print('Attempting to move data')
r = requests.get('https://github.com/geocompx/geocompy/archive/refs/heads/main.zip')
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall('.')
shutil.copytree('py-main/data', 'data', dirs_exist_ok=True)
data_path = Path('output')
if data_path.is_dir():
pass
# print('path exists') # directory exists
else:
print('Attempting to move data')
shutil.copytree('py-main/output', 'output', dirs_exist_ok=True)
```
```{python}
gdf = gpd.read_file('data/world.gpkg')
```
The result is an object of type (class) `GeoDataFrame` with 177 rows (features) and 11 columns, as shown in the output of the following code:
```{python}
#| label: typegdf
type(gdf)
```
```{python}
gdf.shape
```
The `GeoDataFrame` class is an extension of the `DataFrame` class from the popular **pandas** package [@pandas].
This means we can treat non-spatial attributes from a vector layer as a table, and process them using the ordinary, i.e., non-spatial, established function methods.
For example, standard data frame subsetting methods can be used.
The code below creates a subset of the `gdf` dataset containing only the country name and the geometry.
```{python}
gdf = gdf[['name_long', 'geometry']]
gdf
```
The following expression creates a subdataset based on a condition, such as equality of the value in the `'name_long'` column to the string `'Egypt'`.
```{python}
gdf[gdf['name_long'] == 'Egypt']
```
Finally, to get a sense of the spatial component of the vector layer, it can be plotted using the `.plot` method (@fig-gdf-plot).
```{python}
#| label: fig-gdf-plot
#| fig-cap: Basic plot of a `GeoDataFrame`
gdf.plot();
```
Interactive maps of `GeoDataFrame` objects can be created with the `.explore` method, as illustrated in @fig-gdf-explore which was created with the following command:
::: {.content-visible when-format="html"}
```{python}
#| label: fig-gdf-explore
#| fig-cap: Basic interactive map with `.explore`
gdf.explore()
```
:::
::: {.content-visible when-format="pdf"}
```{python}
#| eval: false
gdf.explore()
```
```{python}
#| echo: false
#| output: false
#| error: true
map_to_png.map_to_png(gdf.explore(), 'fig-gdf-explore')
```
![Basic interactive map with `.explore`](images/fig-gdf-explore.png){#fig-gdf-explore}
:::
A subset of the data can be also plotted in a similar fashion.
::: {.content-visible when-format="html"}
```{python}
#| label: fig-gdf-explore2
#| fig-cap: Interactive map of a `GeoDataFrame` subset
gdf[gdf['name_long'] == 'Egypt'].explore()
```
:::
::: {.content-visible when-format="pdf"}
```{python}
#| eval: false
gdf[gdf['name_long'] == 'Egypt'].explore()
```
```{python}
#| echo: false
#| output: false
#| error: true
map_to_png.map_to_png(gdf[gdf['name_long'] == 'Egypt'].explore(), 'fig-gdf-explore2')
```
![Interactive map of a `GeoDataFrame` subset](images/fig-gdf-explore2.png){#fig-gdf-explore2}
:::
### Geometry columns {#sec-geometry-columns}
The geometry column of class `GeoSeries` is an essential column in a `GeoDataFrame`.
It contains the geometric part of the vector layer, and is the basis for all spatial operations.
This column can be accessed by name, which typically (e.g., when reading from a file) is `'geometry'`, as in `gdf['geometry']`.
However, the recommendation is to use the fixed `.geometry` property, which refers to the geometry column regardless whether its name is `'geometry'` or not.
In the case of the `gdf` object, the geometry column contains `'MultiPolygon'`s associated with each country.
```{python}
gdf.geometry
```
The geometry column also contains the spatial reference information, if any (also accessible with the shortcut `gdf.crs`).
```{python}
gdf.geometry.crs
```
Many geometry operations, such as calculating the centroid, buffer, or bounding box of each feature, involve just the geometry.
Applying this type of operation on a `GeoDataFrame` is therefore basically a shortcut to applying it on the `GeoSeries` object in the geometry column.
For example, the two following commands return exactly the same result, a `GeoSeries` containing bounding box polygons (using the `.envelope` method).
```{python}
gdf.envelope
```
```{python}
gdf.geometry.envelope
```
Note that `.envelope`, and other similar operators in **geopandas** such as `.centroid` (@sec-centroids), `.buffer` (@sec-buffers), or `.convex_hull`, return only the geometry (i.e., a `GeoSeries`), not a `GeoDataFrame` with the original attribute data.
In case we want the latter, we can create a copy of the `GeoDataFrame` and then 'overwrite' its geometry (or, we can overwrite the geometries directly in case we do not need the original ones, as in `gdf.geometry=gdf.envelope`).
```{python}
gdf2 = gdf.copy()
gdf2.geometry = gdf.envelope
gdf2
```
Another useful property of the geometry column is the geometry type, as shown in the following code.
Note that the types of geometries contained in a geometry column (and, thus, a vector layer) are not necessarily the same for every row.
It is possible to have multiple geometry types in a single `GeoSeries`.
Accordingly, the `.type` property returns a `Series` (with values of type `str`, i.e., strings), rather than a single value (the same can be done with the shortcut `gdf.geom_type`).
```{python}
gdf.geometry.type
```
To summarize the occurrence of different geometry types in a geometry column, we can use the **pandas** `.value_counts` method.
In this case, we see that the `gdf` layer contains only `'MultiPolygon'` geometries.
```{python}
gdf.geometry.type.value_counts()
```
A `GeoDataFrame` can also have multiple `GeoSeries` columns, as demonstrated in the following code section.
```{python}
#| output: false
gdf['bbox'] = gdf.envelope
gdf['polygon'] = gdf.geometry
gdf
```
Only one geometry column at a time is 'active', in the sense that it is being accessed in operations involving the geometries (such as `.centroid`, `.crs`, etc.).
To switch the active geometry column from one `GeoSeries` column to another, we use `.set_geometry`.
@fig-switch-to-centroids and @fig-switch-to-polygons shows interactive maps of the `gdf` layer with the `'bbox'` and `'polygon'` geometry columns activated, respectively.
::: {.content-visible when-format="html"}
```{python}
#| label: fig-switch-to-centroids
#| fig-cap: Switching to the `'bbox'` geometry column in the `world` layer, and plotting it
gdf = gdf.set_geometry('bbox')
gdf.explore()
```
:::
::: {.content-visible when-format="pdf"}
```{python}
#| eval: false
gdf = gdf.set_geometry('bbox')
gdf.explore()
```
```{python}
#| echo: false
#| output: false
#| error: true
gdf = gdf.set_geometry('bbox')
map_to_png.map_to_png(gdf.explore(), 'fig-switch-to-centroids')
```
![Switching to the `'bbox'` geometry column in the `world` layer, and plotting it](images/fig-switch-to-centroids.png){#fig-switch-to-centroids}
:::
::: {.content-visible when-format="html"}
```{python}
#| label: fig-switch-to-polygons
#| fig-cap: Switching to the `'polygons'` geometry column in the `world` layer, and plotting it
gdf = gdf.set_geometry('polygon')
gdf.explore()
```
:::
::: {.content-visible when-format="pdf"}
```{python}
#| eval: false
gdf = gdf.set_geometry('polygon')
gdf.explore()
```
```{python}
#| echo: false
#| output: false
#| error: true
gdf = gdf.set_geometry('polygon')
map_to_png.map_to_png(gdf.explore(), 'fig-switch-to-polygons')
```
![Switching to the `'polygons'` geometry column in the `world` layer, and plotting it](images/fig-switch-to-polygons.png){#fig-switch-to-polygons}
:::
### The Simple Features standard {#sec-simple-features}
Geometries are the basic building blocks of vector layers.
Although the Simple Features standard defines about 20 types of geometries, we will focus on the seven most commonly used types: `POINT`, `LINESTRING`, `POLYGON`, `MULTIPOINT`, `MULTILINESTRING`, `MULTIPOLYGON` and `GEOMETRYCOLLECTION`.
A useful list of possible geometry types can be found in R's **sf** package documentation[^sf_docs].
[^sf_docs]: [https://r-spatial.github.io/sf/articles/sf1.html#simple-feature-geometry-types](https://r-spatial.github.io/sf/articles/sf1.html#simple-feature-geometry-types)
Simple feature geometries can be represented by well-known binary (WKB) and well-known text (WKT) encodings.
<!-- TODO: add reference or at least link to OGC document on this? --> WKB representations are usually hexadecimal strings easily readable for computers, and this is why GIS software and spatial databases use WKB to transfer and store geometry objects.
WKT, on the other hand, is a human-readable text markup description of Simple Features.
Both formats are exchangeable, and if we present one, we will naturally choose the WKT representation.
The foundation of each geometry type is the point.
A point is simply a coordinate in two-dimensional, three-dimensional, or four-dimensional space such as shown in @fig-point.
``` text
POINT (5 2)
```
A linestring is a sequence of points with a straight line connecting the points (@fig-linestring).
``` text
LINESTRING (1 5, 4 4, 4 1, 2 2, 3 2)
```
A polygon is a sequence of points that form a closed, non-intersecting ring.
Closed means that the first and the last point of a polygon have the same coordinates (@fig-polygon).
``` text
POLYGON ((1 5, 2 2, 4 1, 4 4, 1 5))
```
So far we have created geometries with only one geometric entity per feature.
However, the Simple Features standard allows multiple geometries to exist within a single feature, using 'multi' versions of each geometry type, as illustrated in @fig-multipoint, @fig-multilinestring, and @fig-multipolygon1.
``` text
MULTIPOINT (5 2, 1 3, 3 4, 3 2)
MULTILINESTRING ((1 5, 4 4, 4 1, 2 2, 3 2), (1 2, 2 4))
MULTIPOLYGON (((1 5, 2 2, 4 1, 4 4, 1 5), (0 2, 1 2, 1 3, 0 3, 0 2)))
```
Finally, a geometry collection can contain any combination of geometries of the other six types, such as the combination of a multipoint and linestring shown below (@fig-geometrycollection).
``` text
GEOMETRYCOLLECTION (MULTIPOINT (5 2, 1 3, 3 4, 3 2),
LINESTRING (1 5, 4 4, 4 1, 2 2, 3 2))
```
### Geometries {#sec-geometries}
Each element in the geometry column (`GeoSeries`) is a geometry object of class `shapely` [@shapely].
For example, here is one specific geometry selected by implicit index (Canada, the 4^th^ element in `gdf`'s geometry column).
```{python}
gdf.geometry.iloc[3]
```
We can also select a specific geometry based on the `'name_long'` attribute (i.e., the 1^st^ and only element in the subset of `gdf` where the country name is equal to `Egypt`):
```{python}
gdf[gdf['name_long'] == 'Egypt'].geometry.iloc[0]
```
The **shapely** package is compatible with the Simple Features standard (@sec-simple-features).
Accordingly, seven types of geometry types are supported.
The following section demonstrates creating a `shapely` geometry of each type from scratch.
In the first example (a `'Point'`) we show two types of inputs to create a geometry: a list of coordinates or a `string` in the WKT format.
In the examples for the remaining geometries we use the former approach.
Creating a `'Point'` geometry from a list of coordinates uses the `shapely.Point` function in the following expression (@fig-point).
```{python}
#| label: fig-point
#| fig-cap: A `Point` geometry (created either from a `list` or WKT)
point = shapely.Point([5, 2])
point
```
Alternatively, we can use `shapely.from_wkt` to transform a WKT string to a `shapely` geometry object.
Here is an example of creating the same `'Point'` geometry from WKT (@fig-point).
```{python}
#| output: false
point = shapely.from_wkt('POINT (5 2)')
point
```
A `'LineString'` geometry can be created based on a list of coordinate tuples or lists (@fig-linestring).
```{python}
#| label: fig-linestring
#| fig-cap: A `LineString` geometry
linestring = shapely.LineString([(1,5), (4,4), (4,1), (2,2), (3,2)])
linestring
```
Creation of a `'Polygon'` geometry is similar, but our first and last coordinate must be the same, to ensure that the polygon is closed.
Note that in the following example, there is one list of coordinates that defines the exterior outer hull of the polygon, followed by a `list` of `list`s of coordinates that define the holes (if any) in the polygon (@fig-polygon).
```{python}
#| label: fig-polygon
#| fig-cap: A `Polygon` geometry
polygon = shapely.Polygon(
[(1,5), (2,2), (4,1), (4,4), (1,5)], ## Exterior
[[(2,4), (3,4), (3,3), (2,3), (2,4)]] ## Hole(s)
)
polygon
```
A `'MultiPoint'` geometry is also created from a list of coordinate tuples (@fig-multipoint), where each element represents a single point.
```{python}
#| label: fig-multipoint
#| fig-cap: A `MultiPoint` geometry
multipoint = shapely.MultiPoint([(5,2), (1,3), (3,4), (3,2)])
multipoint
```
A `'MultiLineString'` geometry, on the other hand, has one list of coordinates for each line in the `MultiLineString` (@fig-multilinestring).
```{python}
#| label: fig-multilinestring
#| fig-cap: A `MultiLineString` geometry
multilinestring = shapely.MultiLineString([
[(1,5), (4,4), (4,1), (2,2), (3,2)], ## 1st sequence
[(1,2), (2,4)] ## 2nd sequence, etc.
])
multilinestring
```
A `'MultiPolygon'` geometry (@fig-multipolygon1) is created from a `list` of `Polygon` geometries. For example, here we are creating a `'MultiPolygon'` with two parts, both without holes.
```{python}
#| label: fig-multipolygon1
#| fig-cap: A `MultiPolygon` geometry
multipolygon = shapely.MultiPolygon([
[[(1,5), (2,2), (4,1), (4,4), (1,5)], []], ## Polygon 1
[[(0,2), (1,2), (1,3), (0,3), (0,2)], []] ## Polygon 2, etc.
])
multipolygon
```
Since the required input has four hierarchical levels, it may be more clear to create the single-part `'Polygon'` geometries in advance, using the respective function (`shapely.Polygon`), and then pass them to `shapely.MultiPolygon` (@fig-multipolygon1). (The same technique can be used with the other `shapely.Multi*` functions.)
```{python}
#| output: false
multipolygon = shapely.MultiPolygon([
shapely.Polygon([(1,5), (2,2), (4,1), (4,4), (1,5)]), ## Polygon 1
shapely.Polygon([(0,2), (1,2), (1,3), (0,3), (0,2)]) ## Polygon 2, etc.
])
multipolygon
```
And, finally, a `'GeometryCollection'` geometry is a `list` with one or more of the other six geometry types (@fig-geometrycollection):
```{python}
#| label: fig-geometrycollection
#| fig-cap: A `GeometryCollection` geometry
geometrycollection = shapely.GeometryCollection([multipoint, multilinestring])
geometrycollection
```
`shapely` geometries act as atomic units of vector data, meaning that there is no concept of geometry *sets*: each operation accepts individual geometry object(s) as input, and returns an individual geometry as output.
(The `GeoSeries` and `GeoDataFrame` objects, defined in **geopandas**, are used to deal with sets of `shapely` geometries, collectively.)
For example, the following expression calculates the difference (see @sec-clipping) between the buffered (see @sec-buffers) `multipolygon` (using distance of `0.2`) and itself (@fig-mpol-buffer-difference):
```{python}
#| label: fig-mpol-buffer-difference
#| fig-cap: The difference between a buffered `MultiPolygon` and itself
multipolygon.buffer(0.2).difference(multipolygon)
```
As demonstrated in the last few figures, a `shapely` geometry object is automatically evaluated to a small image of the geometry (when using an interface capable of displaying it, such as Jupyter Notebook).
To print the WKT string instead, we can use the `print` function:
```{python}
print(linestring)
```
Finally, it is important to note that raw coordinates of `shapely` geometries are accessible through a combination of the `.coords`, `.geoms`, `.exterior`, and `.interiors` properties (depending on the geometry type).
These access methods are helpful when we need to develop our own spatial operators for specific tasks.
For example, the following expression returns the `list` of all coordinates of the `polygon` geometry exterior:
```{python}
list(polygon.exterior.coords)
```
Also see @sec-type-transformations, where `.coords`, `.geoms`, and `.exterior` are used to transform a given `shapely` geometry to a different type (e.g., `'Polygon'` to `'MultiPoint'`).
### Vector layer from scratch {#sec-vector-layer-from-scratch}
In the previous sections, we started with a vector layer (`GeoDataFrame`), from an existing GeoPackage file, and 'decomposed' it to extract the geometry column (`GeoSeries`, @sec-geometry-columns) and separate geometries (`shapely`, see @sec-geometries).
In this section, we will demonstrate the opposite process, constructing a `GeoDataFrame` from `shapely` geometries, combined into a `GeoSeries`.
This will help you better understand the structure of a `GeoDataFrame`, and may come in handy when you need to programmatically construct simple vector layers, such as a line between two given points.
Vector layers consist of two main parts: geometries and non-geographic attributes.
@fig-gdf-flow shows how a `GeoDataFrame` object is created---geometries come from a `GeoSeries` object (which consists of `shapely` geometries), while attributes are taken from `Series` objects.
![Creating a `GeoDataFrame` from scratch](images/gdf-flow.svg){#fig-gdf-flow}
The final result, a vector layer (`GeoDataFrame`) is therefore a hierarchical structure (@fig-gdf-structure), containing the geometry column (`GeoSeries`), which in turn contains geometries (`shapely`).
Each of the 'internal' components can be accessed, or 'extracted', which is sometimes necessary, as we will see later on.
![Structure of a `GeoDataFrame`](images/gdf-structure.svg){width=50% fig-align="center" #fig-gdf-structure}
Non-geographic attributes may represent the name of the feature, and other attributes such as measured values, groups, etc.
To illustrate attributes, we will represent a temperature of 25°C in London on June 21st, 2023.
This example contains a geometry (the coordinates), and three attributes with three different classes (place name, temperature, and date).
Objects of class `GeoDataFrame` represent such data by combining the attributes (`Series`) with the simple feature geometry column (`GeoSeries`).
First, we create a point geometry, which we know how to do from @sec-geometries (@fig-point-lnd).
```{python}
#| label: fig-point-lnd
#| fig-cap: A `shapely` point representing London
lnd_point = shapely.Point(0.1, 51.5)
lnd_point
```
Next, we create a `GeoSeries` (of length 1), containing the point and a CRS definition, in this case WGS84 (defined using its EPSG code `4326`).
Also note that the `shapely` geometries go into a `list`, to illustrate that there can be more than one geometry unlike in this example.
```{python}
lnd_geom = gpd.GeoSeries([lnd_point], crs=4326)
lnd_geom
```
Next, we combine the `GeoSeries` with other attributes into a `dict`.
The geometry column is a `GeoSeries`, named `geometry`.
The other attributes (if any) may be defined using `list` or `Series` objects.
Here, for simplicity, we use the `list` option for defining the three attributes `name`, `temperature`, and `date`.
Again, note that the `list` can be of length \>1, in case we are creating a layer with more than one feature (i.e., multiple rows).
```{python}
lnd_data = {
'name': ['London'],
'temperature': [25],
'date': ['2023-06-21'],
'geometry': lnd_geom
}
```
Finally, the `dict` can be converted to a `GeoDataFrame` object, as shown in the following code.
```{python}
lnd_layer = gpd.GeoDataFrame(lnd_data)
lnd_layer
```
What just happened?
First, the coordinates were used to create the simple feature geometry (`shapely`).
Second, the geometry was converted into a simple feature geometry column (`GeoSeries`), with a CRS.
Third, attributes were combined with `GeoSeries`.
This results in an `GeoDataFrame` object, named `lnd_layer`.
To illustrate how does creating a layer with more than one feature looks like, here is an example where we create a layer with two points, London and Paris.
```{python}
lnd_point = shapely.Point(0.1, 51.5)
paris_point = shapely.Point(2.3, 48.9)
towns_geom = gpd.GeoSeries([lnd_point, paris_point], crs=4326)
towns_data = {
'name': ['London', 'Paris'],
'temperature': [25, 27],
'date': ['2013-06-21', '2013-06-21'],
'geometry': towns_geom
}
towns_layer = gpd.GeoDataFrame(towns_data)
towns_layer
```
Now, we are able to create an interactive map of the `towns_layer` object (@fig-layer-from-scratch-explore).
To make the points easier to see, we are customizing a fill color and size (we elaborate on `.explore` options in @sec-interactive-maps).
::: {.content-visible when-format="html"}
```{python}
#| label: fig-layer-from-scratch-explore
#| fig-cap: '`towns_layer`, created from scratch, visualized using `.explore`'
towns_layer.explore(color='red', marker_kwds={'radius': 10})
```
:::
::: {.content-visible when-format="pdf"}
```{python}
#| eval: false
towns_layer.explore(color='red', marker_kwds={'radius': 10})
```
```{python}
#| echo: false
#| output: false
#| error: true
map_to_png.map_to_png(towns_layer.explore(color='red', marker_kwds={'radius': 10}), 'fig-layer-from-scratch-explore')
```
![`towns_layer`, created from scratch, visualized using `.explore`](images/fig-layer-from-scratch-explore.png){#fig-layer-from-scratch-explore}
:::
A spatial (point) layer can be also created from a `DataFrame` object (package **pandas**) that contains columns with coordinates.
To demonstrate, we hereby first create a `GeoSeries` object from the coordinates, and then combine it with the `DataFrame` to form a `GeoDataFrame`.
```{python}
towns_table = pd.DataFrame({
'name': ['London', 'Paris'],
'temperature': [25, 27],
'date': ['2017-06-21', '2017-06-21'],
'x': [0.1, 2.3],
'y': [51.5, 48.9]
})
towns_geom = gpd.points_from_xy(towns_table['x'], towns_table['y'])
towns_layer = gpd.GeoDataFrame(towns_table, geometry=towns_geom, crs=4326)
```
The output gives the same result as previous `towns_layer`.
This approach is particularly useful when we need to read data from a CSV file, e.g., using `pd.read_csv`, and want to turn the resulting `DataFrame` into a `GeoDataFrame` (see another example in @sec-spatial-joining).
### Derived numeric properties {#sec-area-length}
Vector layers are characterized by two essential derived numeric properties: *length* (`.length`)---applicable to lines, and *area* (`.area`)---applicable to polygons.
Area and length can be calculated for any data structures discussed above, either a `shapely` geometry, in which case the returned value is a number, or for `GeoSeries` or `DataFrame`, in which case the returned value is a numeric `Series`.
```{python}
linestring.length
```
```{python}
multipolygon.area
```
```{python}
gpd.GeoSeries([point, linestring, polygon, multipolygon]).area
```
Like all numeric calculations in **geopandas**, the results assume a planar CRS and are returned in its native units.
This means that length and area measurements for geometries in WGS84 (`crs=4326`) are returned in decimal degrees and essentially meaningless (to see the warning, try running `gdf.area`).
To obtain meaningful length and area measurements for data in a geographic CRS, the geometries first need to be transformed to a projected CRS (see @sec-reprojecting-vector-geometries) applicable to the area of interest.
For example, the area of Slovenia can be calculated in the UTM zone 33N CRS (`crs=32633`).
The result is in $m^2$, the units of the UTM zone 33N CRS.
```{python}
gdf[gdf['name_long'] == 'Slovenia'].to_crs(32633).area
```
## Raster data {#sec-raster-data}
The spatial raster data model represents the world with the continuous grid of cells (often also called pixels; @fig-raster-intro-plot1 (A)).
This data model often refers to so-called regular grids, in which each cell has the same, constant size---and we will focus only on regular grids in this book.
However, several other types of grids exist, including rotated, sheared, rectilinear, and curvilinear grids (see Chapter 1 of @pebesma_spatial_2022 or Chapter 2 of @tennekes_elegant_2022).
The raster data model usually consists of a raster header (or metadata) and a matrix (with rows and columns) representing equally spaced cells (often also called pixels; @fig-raster-intro-plot1 (A)).
The raster header defines the coordinate reference system, the origin and the resolution.
The origin (or starting point) is typically the coordinate of the lower-left corner of the matrix.
The metadata defines the origin, and the cell size, i.e., resolution.
Combined with the column and row count, the extent can also be derived.
The matrix representation avoids storing explicitly the coordinates for the four corner points (in fact it only stores one coordinate, namely the origin) of each cell, as would be the case for rectangular vector polygons.
This and map algebra (@sec-map-algebra) makes raster processing much more efficient and faster than vector data processing.
However, in contrast to vector data, the cell of one raster layer can only hold a single value.
The cell values are numeric, representing either a continuous or a categorical variable (@fig-raster-intro-plot1 (C)).
![Raster data types: (A) cell IDs, (B) cell values, (C) a colored raster map](images/raster-intro-plot1.png){#fig-raster-intro-plot1}
Raster maps usually represent continuous phenomena such as elevation, temperature, population density, or spectral data.
Discrete features such as soil or land-cover classes can also be represented in the raster data model.
Both uses of raster datasets are illustrated in @fig-raster-intro-plot2, which shows how the borders of discrete features may become blurred in raster datasets.
Depending on the nature of the application, vector representations of discrete features may be more suitable.
![Examples of continuous and categorical rasters](images/raster-intro-plot2.png){#fig-raster-intro-plot2}
As mentioned above, working with rasters in Python is less organized around one comprehensive package as compared to the case for vector layers and **geopandas**.
Instead, several packages provide alternative subsets of methods for working with raster data.
The two most notable approaches for working with rasters in Python are provided by **rasterio** and **rioxarray** packages.
As we will see shortly, they differ in scope and underlying data models.
Specifically, **rasterio** represents rasters as **numpy** arrays associated with a separate object holding the spatial metadata.
The **rioxarray** package, a wrapper of **rasterio**, however, represents rasters with **xarray** 'extended' arrays, which are an extension of **numpy** array designed to hold axis labels and attributes in the same object, together with the array of raster values.
Similar approaches are provided by less well-known **xarray-spatial** and **geowombat** packages.
Comparatively, **rasterio** is more well-established, but it is more low-level (which has both advantages and distadvantages).
All of the above-mentioned packages, however, are not exhaustive in the same way **geopandas** is.
For example, when working with **rasterio**, more packages may be needed to accomplish common tasks such as zonal statistics (package **rasterstats**) or calculating topographic indices (package **richdem**).
<!-- On the other hand, **xarray** was extended to accommodate spatial operators missing from the core package itself, with the **rioxarray** and **xarray-spatial** packages. -->
In the following two sections, we introduce **rasterio**, which is the raster-related package we are going to work with through the rest of the book.
### Using **rasterio** {#sec-using-rasterio}
To work with the **rasterio** package, we first need to import it.
Additionally, as the raster data is stored within **numpy** arrays, we import the **numpy** package and make all its functions accessible for effective data manipulation.
Finally, we import the **rasterio.plot** sub-module for its `rasterio.plot.show` function that allows for quick visualization of rasters.
```{python}
import numpy as np
import rasterio
import rasterio.plot
```
Rasters are typically imported from existing files.
When working with **rasterio**, importing a raster is actually a two-step process:
- First, we open a raster file 'connection' using `rasterio.open`
- Second, we read raster values from the connection using the `.read` method
This type of separation is analogous to basic Python functions for reading from files, such as `open` and `.readline` to read from a text file.
The rationale is that we do not always want to read all information from the file into memory, which is particularly important as rasters size can be larger than RAM size.
Accordingly, the second step (`.read`) is selective, meaning that the user can fine-tune the subset of values (bands, rows/columns, resolution, etc.) that are actually being read.
For example, we may want to read just one raster band rather than reading all bands.
In the first step, we pass a file path to the `rasterio.open` function to create a `DatasetReader` file connection, hereby named `src`.
For this example, we use a single-band raster representing elevation in Zion National Park, stored in `srtm.tif`.
```{python}
src = rasterio.open('data/srtm.tif')
src
```
To get a first impression of the raster values, we can plot the raster using the `rasterio.plot.show` function (@fig-rasterio-plot):
```{python}
#| label: fig-rasterio-plot
#| fig-cap: Basic plot of a raster, the data are coming from a **rasterio** file connection
#| fig-width: 4
rasterio.plot.show(src);
```
The `DatasetReader` contains the raster metadata, that is, all of the information other than the raster values.
Let's examine it with the `.meta` property.
```{python}
src.meta
```
Namely, it allows us to see the following properties, which we will elaborate on below, and in later chapters:
- `driver`---The raster file format (see @sec-data-output-raster)
- `dtype`---Data type (see @tbl-numpy-data-types)
- `nodata`---The value being used as 'No Data' flag (see @sec-data-output-raster)
- Dimensions:
- `width`---Number of columns
- `height`---Number of rows
- `count`---Number of bands
- `crs`---Coordinate reference system (see @sec-querying-and-setting-coordinate-systems)
- `transform`---The raster affine transformation matrix
The last item (i.e., `transform`) deserves more attention.
To position a raster in geographical space, in addition to the CRS, we must specify the raster *origin* ($x_{min}$, $y_{max}$) and resolution ($delta_{x}$, $delta_{y}$).
In the transformation matrix notation, assuming a regular grid, these data items are stored as follows:
```{text}
Affine(delta_x, 0.0, x_min,
0.0, delta_y, y_max)
```
Note that, by convention, raster y-axis origin is set to the maximum value ($y_{max}$) rather than the minimum, and, accordingly, the y-axis resolution ($delta_{y}$) is negative.
In other words, since the origin is in the *top*-left corner, advancing along the y-axis is done through negative steps (downwards).
In the second step, the `.read` method of the `DatasetReader` is used to read the actual raster values.
Importantly, we can read:
- All layers (as in `.read()`)
- A particular layer, passing a numeric index (as in `.read(1)`)
- A subset of layers, passing a `list` of indices (as in `.read([1,2])`)
Note that the layer indices start from `1`, contrary to the Python convention of the first index being `0`.
The object returned by `.read` is a **numpy** array [@numpy], with either two or three dimensions:
- *Three* dimensions, when reading more than one layer (e.g., `.read()` or `.read([1,2])`). In such case, the dimensions pattern is `(layers, rows, columns)`
- *Two* dimensions, when reading one specific layer (e.g., `.read(1)`). In such case, the dimensions pattern is `(rows, columns)`
Let's read the first (and only) layer from the `srtm.tif` raster, using the file connection object `src` and the `.read` method.
```{python}
src.read(1)
```
The result is a two-dimensional **numpy** array where each value represents the elevation of the corresponding pixel.
The relation between a **rasterio** file connection and the derived properties is summarized in @fig-rasterio-structure.
The file connection (created with `rasterio.open`) gives access to the two components of raster data: the metadata (via the `.meta` property) and the values (via the `.read` method).
![A **rasterio** file connection and its derived components, the metadata and the raster values](images/rasterio-structure.svg){#fig-rasterio-structure}
### Raster from scratch {#sec-raster-from-scratch}
In this section, we are going to demonstrate the creation of rasters from scratch.
We will construct two small rasters, `elev` and `grain`, which we will use in examples later in the book.
Unlike creating a vector layer (see @sec-vector-layer-from-scratch), creating a raster from scratch is rarely needed in practice because aligning a raster with the proper spatial extent is challenging to do programmatically ('georeferencing' tools in GIS software are a better fit for the job).
Nevertheless, the examples will be helpful to become more familiar with the **rasterio** data structures.
Conceptually, a raster is an array combined with georeferencing information, whereas the latter comprises:
- A transformation matrix, containing the origin and resolution, thus linking pixel indices with coordinates in a particular coordinate system
- A CRS definition, specifying the association of that coordinate system with the surface of the earth (optional)
Therefore, to create a raster, we first need to have an array with the values, and then supplement it with the georeferencing information.
Let's create the arrays `elev` and `grain`.
The `elev` array is a $6 \times 6$ array with sequential values from `1` to `36`.
It can be created as follows using the `np.arange` function and `.reshape` method from **numpy**.
```{python}
elev = np.arange(1, 37, dtype=np.uint8).reshape(6, 6)
elev
```
The `grain` array represents a categorical raster with values `0`, `1`, `2`, corresponding to categories 'clay', 'silt', 'sand', respectively.
We will create it from a specific arrangement of pixel values, using **numpy**'s `np.array` and `.reshape`.
```{python}
v = [
1, 0, 1, 2, 2, 2,
0, 2, 0, 0, 2, 1,
0, 2, 2, 0, 0, 2,
0, 0, 1, 1, 1, 1,
1, 1, 1, 2, 1, 1,
2, 1, 2, 2, 0, 2
]
grain = np.array(v, dtype=np.uint8).reshape(6, 6)
grain
```
Note that, in both cases, we are using the `uint8` (unsigned integer in 8 bits, i.e., `0-255`) data type, which is sufficient to represent all possible values of the given rasters (see @tbl-numpy-data-types).
This is the recommended approach for a minimal memory footprint.
What is missing now is the georeferencing information (see @sec-using-rasterio).
In this case, since the rasters are arbitrary, we also set up an arbitrary transformation matrix, where:
- The origin ($x_{min}$, $y_{max}$) is at `-1.5,1.5`
- The raster resolution ($delta_{x}$, $delta_{y}$) is `0.5,-0.5`
We can add this information using `rasterio.transform.from_origin`, and specifying `west`, `north`, `xsize`, and `ysize` parameters.
The resulting transformation matrix object is hereby named `new_transform`.
```{python}
new_transform = rasterio.transform.from_origin(
west=-1.5,
north=1.5,
xsize=0.5,
ysize=0.5
)
new_transform
```
Note that, confusingly, $delta_{y}$ (i.e., `ysize`) is defined in `rasterio.transform.from_origin` using a positive value (`0.5`), even though it is, in fact, negative (`-0.5`).
The raster can now be plotted in its coordinate system, passing the array `elev` along with the transformation matrix `new_transform` to `rasterio.plot.show` (@fig-rasterio-plot-elev).
```{python}
#| label: fig-rasterio-plot-elev
#| fig-cap: Plot of the `elev` raster, a minimal example of a continuous raster, created from scratch
rasterio.plot.show(elev, transform=new_transform);
```
The `grain` raster can be plotted the same way, as we are going to use the same transformation matrix for it as well (@fig-rasterio-plot-grain).
```{python}
#| label: fig-rasterio-plot-grain
#| fig-cap: Plot of the `grain` raster, a minimal example of a categorical raster, created from scratch
rasterio.plot.show(grain, transform=new_transform);
```
At this point, we have two rasters, each composed of an array and related transformation matrix.
We can work with the raster using **rasterio** by:
- Passing the transformation matrix wherever actual raster pixel coordinates are important (such as in function `rasterio.plot.show` above)
- Keeping in mind that any other layer we use in the analysis is in the same CRS
Finally, to export the raster for permanent storage, along with the spatial metadata, we need to go through the following steps:
1. Create a raster file connection (where we set the transform and the CRS, among other settings)
2. Write the array with raster values into the connection
3. Close the connection
Don't worry if the code below is unclear; the concepts related to writing raster data to file will be explained in @sec-data-output-raster.
For now, for completeness, and also to use these rasters in subsequent chapters without having to re-create them from scratch, we just provide the code for exporting the `elev` and `grain` rasters into the `output` directory.
In the case of `elev`, we do it as follows with the `rasterio.open`, `.write`, and `.close` functions and methods of the **rasterio** package.
```{python}
#| eval: false
new_dataset = rasterio.open(
'output/elev.tif', 'w',
driver='GTiff',
height=elev.shape[0],
width=elev.shape[1],
count=1,
dtype=elev.dtype,
crs=4326,
transform=new_transform
)
new_dataset.write(elev, 1)
new_dataset.close()
```
Note that the CRS we (arbitrarily) set for the `elev` raster is WGS84, defined using `crs=4326` according to the EPSG code.
Exporting the `grain` raster is done in the same way, with the only differences being the file name and the array we write into the connection.
```{python}
#| eval: false
new_dataset = rasterio.open(
'output/grain.tif', 'w',
driver='GTiff',
height=grain.shape[0],
width=grain.shape[1],
count=1,
dtype=grain.dtype,
crs=4326,
transform=new_transform
)
new_dataset.write(grain, 1)
new_dataset.close()
```
As a result, the files `elev.tif` and `grain.tif` are written into the `output` directory.
We are going to use these small raster files later on in the examples (for example, @sec-raster-subsetting).
Note that the transform matrices and dimensions of `elev` and `grain` are identical.
This means that the rasters are overlapping, and can be combined into one two-band raster, processed in raster algebra operations (@sec-map-algebra), etc.
## Coordinate Reference Systems {#sec-coordinate-reference-systems-intro}
Vector and raster spatial data types share concepts intrinsic to spatial data.
Perhaps the most fundamental of these is the Coordinate Reference System (CRS), which defines how the spatial elements of the data relate to the surface of the Earth (or other bodies).
CRSs are either geographic or projected, as introduced at the beginning of this chapter (@sec-vector-data).
This section explains each type, laying the foundations for @sec-reproj-geo-data, which provides a deep dive into setting, transforming, and querying CRSs.
### Geographic coordinate systems
Geographic coordinate systems identify any location on the Earth's surface using two values---longitude and latitude (see left panel of @fig-zion-crs).
Longitude is a location in the East-West direction in angular distance from the Prime Meridian plane, while latitude is an angular distance North or South of the equatorial plane.
Distances in geographic CRSs are therefore not measured in meters.
This has important consequences, as demonstrated in @sec-reproj-geo-data.
A spherical or ellipsoidal surface represents the surface of the Earth in geographic coordinate systems.
Spherical models assume that the Earth is a perfect sphere of a given radius---they have the advantage of simplicity, but, at the same time, they are inaccurate: the Earth is not a sphere!
Ellipsoidal models are defined by two parameters: the equatorial radius and the polar radius.
These are suitable because the Earth is compressed: the equatorial radius is around 11.5 $km$ longer than the polar radius.