I am a transdisciplinary researcher specializing in statistical learning and spatial data science. Statistical learning is a label often given to the mathematical framework that connects a broad class of techniques devoted to understanding relationships in data. Spatial data science is a field concerned with the computational manipulation, analysis, and visualization of spatial data. My work lies at the intersection of these two fields: I am a statistician concerned with the representation of space in statistical methods, and I am a spatial data scientist interested in applying novel statistical techniques to spatial problems. Spatial data creates special challenges in the data science workflow and often requires different considerations than classical data (Anselin, 1989). Statistical learning often does not account for these challenges; on the other hand, there are many theoretical approaches and tools in statistical learning that are ripe for application to spatial problems. In short, much is to be gained in reaching across the aisle for both fields. Algorithms are tools of thought as much as they are tools of computation, and geographers and data scientists have much to offer each other in building them.
Within this intersection, I am primarily interested in the issues that space raises in the relationship between statistical learning and scientific discovery. On the nonspatial side, computer science and statistics have developed powerful tools for representing models and causal thought in a machine (Pearl, 2000). However, these tools often fail to account for the spatial dimensions of the problems they model. As an example, the geostatistics community has a very well-fleshed out theory of spatial processes and geostatistical learning which has yet to be widely deployed by data scientists (Hoffimann et al., 2021). Conversely, GIScientists have recognized a need for more sophisticated ways to represent geographic thought—that is, spatial processes—in their models (Gahegan, 2020). Geography, as the standard-bearer for spatial relationships, is uniquely poised to synthesize these efforts for spatial learning in the pursuit of scientific discovery. This exercise in theory-building also serves to strengthen the position of geography as a leader in methods of thinking about space and in realizing this knowledge in computer age statistical thought.
One subtopic of spatial data science that would benefit greatly from the use of sophisticated statistical learning methods is the study of spatial scale. The term “scale” has several definitions, one of which refers to the varying size of spatial patterns. It is connected to several core problems in spatial analysis and statistics on spatial data, including the modifiable areal unit problem (MAUP; Openshaw, 1983), the spatial change-of-support problem (COS), and Simpson’s paradox. These arise from the difficulty in determining from data the scale(s) at which a spatial process operates. Ideally, spatial analysts would like to not only account for scale in our modeling paradigms via techniques such as spatial lag and spatial error models but also to do statistical inference on the scale parameter. The turn of phrase is reductive—complex structures are usually required to capture scale, and they need not represent it by a parameter—but it summarizes succinctly the objectives of current research in modeling spatial scale. There are a variety of existing representations of scale in spatial models, including multiscale geographically weighted regression (MGWR; Fotheringham et al., 2017), spatially clustered regression (SCR; Sugasawa and Murakami, 2021), and myriad different constructions of spatial weights matrices. Finally, developing these models is useful even if they never see use in applied work since it moves the field towards a building a better theory of scale.
My training makes me well suited for addressing these problems: as an undergraduate, I majored in Mathematics with a minor in Computer Science. I took extensive coursework in computational mathematics, including numerous graduate-level classes, that have prepared me for research in algorithm design. Additionally, my research experiences before geography exposed me to many different facets of applied mathematics, including differential topology, mathematical biology, structural acoustics, complex systems, network science, and quantum computing. My knowledge of how these fields integrate mathematics, statistics, and computation into their core questions, theories, and methods of investigation has been invaluable as I explore the same integration in quantitative geography.
Anselin, Luc. “What is Special About Spatial Data? Alternative Perspectives on Spatial Data Analysis,” UC Santa Barbara NCGIA Technical Reports, 1989, 10.
Gahegan, Mark. “Fourth Paradigm GIScience? Prospects for Automated Discovery and Explanation from Data.” International Journal of Geographical Information Science 34, no. 1 (January 2, 2020): 1–21. https://doi.org/10.1080/13658816.2019.1652304.
Hoffimann, Júlio, Maciel Zortea, Breno de Carvalho, and Bianca Zadrozny. “Geostatistical Learning: Challenges and Opportunities.” Frontiers in Applied Mathematics and Statistics 7 (July 1, 2021): 689393. https://doi.org/10.3389/fams.2021.689393.
Fotheringham, A. Stewart, Wenbai Yang, and Wei Kang. “Multiscale Geographically Weighted Regression (MGWR).” Annals of the American Association of Geographers 107, no. 6 (November 2, 2017): 1247–65. https://doi.org/10.1080/24694452.2017.1352480.
Openshaw, Stan. The modifiable areal unit problem. Geo Books, 1983.
Pearl, Judea. Causality. Cambridge University Press, 2000.
Sugasawa, Shonosuke, and Daisuke Murakami. “Spatially Clustered Regression.” Spatial Statistics 44 (August 2021): 100525. https://doi.org/10.1016/j.spasta.2021.100525.