# Basic {sf} functionality for analyzing and plotting geospatial data

## 2020/06/08

This post has some information on using the {sf} (or simple features) R package for analyzing and plotting geospatial data. It is mostly for myself to reference in the future. A lot of this is from the Geocomputation With R book.

Load the {sf} and {tidyverse} packages:

``````library(sf)
library(tidyverse)``````

## Creating test data

Create some test data. Here are five longitudes and latitudes. Two are special

• the fourth pair correspond to Knoxville, TN, USA
• the fifth pair corresponds to Lansing, MI, USA

The others are random (and are not within the United States).

``points <- data.frame(lon = c(-65,-60, -55, -83.92, -84.55), lat = c(50, 45, 40, 35.96, 42.73))``

The following function creates an sf object (telling R which columns within `points` are the longitude and latitude).

``points <- st_as_sf(points, coords = c("lon", "lat"))``

One thing I’m uncertain about is how {sf} knows that “lon” and “lat” have special meanings; what if, instead, I had named these something else? Would {sf} know how to prepare to plot them in the correct way? Let me know if you know the answer.

## Using sf objects as data frames

Attributes can be added to the sf object. The book suggests thinking of sf objects as containing spatial (e.g. lon, lat) data, and attributes; in addition to/because of this, sf objects can be treated as data frames (in many/most cases).

Here, I add names to the five points, using indexing as if the object was a data frame.

``points\$name <- c("random_place_c", "random_place_b", "random_place_c", "knoxville", "lansing")``

It’s possible to drop the spatial/geometric part (class) of the object, which can be useful in some cases; the book mentions that the spatial part is “sticky” and can be difficult to ignore, even when you use the object as a data frame.

``st_drop_geometry(points)``
``````##             name
## 1 random_place_c
## 2 random_place_b
## 3 random_place_c
## 4      knoxville
## 5        lansing``````

Here, it is easy to use a dplyr verb to remove the random places:

``````points %>%
filter(!str_detect(name, "random_place"))``````
``````## Simple feature collection with 2 features and 1 field
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -84.55 ymin: 35.96 xmax: -83.92 ymax: 42.73
## CRS:            NA
##               geometry      name
## 1 POINT (-83.92 35.96) knoxville
## 2 POINT (-84.55 42.73)   lansing``````

## Spatial operations

This step was the big leap for me, but it also illustrates why it can be useful to treat spatial data in a special way.

U.S. states can be represented with a shape representing each state’s boundaries.

There are a number of R packages that provide these shapes in sf format. I am using USAboundaries (mostly because I found it first).

``````library(USAboundaries)

USA <- us_boundaries()``````

I’m not fully grasping the details, but coordinate reference systems (CRSs) link spatial coordinates (e.g., lon, lat) to the three-dimensional shape of the Earth. There are short codes for these; a common CRS is 4269 (I have more to learn about this). This is important for the next step, the spatial operation.

``USA <- st_set_crs(USA, "4269")``

`st_within()` provides information about relationships between pairs of sf objects. There are many spatial operations in addition to `st_within()`, including `st_intersects()` and `st_overlaps()`.

``````points_within_USA <- st_within(points, USA)

points_within_USA``````
``````## Sparse geometry binary predicate list of length 5, where the predicate was `within'
##  1: (empty)
##  2: (empty)
##  3: (empty)
##  4: 23
##  5: 18``````

This can be turned into an integer vector (or another kind of vector, like a logical vector).

``as.integer(points_within_USA)``
``##  NA NA NA 23 18``

This can be used to index `USA` to see which state the points in `dd` are within:

``````USA[as.integer(points_within_USA), ] %>%
filter(!is.na(name))``````
``````## Simple feature collection with 2 features and 12 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -90.41814 ymin: 34.98299 xmax: -81.6469 ymax: 48.21065
## CRS:            NA
##   statefp  statens    affgeoid geoid stusps      name lsad        aland
## 1      47 01325873 0400000US47    47     TN Tennessee   00 106797662267
## 2      26 01779789 0400000US26    26     MI  Michigan   00 146455251245
##         awater state_name state_abbr jurisdiction_type
## 1   2355188876  Tennessee         TN             state
## 2 104031574060   Michigan         MI             state
##                         geometry
## 1 MULTIPOLYGON (((-90.3007 35...
## 2 MULTIPOLYGON (((-84.61622 4...``````

It may be easier to specify the column with the name of the state, too

``````USA[as.integer(points_within_USA), "name"] %>%
filter(!is.na(name))``````
``````## Simple feature collection with 2 features and 1 field
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -90.41814 ymin: 34.98299 xmax: -81.6469 ymax: 48.21065
## CRS:            NA
##        name                       geometry
## 1 Tennessee MULTIPOLYGON (((-90.3007 35...
## 2  Michigan MULTIPOLYGON (((-84.61622 4...``````

## Plotting

sf objects can be plotted with ggplot2 using the sf geom; here is a plot of the state boundaries and the five points.

I’ll plot only the continental U.S.

``````ggplot() +
geom_sf(data = filter(USA, name != "Hawaii" & name != "Alaska")) +
geom_sf(data = points)`````` The projection (and theme) of the map can certainly be improved; this is more to demonstrate that {sf} objects can be plotted with {ggplot2}.

Many packages make it easier to plot spatial data of interest. Here is a plot of all of the school districts in Tennessee.

``````library(leaidr)
library(mapproj)

# only needed the first time
# lea_get(path = "./leaid_sh")

# 47 is the FIPS code for TN
tn <- lea_prep(path = "./leaid_sh", fips = "47")``````
``````## OGR data source with driver: ESRI Shapefile
## Source: "/Users/jrosenb8/utk-homepage/content/post/leaid_sh/schooldistrict_sy1819_tl19.shp", layer: "schooldistrict_sy1819_tl19"
## with 13315 features
## It has 18 fields``````
``````tn_df <- ggplot2::fortify(tn)

tn_map <-
ggplot() +
geom_path(data = tn,
aes(x = long, y = lat, group = group),
color = "gray", size = .2) +
theme_void()

tn_map_projected <-
tn_map +
coord_map()

print(tn_map_projected)`````` (h/t Isabella Velásquez for making this so seamless (even with fairly ginormous files) to do with the {leaidr} package).

## fin

There is a lot more (for me) to learn; this is just a start.

Resources: