# key setups
library(sqldf) # to use sql syntax with data frames
library(knitr) # knitr for kable tables
library(kableExtra) # pretty tables
library(sf) # simple features (GIS)
library(leaflet) # nice maps
library(leaflet.minicharts)
library(tools) # md5sum
library(stringi)
library(tidyverse)
library(magrittr)
library(htmltools)
library(gdalUtilities)

# captions
library(captioner)
table_nums <- captioner(prefix = "Table")
figure_nums <- captioner(prefix = "Figure")

# for having DIV tags
knitr::knit_hooks$set(class = function(before, options, envir) {
    if (before) {
        sprintf("<div class = '%s'>", options$class)
    } else {
        "</div>"
    }
})

knitr::opts_chunk$set(warning = FALSE, message = FALSE)

# path to this file name
# if (!interactive()) {
fnamepath <- current_input(dir = TRUE)
# }

# db connection
source("../code/dbconnect.R")
U <- uwed <- connectdb(dbname = "uwed", host = "doyenne.csde.washington.edu")

1 Introduction

This section of the UW Elections Database presents a methodology and results for estimating the racial summary characteristics of each voting precinct in the State of Washington. The source of the racial composition data is the US Census American Community Survey 5-year rolling surveys. Because the precinct boundaries do not match the census tracto or block group boundaries, it is necessary to use geographic information system analysis to estimate the counts and proportions of different race groups per voting precinct.

2 Methods

2.1 Data

Data were obtained from the US Census Bureau and the office of the Washington Secretary of State.

Data were combined and ultimately stored in a PostgreSQL/PostGIS database housed at the University of Washington’s Center for Studies in Demography and Ecology.

2.1.1 Elections data

[Put in Inhwan’s descriptions.]

2.1.2 Precinct data

Precinct shapefiles (GIS data) were obtained from the WA Secretary of State web site and imported to the PostGIS database. Data were stored in partitioned tables stratified by election year. The identifier for precincts is the combination of county, precinct name, and election year.

2.1.3 Census data

Census Citizen Voting Age Population by Race and Ethnicity (CVAP) data were downloaded from the US Census web site. Data for years 2000-2008 were represented with year 2000 decennial census. Data from years 2009-present were represented as the 5-year ACS data terminating at the named year. Data were obtained at the smallest geographic unit available with racial information, which was either the tract (2009-2012) or the block group (all other years).

Counts of persons \(\ge\) 18 years of age residing in the census unit were obtained, representing the following racial categories:

  • total
  • white
  • black
  • aian = American Indian/Alaska Native
  • asian
  • nhpi = Native Hawai’ian/Pacific Islander
  • other

It should be noted that the CVAP data for 2000 did not have values for American Indian/Alaska Native or Native Hawai’ian/Pacific Islander racial groups.

Note we do have Hispanic and non-Hispanic categories as well … for later development we will have these data available using those strata.

Spatial and tabular data were stored in partitioned tables stratified by census year.

2.2 Analysis

Elections results files were merged into a single table and imported to the PostGIS database. Records are indexed by county, precinct name, election date, and election race identifier.

Precinct and census data were conflated by the following process:

  1. Precinct polygons and census unit polygons were intersected, providing slivers. Original census unit area was saved as an attribute.
  2. An area weighting factor was calculated as \(W = \frac{area_{sliver}}{area_{orig}}\)
  3. Each census racial count was multiplied by the area weighting factor to obtain an estimate of the count of each racial group within each sliver
  4. Racial group counts were aggregated back to the precinct to provide an estimate of the count of each different racial group in the precinct.

3 Results

3.1 Data

# command to dump the data
cmd <- "cmd /c H:\\postgresql\\13\\bin\\psql -h doyenne -c \"\\copy precinct_census_cvap_agg_votes_nogeog to 'R:/Project/UWED/results/precinct_census_cvap_agg_votes_nogeog.csv' with csv header \" uwed "
system(cmd)
## [1] 0
csv <- "../results/precinct_census_cvap_agg_votes_nogeog.csv"
md5csv <- md5sum(csv)

# dump the geog data
Sys.setenv(PATH = "PATH=C:\\PROGRA~1\\QGIS32~1.0\\apps\\qt5\\bin;C:\\PROGRA~1\\QGIS32~1.0\\apps\\Python39\\Scripts;C:\\PROGRA~1\\QGIS32~1.0\\bin;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\system32\\WBem")
cmd <- 'cmd /c C:\\PROGRA~1\\QGIS32~1.0\\bin\\ogr2ogr.exe -f gpkg R:/Project/UWED/results/precinct_census_cvap.gpkg PG:"host=doyenne dbname=uwed" precinct_census_cvap_agg_votes -overwrite'
system(cmd)
## [1] 0
gpkg <- "../results/precinct_census_cvap.gpkg"
md5gpkg <- md5sum(gpkg)

Tabular data are available as a CSV file: precinct_census_cvap_agg_votes_nogeog.csv, MD5sum = 9b50d44ae5431c16666130fb614bee8d. The format is a hybrid of both “long” and “wide” formats. There is a record for each year \(\times\) county \(\times\) precinct \(\times\) election race (i.e., this defines the “long” format); for each year \(\times\) county \(\times\) precinct combination, there are columns for the racial group count estimates (i.e., the “wide” format). This means that the racial group counts are repeated across records that have the same values for year \(\times\) county \(\times\) precinct.

A GPKG file precinct_census_cvap.gpkg is provided that contains one record per year \(\times\) county \(\times\) precinct. Because the relationship between a given year \(\times\) county \(\times\) precinct record and election races is one-to-many, those columns that represent multiple measures are represented as comma-delimited strings. This is probably unhelpful but there is a trade-off between storing multiple copies of the same geometry versus a single geography and storing “long” records for each election race. A proper approach for analytics for a particular race would be to strip the elections results columns and only keep the year \(\times\) county \(\times\) precinct and geometry columns, and then to perform filters on the long tabular data first, finally joining back onto the geometry record.

3.2 Precincts map

See Figure 1 for a map of 2020 precincts with county, precinct name, and total population as a popup and pie charts indicating the proportion of different racial groups.

# get PostGIS data

# get data, transform to 4326, get XY
sql <- "with a as (select *,
                  st_transform(geom_2927, 4326) as geometry
           from precinct_census_cvap_agg_votes
           where electionyear = 2020),
     b as (select *, st_centroid(geometry) as geom_cen from a),
     c as (select electionyear,
                  countycode,
                  precinctname,
                  race_total,
                  race_white,
                  race_black,
                  race_aian,
                  race_asian,
                  race_nhpi,
                  race_other,
                  electiondate,
                  raceid,
                  ballotname,
                  ballotnamepartyname,
                  votes,
                  st_x(geom_cen)      as lng,
                  st_y(geom_cen)      as lat,
                  st_astext(geometry) as geometry
           from b)
select *
from c;"

v <- dbGetQuery(conn = U, statement = sql)

# turn into an SF
v_sf <- v %>%
    st_as_sf(wkt = "geometry") %>%
    st_set_crs(4326)

# a subset for testing
# V <- v_sf
# v_sf %<>% head()
# a leaflet
# colors for the pie chart
colors <- c("white", "black", "red", "yellow", "brown", "green")

# make labels for the popup
labels <- v_sf %>%
    mutate("county: xCCx<br />precinct: xPNx<br />pop total: xPTx" %>%
        str_replace_all("xCCx", countycode) %>%
        str_replace_all("xPNx", str_to_title(precinctname)) %>%
        str_replace_all("xPTx", as.character(race_total))) %>%
    pull()

# the leaflet
m <- leaflet(data = v_sf) %>%
    addTiles() %>%
    addPolygons(weight = 3, popup = labels)

# the charts
m %<>% addMinicharts(
    v_sf$lng, v_sf$lat,
    type = "pie",
    chartdata = select(v_sf, race_white, race_black, race_aian, race_asian, race_nhpi, race_other) %>% st_drop_geometry(),
    colorPalette = colors,
    width = 60 * sqrt(v_sf$race_total) / sqrt(max(v_sf$race_total)), transitionTime = 0
)
m