pkb contents > spatial analysis | just under 996 words | updated 05/21/2017
1. Spatial Analysis
1.1. Quantifying space and place
- A note about geographical distance: unless very short, it differs from the Euclidean distance calculable from geographic coordinates alone.
- Coordinates
- Projections
1.2. Categories of spatial analysis
- Ch. 10: Intro to spatial point pattern analysis
- Summary statistics
- Assessments of randomness (number and location of points)
- 'G function
- Moran's I and Geary's I "measure the tendency of events to cluster or the extent to which points close together have similar values on average than those farther apart"; they are global autocorrelation statistics.
- Local Moran's I and Gi*
- Models of point processes for simulating points data
1.2.1. Clusters, hotspots, heatmaps
Although both produce a smooth-ish gradient visualization from points data (darker=more dense, lighter=less), a hotspot map is different from a heat map. For a heat map, the density surface is created using a point density or kernel density approach. In either case, the resulting map is highly subjective because it depends on your choice of (1) how many input rasters per output raster AKA density unit; (2) length of bandwidth AKA search radius; (3) how points are counted (raw count, inverse-distance weighting, etc.). Here's a script for pretty heatmapping in Python.
- POINT DENSITY: Set neighborhood size. Impute to each map raster the number or weighted number of points in its neighborhood, divided by the area of the neighborhood.
- KERNEL DENSITY: By means of a kernel density function, each point is replaced by a circular surface that is centered on the point; extends some distance (the search radius); takes its maximum value at the center, and reaches zero at the border. The area of the surface may sum to one or some other value determined by the weight of the point. Finally, the value of each raster is the sum of all partial surfaces that overlap it.
A hotspot map defines densities AND includes statistical tests for whether high densities (clusters) are nonrandom. Two primary methods: Gi* and KDE.
Is it possible that the Getis-Ord Gi* (G-i-star) statistic is a recipe for how points are counted (3, above) and that it can be calculated on a point or kernel density basis?
How does this relate to clustering algorithms used outside of geography?
Textbook on spatial points analysis:
- All these summary statistics assume stationarity, and are used to characterize a point process by comparing it with random Poisson process:
- F function: "cumulative distribution function of the empty space distance"
- G function: "cumulative distribution function of the nearest-neighbour distance for a typical point in the pattern"
- Ripley's K: "the expected number of other points of the process within a distance r of a typical point of the process ... weighted and renormalised empirical distribution functions of the pairwise distances"
- To go beyond this, you start thinking in terms of specific point processes which give rise to different kinds of clustering
- "Exploratory techniques for investigating localised features in a point pattern include LISA (Local Indicators of Spatial Association), nearest-neighbour cleaning, and data sharpening"
library(spdep)
library(spatstat)
# create ppp object
my.ppp <- ppp(x.coordinates, y.coordinates, x.range, y.range)
# use plot(as.ppp(my.ppp)) to truncate plot to specified range
plot(my.ppp)
# create a density map
plot(density(my.ppp))
# plot Ripley's K, indicates if clustering is nonrandom
plot(Kest(my.ppp))
1.2.2. Spatial dependence
1.3. Resources
1.4. Install notes
- Package available on CRAN or other repository
- Use a different CRAN mirror
- Use a different default repository, maybe not CRAN
- Package name spelled correctly
- Package available for your version of R
- Connectivity issues
1.4.2. spdep
- /usr/share/R/share/make/shlib.mk:6: recipe for target 'expm.so' failed
- ERROR: compilation failed for package ‘expm’ (dependency for spdep)
- installed with v. 3.3.2 (launch from command line)
1.4.3. spatstat
install.packages("spatstat")
fails because polyclip dependency fails because "unable to load shared object ... undefined symbol: __cxa_throw_bad_array_new_length"
conda install --channel https://conda.anaconda.org/jsignell r-spatstat
fails because dependencies in conflict with r-modelmetrics
- too risky to uninstall r-modelmetrics; depends on r-essentials and
conda remove r-modelmetrics --force
is too risky
- try exploring dependency conflict:
conda info r-spatstat
throws an error, so add r-spatstat channel to conda: conda config --append channels https://conda.anaconda.org/jsignell
- r-abind
- r-base 3.3.1*
- r-deldir >=0.0_21
- r-goftest
- r-matrix
- r-mgcv
- r-nlme
- r-polyclip >=1.5_0
- r-rpart
- r-tensor
conda info r-modelmetrics
:
- looks like it's a conflict between versions of r-base, no idea how to fix that
- install R through Synaptic; this version of R should open when RStudio is launched from the menu
- r-base == v.3.2.3, super out of date :/
- r-cran-spatstat
- VICTORY!!!
2. Sources
2.1. References
2.2. Read
2.3. Unread