2. Handling large datasets
Tobias Stephan
2024-08-24
Source:vignettes/B_datasets.Rmd
B_datasets.Rmd
This vignette teaches you how to handle large stress datasets and how to retrieve relative plate motions parameters from a set of plate motions.
Larger Data Sets
tectonicr also handles larger data sets. A subset of the World Stress Map data compilation (Heidbach et al. 2016) is included as an example data set and can be imported through:
data("san_andreas")
head(san_andreas)
#> Simple feature collection with 6 features and 9 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -118.84 ymin: 35.97 xmax: -112.58 ymax: 39.08
#> Geodetic CRS: WGS 84
#> id lat lon azi unc type depth quality regime
#> 1 wsm00892 38.14 -118.84 50 25 FMS 7 C S
#> 2 wsm00893 35.97 -114.71 54 25 FMS 5 C S
#> 3 wsm00894 37.93 -118.17 24 25 FMS 5 C S
#> 4 wsm00896 38.63 -118.21 41 25 FMS 17 C N
#> 5 wsm00897 39.08 -115.62 30 25 FMS 5 C N
#> 6 wsm00903 38.58 -112.58 27 25 FMS 7 C N
#> geometry
#> 1 POINT (-118.84 38.14)
#> 2 POINT (-114.71 35.97)
#> 3 POINT (-118.17 37.93)
#> 4 POINT (-118.21 38.63)
#> 5 POINT (-115.62 39.08)
#> 6 POINT (-112.58 38.58)
The full or an individually filtered world stress map dataset can be downloaded by
download_WSM2016()
Modeling the stress directions (wrt. to the geographic North pole) using the Pole of Oration (PoR) of the motion of North America relative to the Pacific Plate. We test the dataset against a right-laterally tangential displacement type.
data("nuvel1")
por <- subset(nuvel1, nuvel1$plate.rot == "na")
san_andreas.prd <- PoR_shmax(san_andreas, por, type = "right")
Combine the model results with the coordinates of the observed data
san_andreas.res <- data.frame(
sf::st_drop_geometry(san_andreas),
san_andreas.prd
)
Stress map
ggplot2::ggplot()
can be used to visualize the results.
The orientation of the axis can be displayed with the function
geom_spoke()
. The position argument
position = "center_spoke"
aligns the marker symbol at the
center of the point. The deviation can be color coded.
deviation_norm()
yields the normalized value of the
deviation, i.e. absolute values between 0 and
90.
Also included are the plate boundary geometries after Bird (2003):
data("plates") # load plate boundary data set
Alternatively, there is also the NUVEL1 plate boundary model by
DeMets et al. (1990) stored under
data("nuvel1_plates")
.
First we create the predicted trajectories of (more details in Article 3.):
trajectories <- eulerpole_loxodromes(por, 40, cw = FALSE)
Then we initialize the plot map
…
map <- ggplot() +
geom_sf(
data = plates,
color = "red",
lwd = 2,
alpha = .5
) +
scale_color_continuous(
type = "viridis",
limits = c(0, 90),
name = "|Deviation| in (\u00B0)",
breaks = seq(0, 90, 22.5)
) +
scale_alpha_discrete(name = "Quality rank", range = c(1, 0.4))
…and add the trajectories and data points:
map +
geom_sf(
data = trajectories,
lty = 2
) +
geom_spoke(
data = san_andreas.res,
aes(
x = lon,
y = lat,
angle = deg2rad(90 - azi),
color = deviation_norm(dev),
alpha = quality
),
radius = 1,
position = "center_spoke",
na.rm = TRUE
) +
coord_sf(
xlim = range(san_andreas$lon),
ylim = range(san_andreas$lat)
)
The map shows generally low deviation of the observed directions from the modeled stress direction using counter-clockwise 45 loxodromes.
The normalized test quantifies the fit between the modeled direction the observed stress direction considering the reported uncertainties of the measurement.
norm_chisq(
obs = san_andreas.res$azi.PoR,
prd = 135,
unc = san_andreas.res$unc
)
#> [1] 0.1543845
The value is 0.15, indicating a significantly good fit of the model. Thus, the traction of the transform plate boundary explain the stress direction of the area.
Variation of the Direction of the Maximum Horizontal Stress wrt. to the Distance to the Plate Boundary
The direction of the maximum horizontal stress correlates with plate motion direction at the plate boundary zone. Towards the plate interior, plate boundary forces become weaker and other stress sources will probably dominate.
To visualize the variation of the wrt. to the distance to the plate boundary, we need to transfer the direction of from the geographic reference system (i.e. azimuth is the deviation of a direction from geographic North pole) to the Pole of Rotation (PoR) reference system (i.e. azimuth is the deviation from the PoR).
The PoR coordinate reference system is the oblique transformation of the geographical coordinate system with the PoR coordinates being the the translation factors.
The azimuth in the PoR reference system is the angular difference between the azimuth in geographic reference system and the (initial) bearing of the great circle that passes through the data point and the PoR .
To calculate the distance to the plate boundary, both the plate boundary geometries and the data points (in geographical coordinates) will be transformed in to the PoR reference system. In the PoR system, the distance is the latitudinal or longitudinal difference between the data points and the inward/outward or tangential moving plate boundaries, respectively.
This is done with the function distance_from_pb()
, which
returns the angular distances.
plate_boundary <- subset(plates, plates$pair == "na-pa")
san_andreas.res$distance <-
distance_from_pb(
x = san_andreas,
PoR = por,
pb = plate_boundary,
tangential = TRUE
)
Finally, we visualize the direction wrt. to the distance to the plate boundary:
azi_plot <- ggplot(san_andreas.res, aes(x = distance, y = azi.PoR)) +
coord_cartesian(ylim = c(0, 180)) +
labs(x = "Distance from plate boundary (\u00B0)", y = "Azimuth in PoR (\u00B0)") +
geom_hline(yintercept = c(0, 45, 90, 135, 180), lty = 3) +
geom_pointrange(
aes(
ymin = azi.PoR - unc, ymax = azi.PoR + unc,
color = san_andreas$regime, alpha = san_andreas$quality
),
size = .25
) +
scale_y_continuous(
breaks = seq(-180, 360, 45),
sec.axis = sec_axis(
~.,
name = NULL,
breaks = c(0, 45, 90, 135, 180),
labels = c("Outward", "Tan (L)", "Inward", "Tan (R)", "Outward")
)
) +
scale_alpha_discrete(name = "Quality rank", range = c(1, 0.1)) +
scale_color_manual(name = "Tectonic regime", values = stress_colors(), breaks = names(stress_colors()))
print(azi_plot)
Adding a rolling statistics (e.g. weighted mean and 95% confidence interval) of the transformed azimuth:
san_andreas.res_roll <- san_andreas.res[order(san_andreas.res$distance), ]
san_andreas.res_roll$r_mean <- roll_circstats(
san_andreas.res_roll$azi.PoR,
w = 1 / san_andreas.res_roll$unc,
FUN = circular_mean, width = 51
)
san_andreas.res_roll$r_conf95 <- roll_confidence(
san_andreas.res_roll$azi.PoR,
w = 1 / san_andreas.res_roll$unc,
width = 51
)
azi_plot +
geom_step(
data = san_andreas.res_roll,
aes(distance, r_mean - r_conf95),
lty = 2
) +
geom_step(
data = san_andreas.res_roll,
aes(distance, r_mean + r_conf95),
lty = 2
) +
geom_step(
data = san_andreas.res_roll,
aes(distance, r_mean)
)
Close to the dextral plate boundary, the majority of the stress data have a strike-slip fault regime and are oriented around 135 wrt. to the PoR. Thus, the date are parallel to the predicted stress sourced by a right-lateral displaced plate boundary. Away from the plate boundary, the data becomes more noisy.
This azimuth (PoR) vs. distance plot also allows to identify whether a less known plate boundary represents a inward, outward, or tangential displaced boundary.
The relationship between the azimuth and the distance can be better visualized by using the deviation (normalized by the data precision) from the the predicted stress direction, i.e. the normalized :
# Rolling norm chisq:
san_andreas.res_roll$roll_nchisq <- roll_normchisq(
san_andreas.res_roll$azi.PoR,
san_andreas.res_roll$prd,
san_andreas.res_roll$unc,
width = 51
)
# plotting:
ggplot(san_andreas.res, aes(x = distance, y = nchisq)) +
coord_cartesian(ylim = c(0, 1)) +
labs(x = "Distance from plate boundary (\u00B0)", y = expression(Norm ~ chi^2)) +
geom_hline(yintercept = c(0.15, .33, .7), lty = 3) +
geom_point(aes(color = san_andreas$regime)) +
scale_y_continuous(sec.axis = sec_axis(
~.,
name = NULL,
breaks = c(.15 / 2, .33, .7 + 0.15),
labels = c("Good fit", "Random", "Systematic\nmisfit")
)) +
scale_color_manual(name = "Tectonic regime", values = stress_colors(), breaks = names(stress_colors())) +
geom_step(
data = san_andreas.res_roll,
aes(distance, roll_nchisq)
)
We can see that the data in fact starts to scatter notably beyond a distance of 3.8 and becomes random at 7 away from the plate boundary. Thus, the North American-Pacific plate boundary zone at the San Andreas Fault is approx. 4–7 (ca. 380–750 km) wide.
The normalized vs. distance plot allows to specify the width of the plate boundary zone.
R base plots for quick analysis
The data deviation map can also be build using base R’s plotting engine:
# Setup the colors for the deviation
cols <- tectonicr.colors(
deviation_norm(san_andreas.res$dev),
categorical = FALSE
)
# Setup the legend
col.legend <- data.frame(col = cols, val = names(cols)) |>
dplyr::mutate(val2 = gsub("\\(", "", val), val2 = gsub("\\[", "", val2)) |>
unique() |>
dplyr::arrange(val2)
# Initialize the plot
plot(
san_andreas$lon, san_andreas$lat,
cex = 0,
xlab = "PoR longitude", ylab = "PoR latitude",
asp = 1
)
# Plot the axis and colors
axes(
san_andreas$lon, san_andreas$lat, san_andreas$azi,
col = cols, add = TRUE
)
# Plot the plate boundary
plot(sf::st_geometry(plates), col = "red", lwd = 2, add = TRUE)
# Plot the trajectories
plot(sf::st_geometry(trajectories), add = TRUE, lty = 2)
# Create the legend
graphics::legend(
"bottomleft",
title = "|Deviation| in (\u00B0)",
inset = .05, cex = .75,
legend = col.legend$val, fill = col.legend$col
)
A quick analysis the results can be obtained
stress_analysis()
that returns a list. The transformed
coordinates and azimuths as well as the deviations can be viewed by:
results <- stress_analysis(san_andreas, por, "right", plate_boundary, plot = FALSE)
head(results$result)
#> azi.PoR prd dev nchisq cdist lat.PoR lon.PoR distance
#> 1 173.240166 135 38.24017 0.18053214 0.38311043 59.05548 -85.46364 -2.446542
#> 2 1.058195 135 -133.94180 2.21486507 0.51846479 60.50445 -78.16683 -2.751949
#> 3 147.609558 135 12.60956 0.01962975 0.04765752 59.38025 -84.55317 -2.705483
#> 4 163.607388 135 28.60739 0.10103489 0.22925429 59.73632 -85.74431 -3.162091
#> 5 152.233017 135 17.23302 0.03666381 0.08776909 61.68018 -84.31053 -4.988116
#> 6 150.611386 135 15.61139 0.03008832 0.07242085 63.39714 -80.60932 -6.358818
Statistical parameters describing the distribution of the transformed azimuths can be displayed by
results$stats
#> mean
#> mean 142.4029884
#> sd 20.3016715
#> var 0.2220562
#> median 137.5578600
#> quasi_median 136.1189938
#> skewness -0.4526369
#> kurtosis 1.4620392
#> conf95 3.7636758
#> dispersion 0.1239432
#> norm_chisq 0.1543845
Statistical test results are shown by
results$test
#> $C
#> [1] 0.7521136
#>
#> $statistic
#> [1] 21.45832
#>
#> $p.value
#> [1] 9.219372e-100
… and the associated plots can be displayed by setting
plot = TRUE
:
stress_analysis(san_andreas, por, "right", plate_boundary, plot = TRUE)
References
Bird, Peter. 2003. “An Updated Digital Model of Plate Boundaries” Geochemistry, Geophysics, Geosystems 4 (3). doi: 10.1029/2001gc000252.
DeMets, C., R. G. Gordon, D. F. Argus, and S. Stein. 1990. “Current Plate Motions” Geophysical Journal International 101 (2): 425–78. doi: 10.1111/j.1365-246x.1990.tb06579.x.
Heidbach, Oliver, Mojtaba Rajabi, Karsten Reiter, Moritz Ziegler, and WSM Team. 2016. “World Stress Map Database Release 2016. V. 1.1.” GFZ Data Services. doi: 10.5880/WSM.2016.001.