1 Theoretical example
This section allows reproducing the theoretical example presented in the main text of Marcon and Puech (2023). Analyses rely on the dbmss (Marcon et al. 2015) package for R (R Core Team 2023).
1.1 Dataset simulation
We build a point pattern made of cases (the points of interest) and controls (the background distribution of points).
Cases are a Matérn (Matérn 1960) point pattern with \(\kappa\) (expected) clusters of \(\mu\) (expected) points in a circle of radius scale. Controls are a Poisson point pattern whose density \(\lambda\) decreases exponentially along the y-axis (we will call “north” the higher y values).
library(dplyr)
library(dbmss)
# Simulation of cases (clusters)
rMatClust(kappa = 10, scale = 0.05, mu = 10) %>%
as.wmppp ->
CASES
CASES$marks$PointType <- "Cases"
# Number of points
CASES$n
## [1] 97
# Simulation of controls (random distribution)
rpoispp(function(x, y) {1000 * exp(-2 * y)}) %>%
as.wmppp ->
CONTROLS
CONTROLS$marks$PointType <-"Controls"
# Number of points
CONTROLS$n
## [1] 435
# Mixed patterns (cases and controls)
ALL <- superimpose(CASES, CONTROLS)
autoplot(ALL)
1.2 Calculate and plot M Cases
# Fix the number of simulations and the level of risk
NumberOfSimulations <- 1000
Alpha <- .01
# Calculate and plot M Cases
ALL %>%
MEnvelope(
ReferenceType="Cases",
SimulationType = "RandomLocation",
NumberOfSimulations = NumberOfSimulations,
Alpha = Alpha,
Global = TRUE
) ->
M_env_cases
autoplot(M_env_cases)
The plot shows a clear relative concentration of cases.
1.3 Map M results
To plot the individual values of M around each case, a distance must be chosen. Then, the function must be computed at this distance with individual values. Finally, a map is produced by smoothing the individual values and plotted.
# Choose the distance to plot
Distance <- 0.1
# Calculate the M values to plot
ALL %>%
Mhat(
r = c(0, Distance),
ReferenceType = "Cases",
NeighborType = "Cases",
Individual = TRUE
) ->
M_TheoEx
# Map resolution
resolution <- 512
# Create a map by smoothing the local values of M
M_TheoEx_map <- Smooth(
ALL,
fvind = M_TheoEx,
distance = Distance,
Nbx = resolution, Nby = resolution
)
# Plot the point pattern with values of M(Distance)
plot(M_TheoEx_map, main = "")
# Add the cases to the map
points(
ALL[ALL$marks$PointType == "Cases"],
pch = 20, col = "green"
)
# Add contour lines
contour(M_TheoEx_map, add = TRUE)
We can see that cases are concentrated almost everywhere (local M value above 1) because we chose a Matérn point pattern.
The areas with the higher relative concentration are located in the north of the map because the controls are less dense there.
1.4 Compare with the density of cases
The density of cases is plotted. High densities are not similar to high relative concentrations in this example because the control points are not homogeneously distributed.
plot(density(CASES), main = "")
points(
ALL[ALL$marks$PointType == "Cases"],
pch = 20, col = "green"
)
2 Suzanne Lenglen Park
2.1 Data
Our data is extracted from “Paris open data” 1.
2.1.1 Data wrangling
Data are stored in trees_2021.zip
which contains two GeoJSON files:
- trees_2021 stores all trees of the city of Paris in 2021.
- trees_logged contains all trees logged after 2021.
They must be read. Data are projected into the Lambert 93 datum so that coordinates are in meters.
unzip("data/trees_2021.zip", exdir = "data")
library("sf")
read_sf("data/trees_2021.geojson") %>%
st_transform(crs = 2154) ->
trees_all_raw
read_sf("data/trees_logged.geojson") %>%
st_transform(crs = 2154) ->
trees_logged_raw
2.1.1.1 All trees
The first dataset contains all trees in Paris in 2021, including those to be cut.
Trees from the Suzanne Lenglen park are selected. Columns of interest are:
ID
: a numeric unique identifier for each tree.Species_name
: the scientific name of the tree species, i.e. Genus species.Status
: Alive.Genus
.Species
.French_species_name
: vernacular name.Circumference
: in cm.
library("dplyr")
trees_all_raw %>%
# Filter Suzanne Lenglen park
filter(
adresse == "PARC OMNISPORT SUZANNE LENGLEN / 7 BOULEVARD DES FRERES VOISIN"
) %>%
# Create a field with the species name
mutate(Species_name = as.factor(paste(genre, espece))) %>%
# Create a field with the status
mutate(Status = "Alive") %>%
# Genus and Species fields
mutate(Genus = as.factor(genre)) %>%
mutate(Species = as.factor(espece)) %>%
# Rename and finally select columns
rename(
ID = idbase,
French_species_name = libellefrancais,
Circumference = circonferenceencm
) %>%
select(
ID,
Species_name,
Status,
Genus,
Species,
French_species_name,
Circumference) ->
trees_all
# Number of trees
trees_all %>% nrow()
## [1] 1472
We have 1472 trees in the park.
2.1.1.2 Logged trees
Logged trees are in the second dataset.
Their status is “Logged”.
An extra field, Logging_reason
contains the motivation to cut them off (in French).
Circumference
is absent.
# Tree description
trees_logged_raw %>%
# Filter Suzanne Lenglen park
filter(
adresse == "PARC OMNISPORT SUZANNE LENGLEN / 7 BOULEVARD DES FRERES VOISIN"
) %>%
# Exclude unidentified trees
filter(
!is.na(especearbreprecedent),
!is.na(libellefrancaisarbreprecedent),
!is.na(genrearbreprecedent)
) %>%
filter(libellefrancaisarbreprecedent != "Non spécifié") %>%
filter(especearbreprecedent != "n. sp.") %>%
# Create a field with the species name
mutate(
Species_name = as.factor(paste(genrearbreprecedent, especearbreprecedent))
) %>%
# Create a numeric ID
mutate(ID = as.integer(idbase)) %>%
# Create a field with the status
mutate(Status = "Logged") %>%
# Genus and Species fields
mutate(Genus = as.factor(genrearbreprecedent)) %>%
mutate(Species = as.factor(especearbreprecedent)) %>%
# Reason for logging (in French)
mutate(Logging_reason = motifabattagearbreprecedent) %>%
# Rename and finally select columns
rename(
French_species_name = libellefrancaisarbreprecedent
) %>%
select(
ID,
Species_name,
Status,
Genus,
Species,
Logging_reason,
French_species_name
) ->
trees_logged
# Number of trees
trees_logged %>% nrow()
## [1] 48
48 among the 1472 trees of the park were logged.
2.1.1.3 Merge
The two datasets are merged here.
The logged trees must be removed from the first one.
Circumference
is removed because it is missing from the logged trees dataset.
# All trees
trees_all %>%
# Delete the logged trees
filter(!(ID %in% trees_logged$ID)) %>%
# Delete the circumference that is absent in trees_logged
mutate(Circumference = NULL) %>%
# Bind the logged trees
bind_rows(trees_logged) ->
trees_no_circumference
Circumferences of all trees, including logged ones, are in tree_all
from where they can be recovered.
# Prepare a tibble with circumferences
trees_all %>%
select(ID, Circumference) %>%
# inner_join.sf refuses sf objects
st_set_geometry(NULL) ->
Circumferences
# Add the Circumference of trees
trees_no_circumference %>%
inner_join(Circumferences, by = "ID") ->
trees
2.1.1.4 Shorter logging reasons
Logging reasons can be:
- Decaying: the tree’s condition is not healthy enough to keep it safely in a public park.
- infested: the tree is a maple affected by the (contagious) sooty bark disease, caused by the fungus Cryptostroma corticale.
library("stringr")
trees$Logging_reason[is.na(trees$Logging_reason)] <- ""
trees$Logging_reason %>%
str_replace("Arbre.*", "Decaying") %>%
str_replace("Foyer.*", "Infested") ->
trees$Logging_reason
2.1.1.5 Factors
Several fields are converted to factors for efficiency.
trees$Logging_reason <- as.factor(trees$Logging_reason)
trees$Status <- as.factor(trees$Status)
trees$French_species_name <- as.factor(trees$French_species_name)
2.1.2 Point patterns
dbmms uses weighted, marked, planar point patterns (wmppp).
A wmppp named trees_infested
is built. Point marks are their basal area (as weight) and either their logging reason or their genus if they are alive.
library("dbmss")
trees %>%
# Weight is the basal area
mutate(PointWeight = Circumference^2 / 4 / pi) %>%
mutate(
PointType = ifelse(
Logging_reason == "",
as.character(Genus),
as.character(Logging_reason)
)
) %>%
# Add X and Y
bind_cols(st_coordinates(trees)) %>%
wmppp(
window = as.owin(st_bbox(trees)),
unitname = c("meter", "meters")
) ->
trees_infested
We also need a point pattern to describe the park before logging, as a reference.
trees_all %>%
# Weight is the basal area
mutate(PointWeight = Circumference^2 / 4 / pi) %>%
# Genus is the point type
rename(PointType = Genus) %>%
# Add X and Y
bind_cols(st_coordinates(trees_all)) %>%
wmppp(
window = as.owin(st_bbox(trees_all)),
unitname = c("meter", "meters")
) ->
trees_2021
autoplot(trees_2021) +
ggplot2::labs(size="Basal area", color="Genus")
The map shows the tree genera. Maples (Acer spp.) are the most abundant trees in the park.
2.2 Spatial concentration of maple trees
The M statistic is computed to detect the spatial concentration of maple trees before logging.
Distance <- 15
trees_2021 %>%
MEnvelope(
r = 0:(10 * Distance) / 5,
ReferenceType = "Acer",
NeighborType = "Acer",
NumberOfSimulations = NumberOfSimulations
) ->
M_Acer
autoplot(M_Acer)
2.2.1 Map
To map it, individual values must be calculated at the chosen distance, that is 15 meters.
trees_2021 %>%
Mhat(
r = c(0, Distance),
ReferenceType = "Acer",
NeighborType = "Acer",
Individual = TRUE
) ->
M_ind_Acer
A map is produced by smoothing the individual values. To respect the geometry of the map, the ratio of rows to columns of the image to produce is first calculated.
# Window ratio
ratio <- with(
trees_infested$window,
{
(yrange[2] - yrange[1]) / (xrange[2] - xrange[1])
}
)
# Smooth the values
trees_2021 %>%
Smooth(
fvind = M_ind_Acer,
distance = Distance,
Nbx = resolution, Nby = resolution * ratio
) ->
map_acer
plot(map_acer, main = "")
# Add contour lines
contour(map_acer, nlevels = 10, add = TRUE)
# Add infested trees
points(
trees_infested[trees_infested$marks$PointType == "Infested"],
pch = 20
)
# And decaying trees
points(
trees_infested[trees_infested$marks$PointType == "Decaying"],
pch = 4,
col = "green"
)
Logged trees are shown on the map: infested trees are back points and decaying trees are green crosses.
2.2.2 Alternative distances
The 15-meter distance has been chosen because it is a plausible distance of contagion between trees. A shorter distance, say 6 meters, would be justified as a peak of M(distance) but neighbors are scarse at this scale, leading to less statistical power. At greater distances, such as 30m, concentration is still significant but neighborhoods overlap more, leading to less clear results.
Both alternative distances are tested here.
The map made with 6-meter-radius neighborhoods is similar to the previous one.
30-meter-radius neighborhoods are shown below:
Increasing the distance blurs the information where high and low values overlap and cancel out.
In conclusion, the appropriate distance should be chosen according to the knowledge of the studied process.
2.3 Contagion and spatial structure
To conclude this study, we show that infested trees are located in areas of high concentration of maples.
2.3.1 Location of infested maple trees
# Window ratio
ratio <- with(
trees_infested$window,
{
(yrange[2] - yrange[1]) / (xrange[2] - xrange[1])
}
)
# Smooth the values
trees_2021 %>%
Smooth(
fvind = M_ind_Acer,
distance = Distance,
Nbx = resolution, Nby = resolution * ratio
) ->
map_acer
plot(map_acer, main = "")
# Add contour lines
contour(map_acer, nlevels = 10, add = TRUE)
# Show all maple trees
trees_acer <- trees_2021[trees_2021$marks$PointType == "Acer"]
# Add infested trees
points(
trees_infested[trees_infested$marks$PointType == "Infested"],
pch = 20
)
# And decaying trees
points(
trees_infested[trees_infested$marks$PointType == "Decaying"],
pch = 4, col = "green"
)
Infested trees are present in the areas where maples are concentrated.
2.3.2 Concentration of maples around infested trees
To test the intertype concentration between sane and infested maple trees, the following \(M\)-intertype is computed: we evaluate whether the proportion of maples trees higher in the close neighborhood of infested maples than elsewhere. If so, results give support to the existence of a contagious disease (Hantsch et al. 2014).
trees_infested %>%
MEnvelope(
r = 0:(10 * Distance) / 5,
ReferenceType = "Infested",
NeighborType = "Acer",
NumberOfSimulations = NumberOfSimulations
) ->
M_infested_Acer
autoplot(M_infested_Acer)
The M plot detects a significant concentration of maples around infested maples.
2.3.3 Map
The map is produced.
trees_infested %>%
Mhat(
ReferenceType = "Infested",
NeighborType = "Acer",
Individual = TRUE
) ->
M_ind_infested_Acer
# Smooth the values
trees_infested %>%
Smooth(
fvind = M_ind_infested_Acer,
distance = Distance,
Nbx = resolution, Nby = resolution * ratio
) ->
map_infested_acer
plot(map_infested_acer, main = "")
# Add contour lines
contour(map_infested_acer, nlevels = 5, add = TRUE)
# Add infested trees
points(
trees_infested[trees_infested$marks$PointType == "Infested"],
pch = 20
)
# And decaying trees
points(
trees_infested[trees_infested$marks$PointType == "Decaying"],
pch = 4,
col = "green"
)
The map shows that the local concentration of maple around infested trees is generally high (warm colors around points).