Appendix

Eric Marcon

Florence Puech

05 December 2023

1 Theoretical example

This section allows reproducing the theoretical example presented in the main text of Marcon and Puech (2023). Analyses rely on the dbmss (Marcon et al. 2015) package for R (R Core Team 2023).

1.1 Dataset simulation

We build a point pattern made of cases (the points of interest) and controls (the background distribution of points).

Cases are a Matérn (Matérn 1960) point pattern with \(\kappa\) (expected) clusters of \(\mu\) (expected) points in a circle of radius scale. Controls are a Poisson point pattern whose density \(\lambda\) decreases exponentially along the y-axis (we will call “north” the higher y values).

library(dplyr)
library(dbmss)
# Simulation of cases (clusters)
rMatClust(kappa = 10, scale = 0.05, mu = 10) %>% 
  as.wmppp -> 
  CASES
CASES$marks$PointType <- "Cases"
# Number of points
CASES$n
## [1] 97
# Simulation of controls (random distribution)
rpoispp(function(x, y) {1000 * exp(-2 * y)}) %>% 
  as.wmppp ->
  CONTROLS
CONTROLS$marks$PointType <-"Controls"
# Number of points
CONTROLS$n
## [1] 435
# Mixed patterns (cases and controls)
ALL <- superimpose(CASES, CONTROLS)
autoplot(ALL)

1.2 Calculate and plot M Cases

# Fix the number of simulations and the level of risk
NumberOfSimulations <- 1000
Alpha <- .01

# Calculate and plot M Cases
ALL %>% 
  MEnvelope(
    ReferenceType="Cases", 
    SimulationType = "RandomLocation",
    NumberOfSimulations = NumberOfSimulations,
    Alpha = Alpha,
    Global = TRUE
  ) ->
  M_env_cases

autoplot(M_env_cases)

The plot shows a clear relative concentration of cases.

1.3 Map M results

To plot the individual values of M around each case, a distance must be chosen. Then, the function must be computed at this distance with individual values. Finally, a map is produced by smoothing the individual values and plotted.

# Choose the distance to plot
Distance <- 0.1

# Calculate the M values to plot
ALL %>% 
  Mhat(
    r = c(0, Distance),
    ReferenceType = "Cases",
    NeighborType = "Cases", 
    Individual = TRUE
  ) -> 
  M_TheoEx

# Map resolution
resolution <- 512

# Create a map by smoothing the local values of M
M_TheoEx_map <- Smooth(
  ALL, 
  fvind = M_TheoEx, 
  distance = Distance,
  Nbx = resolution, Nby = resolution
)

# Plot the point pattern with values of M(Distance)
plot(M_TheoEx_map, main = "")
# Add the cases to the map
points(
  ALL[ALL$marks$PointType == "Cases"],
  pch = 20, col = "green"
)
# Add contour lines
contour(M_TheoEx_map, add = TRUE)

We can see that cases are concentrated almost everywhere (local M value above 1) because we chose a Matérn point pattern.

The areas with the higher relative concentration are located in the north of the map because the controls are less dense there.

1.4 Compare with the density of cases

The density of cases is plotted. High densities are not similar to high relative concentrations in this example because the control points are not homogeneously distributed.

plot(density(CASES), main = "")
points(
  ALL[ALL$marks$PointType == "Cases"],
  pch = 20, col = "green"
)

2 Suzanne Lenglen Park

2.1 Data

Our data is extracted from “Paris open data” 1.

2.1.1 Data wrangling

Data are stored in trees_2021.zip which contains two GeoJSON files:

  • trees_2021 stores all trees of the city of Paris in 2021.
  • trees_logged contains all trees logged after 2021.

They must be read. Data are projected into the Lambert 93 datum so that coordinates are in meters.

unzip("data/trees_2021.zip", exdir = "data")
library("sf")
read_sf("data/trees_2021.geojson") %>% 
  st_transform(crs = 2154) -> 
  trees_all_raw
read_sf("data/trees_logged.geojson") %>% 
  st_transform(crs = 2154) -> 
  trees_logged_raw

2.1.1.1 All trees

The first dataset contains all trees in Paris in 2021, including those to be cut.

Trees from the Suzanne Lenglen park are selected. Columns of interest are:

  • ID: a numeric unique identifier for each tree.
  • Species_name: the scientific name of the tree species, i.e. Genus species.
  • Status: Alive.
  • Genus.
  • Species.
  • French_species_name: vernacular name.
  • Circumference: in cm.

library("dplyr")
trees_all_raw %>% 
  # Filter Suzanne Lenglen park
  filter(
    adresse == "PARC OMNISPORT SUZANNE LENGLEN / 7 BOULEVARD DES FRERES VOISIN"
  ) %>% 
  # Create a field with the species name
  mutate(Species_name = as.factor(paste(genre, espece))) %>%
  # Create a field with the status
  mutate(Status = "Alive") %>%
  # Genus and Species fields
  mutate(Genus = as.factor(genre)) %>% 
  mutate(Species = as.factor(espece)) %>% 
  # Rename and finally select columns
  rename(
    ID = idbase,
    French_species_name = libellefrancais,
    Circumference = circonferenceencm
  ) %>% 
  select(
    ID, 
    Species_name, 
    Status, 
    Genus, 
    Species, 
    French_species_name,
    Circumference) -> 
  trees_all

# Number of trees
trees_all %>% nrow()
## [1] 1472

We have 1472 trees in the park.

2.1.1.2 Logged trees

Logged trees are in the second dataset.

Their status is “Logged”. An extra field, Logging_reason contains the motivation to cut them off (in French). Circumference is absent.

# Tree description
trees_logged_raw %>%
  # Filter Suzanne Lenglen park
  filter(
    adresse == "PARC OMNISPORT SUZANNE LENGLEN / 7 BOULEVARD DES FRERES VOISIN"
  ) %>% 
  # Exclude unidentified trees
  filter(
    !is.na(especearbreprecedent), 
    !is.na(libellefrancaisarbreprecedent), 
    !is.na(genrearbreprecedent)
  ) %>%
  filter(libellefrancaisarbreprecedent != "Non spécifié") %>%
  filter(especearbreprecedent != "n. sp.") %>% 
  # Create a field with the species name
  mutate(
    Species_name = as.factor(paste(genrearbreprecedent, especearbreprecedent))
  ) %>%
  # Create a numeric ID
  mutate(ID = as.integer(idbase)) %>%
  # Create a field with the status
  mutate(Status = "Logged") %>%
  # Genus and Species fields
  mutate(Genus = as.factor(genrearbreprecedent)) %>% 
  mutate(Species = as.factor(especearbreprecedent)) %>% 
  # Reason for logging (in French)
  mutate(Logging_reason = motifabattagearbreprecedent) %>% 
  # Rename and finally select columns
  rename(
    French_species_name = libellefrancaisarbreprecedent
  ) %>% 
  select(
    ID, 
    Species_name, 
    Status, 
    Genus, 
    Species, 
    Logging_reason, 
    French_species_name
  ) -> 
  trees_logged

# Number of trees
trees_logged %>% nrow()
## [1] 48

48 among the 1472 trees of the park were logged.

2.1.1.3 Merge

The two datasets are merged here.

The logged trees must be removed from the first one. Circumference is removed because it is missing from the logged trees dataset.

# All trees
trees_all %>%
  # Delete the logged trees
  filter(!(ID %in% trees_logged$ID)) %>% 
  # Delete the circumference that is absent in trees_logged
  mutate(Circumference = NULL) %>% 
  # Bind the logged trees
  bind_rows(trees_logged) -> 
  trees_no_circumference

Circumferences of all trees, including logged ones, are in tree_all from where they can be recovered.

# Prepare a tibble with circumferences
trees_all %>%
  select(ID, Circumference) %>% 
  # inner_join.sf refuses sf objects
  st_set_geometry(NULL) ->
  Circumferences
# Add the Circumference of trees
trees_no_circumference %>%
  inner_join(Circumferences, by = "ID") ->
  trees

2.1.1.4 Shorter logging reasons

Logging reasons can be:

  • Decaying: the tree’s condition is not healthy enough to keep it safely in a public park.
  • infested: the tree is a maple affected by the (contagious) sooty bark disease, caused by the fungus Cryptostroma corticale.

library("stringr")
trees$Logging_reason[is.na(trees$Logging_reason)] <- ""
trees$Logging_reason %>% 
  str_replace("Arbre.*", "Decaying") %>% 
  str_replace("Foyer.*", "Infested") ->
  trees$Logging_reason

2.1.1.5 Factors

Several fields are converted to factors for efficiency.

trees$Logging_reason <- as.factor(trees$Logging_reason)
trees$Status <- as.factor(trees$Status)
trees$French_species_name <- as.factor(trees$French_species_name)

2.1.2 Point patterns

dbmms uses weighted, marked, planar point patterns (wmppp). A wmppp named trees_infested is built. Point marks are their basal area (as weight) and either their logging reason or their genus if they are alive.

library("dbmss")
trees %>%
  # Weight is the basal area
  mutate(PointWeight = Circumference^2 / 4 / pi) %>%
  mutate(
    PointType = ifelse(
      Logging_reason == "", 
      as.character(Genus), 
      as.character(Logging_reason)
    )
  ) %>%
  # Add X and Y
  bind_cols(st_coordinates(trees)) %>% 
  wmppp(
    window = as.owin(st_bbox(trees)),
    unitname = c("meter", "meters")
    ) ->
  trees_infested

We also need a point pattern to describe the park before logging, as a reference.

trees_all %>%
  # Weight is the basal area
  mutate(PointWeight = Circumference^2 / 4 / pi) %>%
  # Genus is the point type
  rename(PointType = Genus) %>%
  # Add X and Y
  bind_cols(st_coordinates(trees_all)) %>% 
  wmppp(
    window = as.owin(st_bbox(trees_all)),
    unitname = c("meter", "meters")
    ) ->
  trees_2021
autoplot(trees_2021) + 
  ggplot2::labs(size="Basal area", color="Genus")

The map shows the tree genera. Maples (Acer spp.) are the most abundant trees in the park.

2.2 Spatial concentration of maple trees

The M statistic is computed to detect the spatial concentration of maple trees before logging.

Distance <- 15
trees_2021 %>% 
   MEnvelope(
     r = 0:(10 * Distance) / 5,
     ReferenceType = "Acer",
     NeighborType = "Acer", 
     NumberOfSimulations = NumberOfSimulations
  ) ->
  M_Acer

autoplot(M_Acer)

2.2.1 Map

To map it, individual values must be calculated at the chosen distance, that is 15 meters.

trees_2021 %>% 
  Mhat(
    r = c(0, Distance),
    ReferenceType = "Acer",
    NeighborType = "Acer", 
    Individual = TRUE
  ) ->
  M_ind_Acer

A map is produced by smoothing the individual values. To respect the geometry of the map, the ratio of rows to columns of the image to produce is first calculated.

# Window ratio
ratio <- with(
  trees_infested$window, 
  {
   (yrange[2] - yrange[1]) / (xrange[2] - xrange[1])
  }
)

# Smooth the values
trees_2021 %>% 
  Smooth(
    fvind = M_ind_Acer, 
    distance = Distance, 
    Nbx = resolution, Nby = resolution * ratio
  ) -> 
  map_acer

plot(map_acer, main = "")
# Add contour lines
contour(map_acer, nlevels = 10, add = TRUE)
# Add infested trees
points(
  trees_infested[trees_infested$marks$PointType == "Infested"],
  pch = 20
)
# And decaying trees
points(
  trees_infested[trees_infested$marks$PointType == "Decaying"],
  pch = 4,
  col = "green"
)

Logged trees are shown on the map: infested trees are back points and decaying trees are green crosses.

2.2.2 Alternative distances

The 15-meter distance has been chosen because it is a plausible distance of contagion between trees. A shorter distance, say 6 meters, would be justified as a peak of M(distance) but neighbors are scarse at this scale, leading to less statistical power. At greater distances, such as 30m, concentration is still significant but neighborhoods overlap more, leading to less clear results.

Both alternative distances are tested here.

The map made with 6-meter-radius neighborhoods is similar to the previous one.

30-meter-radius neighborhoods are shown below:

Increasing the distance blurs the information where high and low values overlap and cancel out.

In conclusion, the appropriate distance should be chosen according to the knowledge of the studied process.

2.3 Contagion and spatial structure

To conclude this study, we show that infested trees are located in areas of high concentration of maples.

2.3.1 Location of infested maple trees

# Window ratio
ratio <- with(
  trees_infested$window, 
  {
   (yrange[2] - yrange[1]) / (xrange[2] - xrange[1])
  }
)

# Smooth the values
trees_2021 %>% 
  Smooth(
    fvind = M_ind_Acer, 
    distance = Distance, 
    Nbx = resolution, Nby = resolution * ratio
  ) -> 
  map_acer

plot(map_acer, main = "")
# Add contour lines
contour(map_acer, nlevels = 10, add = TRUE)
# Show all maple trees
trees_acer <- trees_2021[trees_2021$marks$PointType == "Acer"]
# Add infested trees
points(
  trees_infested[trees_infested$marks$PointType == "Infested"],
  pch = 20
)
# And decaying trees
points(
  trees_infested[trees_infested$marks$PointType == "Decaying"],
  pch = 4, col = "green"
)

Infested trees are present in the areas where maples are concentrated.

2.3.2 Concentration of maples around infested trees

To test the intertype concentration between sane and infested maple trees, the following \(M\)-intertype is computed: we evaluate whether the proportion of maples trees higher in the close neighborhood of infested maples than elsewhere. If so, results give support to the existence of a contagious disease (Hantsch et al. 2014).

trees_infested %>% 
   MEnvelope(
     r = 0:(10 * Distance) / 5,
     ReferenceType = "Infested",
     NeighborType = "Acer",
     NumberOfSimulations = NumberOfSimulations
  ) ->
  M_infested_Acer

autoplot(M_infested_Acer)

The M plot detects a significant concentration of maples around infested maples.

2.3.3 Map

The map is produced.

trees_infested %>% 
  Mhat(
    ReferenceType = "Infested",
    NeighborType = "Acer", 
    Individual = TRUE
  ) ->
  M_ind_infested_Acer

# Smooth the values
trees_infested %>% 
  Smooth(
    fvind = M_ind_infested_Acer, 
    distance = Distance, 
    Nbx = resolution, Nby = resolution * ratio
  ) -> 
  map_infested_acer

plot(map_infested_acer, main = "")
# Add contour lines
contour(map_infested_acer, nlevels = 5, add = TRUE)

# Add infested trees
points(
  trees_infested[trees_infested$marks$PointType == "Infested"],
  pch = 20
)
# And decaying trees
points(
  trees_infested[trees_infested$marks$PointType == "Decaying"],
  pch = 4,
  col = "green"
)

The map shows that the local concentration of maple around infested trees is generally high (warm colors around points).

References

Hantsch, Lydia, Steffen Bien, Stine Radatz, Uwe Braun, Harald Auge, and Helge Bruelheide. 2014. “Tree Diversity and the Role of Non-Host Neighbour Tree Species in Reducing Fungal Pathogen Infestation.” Edited by Peter Thrall. Journal of Ecology 102 (6): 1673–87. https://doi.org/10.1111/1365-2745.12317.
Marcon, Eric, and Florence Puech. 2023. “Mapping Distributions in Non-Homogeneous Space with Distance-Based Methods.” Journal of Spatial Econometrics 4 (1): 13. https://doi.org/10.1007/s43071-023-00042-1.
Marcon, Eric, Stéphane Traissac, Florence Puech, and Gabriel Lang. 2015. “Tools to Characterize Point Patterns: Dbmss for R.” Journal of Statistical Software 67 (3): 1–15. https://doi.org/10.18637/jss.v067.c03.
Matérn, Bertil. 1960. “Spatial Variation.” Meddelanden Från Statens Skogsforskningsinstitut 49 (5): 1–144.
R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.