logo

# 1 Set-up ----

# load libraries
library(tidyverse) # managing data
library(ggdag) # drawing DAG
library(kableExtra) # visualising table
library(here) # handling path
library(rstan) # running Bayesian models
library(plotly) # interactive plot

# stan setup
options(mc.cores = parallel::detectCores()-1)
rstan::rstan_options(auto_write = TRUE) # speed up running time of compiled model

Introducción

El tutorial 2 exploró cómo modelizar patrones espaciales a gran escala: cómo difiere la densidad de población por grandes grupos, como por región o tipo de asentamiento. Integramos esas variaciones a gran escala en un marco bayesiano utilizando un modelo jerárquico de intercepción aleatoria para la densidad de población.

El objetivo del tutorial 3 es integrar las variaciones a pequeña escala de la densidad de población vinculadas al contexto local de los asentamientos humanos. La imagen 1 muestra cómo las covariables geoespaciales de alta resolución proporcionan información precisa sobre el contexto local.

Example of a covariate (residential area) around Onitscha, Nigeria

Imagen 1: Ejemplo de covariable (zona residencial) alrededor de Onitscha, Nigeria

La adición de covariables de alta resolución ayuda a mejorar el ajuste del modelo y orienta la predicción de la población en zonas no muestreadas.

Para que los datos se utilicen como covariables en el modelo, deben estar:

correlacionados con las diferencias de densidad de población
medidos de forma coherente y completa en todo el espacio de estudio
cartografiados con precisión como capas geoespaciales de alta resolución

Las covariables de alta resolución adecuadas para la modelización suelen ser covariables espaciales con cobertura nacional y recolección de datos coherente. Aunque la información a nivel individual o de hogares, como la recolección durante las encuestas, es útil para comprender las diferencias en las características demográficas, ese tipo de información es difícil de utilizar en los enfoques ascendentes porque solo proviene de una muestra de hogares. El objetivo principal del modelo de población es hacer una predicción completa en términos espaciales.

Objetivos

Comprender los requisitos de las covariables que deben incluirse en el modelo
Obtener una visión general de las covariables utilizadas en los modelos de población ascendentes de WorldPop
Familiarizarse con el procesamiento de covariables para el modelo de población
Integrar una regresión lineal para refinar la definición de la media de la distribución lognormal
Añadir un modelo de pendiente aleatoria
Introducir el concepto de inicialización en la estimación de un modelo MCMC

Lecturas complementarias

Para incluir una covariable espacial en la modelización y utilizarla como soporte para la predicción, utilizamos conjuntos de datos organizados en una cuadrícula, conocidos como ráster. Por lo tanto, es necesario contar con conocimientos básicos de SIG sobre gestión de archivos ráster y vectoriales. No es necesario que sea en R, puede ser en QGIS, ArcGIS, Python o cualquier software SIG de tu elección.

El objetivo de este tutorial no es el procesamiento de datos espaciales. Nos limitaremos a mencionar las técnicas de procesamiento necesarias para preparar los datos de covariables para la modelización de poblaciones.

Sin embargo, aquí presentamos algunos recursos de R sobre manipulación espacial:

La biblia de Pebesma y Bivand sobre los paquetes R sf (datos vectoriales) y stars (datos ráster) con una excelente visión general de las diferentes manipulaciones espaciales
La introducción práctica al paquete sf y raster de la Universidad de Wageningen
Un enfoque de la manipulación de datos ráster con paquetes raster y terra de Hijmans

Modelización formal

Partiremos del modelo 3 del tutorial 2, que se basa en una distribución de Poisson para modelizar el recuento de población y en una distribución lognormal con una intercepción aleatoria jerárquica por tipo de asentamiento y región para modelizar la densidad de población.

Dado que la media de la densidad de población solo se define con una intercepción aleatoria, resulta que hay 5 (número de tipo de asentamiento) x 11 (número de región) opciones para las estimaciones de densidad de población. Para añadir variaciones a pequeña escala, refinamos la media de la distribución lognormal con un modelo de regresión que integra las covariables.

Más formalmente, definamos como $X$ una matriz de tamaño del número de observaciones x número de covariables que contiene los valores de las covariables para cada sitio de estudio y $\beta$ un vector de tamaño del número de covariables. A partir de la ecuación 3 del tutorial 2 (y eliminando la distribución a priori para $\alpha_{t,r}$ por la legibilidad), definimos $\mu$ como la media de la distribución lognormal de la siguiente manera:

\[\begin{equation} population 〜 Poisson( pop\_density * settled\_area) \\ pop\_density 〜 Lognormal(\mu, \: \sigma) \\ \mu = \alpha_{t,r} + X \beta \\[10pt] \beta 〜 Normal(0,10)\tag{1} \end{equation}\]

Ten en cuenta que las distribuciones a priori para $\beta$ son distribuciones normales idénticas para cada covariable con media 0 y desviación típica 10 para evitar la introducción de cualquier patrón.

La imagen 2 muestra las relaciones dependientes actualizadas de nuestro modelo al integrar las covariables.

Dependancy graph for model 1 of tutorial 3

Imagen 2: Gráfico de dependencia del modelo 1 del tutorial 3

Revisión de las covariables utilizadas en WorldPop

Hasta la fecha, WorldPop ha elaborado cinco modelos de población ascendentes:

WorldPop. 2019. Estimaciones ascendentes en cuadrículas de la población de Nigeria, versión 1.2.
WorldPop, Universidad de Southampton. https://dx.doi.org/10.5258/SOTON/WP00655.
WorldPop (Facultad de Geografía y Ciencias Ambientales, Universidad de
Southampton). 2020. Estimaciones ascendentes en cuadrículas de la población de Zambia, versión 1.0.
https://dx.doi.org/10.5258/SOTON/WP00662
WorldPop y el Instituto Nacional de Estadística y Demografía de Burkina Faso. 2021. Estimaciones de población en cuadrículas basadas en censos para Burkina Faso (2019), versión 1.0. WorldPop, Universidad de Southampton. https://dx.doi.org/10.5258/SOTON/WP00687.
Boo G, Darin E, Leasure DR, Dooley CA, Chamberlain HR, Lazar AN, Tatem AJ. 2020.
Estimaciones de la población modelizadas en cuadrícula para las provincias de Kinsasa, Congo Central, Kwango, Kwilu y Mai-Ndombe en la República Democrática del Congo, versión 2.0.
WorldPop, Universidad de Southampton. https://dx.doi.org/10.5258/SOTON/WP00669
WorldPop. 2020. Estimaciones de la población ascendentes en cuadrícula para las provincias de Kinsasa, Congo Central, Kwango, Kwilu y Mai-Ndombe en la República Democrática del Congo, versión 1.0. WorldPop, Universidad de Southampton. https://dx.doi.org/10.5258/SOTON/WP00658

Además, actualmente se están actualizando dos modelos: Nigeria, versión 2.0 y República Democrática del Congo, versión 3.0.

Esos seis modelos abarcan una amplia gama de covariables específicas de cada país. La tabla 1 ofrece una visión general del conjunto final de covariables seleccionadas para cada modelo.

review_cov <- read_csv('tutorials/tutorial3/covs_review.csv')

review_cov %>% arrange(Type) %>% kbl(caption='Review of covariates used in WorldPop bottom-up population models') %>% kable_minimal()

Table 1: Review of covariates used in WorldPop bottom-up population models
Type	Covariate	Model	Source
Gridded population	UN-adjusted projected gridded estimates	Burkina Faso v1.0	WorldPop
Gridded population	Projected gridded estimates	Nigeria v1.2	WorldPop
Gridded population	Mean UN-adjusted projected gridded estimates within a 2km radius	Democratic Republic of Congo v1.0	WorldPop
Infrastucture	Fricition surface	Burkina Faso v1.0	Access to the Cities project
Infrastucture	Distance to secondary roads	Burkina Faso v1.0	National Geographical Office
Infrastucture	Household size	Nigeria v1.2, Nigeria v2.0	Demographic and Health Survey
Infrastucture	Residential roads density	Democratic Republic of Congo v1.0	OpenStreetMap
Infrastucture	Travel time to cities	Democratic Republic of Congo v1.0	Malaria Atlas Project
Infrastucture	Tertiary-sector activities density	Democratic Republic of Congo v1.0	OpenStreetMap
Natural feature	Distance to temporary rivers	Burkina Faso v1.0	National Geographical Office
Natural feature	Monthly variability of dry matter productivity	Democratic Republic of Congo v3.0	Copernicus
Natural feature	Monthly variability of surface air temperature	Democratic Republic of Congo v3.0	Copernicus
Natural feature	Land surface ‘roughness’ from Synthetic Aperture Radard VH	Nigeria v2.0	Sentinel-1
Natural feature	Land surface ‘roughness’ from Synthetic Aperture Radard VV	Nigeria v2.0	Sentinel-1
Settlement	Mean building count within a 5km radius	Burkina Faso v1.0	Ecopia & Maxar
Settlement	Mean building area within a 1km radius	Democratic Republic of Congo v2.0	Ecopia & Maxar
Settlement	Mean distance to nearest building within a 1km radius	Democratic Republic of Congo v2.0	Ecopia & Maxar
Settlement	Mean building count within a 1km radius	Democratic Republic of Congo v2.0	Ecopia & Maxar
Settlement	Mean building area	Zambia v1.0	Ecopia & Maxar
Settlement	Building density	Zambia v1.0	Ecopia & Maxar
Settlement	Coefficient of variation of building area	Zambia v1.0	Ecopia & Maxar
Settlement	Sum residential area within a 1km radius	Nigeria v1.2	Oak Ridge National Laboratory
Settlement	Sum nonresidential area within a 1km radius	Nigeria v1.2	Oak Ridge National Laboratory
Settlement	School density within a 1km radius	Nigeria v1.2	Oak Ridge National Laboratory
Settlement	Mean building perimeter	Democratic Republic of Congo v3.0	Ecopia & Maxar
Settlement	Compactness of building	Democratic Republic of Congo v3.0	Ecopia & Maxar

Las covariables seleccionadas pueden clasificarse a grandes rasgos como descriptivas de cuatro impulsores principales de la variación de la densidad de población local:

la distribución espacial previa de la población mediante la desagregación descendente de la población de WorldPop (WorldPop Research Group et al. 2018)
la infraestructura a través de OpenStreetMap (colaboradores de OpenStreetMap 2018), los recursos específicos del país (por ejemplo, Institut Géographique du Burkina Faso (2015)) o los recursos modelizados (por ejemplo, el tiempo de viaje a las ciudades (Weiss et al. 2018))
las características naturales mediante productos de teledetección, como datos de radar (Sentinel-1) o la productividad de la materia seca (Copernicus)
la forma del asentamiento a través de la morfología de las huellas de los edificios (Ecopia.AI y Maxar Technologies 2019) o de las zonas pobladas (Oak Ridge National Laboratory 2018)

Otras fuentes de covariables que se consideraron (pero no se seleccionaron en los modelos finales) fueron las siguientes:

Lugares de conflicto del Proyecto de Datos de Eventos y Ubicación de Conflictos Armados
Variables climáticas de la Unidad de Investigación del Clima de la Universidad de East Anglia
Concesiones mineras activas del grupo IPIS
Clasificación de la ocupación del suelo de la Agencia Espacial Europea
Cambio forestal global de la Universidad de Maryland
Elevación y pendiente de WorldDEM
Emisiones de combustibles fósiles del proyecto Open-source Data Inventory for Anthropogenic CO2

Para construir un modelo, primero reunimos todas las covariables que pueden relacionarse con los datos específicos de nuestra población. Pueden llegar a ser más de 900. A continuación, utilizamos técnicas de análisis geoespacial para obtener una versión en cuadrícula de las covariables con idéntica resolución espacial, alineación y extensión. Se trata de volver a muestrear y recortar las covariables proporcionadas como archivos ráster o, en el caso de las covariables proporcionadas como archivos vectoriales, calcular el recuento, la densidad, la distancia a las características más cercanas o incluso técnicas de interpolación.

Ingeniería de covariables

Otros pasos de ingeniería de covariables pueden ayudar a extraer aún más información de las covariables recopiladas.

Transformación logarítmica

Considerar el logaritmo de las covariables ayuda a manejar los valores extremos.

Estadísticas focales

Las estadísticas focales consisten en resumir las covariables en una ventana móvil alrededor de cada celda de la cuadrícula. Como se observa en la tabla 1, utilizamos diferentes tamaños de ventana (1 km, 2 km o 5 km) y estadísticas de resumen (media o coeficiente de variación). Proporciona información contextual en torno a las celdas de la cuadrícula.

Normalización

Escalar la covariable (es decir, restar la media y dividir por la desviación típica) ayuda a realzar las variaciones locales significativas de la media. La escala puede incluso afinarse calculando la media y la desviación típica por región, de modo que las variaciones locales sean representativas de la región.

Nota sobre la selección de las covariables

Después de analizar las covariables recopiladas, es posible que nos encontremos con más de 1000 covariables potenciales.

Para seleccionar la mejor a efectos de predicción, generalmente utilizamos uno de los dos métodos siguientes:

correlación por pares y diagrama de dispersión con la densidad de población en el sitio de estudio
análisis univariado, probando cada covariable sucesivamente

Inclusión de covariables en el modelo

En las partes restantes del tutorial nos centraremos en los datos que descargamos de Leasure et al. (2020), que corresponde al modelo de Nigeria, versión 1.2.

Resumen de las covariables en Nigeria, versión 1.2

En el modelo de Nigeria, versión 1.2, se utilizaron seis covariables:

x_1: estimaciones de la población en cuadrículas de WorldPop Global
x_2: densidad escolar en un radio de 1 km
x_3: tamaño de los hogares mediante la interpolación de los resultados de la encuesta demográfica y de salud de 2013
x_4: zona poblada en un radio de 1 km
x_5: zona residencial en un radio de 1 km
x_6: zona poblada no residencial en un radio de 1 km

La covariable x_4 se escaló en función de su media y su desviación típica a escala nacional, mientras que las covariables x_5 y x_6 se escalaron en función de su media y su desviación típica en un radio de 50 km. En Leasure et al. se escalaron x_5 y x_6 de esta forma porque se sospechaba que los tipos de vecindario podrían no ser directamente comparables entre regiones (especialmente entre el norte y el sur de Nigeria). Esta escala también redujo la correlación con x_4.

Escalaron las estimaciones de WorldPop Global (x_1) en función de su media y su desviación típica a escala nacional. Trataron esta covariable como un indicador de las densidades de población relativas basado en las covariables geoespaciales que se utilizaron en el modelo de bosque aleatorio.

La covariable x_2 se escaló utilizando su media y su desviación típica en un radio de 50 km. Escalaron esta covariable dentro de una ventana móvil de 50 km porque lo que constituye una “alta densidad” de escuelas varía según la región y esta distinción se perdía cuando la covariable se escalaba a escala nacional. Esto también ayudó a controlar las posibles diferencias en el esfuerzo de elaborar cartografías escolares en las distintas regiones.

Escalaron x_3 en función de su media y su desviación típica a escala nacional. Una de las principales razones para incluir esta covariable fue tener en cuenta el fuerte gradiente norte-sur en el tamaño de los hogares, con un número significativamente mayor de personas por hogar en el norte de Nigeria que en el sur.

Preparación de los datos

Para integrar las covariables en el modelo, primero creamos un conjunto de datos con la media de los valores de las covariables para cada sitio de estudio utilizando estadísticas zonales.

Ten en cuenta que esto constituye un cambio en el soporte: podríamos querer comprobar si el rango de valores de las covariables a nivel del sitio de estudio es representativo de los valores de las covariables a nivel de la celda de cuadrícula.

La imagen 3 muestra la relación entre las covariables y la densidad de población a nivel del sitio de estudio. Vemos que el tamaño de los hogares (x_3) está positivamente asociado a la densidad de población. Los valores negativos se deben al método de escalado adoptado. Por el contrario, la zona poblada no residencial (x_6) está negativamente asociada a la densidad de población, lo que es de esperar: cuanto menos residencial es el entorno, menor es la densidad de población.

# 2. Covariates preparation ----
# load data
data <- readxl::read_excel(here('tutorials/data/nga_demo_data.xls'))
data <- data %>% 
  mutate(
    pop_density=N/A,
    id = as.character(1:nrow(data))
  )

# contrast covariates with pop density
data_long <- data %>% 
  pivot_longer(starts_with('x'), names_to = 'cov')

ggplot(data_long, aes(x=pop_density,y=value))+
  geom_point()+
  geom_smooth(method = "lm", se = FALSE,color='orange')+
  theme_minimal()+
  facet_wrap(.~cov, ncol=3, scales = 'free')+
  labs(x='Population density', y='')

Scatterplot of covariates values vs population density for each study site

Imagen 3: Diagrama de dispersión de los valores de las covariables frente a la densidad de población para cada sitio de estudio

Antes de implementar el modelo en stan, escalamos uniformemente las covariables a nivel del sitio de estudio, de forma que los parámetros $\beta_k$ tengan la misma escala. Primero calculamos los coeficientes de escala (media y desviación típica) de cada covariable:

# compute scaling factors (mean and sd)
covariatesScaling <- function(var){
  mean_var <- mean(var)
  sd_var <- sd(var)
  return(
    data.frame(
      'cov_mean'= mean_var,
      'cov_sd' = sd_var
    )
  )
} 

covs <- data %>% 
  select(starts_with('x'))

scale_factor <- bind_rows(apply(covs, 2, covariatesScaling))
scale_factor$cov <- colnames(covs)

scale_factor %>% select(cov, cov_mean, cov_sd) %>% kbl %>%  kable_minimal()

cov	cov_mean	cov_sd
x1	3.913963	8.071879
x2	2.950790	3.970139
x3	0.000000	1.000000
x4	5.081461	4.748466
x5	3.941279	4.377713
x6	2.729769	5.028868

A continuación, aplicamos el coeficiente de escala a las covariables:

# apply scaling factors to covariates
covs_scaled <-  covs %>% 
  mutate(cluster_id = 1:nrow(covs)) %>% 
  pivot_longer(-cluster_id,names_to = 'cov') %>% 
  left_join(scale_factor, by="cov") %>% 
  mutate(value= (value-cov_mean)/cov_sd ) %>% 
  select(-cov_mean, -cov_sd) %>% 
  pivot_wider(names_from = cov, values_from = value, id_cols = cluster_id) %>% 
  select(-cluster_id)

Almacenamos las covariables escaladas y los coeficientes de escala para la etapa de predicción (en el tutorial 4).

# save scaling factor
write_csv(covs_scaled, here('tutorials/data/covs_scaled.csv'))
write_csv(scale_factor, here('tutorials/data/scale_factor.csv'))

Aplicación del modelo

La ecuación (1) se aplica en stan de la siguiente manera:

// Model 1: Hierarchical alpha by settlement type , region + covariates
data{
  ...
  // slope
  int<lower=0> ncov; // number of covariates
  matrix[n, ncov] cov; // covariates
}
parameters{
  ...
  // slope
  row_vector[ncov] beta; 
}
transformed parameters{
  ...
  for(idx in 1:n){
    pop_density_median[idx] = alpha_t_r[type[idx], region[idx]] + sum( cov[idx,] .* beta );
  }
}
model{
  ...
  //slope
  beta ~ normal(0,10);
}
generated quantities{
  ...
   for(idx in 1:n){
    density_hat[idx] = lognormal_rng( alpha_t_r[type[idx], region[idx]] + sum(cov[idx,] .* beta), sigma );
   }
}

Observa los dos nuevos tipos de datos – matrix para los valores de las covariables y row_vector para $\beta$ –, así como el nuevo operador, .* . Un vector fila es una matriz con una fila. .* Realiza una multiplicación por elementos. Necesitamos estos elementos debido a la forma en que codificamos la regresión lineal para pop_density_median: en un bucle for que recorre cada sitio de estudio definido por su idx. Para cada sitio de estudio, extraemos los valores de las covariables correspondientes, cov[idx,], que es un vector fila. Para obtener un vector de cada valor de covariable asociado a cada parámetro $\beta_k$, beta debe ser un parámetro row_vector y la multiplicación debe realizarse elemento a elemento.

Mantenemos la misma configuración para el MCMC:

# 3. Modelling with covariates ----

# mcmc settings
chains <- 4
warmup <- 500
iter <- 500
seed <- 1789

Y añadimos las covariables a los datos de entrada en stan:

# prepare data for stan
stan_data_model1 <- list(
  population = data$N,
  n = nrow(data),
  area = data$A,
  type = data$type,
  ntype= n_distinct(data$type),
  region = data$region,
  nregion = n_distinct(data$region),
  seed=seed,
  cov = covs_scaled,
  ncov = ncol(covs_scaled)
  )

Añadimos beta como parámetro para controlar y ejecutar el modelo.

pars <- c('alpha','sigma','beta','alpha_t', 'nu_alpha', 'nu_alpha_t', 'population_hat',  'density_hat')

# mcmc
fit1 <- rstan::stan(file = file.path('tutorials/tutorial3/tutorial3_model1.stan'), 
                   data = stan_data_model1,
                   iter = warmup + iter, 
                   chains = chains,
                   warmup = warmup, 
                   pars = pars,
                   seed = seed)

## Warning in .local(object, ...): some chains had errors; consider specifying
## chains = 1 to debug

## here are whatever error messages were returned

## [[1]]
## Stan model 'tutorial3_model1' does not contain samples.

## Warning in validityMethod(object): The following variables have undefined
## values: population_hat[1],The following variables have undefined
## values: population_hat[2],The following variables have undefined
## values: population_hat[3],The following variables have undefined
## values: population_hat[4],The following variables have undefined
## values: population_hat[5],The following variables have undefined
## values: population_hat[6],The following variables have undefined
## values: population_hat[7],The following variables have undefined
## values: population_hat[8],The following variables have undefined
## values: population_hat[9],The following variables have undefined values:
## population_hat[10],The following variables have undefined values:
## population_hat[11],The following variables have undefined values:
## population_hat[12],The following variables have undefined values:
## population_hat[13],The following variables have undefined values:
## population_hat[14],The following variables have undefined values:
## population_hat[15],The following variables have undefined values:
## population_hat[16],The following variables have undefined values:
## population_hat[17],The following variables have undefined values:
## population_hat[18],The following variables have undefined values:
## population_hat[19],The following variables have undefined values:
## population_hat[20],The following variables have undefined values:
## population_hat[21],The following variables have undefined values:
## population_hat[22],The following variables have undefined values:
## population_hat[23],The following variables have undefined values:
## population_hat[24],The following variables have undefined values:
## population_hat[25],The following variables have undefined values:
## population_hat[26],The following variables have undefined values:
## population_hat[27],The following variables have undefined values:
## population_hat[28],The following variables have undefined values:
## population_hat[29],The following variables have undefined values:
## population_hat[30],The following variables have undefined values:
## population_hat[31],The following variables have undefined values:
## population_hat[32],The following variables have undefined values:
## population_hat[33],The following variables have undefined values:
## population_hat[34],The following variables have undefined values:
## population_hat[35],The following variables have undefined values:
## population_hat[36],The following variables have undefined values:
## population_hat[37],The following variables have undefined values:
## population_hat[38],The following variables have undefined values:
## population_hat[39],The following variables have undefined values:
## population_hat[40],The following variables have undefined values:
## population_hat[41],The following variables have undefined values:
## population_hat[42],The following variables have undefined values:
## population_hat[43],The following variables have undefined values:
## population_hat[44],The following variables have undefined values:
## population_hat[45],The following variables have undefined values:
## population_hat[46],The following variables have undefined values:
## population_hat[47],The following variables have undefined values:
## population_hat[48],The following variables have undefined values:
## population_hat[49],The following variables have undefined values:
## population_hat[50],The following variables have undefined values:
## population_hat[51],The following variables have undefined values:
## population_hat[52],The following variables have undefined values:
## population_hat[53],The following variables have undefined values:
## population_hat[54],The following variables have undefined values:
## population_hat[55],The following variables have undefined values:
## population_hat[56],The following variables have undefined values:
## population_hat[57],The following variables have undefined values:
## population_hat[58],The following variables have undefined values:
## population_hat[59],The following variables have undefined values:
## population_hat[60],The following variables have undefined values:
## population_hat[61],The following variables have undefined values:
## population_hat[62],The following variables have undefined values:
## population_hat[63],The following variables have undefined values:
## population_hat[64],The following variables have undefined values:
## population_hat[65],The following variables have undefined values:
## population_hat[66],The following variables have undefined values:
## population_hat[67],The following variables have undefined values:
## population_hat[68],The following variables have undefined values:
## population_hat[69],The following variables have undefined values:
## population_hat[70],The following variables have undefined values:
## population_hat[71],The following variables have undefined values:
## population_hat[72],The following variables have undefined values:
## population_hat[73],The following variables have undefined values:
## population_hat[74],The following variables have undefined values:
## population_hat[75],The following variables have undefined values:
## population_hat[76],The following variables have undefined values:
## population_hat[77],The following variables have undefined values:
## population_hat[78],The following variables have undefined values:
## population_hat[79],The following variables have undefined values:
## population_hat[80],The following variables have undefined values:
## population_hat[81],The following variables have undefined values:
## population_hat[82],The following variables have undefined values:
## population_hat[83],The following variables have undefined values:
## population_hat[84],The following variables have undefined values:
## population_hat[85],The following variables have undefined values:
## population_hat[86],The following variables have undefined values:
## population_hat[87],The following variables have undefined values:
## population_hat[88],The following variables have undefined values:
## population_hat[89],The following variables have undefined values:
## population_hat[90],The following variables have undefined values:
## population_hat[91],The following variables have undefined values:
## population_hat[92],The following variables have undefined values:
## population_hat[93],The following variables have undefined values:
## population_hat[94],The following variables have undefined values:
## population_hat[95],The following variables have undefined values:
## population_hat[96],The following variables have undefined values:
## population_hat[97],The following variables have undefined values:
## population_hat[98],The following variables have undefined values:
## population_hat[99],The following variables have undefined values:
## population_hat[100],The following variables have undefined values:
## population_hat[101],The following variables have undefined values:
## population_hat[102],The following variables have undefined values:
## population_hat[103],The following variables have undefined values:
## population_hat[104],The following variables have undefined values:
## population_hat[105],The following variables have undefined values:
## population_hat[106],The following variables have undefined values:
## population_hat[107],The following variables have undefined values:
## population_hat[108],The following variables have undefined values:
## population_hat[109],The following variables have undefined values:
## population_hat[110],The following variables have undefined values:
## population_hat[111],The following variables have undefined values:
## population_hat[112],The following variables have undefined values:
## population_hat[113],The following variables have undefined values:
## population_hat[114],The following variables have undefined values:
## population_hat[115],The following variables have undefined values:
## population_hat[116],The following variables have undefined values:
## population_hat[117],The following variables have undefined values:
## population_hat[118],The following variables have undefined values:
## population_hat[119],The following variables have undefined values:
## population_hat[120],The following variables have undefined values:
## population_hat[121],The following variables have undefined values:
## population_hat[122],Th

El modelo se enfrenta a problemas de convergencia. El mensaje de error que se devuelve no es muy informativo. Sin embargo, el panel Viewer de Rstudio contiene más información, a saber:

Cadena 2: Excepción: lognormal_lpdf: El parámetro de escala es 0, ¡pero debe ser > 0!.

Significa que la combinación de $\alpha$, $\beta$ y el valor de las covariables que se está probando actualmente conduce a algunos ceros en el término de escala de la distribución lognormal, lo cual está prohibido.

Nota sobre la inicialización

Esta es una oportunidad para hablar de la inicialización. Las simulaciones MCMC comienzan a explorar el espacio de parámetros a partir de un valor inicial.

Este valor inicial se controla en stan mediante la opción init. Su valor por defecto es aleatorio, es decir: “Deja que Stan genere valores iniciales aleatorios para todos los parámetros. El patrón del generador de números aleatorios utilizado por Stan puede especificarse mediante el argumento seed. Si el patrón de Stan es fijo, se utilizan los mismos valores iniciales. Por defecto se generan aleatoriamente valores iniciales entre -2 y 2 en el soporte sin restricciones”.

Definir los valores iniciales ayuda al algoritmo a empezar cerca de la región de interés, de forma que no se pierde tiempo explorando una zona del espacio de parámetros que sabemos que no se ajusta a los valores probables y que podría encontrarse con una combinación de parámetros que no se ajusta a la estructura de nuestro modelo. Ten en cuenta que la inicialización no es restrictiva, sino que solo da una pista al algoritmo.

Solo inicializamos los parámetros raíz. Los parámetros raíz son parámetros que no dependen de otros parámetros. Los parámetros dependientes heredarán entonces la inicialización.

# add initialisation inits.out <- list() set.seed(stan_data_model1$seed) for (c in 1:chains){ inits.i <- list() inits.i$sigma <- runif(1, 0.4, 0.8) inits.i$alpha <- runif(1, 3, 6) inits.i$beta <- runif(stan_data_model1$ncov, -1, 1) inits.out[[c]] <- inits.i }

Obsérvese que definimos valores iniciales para cada cadena. Nos basamos en los valores estimados en modelos anteriores y añadimos algunas fluctuaciones aleatorias.

Ejecutamos la estimación con estos valores de inicialización.

fit1bis <- rstan::stan(file = file.path('tutorials/tutorial3/tutorial3_model1.stan'), data = stan_data_model1, iter = warmup + iter, chains = chains, warmup = warmup, pars = pars, seed = seed, init= inits.out)

## Warning in validityMethod(object): The following variables have undefined ## values: population_hat[1],The following variables have undefined ## values: population_hat[2],The following variables have undefined ## values: population_hat[3],The following variables have undefined ## values: population_hat[4],The following variables have undefined ## values: population_hat[5],The following variables have undefined ## values: population_hat[6],The following variables have undefined ## values: population_hat[7],The following variables have undefined ## values: population_hat[8],The following variables have undefined ## values: population_hat[9],The following variables have undefined values: ## population_hat[10],The following variables have undefined values: ## population_hat[11],The following variables have undefined values: ## population_hat[12],The following variables have undefined values: ## population_hat[13],The following variables have undefined values: ## population_hat[14],The following variables have undefined values: ## population_hat[15],The following variables have undefined values: ## population_hat[16],The following variables have undefined values: ## population_hat[17],The following variables have undefined values: ## population_hat[18],The following variables have undefined values: ## population_hat[19],The following variables have undefined values: ## population_hat[20],The following variables have undefined values: ## population_hat[21],The following variables have undefined values: ## population_hat[22],The following variables have undefined values: ## population_hat[23],The following variables have undefined values: ## population_hat[24],The following variables have undefined values: ## population_hat[25],The following variables have undefined values: ## population_hat[26],The following variables have undefined values: ## population_hat[27],The following variables have undefined values: ## population_hat[28],The following variables have undefined values: ## population_hat[29],The following variables have undefined values: ## population_hat[30],The following variables have undefined values: ## population_hat[31],The following variables have undefined values: ## population_hat[32],The following variables have undefined values: ## population_hat[33],The following variables have undefined values: ## population_hat[34],The following variables have undefined values: ## population_hat[35],The following variables have undefined values: ## population_hat[36],The following variables have undefined values: ## population_hat[37],The following variables have undefined values: ## population_hat[38],The following variables have undefined values: ## population_hat[39],The following variables have undefined values: ## population_hat[40],The following variables have undefined values: ## population_hat[41],The following variables have undefined values: ## population_hat[42],The following variables have undefined values: ## population_hat[43],The following variables have undefined values: ## population_hat[44],The following variables have undefined values: ## population_hat[45],The following variables have undefined values: ## population_hat[46],The following variables have undefined values: ## population_hat[47],The following variables have undefined values: ## population_hat[48],The following variables have undefined values: ## population_hat[49],The following variables have undefined values: ## population_hat[50],The following variables have undefined values: ## population_hat[51],The following variables have undefined values: ## population_hat[52],The following variables have undefined values: ## population_hat[53],The following variables have undefined values: ## population_hat[54],The following variables have undefined values: ## population_hat[55],The following variables have undefined values: ## population_hat[56],The following variables have undefined values: ## population_hat[57],The following variables have undefined values: ## population_hat[58],The following variables have undefined values: ## population_hat[59],The following variables have undefined values: ## population_hat[60],The following variables have undefined values: ## population_hat[61],The following variables have undefined values: ## population_hat[62],The following variables have undefined values: ## population_hat[63],The following variables have undefined values: ## population_hat[64],The following variables have undefined values: ## population_hat[65],The following variables have undefined values: ## population_hat[66],The following variables have undefined values: ## population_hat[67],The following variables have undefined values: ## population_hat[68],The following variables have undefined values: ## population_hat[69],The following variables have undefined values: ## population_hat[70],The following variables have undefined values: ## population_hat[71],The following variables have undefined values: ## population_hat[72],The following variables have undefined values: ## population_hat[73],The following variables have undefined values: ## population_hat[74],The following variables have undefined values: ## population_hat[75],The following variables have undefined values: ## population_hat[76],The following variables have undefined values: ## population_hat[77],The following variables have undefined values: ## population_hat[78],The following variables have undefined values: ## population_hat[79],The following variables have undefined values: ## population_hat[80],The following variables have undefined values: ## population_hat[81],The following variables have undefined values: ## population_hat[82],The following variables have undefined values: ## population_hat[83],The following variables have undefined values: ## population_hat[84],The following variables have undefined values: ## population_hat[85],The following variables have undefined values: ## population_hat[86],The following variables have undefined values: ## population_hat[87],The following variables have undefined values: ## population_hat[88],The following variables have undefined values: ## population_hat[89],The following variables have undefined values: ## population_hat[90],The following variables have undefined values: ## population_hat[91],The following variables have undefined values: ## population_hat[92],The following variables have undefined values: ## population_hat[93],The following variables have undefined values: ## population_hat[94],The following variables have undefined values: ## population_hat[95],The following variables have undefined values: ## population_hat[96],The following variables have undefined values: ## population_hat[97],The following variables have undefined values: ## population_hat[98],The following variables have undefined values: ## population_hat[99],The following variables have undefined values: ## population_hat[100],The following variables have undefined values: ## population_hat[101],The following variables have undefined values: ## population_hat[102],The following variables have undefined values: ## population_hat[103],The following variables have undefined values: ## population_hat[104],The following variables have undefined values: ## population_hat[105],The following variables have undefined values: ## population_hat[106],The following variables have undefined values: ## population_hat[107],The following variables have undefined values: ## population_hat[108],The following variables have undefined values: ## population_hat[109],The following variables have undefined values: ## population_hat[110],The following variables have undefined values: ## population_hat[111],The following variables have undefined values: ## population_hat[112],The following variables have undefined values: ## population_hat[113],The following variables have undefined values: ## population_hat[114],The following variables have undefined values: ## population_hat[115],The following variables have undefined values: ## population_hat[116],The following variables have undefined values: ## population_hat[117],The following variables have undefined values: ## population_hat[118],The following variables have undefined values: ## population_hat[119],The following variables have undefined values: ## population_hat[120],The following variables have undefined values: ## population_hat[121],The following variables have undefined values: ## population_hat[122],Th

El mensaje de error anterior ya no aparece y el modelo contiene muestras. Seguimos observando el mismo problema del tutorial 2, el desbordamiento de enteros para la predicción del recuento de población en el período de calentamiento. Podemos afirmar sin temor a equivocarnos que el modelo ha convergido.

Podemos trazar el $\hat\beta_k$. El signo y la magnitud de los efectos de las covariables coinciden con la asociación mostrada en la imagen 3.

# plot beta estimation stan_plot(fit1, pars='beta', fill_color='orange')

Comparación de la predicción con el modelo anterior

La siguiente pregunta es: ¿Qué mejoras aporta la integración de covariables en el modelo?

Cargamos el modelo 3 del tutorial 2 para comparar:

# load tutorial 2 final model fit0 <- readRDS('tutorials/tutorial2/tutorial2_model3_fit.rds')

A continuación, podemos calcular las predicciones para cada sitio de estudio y comparar la bondad de ajuste al añadir las covariables.

# extract predictions getPopPredictions <- function(model_fit, estimate='population_hat', obs='N', reference_data=data){ # extract predictions predicted_pop <- as_tibble(extract(model_fit, estimate)[[estimate]]) colnames(predicted_pop) <- reference_data$id # summarise predictions predicted_pop <- predicted_pop %>% pivot_longer(everything(),names_to = 'id', values_to = 'predicted') %>% group_by(id) %>% summarise( across(everything(), list(mean=~mean(.), upper=~quantile(., probs=0.975), lower=~quantile(., probs=0.025))) ) %>% left_join(reference_data %>% rename('reference'=all_of(obs)) %>% select(id, reference), by = 'id')%>% mutate( residual= predicted_mean - reference, ci_size = predicted_upper- predicted_lower, estimate = estimate ) return(predicted_pop) } comparison_df <- rbind( getPopPredictions(fit0) %>% mutate(Model='Without covariates'), getPopPredictions(fit1) %>% mutate(Model='With covariates')) # compute goodness-of-fit metrics comparison_df %>% group_by(Model) %>% summarise( `Bias`= mean(residual), `Inaccuracy` = mean(abs(residual)), `Imprecision` = sd(residual) ) %>% kbl(caption = 'Goodness-of-metrics comparison with and without covariates ') %>% kable_minimal()

Table 2: Goodness-of-metrics comparison with and without covariates

Model Bias Inaccuracy Imprecision

With covariates 33.12621 193.7379 276.0208

Without covariates 35.19189 200.0586 284.0779

Vemos una mejora en todas las métricas de bondad de ajuste.

Table 2: Goodness-of-metrics comparison with and without covariates
Model	Bias	Inaccuracy	Imprecision
With covariates	33.12621	193.7379	276.0208
Without covariates	35.19189	200.0586	284.0779

Efecto del agrupamiento y de las covariables: un modelo de pendiente aleatoria

En el tutorial 2 hemos visto que un modelo que se ajusta a un parámetro $\alpha$ único para todas las observaciones podría mejorarse dividiendo las observaciones en grupos que compartan un patrón similar de densidad de población.

La idea es similar con $\beta$: algunos efectos de las covariables pueden variar según el grupo. Ejemplo: x_4, la suma de la zona poblada en un radio de 1 km, podría tener un mayor poder predictivo en las zonas rurales que en las urbanas. Las relaciones de diferencia entre la covariable y la densidad de población por tipo de asentamiento se destacan en la imagen 4.

# 4. Modelling covariates with random slope ---- ggplot(data_long %>% group_by(type) %>% mutate( type = paste0(type,' n=' ,n()), type=as.factor(type)), aes(x=pop_density,y=value, color=type))+ geom_point()+ geom_smooth(method = "lm", se = FALSE)+ theme_minimal()+ facet_wrap(.~cov, ncol=3, scales = 'free')+ labs(y='', x='Population density', color='Settlement type')

Imagen 4: Diagrama de dispersión de las covariables frente a la densidad de población por tipo de asentamiento

La modelización de $\beta_k$ por tipo de asentamiento se denomina modelo de pendiente aleatoria.

Pregunta: ¿Queremos modelizar $\beta_k$ jerárquicamente?

Haz clic para ver la solución

La modelización jerárquica de $\beta$ implica asumir que existe un patrón nacional con refinamiento subnacional. Sin embargo, los parámetros $\beta_{k,t}$ pueden tener direcciones opuestas (véase la imagen 4), lo que contradice un parámetro global $\beta$ común.

Formalmente, un modelo de pendiente aleatoria se escribe de la siguiente manera:
\[\begin{equation} population 〜 Poisson( pop\_density * settled\_area) \\ pop\_density 〜 Lognormal(\mu, \: \sigma) \\[10pt] \mu = \alpha_{t,r} + \beta^{random}_t X^{random} + \beta^{fixed} X^{fixed} \\[15pt] \beta^{random}_t 〜 Normal(0,10) \\ \beta^{fixed} 〜 Normal(0,10) \end{equation}\]
La diferencia puede verse en la indexación: $\beta^{random}$ está indexado por $t$. De forma similar al marco sin agrupación del tutorial 2, establecemos las distribuciones a priori como independientes, a saber, normales (0, 10).

La aplicación en stan es la siguiente:

// Model 1: Hierarchical alpha by settlement type , region + covariates data{ ... // fixed slope int<lower=0> ncov_fixed; // number of covariates -1 matrix[n, ncov_fixed] cov_fixed; // covariates // random slope vector[n] cov_random; } parameters{ ... // slope row_vector[ncov_fixed] beta_fixed; vector[ntype] beta_random; } transformed parameters{ ... vector[n] beta; for(idx in 1:n){ beta[idx] = sum( cov_fixed[idx,] .* beta_fixed) + cov_random[idx] * beta_random[type[idx]]; pop_density_median[idx] = alpha_t_r[type[idx], region[idx]] + beta[idx]; } } model{ ... //slope beta_fixed ~ normal(0,10); beta_random ~ normal(0,10); } generated quantities{ ... vector[n] beta_hat; for(idx in 1:n){ beta_hat[idx] = sum( cov_fixed[idx,] .* beta_fixed) + cov_random[idx] * beta_random[type[idx]]; density_hat[idx] = lognormal_rng( alpha_t_r[type[idx], region[idx]] + beta_hat[idx], sigma ); ... }

Ten en cuenta que escribimos el código para modelizar solo una covariable aleatoria, de tal manera que beta_random es solo un vector que contiene el parámetro $\beta^{random}_t$. Para implementar varios efectos aleatorios necesitaríamos una matriz (tipo de asentamiento x número de covariables).

Para ejecutar el modelo, distinguimos en los datos de entrada entre las covariables que son fijas y las que son aleatorias. Elegimos x_4, la suma de la zona poblada en un radio de 1 km, para modelizarla con un efecto aleatorio.

Ten en cuenta que esta configuración permite probar el modelo con diferentes covariables candidatas para el efecto aleatorio.

# prepare stan data stan_data_model2 <- list( population = data$N, n = nrow(data), area = data$A, type = data$type, ntype= n_distinct(data$type), region = data$region, nregion = n_distinct(data$region), seed=seed, cov_fixed = covs_scaled %>% select(-x4), ncov_fixed = ncol(covs_scaled) -1, cov_random = covs_scaled$x4 ) pars <- c('alpha','sigma','beta_fixed','beta_random','alpha_t','alpha_t_r', 'nu_alpha', 'nu_alpha_t', 'population_hat', 'density_hat') # initialise inits.out <- list() set.seed(stan_data_model2$seed) for (c in 1:chains){ inits.i <- list() # intercept inits.i$sigma <- runif(1, 0.4, 0.8) inits.i$alpha <- runif(1, 3, 6) inits.i$beta_fixed <- runif(stan_data_model2$ncov_fixed, -1, 1) inits.i$beta_random <- runif(stan_data_model2$ntype, -1, 1) inits.out[[c]] <- inits.i } # mcmc fit2 <- rstan::stan(file = file.path('tutorials/tutorial3/tutorial3_model2.stan'), data = stan_data_model2, iter = warmup + iter, chains = chains, warmup = warmup, pars = pars, seed = seed, init = inits.out)

## Warning in validityMethod(object): The following variables have undefined ## values: population_hat[1],The following variables have undefined ## values: population_hat[2],The following variables have undefined ## values: population_hat[3],The following variables have undefined ## values: population_hat[4],The following variables have undefined ## values: population_hat[5],The following variables have undefined ## values: population_hat[6],The following variables have undefined ## values: population_hat[7],The following variables have undefined ## values: population_hat[8],The following variables have undefined ## values: population_hat[9],The following variables have undefined values: ## population_hat[10],The following variables have undefined values: ## population_hat[11],The following variables have undefined values: ## population_hat[12],The following variables have undefined values: ## population_hat[13],The following variables have undefined values: ## population_hat[14],The following variables have undefined values: ## population_hat[15],The following variables have undefined values: ## population_hat[16],The following variables have undefined values: ## population_hat[17],The following variables have undefined values: ## population_hat[18],The following variables have undefined values: ## population_hat[19],The following variables have undefined values: ## population_hat[20],The following variables have undefined values: ## population_hat[21],The following variables have undefined values: ## population_hat[22],The following variables have undefined values: ## population_hat[23],The following variables have undefined values: ## population_hat[24],The following variables have undefined values: ## population_hat[25],The following variables have undefined values: ## population_hat[26],The following variables have undefined values: ## population_hat[27],The following variables have undefined values: ## population_hat[28],The following variables have undefined values: ## population_hat[29],The following variables have undefined values: ## population_hat[30],The following variables have undefined values: ## population_hat[31],The following variables have undefined values: ## population_hat[32],The following variables have undefined values: ## population_hat[33],The following variables have undefined values: ## population_hat[34],The following variables have undefined values: ## population_hat[35],The following variables have undefined values: ## population_hat[36],The following variables have undefined values: ## population_hat[37],The following variables have undefined values: ## population_hat[38],The following variables have undefined values: ## population_hat[39],The following variables have undefined values: ## population_hat[40],The following variables have undefined values: ## population_hat[41],The following variables have undefined values: ## population_hat[42],The following variables have undefined values: ## population_hat[43],The following variables have undefined values: ## population_hat[44],The following variables have undefined values: ## population_hat[45],The following variables have undefined values: ## population_hat[46],The following variables have undefined values: ## population_hat[47],The following variables have undefined values: ## population_hat[48],The following variables have undefined values: ## population_hat[49],The following variables have undefined values: ## population_hat[50],The following variables have undefined values: ## population_hat[51],The following variables have undefined values: ## population_hat[52],The following variables have undefined values: ## population_hat[53],The following variables have undefined values: ## population_hat[54],The following variables have undefined values: ## population_hat[55],The following variables have undefined values: ## population_hat[56],The following variables have undefined values: ## population_hat[57],The following variables have undefined values: ## population_hat[58],The following variables have undefined values: ## population_hat[59],The following variables have undefined values: ## population_hat[60],The following variables have undefined values: ## population_hat[61],The following variables have undefined values: ## population_hat[62],The following variables have undefined values: ## population_hat[63],The following variables have undefined values: ## population_hat[64],The following variables have undefined values: ## population_hat[65],The following variables have undefined values: ## population_hat[66],The following variables have undefined values: ## population_hat[67],The following variables have undefined values: ## population_hat[68],The following variables have undefined values: ## population_hat[69],The following variables have undefined values: ## population_hat[70],The following variables have undefined values: ## population_hat[71],The following variables have undefined values: ## population_hat[72],The following variables have undefined values: ## population_hat[73],The following variables have undefined values: ## population_hat[74],The following variables have undefined values: ## population_hat[75],The following variables have undefined values: ## population_hat[76],The following variables have undefined values: ## population_hat[77],The following variables have undefined values: ## population_hat[78],The following variables have undefined values: ## population_hat[79],The following variables have undefined values: ## population_hat[80],The following variables have undefined values: ## population_hat[81],The following variables have undefined values: ## population_hat[82],The following variables have undefined values: ## population_hat[83],The following variables have undefined values: ## population_hat[84],The following variables have undefined values: ## population_hat[85],The following variables have undefined values: ## population_hat[86],The following variables have undefined values: ## population_hat[87],The following variables have undefined values: ## population_hat[88],The following variables have undefined values: ## population_hat[89],The following variables have undefined values: ## population_hat[90],The following variables have undefined values: ## population_hat[91],The following variables have undefined values: ## population_hat[92],The following variables have undefined values: ## population_hat[93],The following variables have undefined values: ## population_hat[94],The following variables have undefined values: ## population_hat[95],The following variables have undefined values: ## population_hat[96],The following variables have undefined values: ## population_hat[97],The following variables have undefined values: ## population_hat[98],The following variables have undefined values: ## population_hat[99],The following variables have undefined values: ## population_hat[100],The following variables have undefined values: ## population_hat[101],The following variables have undefined values: ## population_hat[102],The following variables have undefined values: ## population_hat[103],The following variables have undefined values: ## population_hat[104],The following variables have undefined values: ## population_hat[105],The following variables have undefined values: ## population_hat[106],The following variables have undefined values: ## population_hat[107],The following variables have undefined values: ## population_hat[108],The following variables have undefined values: ## population_hat[109],The following variables have undefined values: ## population_hat[110],The following variables have undefined values: ## population_hat[111],The following variables have undefined values: ## population_hat[112],The following variables have undefined values: ## population_hat[113],The following variables have undefined values: ## population_hat[114],The following variables have undefined values: ## population_hat[115],The following variables have undefined values: ## population_hat[116],The following variables have undefined values: ## population_hat[117],The following variables have undefined values: ## population_hat[118],The following variables have undefined values: ## population_hat[119],The following variables have undefined values: ## population_hat[120],The following variables have undefined values: ## population_hat[121],The following variables have undefined values: ## population_hat[122],Th

No hay problemas de convergencia. Trazamos el vector beta_random con un parámetro $\hat\beta_t$ para cada tipo de asentamiento.

# plot beta estimation stan_plot(fit2, pars='beta_random', fill_color='orange')+ # add alpha from tutorial 1 geom_vline(xintercept=-0.006515444, size=1.5, linetype=2)+ annotate('text', x=0.1, y=5.7, label="beta for cov 4 \nfrom first model")

Vemos que la modelización de beta^{x4}\) por tipo de asentamiento revela patrones diferentes: observamos un efecto no significativo para los asentamientos 1 y 3, los tipos más urbanizados. Esto era de esperar, ya que es probable que la suma de la zona poblada sea homogénea en toda la zona urbanizada. Vemos también que la anterior estimación $\beta_4$ enmascaraba el efecto en dirección opuesta entre el asentamiento 2 y el asentamiento 4, 5.

Ahora queremos evaluar el efecto sobre el recuento de población previsto para cada sitio de estudio.

# extract predictions comparison_df <- rbind( getPopPredictions(fit1) %>% mutate(model='Fixed effect'), getPopPredictions(fit2) %>% mutate(model='Random effect in x4')) # compute goodness-of-fit metrics comparison_df %>% group_by(model) %>% summarise( `Bias`= mean(residual), `Inaccuracy` = mean(abs(residual)), `Imprecision` = sd(residual) ) %>% kbl(caption = 'Goodness-of-metrics comparison with and without random effect in x4 ') %>% kable_minimal()

Table 3: Goodness-of-metrics comparison with and without random effect in x4

model Bias Inaccuracy Imprecision

Fixed effect 33.12621 193.7379 276.0208

Random effect in x4 33.44278 192.9332 275.6143

Se observa una ligera disminución del patrón y un aumento de la precisión de las estimaciones.

Guardaremos los resultados de este modelo final como un archivo RDS para analizarlo en el tutorial 4.

# save model saveRDS(fit2, 'tutorials/tutorial3/tutorial3_model2_fit.rds')

Agradecimientos

Este tutorial ha sido redactado por Edith Darin, de WorldPop, Universidad de Southampton, y Douglas Leasure, del Leverhulme Centre for Demographic Science, Universidad de Oxford, con la supervisión de Andrew Tatem, de WorldPop, Universidad de Southampton. Este trabajo ha sido financiado por el Fondo de Población de las Naciones Unidas (UNFPA, por sus siglas en inglés).

Licencia

El presente documento puede redistribuirse libremente bajo los términos de una licencia Creative Commons Atribución-SinDerivadas 4.0 Internacional (CC BY-ND 4.0).

Referencias

Ecopia.AI y Maxar Technologies. 2019. Digitize Africa Data.

Institut Géographique du Burkina Faso. 2015. “Base Nationale de Données Topographiques”.

Leasure, Douglas R, Warren C Jochem, Eric M Weber, Vincent Seaman y Andrew J Tatem. 2020. “National population mapping from sparse survey data: A Hierarchical Bayesian Modeling Framework to Account for Uncertainty”. Proceedings of the National Academy of Sciences.

Laboratorio Nacional Oak Ridge. 2018. “LandScan HD: Nigeria”.

Colaboradores de OpenStreetMap. 2018. Planet Dump obtenido en Https://Planet.osm.org.

Weiss, D J, A Nelson, HS Gibson, W Temperley, S Peedell, A Lieber, M Hancher, et al. 2018. “A Global Map of Travel Time to Cities to Assess Inequalities in Accessibility in 2015”. Nature 553 (7688): 333336.

Grupo de investigación WorldPop, Departamento de Geografía y Geociencias, Universidad de Louisville, Departamento de Geografía, Universidad de Namur, y Centro para la Red Internacional de Información sobre Ciencias de la Tierra (CIESIN, por sus siglas en inglés), Universidad de Columbia. 2018. “Global High Resolution Population Denominators Project - Funded by the Bill and Melinda Gates Foundation (Opp1134076)”. https://doi.org/10.5258/SOTON/WP00645.

Modelización de la población para apoyar los censos

Tutorial 3: ¿Cómo modelizar las variaciones espaciales a pequeña escala?

Elaborado el 11/2/2022