Introducción

En esta sección, nos basamos en las lecciones aprendidas en los tutoriales 1 a 4 para modelizar un nuevo recuento de población. Destacaremos la importancia de elaborar una descripción sobre el proceso de generación de datos para apoyar el flujo de trabajo de modelización. También hablaremos de los problemas de estimación que plantean los modelos jerárquicos complejos y que pueden abordarse con un cambio en la parametrización del modelo.

Objetivos

  1. Desarrollar paso a paso un flujo de trabajo de modelización para el recuento de población

  2. Comprender el proceso iterativo de modelización basado en la evaluación de las limitaciones de los modelos

  3. Desentrañar el impacto de la parametrización en la convergencia de las estimaciones

Lecturas complementarias

La lista de lecturas incluye todas las referencias mencionadas en los tutoriales 1-4. Un recurso adicional muy valioso para pasar al siguiente nivel en la modelización bayesiana en Stan son los escritos del Dr. Michael Betancourt.

Los artículos clave relacionados con nuestros modelos de población son:

Flujo de trabajo de modelización

Ahora aplicaremos nuestro aprendizaje al nuevo conjunto de datos. Primero configuramos la sesión para ejecutar los modelos en Stan.

# stan set-up
options(mc.cores = parallel::detectCores()-1)
rstan_options(auto_write = TRUE)
chains <- 4
iter <- 500
warmup <- 300
seed <- 12345677

Echamos un vistazo al nuevo conjunto de datos que contiene 1000 observaciones.

observations <- read_csv(here('tutorials', 'workflow','observations.csv'),
                         col_types = "nn")

# Descriptive statistics --------------------------------------------------
hist(observations$pop)

La distribución de los recuentos de población en general sigue una curva de campana centrada en torno a 550 personas por grupo.

Ajustar un modelo de Poisson

Nuestra variable objetivo, la población, es una variable de recuento. En primer lugar, la modelizamos mediante una distribución de Poisson que tiene un único parámetro, la tasa de Poisson, que define tanto la media como la varianza.

Para definir la distribución a priori de la tasa de Poisson, utilizamos una media normal para restringirla a valores positivos y centrarla en torno a 500 con una varianza bastante amplia de 50. Simulamos esta distribución a priori para ver si los resultados parecen razonables.

# Prior predictive check
alpha_national_simulated <- abs(rnorm(1000, 500, 50))
pop_poisson_simulated <- sapply(alpha_national_simulated, function(x) rpois(1, x))

hist(pop_poisson_simulated)

Este modelo a priori indica una distribución del recuento de población centrada en torno a 500 y como límite 300 y 700 personas por grupo.

Este primer modelo puede resumirse mediante un gráfico como el siguiente (en el círculo verde los datos de entrada, en el cuadrado naranja el parámetro):

Poisson model

Imagen 1: Modelo de Poisson

Ajustamos el modelo.

# fit the model
input_data <- list(
  n_obs =  nrow(observations),
  pop = observations$pop
)
pars_poisson <- c('alpha_national', 'pop_post_pred')

fit_poisson <- stan(
  file= here('tutorials', 'workflow', 'poisson.stan'),
  data= input_data,
  iter = iter + warmup,
  warmup = warmup,
  seed = seed,
  pars = pars_poisson
)

# extract model outputs
samples_poisson <- rstan::extract(fit_poisson)

Stan no nos advirtió de ningún problema de convergencia. Volvemos a comprobarlo con un trazado del parámetro único alpha_national.

# visualise convergence
traceplot(fit_poisson, pars = 'alpha_national')

El trazado muestra que las cuatro cadenas convergen y son estables.

Comparamos el parámetro estimado a posteriori con el parámetro a priori, lo que se conoce como comprobación retrodictiva a priori.

# prior retrodictive check
alpha_national <- tibble(
  posterior = samples_poisson$alpha_national,
  prior = abs(rnorm(chains*iter, 500, 50)),
  iter = 1:(chains*iter)
) %>% 
  pivot_longer(-iter, names_to = 'distribution')

ggplot(alpha_national, aes(x=value, fill=distribution))+
  geom_histogram(binwidth = 5)+
  theme_minimal()

Vemos que nuestro parámetro a priori está en línea con el parámetro a posteriori y no limita en exceso la estimación.

Por último, comparamos los recuentos de población previstos con los observados mediante una comprobación predictiva a posteriori.

# posterior predictive check
pop_posterior <- tibble(
  source = factor(c(rep('predicted', iter*chains*input_data$n_obs), 
                    rep('Observed', input_data$n_obs)), 
                  levels = c('predicted', 'Observed')),
  value= c(as.vector(samples_poisson$pop_post_pred), input_data$pop)
)


ggplot(pop_posterior, aes(x=value, fill=source, after_stat(density)))+
  geom_histogram(binwidth=5, position='identity', alpha=0.7)+
  theme_minimal()

Vemos que los recuentos de población previstos están excesivamente concentrados en torno a la media y no tienen en cuenta la mayor dispersión de las observaciones. Esto se debe a la distribución de Poisson, que tiene un único parámetro que rige tanto su media como su varianza.

Ajustar un modelo Poisson lognormal

Para tener en cuenta la dispersión excesiva, utilizamos un segundo componente de modelización: una distribución lognormal con parámetros para la media, \(alpha_{national}\), y la varianza, \(sigma\). Definimos sus parámetros a priori para obtener una distribución de población razonable: para alpha una media normal centrada en log(500) –similar al parámetro a priori previo para la tasa de Poisson, pero en la escala logarítmica– y para sigma una media normal centrada en 0,3 (la desviación estándar del logaritmo de las observaciones es 0,2).

# Prior predictive check
alpha_simulated <- abs(rnorm(1000, log(500), 0.1))
sigma_simulated <- abs(rnorm(1000, 0.3, 0.1))
lambda_simulated <- mapply( function(x,y) rlnorm(1, x, y), alpha_simulated, sigma_simulated )
pop_lognormal_simulated <- sapply(lambda_simulated, function(x) rpois(1, x))
comp_pop <- tibble(
  model = factor(c(rep('Poisson', length(pop_poisson_simulated)), 
                   rep('Poisson-lognormal',length(pop_lognormal_simulated))),
                 levels=c('Poisson-lognormal','Poisson')),
  value= c(pop_poisson_simulated,pop_lognormal_simulated)
)

ggplot(comp_pop, aes(x=value, fill=model))+
  geom_histogram(bins = 100,position='identity', alpha=0.8)+
  theme_minimal()+
  labs(fill='Prior model', x='Simulated population counts from the prior model', y='')

El modelo de distribución a priori Poisson lognormal tiene en cuenta más variaciones que el modelo de distribución a priori Poisson.

El modelo Poisson lognormal puede resumirse en el siguiente gráfico:
Poisson-Lognormal model

Imagen 2: Modelo Poisson lognormal

Ejecutamos el modelo.

# run the model
pars_lognormal <- c('alpha_national', 'sigma', 'pop_post_pred')

fit_poisson_lognormal <- stan(
  file= here('tutorials', 'workflow','poisson_lognormal.stan'),
  data= input_data,
  iter = iter + warmup,
  warmup = warmup,
  seed = seed,
  pars = pars_lognormal
)

No se emiten advertencias. Volvemos a comprobarlo con trazados de los parámetros clave:

traceplot(fit_poisson_lognormal, pars = 'alpha_national')+
traceplot(fit_poisson_lognormal, pars = 'sigma')

Comparamos los parámetros a posteriori con los parámetros a priori.

samples_poisson_lognormal <- rstan::extract(fit_poisson_lognormal)

# prior retrodictive check
alpha_national <- tibble(
  posterior = samples_poisson_lognormal$alpha_national,
  prior = abs(rnorm(chains*iter, log(500), 0.1)),
  iter = 1:(chains*iter)
) %>% 
  pivot_longer(-iter, names_to = 'distribution')

sigma <- tibble(
  posterior = samples_poisson_lognormal$sigma,
  prior = abs(rnorm(chains*iter, 0.3, 0.1)),
  iter = 1:(chains*iter)) %>% 
  pivot_longer(-iter, names_to = 'distribution')

ggplot(alpha_national, aes(x=value, fill=distribution))+
  geom_histogram(bins=100, position='identity', alpha=0.8)+
  theme_minimal()+
  labs(title='Prior retrodictive check for alpha national')+
 ggplot(sigma, aes(x=value, fill=distribution))+
  geom_histogram(bins=100, position='identity', alpha=0.8)+
  theme_minimal()+
  labs(title='Prior retrodictive check for sigma')

Vemos que los parámetros a priori están en línea con los parámetros a posteriori y no limitan en exceso la estimación.

Por último, comparamos los recuentos de población previstos con los observados mediante una comprobación predictiva a posteriori.

# posterior predictive check
pop_posterior_lognormal <- tibble(
  source = factor(c(rep('Predicted - Poisson', iter*chains*input_data$n_obs), 
                    rep('Predicted - Poisson-lognormal', iter*chains*input_data$n_obs),
                    rep('Observed', input_data$n_obs)), 
                  levels = c('Observed', 'Predicted - Poisson-lognormal', 'Predicted - Poisson')),
  value= c(
    as.vector(samples_poisson$pop_post_pred), 
    as.vector(samples_poisson_lognormal$pop_post_pred), 
    input_data$pop)
)

ggplot(pop_posterior_lognormal, aes(x=value, fill=source, after_stat(density)))+
  geom_histogram(bins=100, position='identity', alpha=0.7)+
  theme_minimal()+
  labs(x='Population counts')

La distribución a posteriori – o los recuentos previstos– del modelo Poisson lognormal tiene una forma que se ajusta más a los recuentos de población observados. Sin embargo, vemos que nuestra distribución observada muestra dos modos distintos que no pueden abordarse con un modelo lognormal que solo tiene un único valor medio.

Ajustar un modelo de intercepción aleatoria

En nuestros datos de entrada tenemos un segundo atributo que contiene información sobre el tipo de asentamiento, 1 o 2. Contrastemos la diferencia en la distribución de la población observada en función de este atributo:

# Descriptive statistics 
ggplot(observations, aes(x=pop, fill=as_factor(settlement), after_stat(density)))+
  geom_histogram(bins=50, position='identity', alpha=0.8)+
  theme_minimal()+
  labs(x='Observed population counts', fill='Settlement')

Si se distingue la distribución de la población por tipo de asentamiento, se observa que el tipo 1 tiene claramente más población que el tipo 2.

Modelizamos esa diferencia mediante un modelo de pendiente aleatoria que estima independientemente la media de la distribución lognormal por tipo de asentamiento:

Independent random intercept model

Imagen 3: Modelo de intercepción aleatoria independiente

Ejecutamos el modelo:

# run the model
input_data <- list(
  n_obs =  nrow(observations),
  pop = observations$pop,
  n_settlement = length(unique(observations$settlement)),
  obs_to_settlement = as.integer(observations$settlement)
)
pars_lognormal_independent <- c('pop_post_pred', 'alpha_settlement',  'sigma')

fit_poisson_lognormal_independent <- stan(
  file= here('tutorials', 'workflow','poisson_independent.stan'),
  data= input_data,
  iter = iter + warmup,
  warmup = warmup,
  seed = seed,
  pars = pars_lognormal_independent
)
## Warning in validityMethod(object): The following variables have
## undefined values: pop_post_pred[1],The following variables have
## undefined values: pop_post_pred[2],The following variables have
## undefined values: pop_post_pred[3],The following variables have
## undefined values: pop_post_pred[4],The following variables have
## undefined values: pop_post_pred[5],The following variables have
## undefined values: pop_post_pred[6],The following variables have
## undefined values: pop_post_pred[7],The following variables have
## undefined values: pop_post_pred[8],The following variables have undefined
## values: pop_post_pred[9],The following variables have undefined values:
## pop_post_pred[10],The following variables have undefined values:
## pop_post_pred[11],The following variables have undefined values:
## pop_post_pred[12],The following variables have undefined values:
## pop_post_pred[13],The following variables have undefined values:
## pop_post_pred[14],The following variables have undefined values:
## pop_post_pred[15],The following variables have undefined values:
## pop_post_pred[16],The following variables have undefined values:
## pop_post_pred[17],The following variables have undefined values:
## pop_post_pred[18],The following variables have undefined values:
## pop_post_pred[19],The following variables have undefined values:
## pop_post_pred[20],The following variables have undefined values:
## pop_post_pred[21],The following variables have undefined values:
## pop_post_pred[22],The following variables have undefined values:
## pop_post_pred[23],The following variables have undefined values:
## pop_post_pred[24],The following variables have undefined values:
## pop_post_pred[25],The following variables have undefined values:
## pop_post_pred[26],The following variables have undefined values:
## pop_post_pred[27],The following variables have undefined values:
## pop_post_pred[28],The following variables have undefined values:
## pop_post_pred[29],The following variables have undefined values:
## pop_post_pred[30],The following variables have undefined values:
## pop_post_pred[31],The following variables have undefined values:
## pop_post_pred[32],The following variables have undefined values:
## pop_post_pred[33],The following variables have undefined values:
## pop_post_pred[34],The following variables have undefined values:
## pop_post_pred[35],The following variables have undefined values:
## pop_post_pred[36],The following variables have undefined values:
## pop_post_pred[37],The following variables have undefined values:
## pop_post_pred[38],The following variables have undefined values:
## pop_post_pred[39],The following variables have undefined values:
## pop_post_pred[40],The following variables have undefined values:
## pop_post_pred[41],The following variables have undefined values:
## pop_post_pred[42],The following variables have undefined values:
## pop_post_pred[43],The following variables have undefined values:
## pop_post_pred[44],The following variables have undefined values:
## pop_post_pred[45],The following variables have undefined values:
## pop_post_pred[46],The following variables have undefined values:
## pop_post_pred[47],The following variables have undefined values:
## pop_post_pred[48],The following variables have undefined values:
## pop_post_pred[49],The following variables have undefined values:
## pop_post_pred[50],The following variables have undefined values:
## pop_post_pred[51],The following variables have undefined values:
## pop_post_pred[52],The following variables have undefined values:
## pop_post_pred[53],The following variables have undefined values:
## pop_post_pred[54],The following variables have undefined values:
## pop_post_pred[55],The following variables have undefined values:
## pop_post_pred[56],The following variables have undefined values:
## pop_post_pred[57],The following variables have undefined values:
## pop_post_pred[58],The following variables have undefined values:
## pop_post_pred[59],The following variables have undefined values:
## pop_post_pred[60],The following variables have undefined values:
## pop_post_pred[61],The following variables have undefined values:
## pop_post_pred[62],The following variables have undefined values:
## pop_post_pred[63],The following variables have undefined values:
## pop_post_pred[64],The following variables have undefined values:
## pop_post_pred[65],The following variables have undefined values:
## pop_post_pred[66],The following variables have undefined values:
## pop_post_pred[67],The following variables have undefined values:
## pop_post_pred[68],The following variables have undefined values:
## pop_post_pred[69],The following variables have undefined values:
## pop_post_pred[70],The following variables have undefined values:
## pop_post_pred[71],The following variables have undefined values:
## pop_post_pred[72],The following variables have undefined values:
## pop_post_pred[73],The following variables have undefined values:
## pop_post_pred[74],The following variables have undefined values:
## pop_post_pred[75],The following variables have undefined values:
## pop_post_pred[76],The following variables have undefined values:
## pop_post_pred[77],The following variables have undefined values:
## pop_post_pred[78],The following variables have undefined values:
## pop_post_pred[79],The following variables have undefined values:
## pop_post_pred[80],The following variables have undefined values:
## pop_post_pred[81],The following variables have undefined values:
## pop_post_pred[82],The following variables have undefined values:
## pop_post_pred[83],The following variables have undefined values:
## pop_post_pred[84],The following variables have undefined values:
## pop_post_pred[85],The following variables have undefined values:
## pop_post_pred[86],The following variables have undefined values:
## pop_post_pred[87],The following variables have undefined values:
## pop_post_pred[88],The following variables have undefined values:
## pop_post_pred[89],The following variables have undefined values:
## pop_post_pred[90],The following variables have undefined values:
## pop_post_pred[91],The following variables have undefined values:
## pop_post_pred[92],The following variables have undefined values:
## pop_post_pred[93],The following variables have undefined values:
## pop_post_pred[94],The following variables have undefined values:
## pop_post_pred[95],The following variables have undefined values:
## pop_post_pred[96],The following variables have undefined values:
## pop_post_pred[97],The following variables have undefined values:
## pop_post_pred[98],The following variables have undefined values:
## pop_post_pred[99],The following variables have undefined values:
## pop_post_pred[100],The following variables have undefined values:
## pop_post_pred[101],The following variables have undefined values:
## pop_post_pred[102],The following variables have undefined values:
## pop_post_pred[103],The following variables have undefined values:
## pop_post_pred[104],The following variables have undefined values:
## pop_post_pred[105],The following variables have undefined values:
## pop_post_pred[106],The following variables have undefined values:
## pop_post_pred[107],The following variables have undefined values:
## pop_post_pred[108],The following variables have undefined values:
## pop_post_pred[109],The following variables have undefined values:
## pop_post_pred[110],The following variables have undefined values:
## pop_post_pred[111],The following variables have undefined values:
## pop_post_pred[112],The following variables have undefined values:
## pop_post_pred[113],The following variables have undefined values:
## pop_post_pred[114],The following variables have undefined values:
## pop_post_pred[115],The following variables have undefined values:
## pop_post_pred[116],The following variables have undefined values:
## pop_post_pred[117],The following variables have undefined values:
## pop_post_pred[118],The following variables have undefined values:
## pop_post_pred[119],The following variables have undefined values:
## pop_post_pred[120],The following variables have undefined values:
## pop_post_pred[121],The following variables have undefined values:
## pop_post_pred[122],The following variables have undefined values:
## pop_post_pred[123],The following variables have undefined values: pop_post_

Stan emite repetidamente la siguiente advertencia: Las siguientes variables tienen valores indefinidos: pop_post_pred. Ya obtuvimos esta advertencia en el tutorial 2, que es típica de un exceso de poisson_rng: en el bloque de cantidades generadas, el pop_post_pred es el resultado de la función poisson_rng, que no puede recibir entradas demasiado altas (por encima de 1,07374e+09). Sin embargo, esto puede ocurrir durante el período de calentamiento, cuando todavía se está explorando el espacio de parámetros para encontrar valores de parámetros que se ajusten tanto a priori como a los datos. Podemos hacer caso omiso de esta advertencia.

Comprobamos los trazados para ver si hay problemas de convergencia:

traceplot(fit_poisson_lognormal_independent, pars = 'alpha_settlement[1]')+
  traceplot(fit_poisson_lognormal_independent, pars = 'alpha_settlement[2]')+
traceplot(fit_poisson_lognormal_independent, pars = 'sigma')

Evaluamos cómo se comparan los parámetros estimados con los parámetros a priori.

samples_poisson_lognormal_independent <- rstan::extract(fit_poisson_lognormal_independent)

# prior retrodictive check
alpha_settlement <- as_tibble(samples_poisson_lognormal_independent$alpha_settlement,
                              .name_repair = 'minimal')
colnames(alpha_settlement) <- c('posterior_alpha_settlement_1', 'posterior_alpha_settlement_2')
alpha_settlement <- alpha_settlement %>% 
  mutate (
    prior = abs(rnorm(chains*iter, log(500), 0.1)),
    iter = 1:(chains*iter)
  ) %>% 
  pivot_longer(-iter, names_to = 'distribution')

ggplot(alpha_settlement, aes(x=value, fill=distribution))+
  geom_histogram(bins=100,position='identity', alpha=0.7)+
  theme_minimal()

Los dos \(alpha_{settlement}\) se sitúan en las dos colas del parámetro a priori. La mayor heterogeneidad de las estimaciones de alpha_settlement para el asentamiento 2, visible a través de un histograma más amplio, se debe al menor número de muestras disponibles en esta categoría (200 frente a 800).

Ahora podemos comparar los recuentos de población posteriores con los valores observados.

# posterior predictive check
pop_posterior_lognormal_independent <- tibble(
  source = factor(c(
    rep('Predicted - Poisson-lognormal', iter*chains*input_data$n_obs),
    rep('Predicted - Poisson-lognormal independent', iter*chains*input_data$n_obs),
    rep('Observed', input_data$n_obs)), 
    levels = c('Observed','Predicted - Poisson-lognormal independent', 
               'Predicted - Poisson-lognormal')),
  value= c(
    as.vector(samples_poisson_lognormal$pop_post_pred), 
    as.vector(samples_poisson_lognormal_independent$pop_post_pred),
    input_data$pop)
)

ggplotly(ggplot(pop_posterior_lognormal_independent, aes(x=value, fill=source, after_stat(density)))+
           geom_histogram(bins=50, position='identity', alpha=0.5)+
           theme_minimal()+
  labs(x='Population counts'))

No dudes en hacer clic en la leyenda para resaltar la comparación.

Vemos que el modelo de intercepción aleatoria vuelve a equilibrar la predicción excesiva de los recuentos en torno a 500 transfiriendo las predicciones a los recuentos en torno a 400 y 600.

Ajustar un modelo jerárquico

Suponer intercepciones independientes significa dividir los datos para estimar por separado cada intercepción y, por lo tanto, reduce la potencia estadística de la estimación. Además, si un tipo de asentamiento no está en la muestra utilizada para ajustar el modelo, no se puede extrapolar ninguna predicción para él.

Por lo tanto, ajustamos un modelo jerárquico que parte de una distribución nacional de alpha a partir de la cual se extrae el parámetro alpha por tipo de asentamiento:

Hierachical intercept model

Imagen 4: Modelo de intercepción jerárquica

Ajustamos el modelo:

pars_lognormal_hierarchical_cp <- c('alpha_national','alpha_settlement', 'u_alpha_settlement', 'sigma', 'pop_post_pred')

fit_poisson_lognormal_hierarchical_cp <- stan(
  file= here('tutorials', 'workflow','poisson_hierarchical_cp.stan'),
  data= input_data,
  iter = iter + warmup,
  warmup = warmup,
  seed = seed,
  pars = pars_lognormal_hierarchical_cp
)
## Warning in validityMethod(object): The following variables have
## undefined values: pop_post_pred[1],The following variables have
## undefined values: pop_post_pred[2],The following variables have
## undefined values: pop_post_pred[3],The following variables have
## undefined values: pop_post_pred[4],The following variables have
## undefined values: pop_post_pred[5],The following variables have
## undefined values: pop_post_pred[6],The following variables have
## undefined values: pop_post_pred[7],The following variables have
## undefined values: pop_post_pred[8],The following variables have undefined
## values: pop_post_pred[9],The following variables have undefined values:
## pop_post_pred[10],The following variables have undefined values:
## pop_post_pred[11],The following variables have undefined values:
## pop_post_pred[12],The following variables have undefined values:
## pop_post_pred[13],The following variables have undefined values:
## pop_post_pred[14],The following variables have undefined values:
## pop_post_pred[15],The following variables have undefined values:
## pop_post_pred[16],The following variables have undefined values:
## pop_post_pred[17],The following variables have undefined values:
## pop_post_pred[18],The following variables have undefined values:
## pop_post_pred[19],The following variables have undefined values:
## pop_post_pred[20],The following variables have undefined values:
## pop_post_pred[21],The following variables have undefined values:
## pop_post_pred[22],The following variables have undefined values:
## pop_post_pred[23],The following variables have undefined values:
## pop_post_pred[24],The following variables have undefined values:
## pop_post_pred[25],The following variables have undefined values:
## pop_post_pred[26],The following variables have undefined values:
## pop_post_pred[27],The following variables have undefined values:
## pop_post_pred[28],The following variables have undefined values:
## pop_post_pred[29],The following variables have undefined values:
## pop_post_pred[30],The following variables have undefined values:
## pop_post_pred[31],The following variables have undefined values:
## pop_post_pred[32],The following variables have undefined values:
## pop_post_pred[33],The following variables have undefined values:
## pop_post_pred[34],The following variables have undefined values:
## pop_post_pred[35],The following variables have undefined values:
## pop_post_pred[36],The following variables have undefined values:
## pop_post_pred[37],The following variables have undefined values:
## pop_post_pred[38],The following variables have undefined values:
## pop_post_pred[39],The following variables have undefined values:
## pop_post_pred[40],The following variables have undefined values:
## pop_post_pred[41],The following variables have undefined values:
## pop_post_pred[42],The following variables have undefined values:
## pop_post_pred[43],The following variables have undefined values:
## pop_post_pred[44],The following variables have undefined values:
## pop_post_pred[45],The following variables have undefined values:
## pop_post_pred[46],The following variables have undefined values:
## pop_post_pred[47],The following variables have undefined values:
## pop_post_pred[48],The following variables have undefined values:
## pop_post_pred[49],The following variables have undefined values:
## pop_post_pred[50],The following variables have undefined values:
## pop_post_pred[51],The following variables have undefined values:
## pop_post_pred[52],The following variables have undefined values:
## pop_post_pred[53],The following variables have undefined values:
## pop_post_pred[54],The following variables have undefined values:
## pop_post_pred[55],The following variables have undefined values:
## pop_post_pred[56],The following variables have undefined values:
## pop_post_pred[57],The following variables have undefined values:
## pop_post_pred[58],The following variables have undefined values:
## pop_post_pred[59],The following variables have undefined values:
## pop_post_pred[60],The following variables have undefined values:
## pop_post_pred[61],The following variables have undefined values:
## pop_post_pred[62],The following variables have undefined values:
## pop_post_pred[63],The following variables have undefined values:
## pop_post_pred[64],The following variables have undefined values:
## pop_post_pred[65],The following variables have undefined values:
## pop_post_pred[66],The following variables have undefined values:
## pop_post_pred[67],The following variables have undefined values:
## pop_post_pred[68],The following variables have undefined values:
## pop_post_pred[69],The following variables have undefined values:
## pop_post_pred[70],The following variables have undefined values:
## pop_post_pred[71],The following variables have undefined values:
## pop_post_pred[72],The following variables have undefined values:
## pop_post_pred[73],The following variables have undefined values:
## pop_post_pred[74],The following variables have undefined values:
## pop_post_pred[75],The following variables have undefined values:
## pop_post_pred[76],The following variables have undefined values:
## pop_post_pred[77],The following variables have undefined values:
## pop_post_pred[78],The following variables have undefined values:
## pop_post_pred[79],The following variables have undefined values:
## pop_post_pred[80],The following variables have undefined values:
## pop_post_pred[81],The following variables have undefined values:
## pop_post_pred[82],The following variables have undefined values:
## pop_post_pred[83],The following variables have undefined values:
## pop_post_pred[84],The following variables have undefined values:
## pop_post_pred[85],The following variables have undefined values:
## pop_post_pred[86],The following variables have undefined values:
## pop_post_pred[87],The following variables have undefined values:
## pop_post_pred[88],The following variables have undefined values:
## pop_post_pred[89],The following variables have undefined values:
## pop_post_pred[90],The following variables have undefined values:
## pop_post_pred[91],The following variables have undefined values:
## pop_post_pred[92],The following variables have undefined values:
## pop_post_pred[93],The following variables have undefined values:
## pop_post_pred[94],The following variables have undefined values:
## pop_post_pred[95],The following variables have undefined values:
## pop_post_pred[96],The following variables have undefined values:
## pop_post_pred[97],The following variables have undefined values:
## pop_post_pred[98],The following variables have undefined values:
## pop_post_pred[99],The following variables have undefined values:
## pop_post_pred[100],The following variables have undefined values:
## pop_post_pred[101],The following variables have undefined values:
## pop_post_pred[102],The following variables have undefined values:
## pop_post_pred[103],The following variables have undefined values:
## pop_post_pred[104],The following variables have undefined values:
## pop_post_pred[105],The following variables have undefined values:
## pop_post_pred[106],The following variables have undefined values:
## pop_post_pred[107],The following variables have undefined values:
## pop_post_pred[108],The following variables have undefined values:
## pop_post_pred[109],The following variables have undefined values:
## pop_post_pred[110],The following variables have undefined values:
## pop_post_pred[111],The following variables have undefined values:
## pop_post_pred[112],The following variables have undefined values:
## pop_post_pred[113],The following variables have undefined values:
## pop_post_pred[114],The following variables have undefined values:
## pop_post_pred[115],The following variables have undefined values:
## pop_post_pred[116],The following variables have undefined values:
## pop_post_pred[117],The following variables have undefined values:
## pop_post_pred[118],The following variables have undefined values:
## pop_post_pred[119],The following variables have undefined values:
## pop_post_pred[120],The following variables have undefined values:
## pop_post_pred[121],The following variables have undefined values:
## pop_post_pred[122],The following variables have undefined values:
## pop_post_pred[123],The following variables have undefined values: pop_post_

Comparamos las distribuciones a posteriori de la media lognormal con la distribución a priori.

samples_poisson_lognormal_hierarchical_cp<- rstan::extract(fit_poisson_lognormal_hierarchical_cp)

# prior retrodictive check
alpha <- as_tibble(samples_poisson_lognormal_hierarchical_cp$alpha_settlement,
                   .name_repair = 'minimal')
colnames(alpha) <- c('posterior_alpha_settlement_1', 'posterior_alpha_settlement_2')

alpha <- alpha %>% 
  mutate (
    posterior_alpha_national = samples_poisson_lognormal_hierarchical_cp$alpha_national,
    prior = abs(rnorm(chains*iter, log(500), 0.1)),
    iter = 1:(chains*iter)
  ) %>% 
  pivot_longer(-iter, names_to = 'distribution')

ggplot(alpha, aes(x=value, fill=distribution))+
  geom_histogram(bins=100,position='identity', alpha=0.7)+
  theme_minimal()

Vemos que el parámetro \(alpha_{national}\) actúa como una distribución madre difusa, lo suficientemente amplia como para ajustarse al rango probable de valores de los dos hijos alphasettlement. Comparemos el parámetro \(alpha_{settlement}\) estimado de manera jerárquica o independiente.

comp_alpha <- rbind(
  alpha %>% 
    mutate(
      distribution = paste0(distribution, '_hierarchical'),
      model= 'hierarchical'
    ),
  alpha_settlement %>% 
    mutate(
      distribution = paste0(distribution, '_independent'),
      model = 'independent'
    )
)  %>% 
  filter(!grepl('prior',distribution))

ggplot(comp_alpha, aes(x=value, y=distribution, fill=model))+
  geom_boxplot()+
  theme_minimal()+
  guides(fill='none')

En la práctica, la jerarquía tiene poco impacto en este entorno controlado (las observaciones son, en realidad, simulaciones) porque se utilizaron dos procesos independientes para generar estos datos. En un ejemplo del mundo real, los procesos podrían ser más difusos, de modo que es mejor suponer primero que hay algo en común antes de separar los componentes del modelo en subclases.

Por último, podemos aplicar la comprobación de la bondad de ajuste que se utilizó habitualmente en los tutoriales: el gráfico de lo observado frente a lo previsto a escala del grupo.

# New validation: cluster-based
comp_obs_poisson_lognormal_hierarchical_cp <- as_tibble(samples_poisson_lognormal_hierarchical_cp$pop_post_pred) %>% 
  summarise(across(
    everything(), ~ c(mean(.), quantile(., probs=c(0.025, 0.5, 0.975)))
  )) %>% 
  mutate(
    metrics = c('mean', paste0('q', c(0.025, 0.5, 0.975)))
  ) %>% 
  pivot_longer(
    -metrics
  ) %>% 
  pivot_wider(names_from = metrics, values_from = value) %>% 
  mutate(
    obs = input_data$pop
  )

ggplot(comp_obs_poisson_lognormal_hierarchical_cp, 
       aes(x=obs, y=q0.5, ymin=q0.025, ymax=q0.975))+
  geom_pointrange(col='grey20')+
  theme_minimal()+
  labs(x='observations', y='predictions')+
  geom_abline(intercept=0, slope=1, size=1, color='orange')

La lógica del modelo jerárquico Poisson lognormal es claramente visible: un recuento medio de población por tipo de asentamiento y amplios intervalos de confianza para compensar la falta de flexibilidad.

Ajustar un modelo jerárquico con varianza independiente

Una última mejora del modelo que puede introducirse con los datos disponibles es desglosar la varianza. De hecho, hemos visto en la distribución de los datos que el histograma para el tipo de asentamiento 1 era más amplio que para el tipo de asentamiento 2.

En la actualidad, tener un sigma único anula la mayor precisión que podemos conseguir para predecir el recuento de población en el tipo de asentamiento 2. Así es como se ve el modelo:

Hierachical intercept model with random variance

Imagen 5: Modelo de intercepción jerárquica con varianza aleatoria

Obsérvese que modelizamos la varianza de manera independiente por tipo de asentamiento y no de manera jerárquica.

Ejecutamos el modelo:

pars_lognormal_hierarchical_var <- c('alpha_national','alpha_settlement', 
                                     'sigma_settlement', 'pop_post_pred')

fit_poisson_lognormal_hierarchical_var<- stan(
  file= here('tutorials', 'workflow' ,'poisson_hierarchical_variance.stan'),
  data= input_data,
  iter = iter + warmup,
  warmup = warmup,
  seed = seed,
  pars = pars_lognormal_hierarchical_var
)

Vemos que el modelo (1) no tiene problemas de convergencia y (2) no excede la función poisson_rng en el bloque de cantidades generadas. Esto ya es señal de un mejor ajuste a nuestros datos.

Veamos las dos varianzas estimadas.

samples_poisson_lognormal_hierarchical_var<- rstan::extract(fit_poisson_lognormal_hierarchical_var)

# prior checks
comp_sigma <- as_tibble(samples_poisson_lognormal_hierarchical_var$sigma_settlement,
                        .name_repair = 'minimal')
colnames(comp_sigma) <- c('sigma_settlement_1', 'sigma_settlement_2')
comp_sigma <- comp_sigma %>% 
  pivot_longer(everything(),names_to = 'posterior') %>% 
  mutate(model='independent sigma') 

comp_sigma <- rbind(
  comp_sigma,
  tibble(
    posterior = 'sigma_fixed',
    value = samples_poisson_lognormal_hierarchical_cp$sigma,
    model = 'fixed sigma'
  )
)

ggplot(comp_sigma, aes(x=posterior, y=value, fill=model))+
  geom_boxplot()+
  theme_minimal()

Este diagrama es una clara indicación de un patrón diferente por tipo de asentamiento: una menor varianza para el asentamiento 2. La mayor heterogeneidad de las estimaciones de la varianza para el asentamiento 2, visible a través de un diagrama de caja más largo, se debe al menor número de muestras disponibles en esta categoría (200 frente a 800).

Veamos si las predicciones posteriores se ajustan mejor a las observaciones.

# posterior predictive check
pop_posterior_lognormal_hierarchical_var<- tibble(
  source = factor(c(rep('Predicted - Poisson-lognormal_hierarchical_var', iter*chains*input_data$n_obs),
                    rep('Predicted - Poisson-lognormal_hierarchical_cp', iter*chains*input_data$n_obs),
                    rep('Observed', input_data$n_obs)), 
                  levels = c('Observed',
                             'Predicted - Poisson-lognormal_hierarchical_var',
                             'Predicted - Poisson-lognormal_hierarchical_cp')),
  value= c(
    as.vector(samples_poisson_lognormal_hierarchical_var$pop_post_pred),
    as.vector(samples_poisson_lognormal_hierarchical_cp$pop_post_pred),
    input_data$pop)
)

ggplotly(ggplot(pop_posterior_lognormal_hierarchical_var, aes(x=value, fill=source, after_stat(density)))+
           geom_histogram(bins=30, position = 'identity', alpha=0.7)+
           theme_minimal())

Vemos una ganancia en la bondad de ajuste para los valores bajos de recuentos de población debido a que la varianza es menor para las predicciones en el tipo de asentamiento 2.

Se puede realizar un análisis más exhaustivo comparando los residuos de los dos modelos.

# validation metrics based on residuals
comp_obs_poisson_lognormal_hierarchical_var <- as_tibble(samples_poisson_lognormal_hierarchical_var$pop_post_pred) %>% 
  summarise(across(
    everything(), ~ c(mean(.), quantile(., probs=c(0.025, 0.5, 0.975)))
  )) %>% 
  mutate(
    metrics = c('mean', paste0('q', c(0.025, 0.5, 0.975)))
  ) %>% 
  pivot_longer(
    -metrics
  ) %>% 
  pivot_wider(names_from = metrics, values_from = value) %>% 
  mutate(
    obs = input_data$pop,
    residual = mean-obs,
    residual_perc = residual/mean*100
  )

comp_obs_poisson_lognormal_hierarchical_var %>% 
  summarise(
    `Bias_std`= mean(residual_perc),
    `Inaccuracy_std` = mean(abs(residual_perc)),
    `Imprecision_std` = sd(residual_perc),
    `In_IC` = mean(obs<q0.975&obs>q0.025)*100) %>% 
  mutate(model= 'Random variance') %>% 
  rbind(
    comp_obs_poisson_lognormal_hierarchical_cp %>% 
      mutate(
        residual = mean-obs,
        residual_perc = residual/mean*100
      ) %>% 
      summarise(
        `Bias_std`= mean(residual_perc),
        `Inaccuracy_std` = mean(abs(residual_perc)),
        `Imprecision_std` = sd(residual_perc),
        `In_IC` = mean(obs<q0.975&obs>q0.025)*100) %>% 
      mutate(model='Fixed variance')
  ) %>% 
  kbl(caption = "Goodness-of-fit metrics comparison", digits=2) %>% kable_minimal()
Table 1: Goodness-of-fit metrics comparison
Bias_std Inaccuracy_std Imprecision_std In_IC model
0.01 11.54 14.62 95.2 Random variance
0.03 11.55 14.63 95.2 Fixed variance

Un relato sobre la parametrización de modelos jerárquicos

La estimación de un modelo jerárquico puede plantear problemas de ajuste. En efecto, los \(alpha_{settlement}\) están muy correlacionados con los \(alpha_{settlement}\), lo que crea un espacio de parámetros de valores probables difícil de explorar para las cadenas.

Para ayudar al algoritmo, podemos reconfigurar el espacio de parámetros reparametrizando la distribución de las variables a priori normales definidas jerárquicamente. Veamos distintas formas de parametrizar una distribución normal:

Different parametrisations of a hierarchical normal model

Imagen 6: Diferentes parametrizaciones de un modelo normal jerárquico

Estas tres parametrizaciones ofrecen diferentes espacios de parámetros que, dependiendo del número de observaciones, del número de agrupaciones en la jerarquía y de las distribuciones a priori sobre mu y sigma, pueden ayudar a lograr la convergencia.

Anteriormente implementamos la parametrización centrada en poisson_hierarchical_cp.stan, cp significa parametrización centrada.

Ubicación no centrada

poisson_hierarchical_nclp.stan ofrece un ejemplo de ubicación no centrada. Corresponde a la adopción de un enfoque residual de la jerarquía: \(alpha_{national}\) representa una base de referencia y \(delta_{settlement}\) variaciones a partir de esta base de referencia.

pars_lognormal_hierarchical_nclp <- c('alpha_national','delta_settlement', 'u_delta_settlement', 'sigma', 'pop_post_pred')

fit_poisson_lognormal_hierarchical_nclp <- stan(
  file= here('tutorials', 'workflow','poisson_hierarchical_nclp.stan'),
  data= input_data,
  iter = iter + warmup,
  warmup = warmup,
  seed = seed,
  pars = pars_lognormal_hierarchical_nclp
)
## Warning in validityMethod(object): The following variables have
## undefined values: pop_post_pred[1],The following variables have
## undefined values: pop_post_pred[2],The following variables have
## undefined values: pop_post_pred[3],The following variables have
## undefined values: pop_post_pred[4],The following variables have
## undefined values: pop_post_pred[5],The following variables have
## undefined values: pop_post_pred[6],The following variables have
## undefined values: pop_post_pred[7],The following variables have
## undefined values: pop_post_pred[8],The following variables have undefined
## values: pop_post_pred[9],The following variables have undefined values:
## pop_post_pred[10],The following variables have undefined values:
## pop_post_pred[11],The following variables have undefined values:
## pop_post_pred[12],The following variables have undefined values:
## pop_post_pred[13],The following variables have undefined values:
## pop_post_pred[14],The following variables have undefined values:
## pop_post_pred[15],The following variables have undefined values:
## pop_post_pred[16],The following variables have undefined values:
## pop_post_pred[17],The following variables have undefined values:
## pop_post_pred[18],The following variables have undefined values:
## pop_post_pred[19],The following variables have undefined values:
## pop_post_pred[20],The following variables have undefined values:
## pop_post_pred[21],The following variables have undefined values:
## pop_post_pred[22],The following variables have undefined values:
## pop_post_pred[23],The following variables have undefined values:
## pop_post_pred[24],The following variables have undefined values:
## pop_post_pred[25],The following variables have undefined values:
## pop_post_pred[26],The following variables have undefined values:
## pop_post_pred[27],The following variables have undefined values:
## pop_post_pred[28],The following variables have undefined values:
## pop_post_pred[29],The following variables have undefined values:
## pop_post_pred[30],The following variables have undefined values:
## pop_post_pred[31],The following variables have undefined values:
## pop_post_pred[32],The following variables have undefined values:
## pop_post_pred[33],The following variables have undefined values:
## pop_post_pred[34],The following variables have undefined values:
## pop_post_pred[35],The following variables have undefined values:
## pop_post_pred[36],The following variables have undefined values:
## pop_post_pred[37],The following variables have undefined values:
## pop_post_pred[38],The following variables have undefined values:
## pop_post_pred[39],The following variables have undefined values:
## pop_post_pred[40],The following variables have undefined values:
## pop_post_pred[41],The following variables have undefined values:
## pop_post_pred[42],The following variables have undefined values:
## pop_post_pred[43],The following variables have undefined values:
## pop_post_pred[44],The following variables have undefined values:
## pop_post_pred[45],The following variables have undefined values:
## pop_post_pred[46],The following variables have undefined values:
## pop_post_pred[47],The following variables have undefined values:
## pop_post_pred[48],The following variables have undefined values:
## pop_post_pred[49],The following variables have undefined values:
## pop_post_pred[50],The following variables have undefined values:
## pop_post_pred[51],The following variables have undefined values:
## pop_post_pred[52],The following variables have undefined values:
## pop_post_pred[53],The following variables have undefined values:
## pop_post_pred[54],The following variables have undefined values:
## pop_post_pred[55],The following variables have undefined values:
## pop_post_pred[56],The following variables have undefined values:
## pop_post_pred[57],The following variables have undefined values:
## pop_post_pred[58],The following variables have undefined values:
## pop_post_pred[59],The following variables have undefined values:
## pop_post_pred[60],The following variables have undefined values:
## pop_post_pred[61],The following variables have undefined values:
## pop_post_pred[62],The following variables have undefined values:
## pop_post_pred[63],The following variables have undefined values:
## pop_post_pred[64],The following variables have undefined values:
## pop_post_pred[65],The following variables have undefined values:
## pop_post_pred[66],The following variables have undefined values:
## pop_post_pred[67],The following variables have undefined values:
## pop_post_pred[68],The following variables have undefined values:
## pop_post_pred[69],The following variables have undefined values:
## pop_post_pred[70],The following variables have undefined values:
## pop_post_pred[71],The following variables have undefined values:
## pop_post_pred[72],The following variables have undefined values:
## pop_post_pred[73],The following variables have undefined values:
## pop_post_pred[74],The following variables have undefined values:
## pop_post_pred[75],The following variables have undefined values:
## pop_post_pred[76],The following variables have undefined values:
## pop_post_pred[77],The following variables have undefined values:
## pop_post_pred[78],The following variables have undefined values:
## pop_post_pred[79],The following variables have undefined values:
## pop_post_pred[80],The following variables have undefined values:
## pop_post_pred[81],The following variables have undefined values:
## pop_post_pred[82],The following variables have undefined values:
## pop_post_pred[83],The following variables have undefined values:
## pop_post_pred[84],The following variables have undefined values:
## pop_post_pred[85],The following variables have undefined values:
## pop_post_pred[86],The following variables have undefined values:
## pop_post_pred[87],The following variables have undefined values:
## pop_post_pred[88],The following variables have undefined values:
## pop_post_pred[89],The following variables have undefined values:
## pop_post_pred[90],The following variables have undefined values:
## pop_post_pred[91],The following variables have undefined values:
## pop_post_pred[92],The following variables have undefined values:
## pop_post_pred[93],The following variables have undefined values:
## pop_post_pred[94],The following variables have undefined values:
## pop_post_pred[95],The following variables have undefined values:
## pop_post_pred[96],The following variables have undefined values:
## pop_post_pred[97],The following variables have undefined values:
## pop_post_pred[98],The following variables have undefined values:
## pop_post_pred[99],The following variables have undefined values:
## pop_post_pred[100],The following variables have undefined values:
## pop_post_pred[101],The following variables have undefined values:
## pop_post_pred[102],The following variables have undefined values:
## pop_post_pred[103],The following variables have undefined values:
## pop_post_pred[104],The following variables have undefined values:
## pop_post_pred[105],The following variables have undefined values:
## pop_post_pred[106],The following variables have undefined values:
## pop_post_pred[107],The following variables have undefined values:
## pop_post_pred[108],The following variables have undefined values:
## pop_post_pred[109],The following variables have undefined values:
## pop_post_pred[110],The following variables have undefined values:
## pop_post_pred[111],The following variables have undefined values:
## pop_post_pred[112],The following variables have undefined values:
## pop_post_pred[113],The following variables have undefined values:
## pop_post_pred[114],The following variables have undefined values:
## pop_post_pred[115],The following variables have undefined values:
## pop_post_pred[116],The following variables have undefined values:
## pop_post_pred[117],The following variables have undefined values:
## pop_post_pred[118],The following variables have undefined values:
## pop_post_pred[119],The following variables have undefined values:
## pop_post_pred[120],The following variables have undefined values:
## pop_post_pred[121],The following variables have undefined values:
## pop_post_pred[122],The following variables have undefined values:
## pop_post_pred[123],The following variables have undefined values: pop_post_

Podemos trazar los parámetros estimados añadiendo la desviación \(delta_{settlement}\) a la base de referencia, \(alpha_{national}\):

samples_poisson_lognormal_hierarchical_nclp<- rstan::extract(fit_poisson_lognormal_hierarchical_nclp)

mu <- as_tibble(samples_poisson_lognormal_hierarchical_nclp$delta_settlement,
                .name_repair = 'minimal')
colnames(mu) <- c('posterior_delta_settlement_1', 'posterior_delta_settlement_2')
mu <- mu %>% 
  mutate(
    posterior_baseline = samples_poisson_lognormal_hierarchical_nclp$alpha_national,
    posterior_mu_settlement_1 = posterior_baseline + posterior_delta_settlement_1,
    posterior_mu_settlement_2 = posterior_baseline + posterior_delta_settlement_2,
    iter = 1:(chains*iter)
  ) %>% 
  select(-posterior_delta_settlement_1, -posterior_delta_settlement_2) %>% 
  pivot_longer(-iter, names_to = 'posterior')

ggplot(mu, aes(x= posterior, y=value, fill=posterior))+
  geom_boxplot()+
  theme_minimal()+
  guides(fill='none')

Podemos volver a comprobar que no hay impacto en las predicciones posteriores.

# posterior predictive check
pop_posterior_lognormal_hierarchical <- tibble(
  source = factor(c(rep('Predicted - Poisson-lognormal_hierarchical_nclp', iter*chains*input_data$n_obs), 
                    rep('Predicted - Poisson-lognormal_hierarchical_cp', iter*chains*input_data$n_obs),
                    rep('Observed', input_data$n_obs)), 
                  levels = c('Observed',
                             'Predicted - Poisson-lognormal_hierarchical_nclp',
                             'Predicted - Poisson-lognormal_hierarchical_cp')),
  value= c(
    as.vector(samples_poisson_lognormal_hierarchical_nclp$pop_post_pred),
    as.vector(samples_poisson_lognormal_hierarchical_cp$pop_post_pred),
    input_data$pop)
)

ggplotly(ggplot(pop_posterior_lognormal_hierarchical, aes(x=value, fill=source, after_stat(density)))+
           geom_histogram(bins=50, position = 'identity', alpha=0.7)+
           theme_minimal())

Escala y ubicación no centradas

poisson_hierarchical_nclsp.stan ofrece un ejemplo de escala no centrada.

pars_lognormal_hierarchical_nclsp <- c('alpha_national','u_delta_settlement', 
                                       'eta_delta_settlement',
                                       'sigma', 'pop_post_pred')

fit_poisson_lognormal_hierarchical_nclsp <- stan(
  file= here('tutorials', 'workflow','poisson_hierarchical_nclsp.stan'),
  data= input_data,
  iter = iter + warmup,
  warmup = warmup,
  seed = 1095856,
  pars = pars_lognormal_hierarchical_nclsp
)
## Warning: There were 90 divergent transitions after warmup. See
## https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: Examine the pairs() plot to diagnose sampling problems

Por el contrario, esta parametrización requiere más tiempo de ejecución y da lugar a transiciones divergentes, lo que significa que no se ha podido explorar adecuadamente el espacio de parámetros.

Podemos destacar las iteraciones que fueron divergentes y ver dónde se sitúan en la estimación de u_delta_settlement y eta_delta_settlement, los parámetros sospechosos de una estimación difícil.

partition_div <- function(fit) {
  nom_params <- rstan:::extract(fit, permuted=FALSE)
  params <- as.data.frame(do.call(rbind, lapply(1:chains, function(n) nom_params[,n,])))
  
  sampler_params <- get_sampler_params(fit, inc_warmup=FALSE)
  divergent <- do.call(rbind, sampler_params)[,'divergent__']
  params$divergent <- divergent
  
  div_params <- params[params$divergent == 1,]
  nondiv_params <- params[params$divergent == 0,]
  
  return(list(div_params, nondiv_params))
}

div_poisson_lognormal_hierarchical_nclsp<- partition_div(fit_poisson_lognormal_hierarchical_nclsp)

div_samples_cp <- div_poisson_lognormal_hierarchical_nclsp[[1]]
nondiv_samples_cp <- div_poisson_lognormal_hierarchical_nclsp[[2]]

c_dark_trans <- c("#8F272780")
c_green_trans <- c("#00FF0080")

par(mfrow=c(1, 2))
for (k in 1:input_data$n_settlement) {
  name_x <- paste("eta_delta_settlement[", k, "]", sep='')
  
  plot(nondiv_samples_cp[name_x][,1], log(nondiv_samples_cp$u_delta_settlement),
       col=c_dark_trans, pch=16, main="",
       xlab=name_x, xlim=c(-5, 5), ylab="log(u_delta_settlement)", ylim=c(-5, 3))
  points(div_samples_cp[name_x][,1], log(div_samples_cp$u_delta_settlement),
         col=c_green_trans, pch=16)
}

Esta imagen es típica de un embudo: el tamaño de paso utilizado para las iteraciones de Monte Carlo no puede ajustarse al mismo tiempo a la base y al cuello de la exploración del espacio de parámetros.

¡No centrar ni la escala ni la ubicación no es adecuado para nuestros datos!

Agradecimientos

Este tutorial ha sido redactado por Edith Darin, de WorldPop, Universidad de Southampton, y Douglas Leasure, del Leverhulme Centre for Demographic Science, Universidad de Oxford, con la supervisión de Andrew Tatem, de WorldPop, Universidad de Southampton.

Este trabajo ha sido financiado por el Fondo de Población de las Naciones Unidas (UNFPA, por sus siglas en inglés).

Licencia

El presente documento puede redistribuirse libremente bajo los términos de una licencia Creative Commons Atribución-SinDerivadas 4.0 Internacional (CC BY-ND 4.0).