7 Hierarchical Bottom-Up Modelling Extention: the Weighted Likelihood

This section presents a simulation study exploring weighted Bayesian models to recover unbiased population estimates from weighted survey data.

Note: Supplementary files for this chapter can be downloaded from http://doi.org/10.5258/SOTON/WP00706.

Note: This chapter assumes familiarity with Bayesian statistical models and notation, Stan and JAGS software, and the R statistical programming language.

Introduction

Statistical models used to map estimates of population counts across the landscape require observations of population counts from a representative sample of locations to use as training data. These data usually come from household surveys in which populations are enumerated within geographically defined survey locations. A stratified random sample is ideal for recovering unbiased estimates of the mean and variance of population densities. However, national household surveys (e.g. Demographic and Health Surveys or Living Standards Measurement Surveys) often implement a PPS sampling design (Probability Proportional to Size) in which locations with higher population densities are more likely to be included in the sample compared to a random sample. This often results in biased estimates of average population densities for population modelling. Population-weighted sampling is intended to approximate random samples of individuals or households from sets of geographically clustered households, but it does not produce random samples of locations (and therefore population sizes) needed for geographical population models.

Our objectives here were to:

  1. Demonstrate that a population-weighted sample results in biased estimates of population densities,
  2. Demonstrate that model-based estimates of population totals for large areas are sensitive to this bias,
  3. Explore Bayesian weighted-likelihood and weighted-precision approaches to produce unbiased parameter estimates, and
  4. Demonstrate that weighted models can recover unbiased estimates of population densities and population totals from a population-weighted sample.

This analysis was intended as a theoretical foundation to support ongoing development of statistical models to estimate and map population sizes using weighted survey data as inputs.

Methods

We simulated populations by drawing population densities for each location from a distribution with known parameters. We then produced various types of samples from those populations: random, population-weighted, or a combination. Every population included one million locations and every sample included 2000 locations. A simulated “location” could represent a 1 hectare populated grid square. We fit three types of models to these data trying to recover the known population parameters: unweighted model, weighted-precision model, and weighted-likelihood model.

All simulations were conducted using the R statistical programming environment (R Core Team 2020a). Statistical models were fit using either the RStan R package (Stan Development Team 2020) with the Stan probabilistic programming language (Stan Development Team 2019a) or the runjags R package (Denwood 2016a) with JAGS software (Plummer 2003a).

Simulated Populations

We used a log-normal distribution to represent population densities following the population model of Leasure et al (2020d):

\[\begin{equation} N_i \sim Poisson( D_i A_i ) \\ \tag{7.1} \end{equation}\]

\[\begin{equation} \begin{split} D_i \sim LogNormal( \mu_i, \sigma_{t,g} ) \\ \mu_i = \alpha_{t,g} + \sum_{k=1}^{K} \beta_k x_{k,i} \end{split} \end{equation}\]

In this model, \(N_i\) was the observed population count and \(A_i\) was the observed settled area (ha) at location \(i\). Population densities \(D_i\) were modelled as a function of settlement types \(t\) (e.g. urban/rural), geographic units \(g\), and \(K\) geospatial covariates \(x_{k,i}\). The regression parameters \(\alpha_{t,g}\), \(\beta_k\), and \(\sigma_{t,g}\) estimated average population densities, effects of covariates, and unexplained residual variation, respectively.

The intended purpose of Eq. (7.1) was to estimate model parameters based on observed population data. For the purposes of the current simulation study, we reversed that logic. We provided pre-defined parameter values to generate simulated population data.

For our simulations, we made a series of simplifying assumptions to this model. We assumed that every location \(i\) included one hectare of settled area (i.e. \(A_i = 1\)) and ignored the Poisson variation so that \(N_i = D_i\). We also ignored the effects of settlement type \(t\), geographic location \(g\), and covariates \(x_{k,i}\) so that they were dropped from the model. These simplifying assumptions allowed us to isolate the effects of weighted sampling in the absence of these potentially confounding effects. While beyond the scope of the current report, relaxing these assumptions and assessing their effects should be the focus of future theoretical and empirical studies.

The simplified model used for our simulations was:

\[\begin{equation} D_i \sim LogNormal( log(\mu), \sigma ) \tag{7.2} \end{equation}\]

Note: We modelled the median \(\mu\) on the natural scale so that the parameter estimate was easier to interpret (i.e. average population densities), but we kept \(\sigma\) on the log-scale to simplify the equations.

We simulated population densities (i.e. count of people per hectare) at one million locations by taking one million draws from this log-normal distribution. We repeated this for a range of parameter values for \(\mu\) (i.e. 100, 250, 500) and \(\sigma\) (i.e. 0.25, 0.5, 0.75).

Following Eq. (7.2), a population where \(\mu = 250\) and \(\sigma=0.5\) can be simulated across one million locations using the following R code:

Histogram of population

Figure 7.1: Histogram of population

Simulated Survey Data

We simulated three sampling designs, each with a sample size of 2000 locations:

  1. Random sampling,
  2. Population-weighted sampling, and
  3. A combination of random and population-weighted sampling.

Random Sample

The random sample was simply drawn using the sample function to draw 2000 samples without replacement from the simulated population densities:

# sample size (number of locations)
n <- 2e3

# random sample
D <- sample(x = pop,
            n = n)

Population-weighted Sample

To draw a population-weighted sample, we first calculated sampling probabilities based on the population at each location. These were then used to draw a weighted (i.e. non-random) sample from the population in which locations with higher population densities were over-represented.

Note: A random sample is equivalent to a weighted sample in which all samples have equal weights.

# sampling weights based on population density 
w <- pop / sum(pop)

# select locations for a weighted sample
i <- sample(x = 1:pop_n,
            size = n,
            prob = w)

# population densities at selected locations
D <- pop[i]

Combined Sample

Combined samples (random and weighted) were produced using several different proportions of random samples (i.e. 0.2, 0.5, and 0.8). For example, if 20% of the 2000 sampled locations were random samples, then 80% of the 2000 sampled locations would have been weighted samples.

# proportion random
prop <- 0.5

# select locations for weighted sample
i <- sample(x = 1:pop_n,
            size = n*(1-prop),
            prob = w)

# select locations for random sample
j <- sample(x = (1:pop_n)[-i],
            size = n*prop)

# weights for selected locations in weighted sample
w_i <- w[i]

# weights for selected locations in random sample
w_j <- rep(x = mean(w_i), 
           times = n*prop)

# population densities at selected locations
D <- pop[ c(i,j) ]

# weights at selected locations
w <- c(w_i, w_j)

It is important to note that we assigned equal weights to all of the random samples that were equal to the mean weight among the weighted samples. In other words, each random sample was given an equal weight in the model comparable to an average weighted sample. This was intended to balance the influence of the random and weighted portions of the sample.

Statistical Models

We evaluated four statistical models:

  1. Unweighted model (Stan),
  2. Weighted-likelihood model (Stan),
  3. Weighted-precision model (Stan), and
  4. Weighted-precision model (JAGS).

The unweighted model was included to evaluate the bias that arises when fitting an unweighted model to population-weighted sample data. The weighted-precision and weighted-likelihood models were designed to use sample weights to recover unbiased estimates of population parameters from a weighted sample. We developed the weighted-precision model for both Stan and JAGs to demonstrate that both implementations produced the same results and to provide example code for both. The weighted-likelihood approach requires a direct adjustment to the likelihood that was not possible to implement in JAGS.

All models were run with four MCMC chains including a burnin period of 1000 iterations and an additional 1000 iterations that were retained for analysis. MCMC chains for all models achieved convergence. For JAGS models, convergence was defined as Gelman-Rubin statistics (potential scale reduction factors) that were less than 1.1 for all parameters (Gelman & Rubin 1992). For Stan models, convergence was defined as R-hat less than 1.05 (Stan Development Team 2020) (https://mc-stan.org/rstan/reference/Rhat.html).

Unweighted Log-normal

Our simplest model was a log-normal with no weights:

\[\begin{equation} D_i \sim LogNormal( log(\mu), \sigma ) \tag{7.3} \end{equation}\]

Notice that this is identical to Eq. (7.2) that was used to generate our simulated populations. Our implementation used the following Stan model:

data{
  int<lower=0> n;         # sample size
  vector<lower=0>[n] D;   # observed population densities
}

parameters{
  real<lower=0> mu;      # median (natural)
  real<lower=0> sigma;   # standard deviation (log)
}

model{
  D ~ lognormal(log(mu), sigma);  # likelihood
  
  mu ~ uniform(0, 2e3);   # prior for mu
  sigma ~ uniform(0, 5);  # prior for sigma
}

Weighted-likelihood

The weighted-likelihood approach used the same log-normal model but implemented a manual adjustment to the likelihood function (Stan Development Team 2019b) for each sample based on the sample weights to account for the increased probability of including locations with high population densities in the weighted sample. We implemented this model in Stan:

data{
  int<lower=0> n;                # sample size
  vector<lower=0>[n] D;          # observed population densities
  vector<lower=0,upper=1>[n] w;  # sampling probabilities (weights)
}

parameters{
  real<lower=0> mu;      # median (natural)
  real<lower=0> sigma;   # standard deviation (log)
}

model{

  # weighted likelihood
  for(i in 1:n){
    target += lognormal_lpdf( D[i] | log(mu), sigma ) / w[i];  
  }
  
  mu ~ uniform(0, 2e3);   # prior for mu
  sigma ~ uniform(0, 5);  # prior for sigma
}

Note: The sampling probabilities w were defined in the section above (see Simulated Survey Data).

In this model, the likelihood for each sample is divided by its sampling probability–the probability of a location being selected for the sample out of the one million locations in the population. This adjustment to the likelihood normalizes the influence on parameter estimates of locations that had higher sampling probabilities (i.e. locations with high population densities are over-represented in a population-weighted sample). If this model were to be used for a random sample, all of the weights would be equal and it would be equivalent to the unweighted model above (see Unweighted Log-normal).

Weighted-precision (Stan)

A potential alternative to the weighted-likelihood approach would be to scale the precision \(\tau\) of the log-normal using the location-specific weights \(w_i\). Precision \(\tau\) is defined as the inverse of variance \(\sigma^2\):

\[\begin{equation} \begin{split} \tau = \sigma^{-2} \\ \sigma = \tau^{-0.5} \end{split} \end{equation}\]

For this model, we define an inverse sampling weight \(m_i\) that is scaled to sum to one across all samples:

\[\begin{equation} m_i = \frac{w_i^{-1}}{\sum_{i=1}^{n}{w_i^{-1}}} \end{equation}\]

We will refer to these scaled inverse sampling weights as model weights \(m_i\). Now we can specify a weighted-precision model as:

\[\begin{equation} \begin{split} D_i \sim LogNormal(\mu_i, \tau_i^{-0.5}) \\ \tau_i = \theta^{-2} m_i \end{split} \end{equation}\]

where \(\theta^{-2}\) is a naive precision term that does not account for the model weights \(m_i\). Notice that the precision \(\tau_i\) is location-specific (i.e. indexed by \(i\)) because it has been adjusted by the model weights \(m_i\), and that \(\tau_i^{-0.5}\) is an adjusted location-specific standard deviation. Where the model weights are relatively low, the location-specific precisions \(\tau_i\) will be decreased to reduce the weight of those samples in the model. For our population-weighted sample, this reduced the weights of locations with high population densities that were over-represented in the sample.

Our goal was to recover an unbiased estimate of the standard deviation for the overall distribution of population densities among all locations in the population. So far, we have only estimated location-specific precisions \(\tau_i\) which are dependent on location-specific model weights \(m_i\). We derived the global standard deviation \(\sigma\) using a weighted average of the location-specific standard deviations \(\tau_i^{-0.5}\):

\[\begin{equation} \sigma = \frac{\sum_{i=1}^{n}{\tau_i^{-0.5} \sqrt{m_i}}} {\sum_{i=1}^{n}{\sqrt{m_i}}} \end{equation}\]

We used \(\sqrt{m_i}\) for this weighted average so that the weights are on the same scale as the standard deviations being averaged. It is important to note that the model weights \(m_i\) were used to adjust a naive precision parameter and so here we want to use a square root transformed weight to calculate a weighted average of standard deviations \(\tau_i^{-0.5}\).

We implemented the weighted-precision model in Stan:

data{
  int<lower=0> n;                # sample size
  vector<lower=0>[n] D;          # observed counts
  vector<lower=0,upper=1>[n] w;  # sampling probabilities
}

transformed data{
  
  # model weights (scaled inverse sampling weights)
  vector<lower=0,upper=1>[n] m = inv(w) ./ sum(inv(w)); 
}

parameters{
  real<lower=0> mu;      # median
  real<lower=0> theta;   # naive standard deviation
}

transformed parameters{
  
  # location-specific weighted precision
  vector<lower=0>[n] tau = m * pow(theta,-2);
}

model{
  
  # likelihood with weighted precision
  D ~ lognormal( log(mu), sqrt(inv(tau)) ); 
  
  mu ~ uniform(0, 2e3);  # prior median
  theta ~ uniform(0, 1); # prior naive standard deviation
}

generated quantities {
  
  # weighted average global sigma
  real<lower=0> sigma = sum( sqrt(inv(tau)) .* sqrt(m) ) / sum( sqrt(m));
}

Weighted-precision (JAGS)

We also implemented the weighted-precision model using JAGS software to provide an example of the coding differences and to demonstrate that both Stan and JAGS produce the same results. JAGS parameterizes the log-normal distribution using precision rather than the standard deviation used by Stan.

model{
  
  # model weights (scaled inverse sampling weights)
  m <- pow(w,-1) / sum(pow(w,-1))
  
  for(i in 1:n){
    
    # likelihood with weighted precision
    D[i] ~ dlnorm(log(mu), tau[i])
    
    # location-specific weighted precision
    tau[i] <- pow(theta,-2) * m[i]
  }
  
  # prior for median
  mu ~ dunif(0, 2e3)
  
  # prior for naive standard deviation
  theta ~ dunif(0, 1)
  
  # weighted average global sigma
  sigma <- sum( pow(tau,-0.5) * sqrt(m) ) / sum( sqrt(m) )
}

Population Totals

We used the fitted models to estimate total population sizes for the simulated populations. This was done by producing posterior predictions for the one million locations represented in the original simulated population data (see section Simulated Populations).

\[\begin{equation} \begin{split} D_i \sim LogNormal(log(\mu), \sigma) \\ T = \sum_{i=1}^{1e6}{D_i} \end{split} \end{equation}\]

The location-specific posterior predictions for population densities \(D_i\) (i.e. people per hectare) were assumed to equal the population count \(N_i\) for each location because we assumed that each location contained one hectare of settled area \(A_i = 1\). We summed the location-specific posterior predictions for population counts across all locations to derive a posterior prediction for the total size \(T\) of the simulated population.

Results

Results presented here are based on simulated populations where \(\mu = 250\) and \(\sigma = 0.5\). For simulations that contained a combination of random and weighted samples, we used a 50/50 split (i.e. \(prop=0.5\)). We explored other combinations of parameters which produced results similar to those presented here (see supplementary files in Appendix A). Supplementary files also include source code to reproduce these results and to conduct simulations using other parameters.

The unweighted model was able to recover the simulated “true” distribution of population densities from a random sample but not from a population-weighted sample (top panels of Fig. 7.2). All three weighted models were able to recover the “true” population distribution from a population-weighted sample and a combined sample (bottom panels of Fig. 7.2). Posterior predicted distributions of population densities were very similar for all of the weighted models (Fig. 7.3).

Some important differences were apparent among the three weighted models when we looked at individual parameter estimates for the median \(\mu\), standard deviation \(\sigma\), and mean (\(\mu e^{0.5\sigma^2}\)). Most importantly, the weighted-likelihood approach produced marginal posterior distributions that were so narrow that they were essentially point-estimates that did not account for parameter uncertainty (Fig. 7.4). Although they did not adequately account for parameter uncertainty (i.e. uncertainty in statistical estimates of the mean, median, and standard deviations), the point-estimates appeared to be unbiased estimators (Tables 7.9, 7.11, and 7.13).

In contrast, the weighted-precision approaches produced full posteriors that accounted for parameter uncertainty (Fig. 7.5; Tables 7.1, 7.3, and 7.5). As expected, unweighted models that were fit to population-weighted samples significantly overestimated the median \(\mu\) and the mean (\(\mu e^{0.5\sigma^2}\)), but not the standard deviation \(\sigma\) (Fig. 7.5).

The bias in parameter estimates from an unweighted model fit to a population-weighted sample resulted in significant overestimation of the total population when the model predictions were applied across one million locations (scenario “wu” in Fig. 7.7; Table 7.7). The weighted models were all able to recover unbiased estimates of the total population, but the weighted-likelihood approach did not produce robust credible intervals (Fig. 7.6; Table 7.15). This was presumedly because of the previously mentioned issue with the weighted-likelihood failing to account for parameter uncertainty. The weighted-precision approaches produced unbiased estimators of total population and robust credible intervals (Fig. 7.7; Table 7.7).

Discussion

From the results, the weighted-precision model was identified as the only approach that recovered unbiased estimates of population densities and population totals with robust credible intervals from population-weighted samples. The weighted-likelihood model recovered unbiased estimates of population densities and totals from population-weighted samples, but did not produce robust credible intervals. The unweighted model recovered unbiased estimates with robust credible intervals from random samples, but produced significantly biased estimates of population densities and totals from population-weighted samples. For these reasons, we recommend the weighted-precision model for use with population-weighted samples and we have demonstrated that the Stan and JAGS implementations of this model produced the same results.

Compared to the statistical model presented by Leasure et al (2020d), our simulation made some simplifying assumptions to isolate effects of population-weighted sample designs on these statistical models. For the simulations, we assumed that area of settlement was equal in every location and we ignored the effects of settlement type (urban/rural), geographic location, and other geospatial covariates. Exploring the effects of these factors in a simulation framework was beyond the scope of this proof-of-concept study, but we encourage further investigation using a combination of empirical and simulation-based studies.

We defined the sampling weights as the probability of a location being selected for the sample. In a random sample, those probabilities would be equal across all locations in the population. In our simulation of population-weighted sampling, we simply defined this based on the proportion of the total population that occurred in each location, and we defined locations as having equal geographic size. In practice, the sampling weights used for PPS sample designs in household surveys are often much more complicated, involving weighted selection of survey clusters from a national sampling frame followed by weighted sampling of households within the selected clusters. This presents additional challenges for weighted models using these real-world weighted survey data (Gelman 2007a). Our simulation demonstrated that the weighted models can recover unbiased parameter estimates from a population-weighted sample, but this only holds true if the available sampling weights represent the true probability of each survey cluster being selected from the national sampling frame.

Weighted-precision models were used to produce high-resolution gridded population estimates for Zambia (WorldPop 2020a) and Democratic Republic of the Congo (Boo et al. 2020b, WorldPop 2020b). These publications implemented a full version of the weighted-precision model that extended the core model of Leasure et al (2020d) from Eq. (7.1) to include a weighted-precision that accounted for population-weighted sample data while also accounting for geographical location, settlement type (urban/rural), building footprints, and other geospatial covariates.

The weighted-likelihood approach seemed to produce point-estimates rather than true Bayesian posterior distributions that accounted for paramater uncertainty (Fig. 7.4). It is presumed this was the result of the direct modification of the likelihood in the Stan model (see section Weighted-likelihood). This implementation of weighted-likelihood is not advised for adjusting a log-normal distribution, but it may perform better for discrete distributions where sufficient statistics are appropriate (See “Exploiting sufficient statistics” in the “Efficiency Tuning” chapter of the Stan User Guide (2019a)). More investigation is needed on this topic to determine if this approach may be suitable for the one-inflated Poisson models (Leasure & Tatem 2020a) developed to produce full coverage gridded population estimates of people per household from census microdata (Minnesota Population Center 2019), which sometimes include weighted samples.

7.1 Contributing

This analysis and report were developed by Doug Leasure and Claire Dooley from the WorldPop Group at the University of Southampton with oversight and project support from Andy Tatem. Funding was provided by the Bill and Melinda Gates Foundation and the United Kingdom Foreign, Commonwealth & Development Office as part of the GRID3 project (OPP1182408, OPP1182425).

7.1.1 Suggested Citation

Leasure DR, Dooley CA, Tatem AJ. 2021. A simulation study exploring weighted Bayesian models to recover unbiased population estimates from weighted survey data. WorldPop, University of Southampton. doi:10.5258/SOTON/WP00706

7.1.2 License

You are free to redistribute this document under the terms of a Creative Commons Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0) license.

References

Agency TES. 2021. Mapping our human footprint from space. 2023. https://www.esa.int/Applications/Observing_the_Earth/Mapping_our_human_footprint_from_space.
Alegana VA, Atkinson PM, Pezzulo C, Sorichetta A, Weiss D, Bird T, Erbach-Schoenber E, J TA. 2015. Fine resolution mapping of population age-structures for health and development applications. Journal of The Royal Society Interface 12:20150073. doi:10.1098/rsif.2015.0073.
Andrade LOM de, Pellegrini Filho A, Solar O, Rígoli F, Salazar LM de, Serrate PC-F, Ribeiro KG, Koller TS, Cruz FNB, Atun R. 2015. Social determinants of health, universal health coverage, and sustainable development: Case studies from latin american countries. The Lancet 385:1343–1351.
Bakka H, Rue H, Fuglstad G, Riebler A, Bolin D, Illian J, Krainski E, Simpson D, Lindgren F. 2018. Spatial modeling with r‐INLA: A review. Wiley Interdisciplinary Reviews: Computational Statistics 10:e1443.
Balk DL, Deichmann U, Yetman G, Pozzi F, Hay SI, Nelson A. 2006. Determining global population distribution: Methods, applications and data. Advances in parasitology 62:119–156.
Bhaduri B, Bright E, Coleman P, Urban ML. 2007. LandScan USA: A high-resolution geospatial and temporal modeling approach for population distribution and dynamics. GeoJournal 69:103–117.
Bharti N, Djibo A, Tatem AJ, Grenfell BT, Ferrari MJ. 2016. Measuring populations to improve vaccination coverage. Scientific Reports 6:34541.
Blackwell D. 1947. Conditional expectation and unbiased sequential estimation. The Annals of Mathematical Statistics:105–110.
Blangiardo M, Cameletti M, Baio G, Rue H. 2013. Spatial and spatio-temporal models with r-INLA. Spatial and spatio-temporal epidemiology 7:39–55.
Bondarenko M, Kerr D, Sorichetta A, Tatem A. 2020a. Census/projection-disaggregated gridded population datasets for 51 countries across sub-saharan africa in 2020 using building footprints.
Bondarenko M, Nieves J, Sorichetta A, Stevens FR, Gaughan AE, Tatem A, others. 2018b. wpgpRFPMS: WorldPop random forests population modelling r scripts, version 0.1. 0.
Bondarenko M, Nieves J, Sorichetta A, Stevens FR, Gaughan AE, Tatem A, others. 2018a. wpgpRFPMS: WorldPop Random Forests population modelling R scripts, version 0.1.0. WorldPop, University of Southampton. doi:10.5258/SOTON/WP00665. https://github.com/wpgp/wpgpRFPMS.
Bondarenko M, Nieves JJ, Stevens FR, Gaughan AE, Jochem C, Kerr D, Sorichetta A. 2021. popRF: Random forest-informed population disaggregation r package. https://cran.r-project.org/package=popRF.
Bondarenko M, Nieves JJ, Stevens FR, Gaughan AE, Tatem A, Sorichetta A. 2020b. wpgpRFPMS: Random forests population modelling r scripts, version 0.1.0.
Boo G, Darin E, Leasure D, Dooley C, Chamberlain H, Lazar A, Tatem AJ. 2020b. Modelled gridded population estimates for the Kinshasa, Kongo-Central, Kwango, Kwilu, and Mai-Ndombe provinces in the Democratic Republic of the Congo 2018, version 2.0. WorldPop, University of Southampton. doi:10.5258/SOTON/WP00669. https://wopr.worldpop.org/?COD/population/v2.0.
Boo G, Darin EC, Leasure DR, Dooley CA, Chamberlain HR, Lazar AN, Tatem AJ. 2020a. Bottom-up gridded population estimates for the kinshasa, kongo-central, kwango, kwilu, and mai-ndombe provinces in the democratic republic of the congo, version 2.0.
Boo G, Darin E, Leasure DR, Dooley CA, Chamberlain HR, Lázár AN, Tschirhart K, Sinai C, Hoff NA, Fuller T. 2022a. High-resolution population estimation using household survey data and building footprints. Nature communications 13:1330.
Boo G, Darin E, Leasure DR, Dooley CA, Chamberlain HR, Lázár AN, Tschirhart K, Sinai C, Hoff NA, Fuller T. 2022b. High-resolution population estimation using household survey data and building footprints. Nature communications 13:1330.
Boo G, Darin E, Thomson DR, Tatem AJ. 2020c. A grid-based sample design framework for household surveys. Gates Open Research 4.
Bosco C, Alegana V, Bird T, Pezzulo C, Bengtsson L, Sorichetta A, Steele J, Hornby G, Ruktanonchai C, Ruktanonchai N, Wetter E, Tatem AJ. 2017. Exploring the high-resolution mapping of gender-disaggregated development indicators. Journal of The Royal Society Interface 14:20160825.
Breiman L. 1996. Bagging predictors. Machine learning 24:123–140.
Breiman L. 2001b. Random forests. Machine learning 45:5–32. doi:10.1023/A:1010933404324.
Breiman L. 2001a. Random forests. Machine learning 45:5–32.
Briggs DJ, Gulliver J, Fecht D, Vienneau DM. 2007. Dasymetric modelling of small-area population distribution using land cover and light emissions data. Remote sensing of Environment 108:451–466.
Bryant JR, Graham PJ. 2013. Bayesian demographic accounts: Subnational population estimation using multiple data sources.
Buchhorn M, Smets B, Bertels L, Roo BD, Lesiv M, Tsendbazar N-E, Herold M, Fritz S. 2020. Copernicus Global Land Service: Land Cover 100m: collection 3: epoch 2019: Globe. doi:10.5281/zenodo.3939050. https://doi.org/10.5281/zenodo.3939050.
Bureau Central du Recensement. 2018. Report des limites administratives — république démocratique du congo.
Carella G, Pérez Trufero J, Álvarez M, Mateu J. 2022. A bayesian spatial analysis of the heterogeneity in human mobility changes during the first wave of the COVID-19 epidemic in the united states. The American Statistician 76:64–72.
Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, Riddell A. 2017. Stan: A probabilistic programming language. Journal of statistical software 76. doi:10.18637/jss.v076.i01.
Carroll R, Lawson A, Faes C, Kirby RS, Aregay M, Watjou K. 2015. Comparing INLA and OpenBUGS for hierarchical poisson modeling in disease mapping. Spatial and spatio-temporal epidemiology 14:45–54.
Center for International Earth Science Information Network, Novel-T. 2020. GRID3 Burkina Faso Settlement Extents Version 01, Alpha. doi:10.7916/d8-h47k-8637. https://doi.org/10.7916/d8-h47k-8637.
Chai T, Draxler RR. 2014. Root mean square error (RMSE) or mean absolute error (MAE)?–arguments against avoiding RMSE in the literature. Geoscientific model development 7:1247–1250.
Chamberlain HR, Darin E, Adewole A, Jochem WC, Lazar AN, Tatem AJ. 2023. Building footprint data for countries in africa: To what extent are existing data products comparable?
Charvériat C. 2000. Natural disasters in latin america and the caribbean: An overview of risk.
Christensen OF, Roberts GO, Sköld M. 2006. Robust markov chain monte carlo methods for spatial generalized linear mixed models. Journal of Computational and Graphical Statistics 15:1–17.
CIESIN. 2018. Gridded population of the world, version 4 (GPWv4): Population count adjusted to match 2015 revision of UN WPP country totals, revision 11. https://doi.org/10.7927/H4PN93PB.
Cohen JE, Small C. 1998a. Hypsographic demography: The distribution of human population by altitude. Proceedings of the National Academy of Sciences 95:14009–14014.
Cohen JE, Small C. 1998b. Hypsographic demography: The distribution of human population by altitude. Proceedings of the National Academy of Sciences 95:14009–14014.
Corporation M, Weston S. 2020. doParallel: Foreach parallel adaptor for the ’parallel’ package. https://CRAN.R-project.org/package=doParallel.
Cressie N. 2015. Statistics for spatial data. John Wiley & Sons.
Cutler F original by LB and A, Wiener R port by AL and M. 2018. randomForest: Breiman and cutler’s random forests for classification and regression. https://CRAN.R-project.org/package=randomForest.
D. R. Leasure AJT M. Bondarenko. 2020. Wopr: An r package to query the WorldPop open population repository, version 0.3.4. https://apps.worldpop.org/woprVision.
Daniel Baston. 2020. exactextractr: Fast extraction from raster datasets using polygons. https://CRAN.R-project.org/package=exactextractr.
Darin E, Kuépié M, Bassinga H, Boo G, Tatem AJ, Reeve P. 2022a. The population seen from space: When satellite images come to the rescue of the census. Population 77:437–464.
Darin E, Kuépié M, Bassinga H, Boo G, Tatem AJ, Reeve P. 2022b. The population seen from space: When satellite images come to the rescue of the census. Population 77:437–464.
Dauby G. 2019. ConR: Computation of parameters used in preliminary assessment of conservation status. R package version 1.
Dauer QP. 2020. State and societal responses to natural disasters in latin american and caribbean history. History Compass 18:e12605.
Denwood MJ. 2016b. Runjags: An r package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. Journal of statistical software 71:1–25.
Denwood MJ. 2016a. runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. Journal of Statistical Software 71:1–25. doi:10.18637/jss.v071.i09.
Di Baldassarre G, Yan K, Ferdous MR, Brandimarte L. 2014. The interplay between human population dynamics and flooding in bangladesh: A spatial analysis. Proceedings of the International Association of Hydrological Sciences 364:188–191.
Diggle PJ, Giorgi E. 2016a. Model-based geostatistics for prevalence mapping in low-resource settings. Journal of the American Statistical Association 111:1096–1120. doi:10.1080/01621459.2015.1123158. https://doi.org/10.1080/01621459.2015.1123158.
Diggle PJ, Giorgi E. 2016b. Model-based geostatistics for prevalence mapping in low-resource settings. Journal of the American Statistical Association 111:1096–1120.
Dobson JE, Bright EA, Coleman PR, Durfee RC, Worley BA. 2000. LandScan: A global population database for estimating populations at risk. Photogrammetric engineering and remote sensing 66:849–857.
Dooley CA, Boo G, Leasure DR, Tatem AJ. 2020a. Gridded maps of building patterns throughout sub-Saharan Africa, version 1.1. WorldPop, University of Southampton. doi:10.5258/SOTON/WP00677.
Dooley C, Jochem W, Leasure D, Sorichetta A, Lazar A, Tatem A, Bondarenko M. 2021. South sudan 2020 gridded population estimates from census projections adjusted for displacement, version 2.0.
Dooley C, Tatem A, Bondarenko M. 2020b. Gridded maps of building patterns throughout sub-saharan africa, version 1.0. University of Southampton: Southampton, UK. Source of building Footprints “Ecopia Vector Maps Powered by Maxar Satellite Imagery.
Dooley C, Tatem A, Bondarenko M. 2020c. Gridded maps of building patterns throughout sub-saharan africa, version 1.0. University of Southampton: Southampton, UK. Source of building Footprints “Ecopia Vector Maps Powered by Maxar Satellite Imagery.
Dotse-Gborgbortsi W, Dwomoh D, Alegana V, Hill A, Tatem AJ, Wright J. 2020. The influence of distance and quality on utilisation of birthing services at health facilities in eastern region, ghana. BMJ Global Health 4:e002020.
Dowle M, Srinivasan A, Gorecki J, Chirico M, Stetsenko P, Short T, Lianoglou S, Antonyan E, Bonsch M, Parsonage H, Ritchie S, Ren K, Tan X, Saporta R, Seiskari O, Dong X, Lang M, Iwasaki W, Wenchel S, Broman K, Schmidt T, Arenburg D, Smith E, Cocquemas F, Gomez M, Chataignon P, Blaser N, Selivanov D, Riabushenko A, Lee C, Groves D, Possenriede D, Parages F, Toth D, Yaramaz-David M, Perumal A, Sams J, Morgan M, Quinn M, @javrucebo, @marc-outins, Storey R, Saraswat M, Jacob M, Schubmehl M, Vaughan D, Hocking T, Silvestri L, Barrett T, Hester J, Damico A, Freundt S, Simons D, Andrade ES de, Miller C, Meldgaard JP, Tlapak V, Ushey K, Eddelbuettel D. 2020. Data.table: Extension of ’data.frame’. https://CRAN.R-project.org/package=data.table.
Doxsey-Whitfield E, MacManus K, Adamo SB, Pistolesi L, Squires J, Borkovska O, Baptista SR. 2015. Taking advantage of the improved availability of census data: A first look at the gridded population of the world, version 4. Papers in Applied Geography 1:226–234.
Earth Observation Group. 2020. Visible infrared imaging radiometer suite (VIIRS) nighttime lights 2020 (annual composite). https://eogdata.mines.edu/nighttime_light/annual/v20/2020/VNL_v2_npp_2020_global_vcmslcfg_c202101211500.average.tif.gz.
Economic UND of, Social Affairs Population Division. 2019. World population prospects 2019: Methodology of the united nations population estimates and projections (ST/ESA/SER.a/425). https://population.un.org/wpp/Publications/Files/WPP2019_Methodology.pdf.
Ecopia.AI & Maxar Technologies. 2020a. Digitize africa data — building footprints. https://www.maxar.com/products/imagery-basemaps.
Ecopia.AI & Maxar Technologies. 2020c. Digitize africa data — building footprints. https://www.maxar.com/products/imagery-basemaps.
Ecopia.AI & Maxar Technologies. 2020b. Digitize africa data — building footprints. https://www.maxar.com/products/imagery-basemaps.
Ecopia.AI, Maxar Technologies. 2019. Digitize africa data.
Ecopia.AI, Maxar Technologies, Inc. 2019-2021. Digitize africa data. http://digitizeafrica.ai/.
Ehrlich D, Freire S, Melchiorri M, Kemper T. 2021. Open and consistent geospatial data on population density, built-up and settlements to analyse human presence, societal impact and sustainability: A review of GHSL applications. Sustainability 13:7851.
Ehrlich D, Kemper T, Pesaresi M, Corbane C. 2018. Built-up area and population density: Two essential societal variables to address climate hazard impact. Environmental Science & Policy 90:73–82.
Eicher CL, Brewer CA. 2001. Dasymetric mapping and areal interpolation: Implementation and evaluation. Cartography and Geographic Information Science 28:125–138.
Elvidge CD, Zhizhin M, Ghosh T, Hsu F-C, Taneja J. 2021. Annual time series of global VIIRS nighttime lights derived from monthly averages: 2012 to 2019. Remote Sensing 13:922.
Engstrom R, Newhouse DL, Soundararajan V. 2019. Estimating small area population density using survey data and satellite imagery: An application to sri lanka. World Bank Policy Research Working Paper.
Epanechnikov VA. 1969. Non-parametric estimation of a multivariate probability density. Theory of Probability & Its Applications 14:153–158.
Erbach-Schoenberg E zu, Alegana VA, Sorichetta A, Linard C, Lourenço C, Ruktanonchai NW, Graupe B, Bird TJ, Pezzulo C, Wesolowski A. 2016. Dynamic denominators: The impact of seasonally varying population numbers on disease incidence estimates. Population health metrics 14:1–10.
Esch T, Brzoska E, Dech S, Leutner B, Palacios-Lopez D, Metz-Marconcini A, Marconcini M, Roth A, Zeidler J. 2022. World settlement footprint 3D-a first three-dimensional survey of the global building stock. Remote sensing of environment 270:112877.
Esch T, Zeidler J, Palacios-Lopez D, Marconcini M, Roth A, Mönks M, Leutner B, Brzoska E, Metz-Marconcini A, Bachofer F. 2020. Towards a large-scale 3D modeling of the built environment—joint analysis of TanDEM-x, sentinel-2 and open street map data. Remote Sensing 12:2391.
European Space Agency Climate Change Initiative. 2017a. Waterbodies – version 4.0. ftp://geo10.elie.ucl.ac.be/v207/ESACCI-LC-L4-WB-Ocean-Land-Map-150m-P13Y-2000-v4.0.tif.
European Space Agency Climate Change Initiative. 2017b. Land cover CCI product user guide version 2. Tech. rep. maps.elie.ucl.ac.be/CCI/viewer/download/ESACCI-LC-Ph2-PUGv2_2.0.pdf.
European Space Agency Climate Change Initiative. 2019. ICDR – land cover 2019 – version 2.1.4. https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-land-cover?tab=form.
Farr TG, Rosen PA, Caro E, Crippen R, Duren R, Hensley S, Kobrick M, Paller M, Rodriguez E, Roth L. 2007. The shuttle radar topography mission. Reviews of geophysics 45.
Findlay MA. 2021. Doing development research. SAGE Publications.
Flasse L, Schewin C, Grapin-Botton A. 2021. Pancreas morphogenesis: Branching in and then out. Current topics in developmental biology 143:75–110.
Fleiss M, Kienberger S, Aubrecht C, Kidd R, Zeil P. 2011. Mapping the 2010 pakistan floods and its impact on human life: A post-disaster assessment of socioeconomic indicators. Geoinformation for Disaster Management.
Florczyk AJ, Corbane C, Ehrlich D, Freire S, Kemper T, Maffenini L, Melchiorri M, Pesaresi M, Politis P, Schiavina M. 2019. GHSL data package 2019. Luxembourg, eur 29788:290498.
Floyd JR, Ogola J, Fèvre EM, Wardrop N, Tatem AJ, Ruktanonchai NW. 2020. Activity-specific mobility of adults in a rural region of western kenya. PeerJ 8:e8798.
Freire S, MacManus K, Pesaresi M, Doxsey-Whitfield E, Mills J. 2016. Development of new open and free multi-temporal global population grids at 250 m resolution. Population 250.
Fuglstad G-A, Simpson D, Lindgren F, Rue H. 2019. Constructing priors that penalize the complexity of gaussian random fields. Journal of the American Statistical Association 114:445–452.
Gaughan AE, Stevens FR, Linard C, Jia P, Tatem AJ. 2013. High resolution population distribution maps for southeast asia in 2010 and 2015. PloS one 8:e55882.
Gaughan AE, Stevens FR, Linard C, Patel NN, Tatem AJ. 2015. Exploring nationally and regionally defined models for large area population mapping. International Journal of Digital Earth 8:989–1006.
Gavankar NL, Ghosh SK. 2018. Automatic building footprint extraction from high-resolution satellite image using mathematical morphology. European Journal of Remote Sensing 51:182–193.
Gelman A. 2007b. Struggles with survey weighting and regression modeling.
Gelman A. 2007a. Struggles with survey weighting and regression modeling. Statistical Science 22:153–164. doi:10.1214/088342306000000691.
Gelman C A. 2013. Bayesian data analysis (3rd edition). Chapman; Hall/CRC. doi:https://doi.org/10.1201/b16018.
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. 2013. Bayesian data analysis. CRC press.
Gelman A, Rubin DB. 1992. Inference from iterative simulation using multiple sequences. Statistical science 7:457–472. doi:10.1214/ss/1177011136.
Genuer R, Poggi J-M, Tuleau-Malot C. 2010a. Variable selection using random forests. Pattern recognition letters 31:2225–2236.
Genuer R, Poggi J-M, Tuleau-Malot C. 2010b. Variable selection using random forests. Pattern recognition letters 31:2225–2236. doi:10.1016/j.patrec.2010.03.014.
Georganos S, Grippa T, Niang Gadiaga A, Linard C, Lennert M, Vanhuysse S, Mboga N, Wolff E, Kalogirou S. 2021. Geographical random forests: A spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto International 36:121–136.
Gerland P, Raftery AE, Ševčíková H, Li N, Gu D, Spoorenberg T, Alkema L, Fosdick BK, Chunn J, Lalic N. 2014. World population stabilization unlikely this century. Science 346:234–237.
Ghana Statistical Services. 2010. Census enumeration area boundaries and urban/rural classification. Accra, Ghana: Ghana Statistical Services.
Giorgi E, Diggle PJ, Snow RW, Noor AM. 2018. Geostatistical methods for disease mapping and visualisation using data from spatio‐temporally referenced prevalence surveys. International Statistical Review 86:571–597.
Gómez-Rubio V, Bivand RS, Rue H. 2021. Estimating spatial econometrics models with integrated nested laplace approximation. Mathematics 9:2044.
Google. 2023. Open buildings. 2023. https://sites.research.google/open-buildings/.
Grippa T, Linard C, Lennert M, Georganos S, Mboga N, Vanhuysse S, Gadiaga A, Wolff E. 2019. Improving urban population distribution models with very-high resolution satellite information. Data 4:13.
Groupe H, Arter. 2014. Schéma d’orientation stratégique de l’agglomération kinoise (SOSAK) et plan particulier d’aménagement de la ville (PPA) (groupe huit and arter).
Harrison JG, Calder WJ, Shastry V, Buerkle CA. 2020. Dirichlet‐multinomial modelling outperforms alternatives for analysis of microbiome and other ecological count data. Molecular ecology resources 20:481–497.
Harris I, Osborn TJ, Jones P, Lister D. 2020. Version 4 of the CRU TS monthly high-resolution gridded multivariate climate dataset. Scientific data 7:1–18.
Hay SI, Noor AM, Nelson A, Tatem AJ. 2005. The accuracy of human population maps for public health application. Tropical medicine & international health 10:1073–1086.
Health U. 2023. SAPIENs: A tool to conduct small area population evaluations. 2021. https://unlimithealth.org/research/sapiens-project/.
Hijmans RJ. 2020. raster: Geographic data analysis and modeling. https://CRAN.R-project.org/package=raster.
Hijmans RJ, Etten J van, Sumner M, Cheng J, Baston D, Bevan A, Bivand R, Busetto L, Canty M, Forrest D, Ghosh A, Golicher D, Gray J, Greenberg JA, Hiemstra P, Hingee K, Geosciences I for MA, Karney C, Mattiuzzi M, Mosher S, Nowosad J, Pebesma E, Lamigueiro OP, Racine EB, Rowlingson B, Shortridge A, Venables B, Wueest R. 2020. Raster: Geographic data analysis and modeling. https://CRAN.R-project.org/package=raster.
Hijmans RJ, Van Etten J, Cheng J, Mattiuzzi M, Sumner M, Greenberg JA, Lamigueiro OP, Bevan A, Racine EB, Shortridge A. 2015a. Package “raster.” R package 734:473.
Hijmans RJ, Van Etten J, Cheng J, Mattiuzzi M, Sumner M, Greenberg JA, Lamigueiro OP, Bevan A, Racine EB, Shortridge A. 2015b. Package “raster.” R package 734:473.
Hobbs NT, Hooten MB. 2015. Bayesian models: A statistical primer for ecologists. Princeton University Press.
Hoffman MD, Gelman A. 2014. The no-u-turn sampler: Adaptively setting path lengths in hamiltonian monte carlo. J. Mach. Learn. Res. 15:1593–1623.
Hoffman MD, Gelman A. The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. :31.
Hurlbert SH. 1984. Pseudoreplication and the design of ecological field experiments. Ecological monographs 54:187–211.
IBGE. 2019. Brazilian territorial division, 2019 edition. Brazilian Institute of Geography and Environment (IBGE). https://www.ibge.gov.br/en/geosciences/territorial-organization/regional-division/23708-brazilian-territorial-division.html?=&t=o-que-e.
IBGE. 2020b. Meshes of census sectors intra-municipal divisions. Brazilian Institute of Geography and Environment (IBGE). http://geoftp.ibge.gov.br/organizacao_do_territorio/malhas_territoriais/malhas_de_setores_censitarios__divisoes_intramunicipais/2019/Malha_de_setores_(shp)_Brasil/.
IBGE. 2020a. Population estimates - tables 2020. Brazilian Institute of Geography and Environment (IBGE). https://www.ibge.gov.br/en/statistics/social/18448-population-estimates.html?=&t=resultados.
Institut Géographique du Burkina Faso. 2015. Base nationale de données topographiques.
Institut National de la Statistique et de la Démographie. 2019. Recensement général de la population et de l’habitation de 2019 du burkina faso - résultats provisoires.
Intergovernmental Panel on Climate Change. 2014. Climate change 2014: Synthesis report. Contribution of working groups I, II and III to the fifth assessment report of the intergovernmental panel on climate change 151.
International Federation of Red Cross and Red Crescent Societies. 2020. World disaster report 2020 – tackling the humanitarian impacts of the climate crisis together. https://www.ifrc.org/document/world-disasters-report-2020.
IOM. 2017. Displacement tracking matrix. Kajo keji, central equatoria: Paper registration | rapid intentions & multi-sectorial needs survey | 29 june-12 july 2017.
IOM. 2021. South sudan - baseline assessment round 9 - IDP and returnee. Data released 31st january 2021. https://displacement.iom.int.
James G, Witten D, Hastie T, Tibshirani R. 2013. An introduction to statistical learning. Springer.
Jochem WC, Bird TJ, Tatem AJ. 2018. Identifying residential neighbourhood types from settlement points in a machine learning approach. Computers, environment and urban systems 69:104–113.
Jochem W, Bondarenko M, Nieves J, Stevens F, Gaughan A, Kerr D, Tatem A, Sorichetta A. 2021a. popRF: Random forest-informed disaggregative population modelling and mapping. doi:10.13140/RG.2.2.24822.93763.
Jochem WC, Leasure DR, Pannell O, Chamberlain HR, Jones P, Tatem AJ. 2020. Classifying settlement types from multi-scale spatial patterns of building footprints. Environment and Planning B: Urban Analytics and City Science. doi:10.1177/2399808320921208.
Jochem WC, Leasure DR, Pannell O, Chamberlain HR, Jones P, Tatem AJ. 2021b. Classifying settlement types from multi-scale spatial patterns of building footprints. Environment and Planning B: Urban Analytics and City Science 48:1161–1179.
Jochem WC, Tatem AJ. 2021. Tools for mapping multi-scale settlement patterns of building footprints: An introduction to the r package foot. PLoS One 16:e0247535.
Juntunen T, Vanhatalo J, Peltonen H, Mäntyniemi S. 2012. Bayesian spatial multispecies modelling to assess pelagic fish stocks from acoustic-and trawl-survey data. ICES Journal of Marine Science 69:95–104.
Krainski ET, Gómez-Rubio V, Bakka H, Lenzi A, Castro-Camilo D, Simpson D, Lindgren F, Rue H. 2018. Advanced spatial modeling with stochastic partial differential equations using r and INLA. https://becarioprecario.bitbucket.io/spde-gitbook/.
KSPH. 2018. Microcensus survey data for the kinshasa, kongo central and former bandundu provinces (2017 and 2018).
Kuhn M, Wing J, Weston S, Williams A, Keefer C, Engelhardt A, Cooper T, Mayer Z, Kenkel B, R Core Team, Benesty M, Lescarbeau R, Ziem A, Scrucca L, Tang Y, Candan C, Hunt T. 2020. Caret: Classification and regression training. https://CRAN.R-project.org/package=caret.
Kummu M, De Moel H, Ward PJ, Varis O. 2011. How close do we live to water? A global analysis of population distance to freshwater bodies. PloS one 6:e20578.
Lai S, Bogoch II, Ruktanonchai NW, Watts A, Lu X, Yang W, Yu H, Khan K, Tatem AJ. 2020. Assessing spread risk of wuhan novel coronavirus within and beyond china, january-april 2020: A travel network-based modelling study. medRxiv.
Lamarche C, Santoro M, Bontemps S, d’Andrimont R, Radoux J, Giustarini L, Brockmann C, Wevers J, Defourny P, Arino O. 2017. Compilation and validation of SAR and optical data products for a complete and global map of inland/ocean water tailored to the climate modeling community. Remote Sensing 9:36.
Leasure Bondarenko, Tatem. 2020. Wopr: An r package to query the WorldPop open population repository, version 0.3.4. https://apps.worldpop.org/woprVision.
Leasure D, Bondarenko M, Darin E, Tatem A. 2021a. Wopr: An r package to query the WorldPop open population repository, version 1.0. 0.
Leasure DR, Bondarenko M, Tatem AJ. 2020a. wopr: An R package to query the WorldPop Open Population Repository, version 0.3.4. WorldPop, University of Southampton. doi:10.5258/SOTON/WP00679. https://apps.worldpop.org/woprVision.
Leasure DR, Dooley CA, Bondarenko M, Tatem AJ. 2020b. peanutButter: An R package to produce rapid-response gridded population estimates from building footprints, version 0.3.0. doi:10.5258/SOTON/WP00681.
Leasure DR, Dooley CA, Maksym B, Tatem AJ. 2020c. peanutButter: An r package to produce rapid-response gridded population estimates from building footprints, version 0.1.0. WorldPop Research Group, University of Southampton. doi:10.5258/SOTON/WP00667. https://github.com/wpgp/peanutButter.
Leasure D, Dooley C, Tatem A. 2021b. A simulation study exploring weighted likelihood models to recover unbiased population estimates from weighted survey data.
Leasure DR, Jochem WC, Weber EM, Seaman V, Tatem AJ. 2020f. National population mapping from sparse survey data: A hierarchical bayesian modeling framework to account for uncertainty. Proceedings of the National Academy of Sciences.
Leasure DR, Jochem WC, Weber EM, Seaman V, Tatem AJ. 2020e. National population mapping from sparse survey data: A hierarchical bayesian modeling framework to account for uncertainty. Proceedings of the National Academy of Sciences 117:24173–24179.
Leasure DR, Jochem WC, Weber EM, Seaman V, Tatem AJ. 2020g. National population mapping from sparse survey data: A hierarchical bayesian modeling framework to account for uncertainty. Proceedings of the National Academy of Sciences 117:24173–24179.
Leasure DR, Jochem WC, Weber EM, Seaman V, Tatem AJ. 2020h. National population mapping from sparse survey data: A hierarchical bayesian modeling framework to account for uncertainty. Proceedings of the National Academy of Sciences 117:24173–24179.
Leasure DR, Jochem WC, Weber EM, Seaman V, Tatem AJ. 2020d. National population mapping from sparse survey data: A hierarchical bayesian modeling framework to account for uncertainty. Proceedings of the National Academy of Sciences 201913050.
Leasure DR, Jochem WC, Weber EM, Seaman V, Tatem AJ. 2020i. National population mapping from sparse survey data: A hierarchical Bayesian modeling framework to account for uncertainty. Proceedings of the National Academy of Sciences 117:24173–24179. doi:10.1073/pnas.1913050117. https://www.pnas.org/content/117/39/24173.
Leasure DR, Tatem AJ. 2020b. Bayesian gridded population estimates for ghana 2018, version 1.0. WorldPop, University of Southampton. doi:10.5258/SOTON/WP00680. https://wopr.worldpop.org/?GHA/Population/v1.0.
Leasure DR, Tatem AJ. 2020a. A bayesian approach to produce 100 m gridded population estimates using census microdata and recent building footprints. WorldPop, University of Southampton. doi:10.5258/SOTON/WP00686.
Lee D. 2011. A comparison of conditional autoregressive models used in bayesian disease mapping. Spatial and spatio-temporal epidemiology 2:79–89.
Lee SA, Economou T, Lowe R. 2022. A bayesian modelling framework to quantify multiple sources of spatial variation for disease mapping. Journal of the Royal Society Interface 19:20220440.
Lehner B, Verdin K, Jarvis A. 2008. New global hydrography derived from spaceborne elevation data. Eos, Transactions American Geophysical Union 89:93–94.
Levesque J-F, Harris MF, Russell G. 2013. Patient-centred access to health care: Conceptualising access at the interface of health systems and populations. International journal for equity in health 12:1–9.
Leyk S, Gaughan AE, Adamo SB, Sherbinin A de, Balk D, Freire S, Rose A, Stevens FR, Blankespoor B, Frye C. 2019. The spatial allocation of population: A review of large-scale gridded population data products and their fitness for use. Earth System Science Data 11:1385–1409.
Liaw A, Wiener M. 2002a. Classification and regression by randomForest. R News 2:18–22. https://CRAN.R-project.org/doc/Rnews/.
Liaw A, Wiener M. 2002b. Classification and regression by randomForest. R News 2:18–22. https://cran.r-project.org/package=randomForest.
Liaw A, Wiener M. 2002c. Classification and regression by randomForest. R news 2:18–22.
Li W, He C, Fang J, Zheng J, Fu H, Yu L. 2019. Semantic segmentation-based building footprint extraction using very high-resolution satellite images and multi-source GIS data. Remote Sensing 11:403.
Linard C, Gilbert M, Tatem AJ. 2011. Assessing the use of global land cover data for guiding large area population distribution modelling. GeoJournal 76:525–538.
Lindgren F, Rue H. 2015. Bayesian spatial modelling with r-INLA. Journal of Statistical Software 63:1–25. doi:10.18637/jss.v063.i19.
Lindgren F, Rue H, Lindström J. 2011. An explicit link between gaussian fields and gaussian markov random fields: The stochastic partial differential equation approach. Journal of the Royal Statistical Society Series B: Statistical Methodology 73:423–498.
Liu P, Liu X, Liu M, Shi Q, Yang J, Xu X, Zhang Y. 2019. Building footprint extraction from high-resolution images via spatial residual inception convolutional neural network. Remote Sensing 11:830.
Lloyd CT, Chamberlain H, Kerr D, Yetman G, Pistolesi L, Stevens FR, Gaughan AE, Nieves JJ, Hornby G, MacManus K, others. 2019. Global spatio-temporally harmonised datasets for producing high-resolution gridded population distribution datasets. Big earth data 3:108–139.
Lloyd CT, Sorichetta A, Tatem AJ. 2017b. High resolution global gridded data for use in population studies. Scientific data 4:1–17.
Lloyd CT, Sorichetta A, Tatem AJ. 2017a. High resolution global gridded data for use in population studies. Scientific data 4:1–17.
Lloyd CT, Sturrock HJ, Leasure DR, Jochem WC, Lazar AN, Tatem AJ. 2020. Classifying residential status of urban building types in low and middle income settings. Remote Sensing.
Marconcini M, Metz-Marconcini A, Üreyen S, Palacios-Lopez D, Hanke W, Bachofer F, Zeidler J, Esch T, Gorelick N, Kakarla A. 2020a. Outlining where humans live, the world settlement footprint 2015. Scientific Data 7:242.
Marconcini M, Metz A, Zeidler J, Esch T. 2020b. Urban monitoring in support of sustainable cities. In: 2015 joint urban remote sensing event (JURSE). IEEE, 1–4.
Marivoet W, De Herdt T. 2017. From figures to facts: Making sense of socioeconomic surveys in the democratic republic of the congo (DRC). Analysis and policy brief/University of Antwerp, Institute of Development Policy and Management; 23.
Marivoet W, De Herdt T. 2018. Tracing down real socio-economic trends from household data with erratic sampling frames: The case of the democratic republic of the congo. Journal of Asian and African Studies 53:532–552.
McCullagh P, Nelder JA. 1989. Generalized linear models, 2nd edition. Chapman; Hall/CRC.
Mcdonald RI, Forman RT, Kareiva P, Neugarten R, Salzer D, Fisher J. 2009. Urban effects, distance, and protected areas in an urbanizing world. Landscape and Urban Planning 93:63–75.
McKeen T, Bondarenko M, Kerr D, Esch T, Marconcini M, Palacios-Lopez D, Zeidler J, Valle RC, Juran S, Tatem AJ. 2023. High-resolution gridded population datasets for latin america and the caribbean using official statistics. Scientific Data 10:436.
Mennis J. 2003. Generating surface models of population using dasymetric mapping. The Professional Geographer 55:31–42.
Mennis J, Hultgren T. 2006. Intelligent dasymetric mapping and its application to areal interpolation. Cartography and Geographic Information Science 33:179–194.
Microsoft. 2022. Worldwide building footprints derived from satellite imagery (GitHub repository). 2022. https://github.com/microsoft/GlobalMLBuildingFootprints.
Minnesota Population Center. 2019. Integrated Public Use Microdata Series, International: Version 7.2. Minneapolis, MN: IPUMS.
Minnesota Population Center. 2020. Integrated public use microdata series, international: Version 7.3 [south sudan 2008 census]. Minneapolis, MN: IPUMS, 2020. https://doi.org/10.18128/D020.V7.3.
Mossoux S, Kervyn M, Soulé H, Canters F. 2018. Mapping population distribution from high resolution remotely sensed imagery in a data poor setting. Remote Sensing 10:1409.
Moultrie TA, Dorrington RE, Hill AG, Hill K, Timæus IM, Zaba B. 2013. Tools for demographic estimation. International Union for the Scientific Study of Population.
Nagle NN, Buttenfield BP, Leyk S, Spielman S. 2014. Dasymetric modeling and uncertainty. Annals of the Association of American Geographers 104:80–95.
National Bureau of Statistics. 2015. Population projections, south sudan. From 2015 - 2020.
National Institute of Statistics. 2016. Projections démographiques et estimations des cibles prioritaires des différents programmes et interventions de santé. Ministère de la santé publique, cameroon, june 2016. 144 pages. https://ins-cameroun.cm/en/statistique/projections-demographiques-et-estimations-des-cibles-prioritaires-des-differents-programmes-et-interventions-de-sante/.
Nieves JJ, Stevens FR, Gaughan AE, Linard C, Sorichetta A, Hornby G, Patel NN, Tatem AJ. 2017. Examining the correlates and drivers of human population distributions across low-and middle-income countries. Journal of the Royal Society interface 14:20170401.
Nilsen K, Tejedor-Garavito N, Leasure DR, Utazi CE, Ruktanonchai CW, Wigley AS, Dooley CA, Matthews Z, Tatem AJ. 2021. A review of geospatial methods for population estimation and their use in constructing reproductive, maternal, newborn, child and adolescent health service indicators. BMC health services research 21:1–10.
Nordstrand E, Frye C. 2014. World population estimate. doi:10.13140/RG.2.2.18213.14565.
Oak Ridge National Laboratory. 2018a. LandScan HD: nigeria.
Oak Ridge National Laboratory. 2018b. LandScan HD: Nigeria (oak ridge national laboratory).
Olorunfemi J, Fashagba I. 2021. Population census administration in nigeria. Nigerian Politics:353–367.
OpenStreetMap contributors. 2018. Planet dump retrieved from https://planet.osm.org.
Paige J, Fuglstad G-A, Riebler A, Wakefield J. 2022. Spatial aggregation with respect to a population distribution: Impact on inference. Spatial Statistics 52:100714.
Palacios-Lopez D, Bachofer F, Esch T, Heldens W, Hirner A, Marconcini M, Sorichetta A, Zeidler J, Kuenzer C, Dech S. 2019. New perspectives for mapping global population distribution using world settlement footprint products. Sustainability 11:6056.
Palacios-Lopez D, Bachofer F, Esch T, Marconcini M, MacManus K, Sorichetta A, Zeidler J, Dech S, Tatem AJ, Reinartz P. 2021. High-resolution gridded population datasets: Exploring the capabilities of the world settlement footprint 2019 imperviousness layer for the african continent. Remote Sensing 13:1142.
Palacios-Lopez D, Esch T, MacManus K, Marconcini M, Sorichetta A, Yetman G, Zeidler J, Dech S, Tatem AJ, Reinartz P. 2022. Towards an improved large-scale gridded population dataset: A pan-european study on the integration of 3D settlement data into population modelling. Remote Sensing 14:325.
Pan American Health Organization. 2017. Health in the americas+, 2017 edition: Summary: Regional outlook and country profiles. Pan American Health Organization.
Pandey PC, Koutsias N, Petropoulos GP, Srivastava PK, Ben Dor E. 2021. Land use/land cover in view of earth observation: Data sources, input dimensions, and classifiers—a review of the state of the art. Geocarto International 36:957–988.
Pateiro-López B, Rodríguez-Casal A. 2010. Generalizing the convex hull of a sample: The r package alphahull. Journal of Statistical software 34:1–28.
Pebesma E. 2018c. Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal 10:439–446. doi:10.32614/RJ-2018-009. https://doi.org/10.32614/RJ-2018-009.
Pebesma EJ. 2018a. Simple features for r: Standardized support for spatial vector data. R J. 10:439.
Pebesma EJ. 2018b. Simple features for r: Standardized support for spatial vector data. R J. 10:439.
Pebesma E, Bivand R, Racine E, Sumner M, Cook I, Keitt T, Lovelace R, Wickham H, Ooms J, Müller K, Pedersen TL, Baston D. 2020. Sf: Simple features for r. https://CRAN.R-project.org/package=sf.
Pettit L. 1990. The conditional predictive ordinate for the normal distribution. Journal of the Royal Statistical Society: Series B (Methodological) 52:175–184.
Pezzulo C, Hornby GM, Sorichetta A, Gaughan AE, Linard C, Bird TJ, Kerr D, Lloyd CT, Tatem AJ. 2017b. Sub-national mapping of population pyramids and dependency ratios in africa and asia. Scientific data 4:1–15.
Pezzulo C, Hornby GM, Sorichetta A, Gaughan AE, Linard C, Bird TJ, Kerr D, Lloyd CT, Tatem AJ. 2017a. Sub-national mapping of population pyramids and dependency ratios in africa and asia. Scientific Data 4:170089.
Plummer M. 2003a. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In: Proceedings of the 3rd international workshop on distributed statistical computing. Vienna, Austria., 1–10. http://mcmc-jags.sourceforge.net/.
Plummer M. 2003b. JAGS: A program for analysis of bayesian graphical models using gibbs sampling. In: Proceedings of the 3rd international workshop on distributed statistical computing. Vienna, Austria, 1–10.
Plummer M. JAGS: A program for analysis of bayesian graphical models using gibbs sampling. In: Proceedings of the 3rd international workshop on distributed statistical computing. Vienna, Austria, 1–10.
Plummer M, Best N, Cowles K, Vines K. 2006. CODA: Convergence diagnosis and output analysis for MCMC. R news 6:7–11.
Qader S, Lefebvre V, Tatem A, Pape U, Himelein K, Ninneman A, Bengtsson L, Bird T. 2021. Semi-automatic mapping of pre-census enumeration areas and population sampling frames. Humanities and Social Sciences Communications 8:3.
Qader SH, Lefebvre V, Tatem AJ, Pape U, Jochem W, Himelein K, Ninneman A, Wolburg P, Nunez-Chaim G, Bengtsson L, Bird T. 2020. Using gridded population and quadtree sampling units to support survey sample design in low-income settings. International Journal of Health Geographics 19:10.
Qiu Y, Zhao X, Fan D, Li S, Zhao Y. 2022. Disaggregating population data for assessing progress of SDGs: Methods and applications. International Journal of Digital Earth 15:2–29.
R Core Team. 2013. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
R Core Team. 2020b. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
R Core Team. 2020a. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Raftery AE, Li N, Ševčíková H, Gerland P, Heilig GK. 2012. Bayesian probabilistic population projections for all countries. Proceedings of the National Academy of Sciences 109:13915–13921.
Raleigh C, Linke rew, Hegre H, Karlsen J. 2010. Introducing ACLED: An armed conflict location and event dataset. Journal of peace research 47:651–660.
Rao CR. 1992. Information and the accuracy attainable in the estimation of statistical parameters. In: Breakthroughs in statistics: Foundations and basic theory. Springer, 235–247.
Reguero BG, Losada IJ, Díaz-Simal P, Méndez FJ, Beck MW. 2015. Effects of climate change on exposure to coastal flooding in latin america and the caribbean. PLoS One 10:e0133409.
Robert CP, Roberts G. 2021. Rao–blackwellisation in the markov chain monte carlo era. International Statistical Review 89:237–249.
Robnik-Šikonja M. 2004. Improving random forests. In: European conference on machine learning. Springer, 359–370.
Royle JA, Dorazio RM. 2008. Hierarchical modeling and inference in ecology: The analysis of data from populations, metapopulations and communities. Elsevier.
Rue H, Held L. 2005. Gaussian markov random fields: Theory and applications. CRC press.
Rue H, Martino S, Chopin N. 2009. Approximate bayesian inference for latent gaussian models by using integrated nested laplace approximations. Journal of the Royal Statistical Society Series B: Statistical Methodology 71:319–392.
Ruktanonchai NW, Floyd JR, Lai S, Ruktanonchai CW, Sadilek A, Rente-Lourenco P, Ben X, Carioli A, Gwinn J, Steele JE, Prosper O, Schneider A, Oplinger A, Eastham P, Tatem AJ. 2020. Assessing the impact of coordinated COVID-19 exit strategies across europe. Science 369:1465–1470.
Sachs JD. 2012. From millennium development goals to sustainable development goals. The lancet 379:2206–2211.
Schumacher JV, Redmond RL, Hart MM, Jensen ME. Mapping patterns of human use and potential resource conflicts on public lands. In: Monitoring ecological condition in the western united states: Proceedings of the fourth symposium on the environmental monitoring and assessment program (EMAP), san franciso, CA, april 6–8, 1999. Springer, 127–137.
Sims K, Reith A, Bright E, Kaufman J, Pyle J, Epting J, Gonzales J, Adams D, Powell E, Urban M, Rose A. 2023. LandScan global 2022. doi:10.48690/1529167. landscan.ornl.gov.
Skinner C. 2018. Issues and challenges in census taking. Annual Review of Statistics and its Application 5:49–63.
Sorichetta A, Hornby GM, Stevens FR, Gaughan AE, Linard C, Tatem AJ. 2015a. High-resolution gridded population datasets for Latin America and the Caribbean in 2010, 2015, and 2020. Scientific Data 2:1–12. doi:10.1038/sdata.2015.45.
Sorichetta A, Hornby GM, Stevens FR, Gaughan AE, Linard C, Tatem AJ. 2015b. High-resolution gridded population datasets for latin america and the caribbean in 2010, 2015, and 2020. Scientific data 2:1–12.
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. 2002. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society Series B: Statistical Methodology 64:583–639.
Stan Development Team. 2019a. Stan User’s Guide. https://mc-stan.org/docs/2_26/stan-users-guide.
Stan Development Team. 2019b. Stan Reference Manual. https://mc-stan.org/docs/2_26/reference-manual/increment-log-prob-section.html.
Stan Development Team. 2020. RStan: the R interface to Stan. R package version 2.19.3. http://mc-stan.org/.
Stathakis D, Baltas P. 2018. Seasonal population estimates based on night-time lights. Computers, Environment and Urban Systems 68:133–141.
Steele JE, Sundsøy PR, Pezzulo C, Alegana VA, Bird TJ, Blumenstock J, Bjelland J, Engø-Monsen K, Montjoye Y-AD, Iqbal AM, Hadiuzzaman KN, Lu X, Wetter E, Tatem AJ, Bengtsson L. 2017b. Mapping poverty using mobile phone and satellite data. Journal of The Royal Society Interface 14:20160690. doi:10.1098/rsif.2016.0690.
Steele JE, Sundsøy PR, Pezzulo C, Alegana VA, Bird TJ, Blumenstock J, Bjelland J, Engø-Monsen K, Montjoye Y-Ad, Iqbal AM, Hadiuzzaman KN, Lu X, Wetter E, Tatem AJ, Bengtsson L. 2017a. Mapping poverty using mobile phone and satellite data. Journal of The Royal Society Interface 14:20160690.
Stevens FR, Gaughan AE, Linard C, Tatem AJ. 2015c. Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data. PLOS ONE 10:e0107042. doi:10.1371/journal.pone.0107042.
Stevens FR, Gaughan AE, Linard C, Tatem AJ. 2015d. Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data. PloS one 10:e0107042.
Stevens FR, Gaughan AE, Linard C, Tatem AJ. 2015b. Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data. PloS one 10:e0107042.
Stevens FR, Gaughan AE, Linard C, Tatem AJ. 2015a. Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data. PLoS ONE 10:e0107042.
Stewart R, Urban M, Duchscherer S, Kaufman J, Morton A, Thakur G, Piburn J, Moehl J. 2016. A bayesian machine learning model for estimating building occupancy from open source data. Natural Hazards 81:1929–1956.
Sturrock HJ, Woolheater K, Bennett AF, Andrade-Pacheco R, Midekisa A. 2018. Predicting residential structures from open source remotely enumerated data using machine learning. PloS one 13:e0204399.
Tatem AJ. 2014. Mapping population and pathogen movements. International health 6:5–11.
Tatem AJ. 2017. WorldPop, open data for spatial demography. Scientific data 4:1–4.
Tatem A. 2022. Small area population denominators for improved disease surveillance and response. Epidemics 41:100641.
Tatem AJ, Noor AM, Von Hagen C, Di Gregorio A, Hay SI. 2007. High resolution population maps for low income nations: Combining land cover and census in east africa. PloS one 2:e1298.
Thomson DR, Rhoda DA, Tatem AJ, Castro MC. 2020. Gridded population survey sampling: A systematic scoping review of the field and strategic research agenda. International journal of health geographics 19:1–16.
Tiecke TG, Liu X, Zhang A, Gros A, Li N, Yetman G, Kilic T, Murray S, Blankespoor B, Prydz EB. 2017. Mapping the world population one building at a time. arXiv preprint arXiv:1712.05839.
Tobler WR. 1970. A computer movie simulating urban growth in the detroit region. Economic geography 46:234–240.
Tusting LS, Bisanzio D, Alabaster G, Cameron E, Cibulskis R, Davies M, Flaxman S, Gibson HS, Knudsen J, Mbogo C, others. 2019. Mapping changes in housing in sub-saharan africa from 2000 to 2015. Nature 568:391–394.
UNFPA. 2020a. The value of modelled population estimates for census planning and preparation. Technical guidance note. https://www.unfpa.org/resources/value-modelled-population-estimates-census-planning-and-preparation.
UNFPA. 2020b. The value of modelled population estimates for census planning and preparation. Technical guidance note. https://www.unfpa.org/resources/value-modelled-population-estimates-census-planning-and-preparation.
UNHCR. 2019. Regional intention survey of south sudanese refugees. https://microdata.unhcr.org/index.php/catalog/224.
UNHCR. 2020. Regional overview of the south sudanese refugee population: 2020 SOUTH SUDAN REGIONAL RRRP.
United Nations Department of Economic Social Affairs Population Division. 2022. World population prospects 2022: Summary of results. UN DESA/POP/2022/TR/NO. 3.
United Nations Office for the Coordination of Humanitarian Affairs. 2019. Natural disasters 2000-2019 -latin america and the caribbean. https://www.humanitarianresponse.info/en/operations/latin-america-and-caribbean/document/latin-america-and-caribbean-natural-disasaters-2000.
United Nations Population Fund (UNFPA). 2019. Hybrid census. Technical brief. https://www.unfpa.org/resources/new-methodology-hybrid-census-generate-spatially-disaggregated-population-estimates.
United Nations Population Fund (UNFPA). 2020. The value of modeled population estimates for census planning and preparation. Technical guidance note. https://www.unfpa.org/resources/value-modelled-population-estimates-census-planning-and-preparation.
United Nations Satellite Centre. 2020a. United nations institute for training and research (UNITAR). Satellite detected waters in nghe an province of viet nam as of 31 october 2020. https://unosat.org/products/2952.
United Nations Satellite Centre. 2020b. United nations institute for training and research (UNITAR). Satellite detected waters in thua thien hue province of viet nam as of 10 november 2020. https://unosat.org/products/2964.
Utazi CE, Thorley J, Alegana VA, Ferrari MJ, Takahashi S, Metcalf CJE, Lessler J, Cutts FT, Tatem AJ. 2019. Mapping vaccination coverage to explore the effects of delivery mechanisms and inform vaccination strategies. Nature communications 10:1–10. doi:10.1038/s41467-019-09611-1.
Utazi CE, Thorley J, Alegana VA, Ferrari MJ, Takahashi S, Metcalf CJE, Lessler J, Tatem AJ. 2018. High resolution age-structured mapping of childhood vaccination coverage in low and middle income countries. Vaccine 36:1583–1591.
Utazi CE, Wagai J, Pannell O, Cutts FT, Rhoda DA, Ferrari MJ, Dieng B, Oteri J, Danovaro-Holliday MC, Adeniran A, Tatem AJ. 2020. Geospatial variation in measles vaccine coverage through routine and campaign strategies in nigeria: Analysis of recent household surveys. Vaccine 38:3062–3071.
Wakefield J. 2007. Disease mapping and spatial regression with count data. Biostatistics 8:158–183.
Wallig M, Corporation M, Weston S, Tenenbaum D. 2020. doParallel: Foreach parallel adaptor for the ’parallel’ package. https://CRAN.R-project.org/package=doParallel.
Wardrop N, Jochem W, Bird T, Chamberlain H, Clarke D, Kerr D, Bengtsson L, Juran S, Seaman V, Tatem A. 2018c. Spatially disaggregated population estimates in the absence of national population and housing census data. Proceedings of the National Academy of Sciences 115:3529–3537.
Wardrop N, Jochem W, Bird T, Chamberlain H, Clarke D, Kerr D, Bengtsson L, Juran S, Seaman V, Tatem A. 2018b. Spatially disaggregated population estimates in the absence of national population and housing census data. Proceedings of the National Academy of Sciences 115:3529–3537.
Wardrop N, Jochem W, Bird T, Chamberlain H, Clarke D, Kerr D, Bengtsson L, Juran S, Seaman V, Tatem A. 2018d. Spatially disaggregated population estimates in the absence of national population and housing census data. Proceedings of the National Academy of Sciences 115:3529–3537.
Wardrop NA, Jochem WC, Bird TJ, Chamberlain HR, Clarke D, Kerr D, Bengtsson L, Juran S, Seaman V, Tatem AJ. 2018a. Spatially disaggregated population estimates in the absence of national population and housing census data. Proceedings of the National Academy of Sciences 115:3529–3537.
Watanabe S. 2013. A widely applicable bayesian information criterion. The Journal of Machine Learning Research 14:867–897.
Weber EM, Seaman VY, Stewart RN, Bird TJ, Tatem AJ, McKee JJ, Bhaduri BL, Moehl JJ, Reith AE. 2018a. Census-independent population mapping in northern nigeria. Remote sensing of environment 204:786–798.
Weber EM, Seaman VY, Stewart RN, Bird TJ, Tatem AJ, McKee JJ, Bhaduri BL, Moehl JJ, Reith AE. 2018b. Census-independent population mapping in northern nigeria. Remote sensing of environment 204:786–798.
Weber EM, Seaman VY, Stewart RN, Bird TJ, Tatem AJ, McKee JJ, Bhaduri BL, Moehl JJ, Reith AE. 2018c. Census-independent population mapping in northern nigeria. Remote Sensing of Environment 204:786–798. doi:10.1016/j.rse.2017.09.024. http://www.sciencedirect.com/science/article/pii/S0034425717304364.
Weiss DJ, Nelson A, Gibson H, Temperley W, Peedell S, Lieber A, Hancher M, Poyart E, Belchior S, Fullman N, others. 2018b. A global map of travel time to cities to assess inequalities in accessibility in 2015. Nature 553:333–336.
Weiss DJ, Nelson A, Gibson H, Temperley W, Peedell S, Lieber A, Hancher M, Poyart E, Belchior S, Fullman N, others. 2018a. A global map of travel time to cities to assess inequalities in accessibility in 2015. Nature 553:333336.
Wesolowski A, Qureshi T, Boni MF, Sundsøy PR, Johansson MA, Rasheed SB, Engø-Monsen K, Buckee CO. 2015. Impact of human mobility on the emergence of dengue epidemics in pakistan. Proceedings of the National Academy of Sciences 112:11887–11892.
Wickham H, François R, Henry L, Müller K, RStudio. 2020. Dplyr: A grammar of data manipulation. https://CRAN.R-project.org/package=dplyr.
Wigley AS, Tejedor-Garavito N, Alegana V, Carioli A, Ruktanonchai CW, Pezzulo C, Matthews Z, Tatem AJ, Nilsen K. 2020. Measuring the availability and geographical accessibility of maternal health services across sub-saharan africa. BMC Medicine 18:237.
Wilkin J, Biggs E, Tatem AJ. 2019. Measurement of social networks for innovation within community disaster resilience. Sustainability 11:1943.
Wilson R, Erbach-Schoenberg E, Albert M, Power D, Tudge S, Gonzalez M, Guthrie S, Chamberlain H, Brooks C, Hughes C, Pitonakova L, Buckee C, Lu X, Wetter E, Tatem A, Bengtsson L. 2016. Rapid and near real-time assessments of population displacement using mobile phone data following disasters: The 2015 nepal earthquake.
World Health Organization. 2014. WHO-UNICEF guidelines for comprehensive multi-year planning for immunization - update september 2013. https://apps.who.int/iris/bitstream/handle/10665/100618/WHO_IVB_14.01_eng.pdf.
World Health Organization. 2016. Global routine immunization strategies and practices (GRISP): A companion document to the global vaccine action plan (GVAP).
World Health Organization and the United Nations Children’s Fund. 2022. Primary health care measurement framework and indicators: Monitoring health systems through a primary health care lens. Geneva.
World Wildlife Fund. 2006. 3 arc-second GRID: Void-filled DEM. www.hydrosheds.org/downloads.
WorldPop. 2018a. Global high resolution population denominators project. 2018. https://www.worldpop.org/doi/10.5258/SOTON/WP00645.
WorldPop. 2018b. Global high resolution population denominators project. 2018. https://www.worldpop.org/doi/10.5258/SOTON/WP00645.
WorldPop. 2020b. Bottom-up gridded population estimates for the Kinshasa, Kongo-Central, Kwango, Kwilu, and Mai-Ndombe provinces in the Democratic Republic of the Congo, version 1.0. WorldPop, University of Southampton. doi:10.5258/SOTON/WP00658. https://wopr.worldpop.org/?COD/population/v1.0.
WorldPop. 2020a. Bottom-up gridded population estimates for Zambia, version 1.0. WorldPop, University of Southampton. doi:10.5258/SOTON/WP00662. https://wopr.worldpop.org/?ZMB/population/v1.0.
WorldPop Research Group, University of Southampton, Department of Geography and Geosciences, University of Louisville, Departement de Geographie, Universite de Namur, Center for International Earth Science Information Network (CIESIN), Columbia University. 2018. Global high resolution population denominators project - funded by the bill and melinda gates foundation (OPP1134076). doi:10.5258/SOTON/WP00645.
WorldPop Research Group, Department of Geography and Geosciences, University of Louisville, Departement de Geographie, Universite de Namur, Center for International Earth Science Information Network (CIESIN), Columbia University. 2018. Global high resolution population denominators project - funded by the bill and melinda gates foundation (OPP1134076). doi:10.5258/SOTON/WP00645.
WorldPop, CIESIN. 2018d. Geospatial covariate data layers: VIIRS night-time lights (2012-216), Brazil. WorldPop, University of Southampton. doi:10.5258/SOTON/WP00644. ftp://ftp.worldpop.org/GIS/Covariates/Global_2000_2020/BRA/VIIRS/.
WorldPop, CIESIN. 2018c. Administrative Areas: National Boundaries, Brazil. WorldPop, University of Southampton. doi:10.5258/SOTON/WP00651. ftp://ftp.worldpop.org/GIS/Mastergrid/Global_2000_2020/BRA/L0/.
WorldPop, CIESIN. 2018a. Global High Resolution Population Denominators Project - Funded by The Bill and Melinda Gates Foundation (OPP1134076). WorldPop, University of Southampton. doi:10.5258/SOTON/WP00651.
WorldPop, CIESIN. 2018b. Global High Resolution Population Denominators Project - Funded by The Bill and Melinda Gates Foundation (OPP1134076). WorldPop, University of Southampton. ftp://ftp.worldpop.org/GIS/Population/Global_2000_2020/CensusTables/.
WorldPop, Institut National de la Statistique et de la Démographie du Burkina Faso. 2020. Census-based gridded population estimates for burkina faso (2019), version 1.0. WorldPop, University of Southampton. doi:10.5258/SOTON/WP00687.
Yankey O, Utazi CE, Nnanatu CC, Gadiaga AN, Abbot T, Lazar AN, Tatem AJ. 2024. Disaggregating census data for population mapping using a bayesian additive regression tree model. Applied Geography 172:103416.
Zagatti GA, Gonzalez M, Avner P, Lozano-Gracia N, Brooks CJ, Albert M, Gray J, Antos SE, Burci P, Erbach-Schoenberg E zu, Tatem AJ, Wetter E, Bengtsson L. 2018. A trip to work: Estimation of origin and destination of commuting patterns in the main metropolitan regions of haiti using CDR. Development Engineering 3:133–165.

Tables

Weighted-precision Model

Note: The five scenarios shown are: (pop) simulated “true” population; (ru) random sample data with an unweighted model; (wu) weighted sample data with an unweighted model; (ww) weighted sample data with a weighted-precision model; (cw) combined sample data (weighted and random) with a weighted-precision model. The “lower” and “upper” columns show 95% credible intervals.

Table 7.1: Table 7.2: Median \(\mu\) parameter estimate: Summary statistics for the posterior distributions from the unweighted and weighted-precision models.
median mean lower upper
pop 250.0 250.0 250.0 250.0
ru 254.0 254.0 248.4 259.3
wu 323.5 323.5 316.4 330.5
ww 254.7 254.7 249.4 259.9
cw 248.6 248.6 243.0 254.4
Table 7.3: Table 7.4: Mean (\(\mu e^{0.5 \sigma^2}\)) parameter: Summary statistics for the posterior distributions from the unweighted and weighted-precision models.
median mean lower upper
pop 283.3 283.3 283.3 283.3
ru 287.0 287.1 280.4 293.7
wu 365.6 365.7 357.4 374.4
ww 288.2 288.3 282.0 294.6
cw 285.0 285.0 278.5 292.0
Table 7.5: Table 7.6: Standard deviation \(\sigma\) parameter: Summary statistics for the posterior distributions from the unweighted and weighted-precision models.
median mean lower upper
pop 0.500 0.500 0.500 0.500
ru 0.495 0.495 0.480 0.511
wu 0.495 0.495 0.480 0.510
ww 0.498 0.498 0.482 0.514
cw 0.523 0.523 0.507 0.540
Table 7.7: Table 7.8: Derived population totals \(T\): Summary statistics for the derived posterior distributions from the unweighted and weighted-precision models.
median mean lower upper
pop 283,246,502 283,246,502 283,246,502 283,246,502
ru 287,007,075 287,051,257 280,441,789 293,578,137
wu 365,627,317 365,697,914 357,366,590 374,421,369
ww 288,247,168 288,283,242 281,977,499 294,695,720
cw 285,019,319 285,030,703 278,418,950 292,024,475

See Fig. 7.5 for density plots of posteriors from Tables 7.1, 7.3, and 7.5; and see Fig. 7.7 for a barplot of Table 7.7.

Weighted-likelihood Model

Note: The five scenarios shown are: (pop) simulated “true” population; (ru) random sample data with an unweighted model; (wu) weighted sample data with an unweighted model; (ww) weighted sample data with a weighted-likelihood model; (cw) combined sample data (weighted and random) with a weighted-likelihood model. The “lower” and “upper” columns show 95% credible intervals.

Table 7.9: Table 7.10: Median \(\mu\) parameter: Summary statistics for the posterior distributions from the unweighted and weighted-likelihood models.
median mean lower upper
pop 250.0 250.0 250.0 250.0
ru 254.0 254.0 248.4 259.3
wu 323.5 323.5 316.4 330.5
ww 254.7 254.7 254.7 254.7
cw 248.5 248.5 248.5 248.5
Table 7.11: Table 7.12: Mean (\(\mu e^{0.5 \sigma^2}\)) parameter: Summary statistics for the posterior distributions from the unweighted and weighted-likelihood models.
median mean lower upper
pop 283.3 283.3 283.3 283.3
ru 287.0 287.1 280.4 293.7
wu 365.6 365.7 357.4 374.4
ww 286.1 286.1 286.1 286.1
cw 283.3 283.3 283.2 283.3
Table 7.13: Table 7.14: Standard deviation \(\sigma\) parameter: Summary statistics for the posterior distributions from the unweighted and weighted-likelihood models.
median mean lower upper
pop 0.500 0.500 0.500 0.500
ru 0.495 0.495 0.480 0.511
wu 0.495 0.495 0.480 0.510
ww 0.483 0.483 0.483 0.483
cw 0.512 0.512 0.512 0.512
Table 7.15: Table 7.16: Derived population totals \(T\): Summary statistics for the derived posterior distributions from the unweighted and weighted-likelihood models.
median mean lower upper
pop 283,246,502 283,246,502 283,246,502 283,246,502
ru 287,007,075 287,051,257 280,441,789 293,578,137
wu 365,627,317 365,697,914 357,366,590 374,421,369
ww 286,114,506 286,113,499 285,826,808 286,398,643
cw 283,260,492 283,258,895 282,950,784 283,558,666

See Fig. 7.4 for density plots of posteriors from Tables 7.9, 7.11, and 7.13; and see Fig. 7.6 for a barplot of Table 7.15.

Figures

Posterior predicted distributions of population densities from an unweighted model and a weighted-precision model using a random sample, a population-weighted sample, and a combination sample. Dashed lines show distributions of population data from the sample of locations. Solid lines show distributions of model-based population estimates across the sample of locations. Only the weighted-precision model (Stan) results are shown here, but results were comparable with the weighted-likelihood model (Stan) and weighted-precision model (JAGS) (see supplementary files in Appendix A).

Figure 7.2: Posterior predicted distributions of population densities from an unweighted model and a weighted-precision model using a random sample, a population-weighted sample, and a combination sample. Dashed lines show distributions of population data from the sample of locations. Solid lines show distributions of model-based population estimates across the sample of locations. Only the weighted-precision model (Stan) results are shown here, but results were comparable with the weighted-likelihood model (Stan) and weighted-precision model (JAGS) (see supplementary files in Appendix A).

Posterior predicted distribution of population densities from weighted-likelihood, Stan weighted-precision, and JAGS weighted-precision models. The true population distribution includes one million locations and the weighted sample includes 2000 locations.

Figure 7.3: Posterior predicted distribution of population densities from weighted-likelihood, Stan weighted-precision, and JAGS weighted-precision models. The true population distribution includes one million locations and the weighted sample includes 2000 locations.

Parameter estimates (marginal posterior distributions) for the median, mean, and sigma from the weighted-likelihood model. The four scenarios are: (ru) random sample data with an unweighted model; (wu) weighted sample data with an unweighted model; (ww) weighted sample data with a weighted model; and (cw) combined sample data (weighted and random) with a weighted model. The vertical red line is the simulated 'true' parameter estimate.

Figure 7.4: Parameter estimates (marginal posterior distributions) for the median, mean, and sigma from the weighted-likelihood model. The four scenarios are: (ru) random sample data with an unweighted model; (wu) weighted sample data with an unweighted model; (ww) weighted sample data with a weighted model; and (cw) combined sample data (weighted and random) with a weighted model. The vertical red line is the simulated ‘true’ parameter estimate.

Parameter estimates for the median, mean, and sigma from the weighted-precision model. The four scenarios are: (ru) random sample data with an unweighted model; (wu) weighted sample data with an unweighted model; (ww) weighted sample data with a weighted model; and (cw) combined sample data (weighted and random) with a weighted model. The vertical red line is the simulated 'true' parameter estimate.

Figure 7.5: Parameter estimates for the median, mean, and sigma from the weighted-precision model. The four scenarios are: (ru) random sample data with an unweighted model; (wu) weighted sample data with an unweighted model; (ww) weighted sample data with a weighted model; and (cw) combined sample data (weighted and random) with a weighted model. The vertical red line is the simulated ‘true’ parameter estimate.

Posterior predicted population totals from a weighted-likelihood model. Population totals include one million locations from the full simulated population. The four scenarios are: (ru) random sample data with an unweighted model; (wu) weighted sample data with an unweighted model; (ww) weighted sample data with a weighted model; and (cw) combined sample data (weighted and random) with a weighted model. The horizontal line is the simulated 'true' total population size.

Figure 7.6: Posterior predicted population totals from a weighted-likelihood model. Population totals include one million locations from the full simulated population. The four scenarios are: (ru) random sample data with an unweighted model; (wu) weighted sample data with an unweighted model; (ww) weighted sample data with a weighted model; and (cw) combined sample data (weighted and random) with a weighted model. The horizontal line is the simulated ‘true’ total population size.

Posterior predicted population totals from a weighted-precision model. Population totals include one million locations from the full simulated population. The four scenarios are: (ru) random sample data with an unweighted model; (wu) weighted sample data with an unweighted model; (ww) weighted sample data with a weighted model; and (cw) combined sample data (weighted and random) with a weighted model. The horizontal line is the simulated 'true' total population size.

Figure 7.7: Posterior predicted population totals from a weighted-precision model. Population totals include one million locations from the full simulated population. The four scenarios are: (ru) random sample data with an unweighted model; (wu) weighted sample data with an unweighted model; (ww) weighted sample data with a weighted model; and (cw) combined sample data (weighted and random) with a weighted model. The horizontal line is the simulated ‘true’ total population size.

Appendix A: Supplementary Files

All code used to conduct the simulation analyses and produce results are provided in a supplementary file that is available from http://doi.org/10.5258/SOTON/WP00706. The supplemental code can be used to replicate the results in this report and to explore different simulation settings and model designs.

The supplementary file leasure2021simulation_supplement.zip is a compressed zip archive that contains three folders: scripts, models, and output.

The output folder contains simulation results for various parameterizations not presented in the report. It has sub-folders named for the script and parameters that produced each result.

The scripts folder contains three R scripts:
1. minimum example.R - A minimum example of the simulation framework and weighted models.
2. weighted3ways.R - Evaluate all three weighted models using the same simulated sample data.
3. random + weighted.R - Compare unweighted and weighted models using random, weighted, and combined samples.

The models folder contains:
1. Unweighted model (Stan)
2. Weighted-likelihood model (Stan)
3. Weighted-precision model (Stan)
4. Weighted-precision model (JAGS)

References

Boo G, Darin E, Leasure D, Dooley C, Chamberlain H, Lazar A, Tatem AJ. 2020b. Modelled gridded population estimates for the Kinshasa, Kongo-Central, Kwango, Kwilu, and Mai-Ndombe provinces in the Democratic Republic of the Congo 2018, version 2.0. WorldPop, University of Southampton. doi:10.5258/SOTON/WP00669. https://wopr.worldpop.org/?COD/population/v2.0.
Denwood MJ. 2016a. runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. Journal of Statistical Software 71:1–25. doi:10.18637/jss.v071.i09.
Gelman A. 2007a. Struggles with survey weighting and regression modeling. Statistical Science 22:153–164. doi:10.1214/088342306000000691.
Gelman A, Rubin DB. 1992. Inference from iterative simulation using multiple sequences. Statistical science 7:457–472. doi:10.1214/ss/1177011136.
Leasure DR, Jochem WC, Weber EM, Seaman V, Tatem AJ. 2020d. National population mapping from sparse survey data: A hierarchical bayesian modeling framework to account for uncertainty. Proceedings of the National Academy of Sciences 201913050.
Leasure DR, Tatem AJ. 2020a. A bayesian approach to produce 100 m gridded population estimates using census microdata and recent building footprints. WorldPop, University of Southampton. doi:10.5258/SOTON/WP00686.
Minnesota Population Center. 2019. Integrated Public Use Microdata Series, International: Version 7.2. Minneapolis, MN: IPUMS.
Plummer M. 2003a. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In: Proceedings of the 3rd international workshop on distributed statistical computing. Vienna, Austria., 1–10. http://mcmc-jags.sourceforge.net/.
R Core Team. 2020a. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Stan Development Team. 2019a. Stan User’s Guide. https://mc-stan.org/docs/2_26/stan-users-guide.
Stan Development Team. 2019b. Stan Reference Manual. https://mc-stan.org/docs/2_26/reference-manual/increment-log-prob-section.html.
Stan Development Team. 2020. RStan: the R interface to Stan. R package version 2.19.3. http://mc-stan.org/.
WorldPop. 2020b. Bottom-up gridded population estimates for the Kinshasa, Kongo-Central, Kwango, Kwilu, and Mai-Ndombe provinces in the Democratic Republic of the Congo, version 1.0. WorldPop, University of Southampton. doi:10.5258/SOTON/WP00658. https://wopr.worldpop.org/?COD/population/v1.0.
WorldPop. 2020a. Bottom-up gridded population estimates for Zambia, version 1.0. WorldPop, University of Southampton. doi:10.5258/SOTON/WP00662. https://wopr.worldpop.org/?ZMB/population/v1.0.