4 Settlement Data and Geospatial Covariates
4.1 Settlement Data
Settlement data plays a pivotal role in gridded population modelling as it provides critical information about the distribution and characteristics of human settlements which are key in population modelling (Nieves et al. 2017, Tiecke et al. 2017, Leasure et al. 2020e). Various types of settlement data are available to produce gridded population estimates, each offering unique insights into the spatial and demographic patterns of human populations. Some examples of settlement data includes building footprints (Chamberlain et al. 2023), landuse/land cover (Pandey et al. 2021), built-up areas, and residential and non-residential classifications (Lloyd et al. 2017a).Recently, the advancement in highly detailed satellite imagery with high spatial resolution, along with enhancements in computer capabilities and algorithms, has facilitated the ability to automatically extract building footprints from satellite imagery (Gavankar & Ghosh 2018, Li et al. 2019, Liu et al. 2019). A building footprint is a polygon or set of polygons that defines the geometry such as the shape, size, height, or perimeter of a given building. These building footprints are available globally, making them an important source of settlement information for gridded population estimation.
Although remote sensing and satellite imagery provide most of the available building footprints in a variety of formats and resolutions, National Statistical Offices (NSOs) of various countries can also provide detailed settlement information collected through national surveys and questionnaires that may define building locations and other attributes such as the use and purpose of the building. In addition, volunteered geographic information (VGI), such as Open Street Map, provides an easy alternative source of building data, generally derived through manual digitisation of imagery or by incorporating open building datasets. Together, these different types of settlement data help researchers at WorldPop to understand the dynamics of human populations, their spatial distribution, and how best to incorporate such settlement data into gridded population estimation.
Settlement data may come in a variety of spatial formats such as a raster dataset or a polygon dataset depending on the data source. Common spatial data format for settlement data is a polygon shapefile which defines the perimeter or footprint extent of the buildings. Building footprint derived from satellite imagery can be converted to a high-resolution raster format and different building metrics such as count of buildings, building area, building perimeter and building height can be calculated from the available building footprint (Jochem & Tatem 2021).
4.2 Key Settlement Data Sources
Building Footprints – WorldPop have used building footprints from a variety of sources for population modelling. These include the following:
1.Maxar & Ecopia Building Footprint: This has been the most frequently used settlement data source by WorldPop for gridded population estimations. This dataset is the result of a partnership between Maxar Technologies, a satellite imagery company, and Ecopia AI, a geospatial data analytics company that specialises in the extraction of building footprints from remote sensing data. Through an initiative called Digitising Sub-Saharan Africa, Ecopia has extracted every building and road in 51 African countries and has made the data available for use in humanitarian response efforts across the continent with support from the Bill & Melinda Gates Foundation. More information on Digitising Sub-Saharan Africa initiative can be found here Digitising Sub-Saharan Africa. This dataset has also been made available to WorldPop and has been widely used in population modelling for many Sub-Saharan countries. The building footprints are provided in the format of polygon shapefiles that defines the geometry of a given building. The first version of the Maxar-Ecopia building footprint dataset for Sub-Saharan Africa, version 1, was released in 2020. Recently, version 2 of the footprint has also been released. Maxar and Ecopia building footprints are however a closed source data and users require a license to access this dataset.
2.Google Open Buildings: The Google building footprint dataset is produced by Google and available at Google Open Buildings. This building footprint is open-access and encompass 1.8 billion building detections, spanning regions in Africa, South Asia, South-East Asia, Latin America, and the Caribbean (Google 2023). The first version of this dataset, version 1, was released in 2021 and exclusively covers African countries. Version 2 of the building footprint was also released the same year, with coverage expanded to include South and South-East Asian countries. In 2023, google released version 3 of the dataset further broadening the building footprint coverage to encompass Latin America and the Caribbean regions (Google 2023). The dataset is provided as a polygon shapefile but can be converted to raster format for processing at different spatial resolutions. The combination of open access availability and comprehensive geographical coverage makes this building footprint data source invaluable for both in-depth analysis of settlement morphologies across countries, and in facilitating population modelling efforts, ultimately supporting evidence-based decision-making.
3.Global Microsoft building footprint: The global Microsoft’s building footprint dataset is produced by Microsoft and was released in 2022. The data can be accessed at GlobalMLBuildingFootprints and comes in GeoJSON format, which can be converted to other spatial formats such as a vector or raster file. This building footprint covers more than 1.2 billion buildings worldwide, covering various regions such as Africa, Caribbean nations, Europe, the Middle East, and South Asian countries (Microsoft 2022). However, the dataset has limited spatial coverage in Latin American countries, as shown in these coverage maps.
4.World Settlement Footprint Layers: The World Settlement Footprint (WSF) is a joint project between the European Space Agency (ESA) and the German Aerospace Center (DLR), in collaboration with the Google Earth Engine team(Agency 2021). The WSF consist of different settlement products including the World Settlement Footprints (Versions 2 and 3), World Settlement Footprint Evolution, World Settlement Footprint 3D, and the Global Urban Footprint (Versions 1 and 2). The World Settlement Footprints, both Versions 2 and 3, consist of a binary grid raster data with a 10-metre resolution, indicating the presence or absence of human settlements. Version 2 was initially generated in 2015, followed by an updated version in 2019 (version 2). The World Settlement Footprint 3D (WSF 3D), produced in 2021 offers a detailed quantification of average height, total volume and total area of buildings at a 90-metre resolution globally. For temporal analysis, the World Settlement Footprint Evolution (WSF-Evolution) dataset outlines global settlement changes on a yearly basis from 1985 to 2015. This data is produced at 30 metre resolution. The Global Urban Footprint Layer, available in both Versions 1 and 2, offers a 12-metre-resolution binary raster layer, distinguishing between built-up areas and non-built-up surfaces. Version 1 of the Global Urban Footprint was released in 2014 and the version 2 was released in 2016 (German Aerospace Center(DLR), 2018). These diverse settlement products offer unique insights into global settlement patterns and serve as valuable data sources for population modelling. Importantly, these datasets are open access, allowing free data downloads for any country worldwide. For more detailed information on these settlement products, visit geoservice maps.
5.Global Human Settlement Layers: The Global Human Settlement Layers are produced by the European Commission Joint Research Centre (JRC) in collaboration with other agencies, such as the European Space Agency (ESA). This dataset offers a comprehensive open-source settlement product, encompassing a wide range of settlement layers such as built-up surface area, building height, and built-up volumes. The Global Human Settlement Layers are provided in raster format, with global coverage and available at different resolutions depending on the settlement product type. These datasets are open source and can be downloaded for any country globally. Further information on these settlement products and how to download them can be found here: GHSL Layers. (Florczyk et al. 2019) also provides a comprehensive description of these settlement products.
4.3 Use of Settlement Data in Population Modelling
Settlement data is used in modelling grided population in a variety of ways:
Settlement data such as building count or settled area are key inputs in bottom-up gridded population estimation (see Settlement Map). For example, in a bottom-up population modelling approach where building count or settled area is used as input data, the predicted gridded population count is obtained as a product of the predicted density and the building count or settled area (see chapter 5) for more details on bottom-up population modelling.
Settlement data is also used as a constraint layer in both top-down and bottom-up population modelling. In producing high-resolution population counts, settlement data can also be used to constrain the predictions to only locations where we have settlement information to avoid making predictions at unsettled locations. Thus, settlement data serve as a proxy to extrapolate population totals for a given location.
Settlement data may be used as covariates to account for variabilities in population density for both bottom-up and top-down population modelling. Building footprint metrics such as building count, perimeter, area, and height for example, are often included as geospatial covariates in gridded population input for the modelling.
Settlement data can also be classified in a hierarchical pattern defining settlement typologies such as rural, urban, cities, towns, and villages, providing a broad zone of human population density and settlement morphology in a given location (Florczyk et al. 2019, Jochem et al. 2021a).These settlement classifications may be incorporated into bottom-up gridded population estimation methods as fixed or random effects to account for variabilities in population density attributed to the respective settlement classifications.
4.4 Geospatial Covariates
Geospatial covariates are high-resolution raster layers used in geospatial or spatial statistical analysis to explain, model, or predict a particular phenomenon or spatial pattern. At the core of geospatial covariates is a location attribute that distinguishes it from any other dataset. Geospatial covariates have a location attribute defining the exact position or location of the dataset on the surface of the earth. Examples of geospatial covariates include land use and land cover data, settlement layers, environmental and climatic variables, natural features, and socioeconomic variables. These geospatial covariates can be obtained from a variety of sources, such as satellite imagery, aerial photography, census or survey data and weather data.
Geospatial covariates play important role in gridded population estimation, allowing us to model variations in population distribution and population density. There are a wide variety of geospatial covariates that can be included in a gridded population estimation to inform the distribution of population patterns. Whether a given geospatial covariate is used in a gridded population estimation or not is determined by whether they have a significant correlation with variation in population distribution, their availability across the entire study location, and whether they can be mapped accurately as a high-resolution geospatial layer. Some covariates may not be in the form of a gridded raster layer and will require pre-processing - using various interpolation or kriging methods, such as inverse distance weighting, to convert them into a high-resolution geospatial layers for use in the modelling. WorldPop Open Population Repository WOPR has a wide variety of geospatial covariates processed to a 100m resolution that can readily be downloaded for population and other geospatial modelling work.
Covariates selected for population modelling may have different units of measurement. As such, the covariates will need to be standardised using a statistical method to ensure all covariates are on the same scale. Furthermore, the modelling process may also involve a large number of geospatial covariates, requiring the use of appropriate statistical methods such as a stepwise selection process or pairwise correlation analysis to select the appropriate geospatial covariates with significant associations with the population distribution in the modelling. A large number of geospatial covariates imposes a computational cost in building the model and making predictions at unsampled locations.
Contribution
This chapter was written by Ortis Yankey