The foot package was developed by WorldPop at the University of Southampton (www.worldpop.org) to support geometric calculations and summaries of measures from building footprint polygons. This vignette demonstrates how users can extend the basic functionality of calculate_footstats and calculate_bigfoot by making and supplying their own functions to summarise footprint characteristics. For an introduction to the package, see vignette("footsteps").

Calculations with foot

As noted in the introductory vignettes, foot primarily uses calculate_footstats to calculate and summarise metrics. Internally this function uses data.table in order to handle large sets of building footprints and efficiently summarise them. The attributes to be summarised (what) are supplied to function names (how). These internal structures also allow for user-defined functions to be specified.

Data preparation

To demonstrate using custom functions, we will first add some additional attribute “data” to the footprints which we will use.

data("kampala", package = "foot")

buildings <- kampala$buildings
adminzones <- kampala$adminZones

# Adding random data categorical variable
buildings$category <- sample(LETTERS[1:5], size = nrow(buildings), replace = T)
# continuous variable
buildings$mult <- sample(rnorm(nrow(buildings), mean = 10, sd = 2))

We can use any attributes of the footprints within calculate_footstats and calculate_bigfoot, not only the built-in morphology measures listed by list_fs(what='all').

# summarising a new data value
calculate_footstats(buildings,
                    adminzones,
                    what="mult", # new attribute to summarise
                    how="mean",
                    verbose=F)
#>     zoneID mult_mean
#>  1:      1  9.947260
#>  2:      2 10.078251
#>  3:      3  9.952500
#>  4:      4  9.476538
#>  5:      5  9.873005
#>  6:      6 10.322830
#>  7:      7 10.454661
#>  8:      8  9.762317
#>  9:      9 10.013834
#> 10:     10 10.349565
#> 11:     11  9.913699
#> 12:     12  9.510073
#> 13:     13  9.942433
#> 14:     14 10.013817
#> 15:     15 10.069056
#> 16:     16 10.072072
#> 17:     17 10.386338
#> 18:     18  9.989056
#> 19:     19 10.016845
#> 20:     20 10.039145
#> 21:     21  9.885370
#> 22:     22 10.627935
#> 23:     23  9.637838
#> 24:     24  9.908850
#> 25:     25 10.615264
#> 26:     26 10.162614
#> 27:     27  9.889993
#> 28:     29  9.994113
#> 29:     30  9.920652
#> 30:     31 12.489383
#> 31:     32  9.976741
#> 32:     33  9.882411
#> 33:     34 10.018653
#>     zoneID mult_mean

Additional built-in functions

The internal foot functions are documented in ?fs_functions; however, these functions are intended to be used within the wrapper functions of foot and are rarely intended to be used as standalone functions. One built-in summary function, not applied by default, is majority. It is designed to summarise categorical data. This function is available for users in the same manner of specifying the how argument.

# get the majority category in each zone
calculate_footstats(buildings, 
                    adminzones, 
                    what="category", 
                    how="majority", 
                    verbose=F)
#>     zoneID category_majority
#>  1:      1                 D
#>  2:      2                 E
#>  3:      3                 A
#>  4:      4                 B
#>  5:      5                 B
#>  6:      6                 A
#>  7:      7                 C
#>  8:      8                 A
#>  9:      9                 C
#> 10:     10                 D
#> 11:     11                 D
#> 12:     12                 C
#> 13:     13                 D
#> 14:     14                 B
#> 15:     15                 D
#> 16:     16                 B
#> 17:     17                 C
#> 18:     18                 A
#> 19:     19                 B
#> 20:     20                 B
#> 21:     21                 D
#> 22:     22                 E
#> 23:     23                 D
#> 24:     24                 C
#> 25:     25                 C
#> 26:     26                 C
#> 27:     27                 B
#> 28:     29                 C
#> 29:     30                 D
#> 30:     31                 B
#> 31:     32                 E
#> 32:     33                 A
#> 33:     34                 C
#>     zoneID category_majority

The majority function is similar to the idea of supplying a user-defined function which is demonstrated in the next section.

User-defined summary functions

Creating functions for use with foot follows the same procedures and syntax for functions in R in general. They must be declared with <- function() and they must be available within the environment where foot functions are being used. When the functions are used internally by calculate_footstats, they are applied to footprints by zone index. Therefore they should return a single, summary value since the function for that group of footprints in the zone.

The name of the function is what is passed to foot as an argument to how. The argument(s) to the custom function can be named anything, but they will typically be values present within the footprint attributes to be summarised.

The example below shows a simple function that calculates the sum of the square root of the values. We will apply it to ‘area’, and foot will automatically pre-calculate this characteristic since it is not present in the column names of the footprints.

# simple function example 1
f1 <- function(v){
  units(v) <- NULL # ignore units in our function
  return(sum(sqrt(v)))
}

# applying custom summary function to area
calculate_footstats(buildings,
                    adminzones,
                    what="area", how="f1",
                    verbose=F)
#>     zoneID    area_f1
#>  1:      1  365.00586
#>  2:      2 1827.32853
#>  3:      3  873.59117
#>  4:      4 1970.96031
#>  5:      5  869.86802
#>  6:      6  824.36335
#>  7:      7  486.75006
#>  8:      8 1197.14781
#>  9:      9 3055.03307
#> 10:     10  386.10669
#> 11:     11 1407.28115
#> 12:     12  295.43981
#> 13:     13   73.96388
#> 14:     14  731.89385
#> 15:     15 2019.18463
#> 16:     16 1156.94489
#> 17:     17 1136.03871
#> 18:     18 3480.35645
#> 19:     19 5918.73046
#> 20:     20 5274.46461
#> 21:     21 1000.86861
#> 22:     22  723.99592
#> 23:     23 3448.71743
#> 24:     24 3890.88401
#> 25:     25  436.64536
#> 26:     26 3434.11917
#> 27:     27 2288.60545
#> 28:     29 1914.86276
#> 29:     30  991.99424
#> 30:     31   11.94884
#> 31:     32 1917.43260
#> 32:     33  310.27950
#> 33:     34 9861.36237
#>     zoneID    area_f1

Although this function was just used to process area, the function can be used for any continuous value. It can also be used on multiple characteristics or combined with other lists of functions, just like any other built-in function in foot.

calculate_footstats(buildings,
                    adminzones,
                    what=list("area","perimeter"), how="f1",
                    verbose=F)
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
#> WARNING: different compile-time and runtime versions for GEOS found:
#> Linked against: 3.9.0-CAPI-1.16.2 compiled against: 3.8.1-CAPI-1.13.3
#> It is probably a good idea to reinstall sf, and maybe rgeos and rgdal too
#>     zoneID    area_f1 perimeter_f1
#>  1:      1  365.00586   183.488651
#>  2:      2 1827.32853  1048.879091
#>  3:      3  873.59117   426.068457
#>  4:      4 1970.96031   968.615609
#>  5:      5  869.86802   412.069737
#>  6:      6  824.36335   337.518569
#>  7:      7  486.75006   171.066602
#>  8:      8 1197.14781   515.464525
#>  9:      9 3055.03307  1911.775834
#> 10:     10  386.10669   226.579855
#> 11:     11 1407.28115   857.273283
#> 12:     12  295.43981   214.216882
#> 13:     13   73.96388    35.239420
#> 14:     14  731.89385   500.492421
#> 15:     15 2019.18463  1407.986177
#> 16:     16 1156.94489   835.883724
#> 17:     17 1136.03871   716.240827
#> 18:     18 3480.35645  2477.699320
#> 19:     19 5918.73046  3199.202606
#> 20:     20 5274.46461  2994.324208
#> 21:     21 1000.86861   552.056838
#> 22:     22  723.99592   400.124081
#> 23:     23 3448.71743  1548.754569
#> 24:     24 3890.88401  2167.419552
#> 25:     25  436.64536   207.129346
#> 26:     26 3434.11917  1360.146927
#> 27:     27 2288.60545  1300.818322
#> 28:     29 1914.86276  1117.788513
#> 29:     30  991.99424   569.928890
#> 30:     31   11.94884     6.938055
#> 31:     32 1917.43260  1042.133260
#> 32:     33  310.27950   155.175160
#> 33:     34 9861.36237  6005.424628
#>     zoneID    area_f1 perimeter_f1

Functions with multiple arguments

In some instances, a custom function may need to make use of two or more characteristics from within the building footprint datasets. The built-in functions in foot are primarily designed to work with a single value (e.g. area or perimeter).

While it may sometimes be quicker to pre-calculate the combination, it could be advantageous to use a function, particularly in calculate_bigfoot where smaller subsets of a large building footprint dataset are processed. To make sure multiple attributes are supplied to the summary function, the arguments in what should be specified using a special type (fs_varlist). The fs_varlist creates a nested list within the internal processing to keep the argument together. Keep in mind that the arguments are passed to the summary function by position, not be name, so the order within fs_varlist must match the order of parameters that the function is expecting.

Creating a Perimeter/Area ratio

An example of a custom function using two characteristics is the average perimeter-area ratio. We can compare this to the built-in function which uses a Polsby-Popper metric (fs_compact).

# average perimeter-area ratio
pa <- function(p, a){
  return(mean(p / a))
}

# used to summarise within zones
# note that fs_varlist is still within a list
calculate_footstats(buildings,
                    adminzones,
                    what=list(list("compact"), fs_varlist("perimeter","area")),
                    how=list(list("mean"), list("pa")),
                    verbose=T
                   )
#> Selecting metrics 
#> Setting control values. 
#> Pre-calculating areas 
#> Pre-calculating perimeters 
#> Pre-calculating compactness 
#> Creating zonal index 
#> 
#> Calculating  2  metrics ... 
#>    compact mean  
#>    perimeter area pa  
#> Finished calculating metrics.
#>     zoneID compact_mean perimeter_area_pa
#>  1:      1    0.6018474   0.3661908 [1/m]
#>  2:      2    0.6604745   0.4291126 [1/m]
#>  3:      3    0.6154374   0.3848159 [1/m]
#>  4:      4    0.6300735   0.3979379 [1/m]
#>  5:      5    0.6080815   0.4014357 [1/m]
#>  6:      6    0.6283676   0.3119084 [1/m]
#>  7:      7    0.5472201   0.2687669 [1/m]
#>  8:      8    0.5541688   0.3042632 [1/m]
#>  9:      9    0.6141427   0.5018987 [1/m]
#> 10:     10    0.6247686   0.5062606 [1/m]
#> 11:     11    0.5803702   0.4672657 [1/m]
#> 12:     12    0.6620628   0.6440226 [1/m]
#> 13:     13    0.5655211   0.5569546 [1/m]
#> 14:     14    0.7154767   0.4860048 [1/m]
#> 15:     15    0.7279642   0.5282777 [1/m]
#> 16:     16    0.7013814   0.5653320 [1/m]
#> 17:     17    0.6548002   0.5064319 [1/m]
#> 18:     18    0.6900773   0.6217822 [1/m]
#> 19:     19    0.6716734   0.3503747 [1/m]
#> 20:     20    0.6841125   0.3815382 [1/m]
#> 21:     21    0.6957138   0.3679120 [1/m]
#> 22:     22    0.7145771   0.3238091 [1/m]
#> 23:     23    0.6178726   0.3983421 [1/m]
#> 24:     24    0.6712216   0.3692950 [1/m]
#> 25:     25    0.6332470   0.3589913 [1/m]
#> 26:     26    0.6146447   0.3022588 [1/m]
#> 27:     27    0.6862459   0.3886627 [1/m]
#> 28:     29    0.6646580   0.4497591 [1/m]
#> 29:     30    0.6477064   0.3973210 [1/m]
#> 30:     31    0.7743018   0.3371508 [1/m]
#> 31:     32    0.6386669   0.3898653 [1/m]
#> 32:     33    0.5873562   0.3436390 [1/m]
#> 33:     34    0.6441987   0.4537800 [1/m]
#>     zoneID compact_mean perimeter_area_pa

Accessing R objects other than the footprints

A more complicated scenario exists when a user-defined function needs to access data which is not an attribute of the footprints dataset. In order to access the non-footprint data, a partial function must be created first and then supplied to calculation function.

In the example below, a simple constant value is supplied to a summary function; however, the idea extends to any object in the R environment. This process is how the nearest neighbour index is calculated in foot by drawing on the spatial zones object as well as the footprints.

# external "data"
d1 <- 0.001

# This will NOT work because argument 'd' is not found
# f2 <- function(x, d){
#   return(sum(d * x))
# }
# 
# calculate_footstats(buildings, adminzones, what="area", how="f2", verbose=T)

# Instead...
# example of creating a partial function
gen_f3 <- function(d){
  force(d) # must include
  function(x){
    return(sum(d * x))
  }
}

# generate the function and initialise it with `d1` from above.
f3 <- gen_f3(d1)

# this now uses the generated function, and `d` is found
calculate_footstats(buildings, 
                    adminzones, 
                    what="area", 
                    how="f3", 
                    verbose=F
                   )
#>     zoneID           area_f3
#>  1:      1   8.4925307 [m^2]
#>  2:      2  29.8921595 [m^2]
#>  3:      3  25.8441923 [m^2]
#>  4:      4  64.1223606 [m^2]
#>  5:      5  26.8496529 [m^2]
#>  6:      6  35.9289675 [m^2]
#>  7:      7  28.5153292 [m^2]
#>  8:      8  44.7039953 [m^2]
#>  9:      9  48.3619770 [m^2]
#> 10:     10   7.5938634 [m^2]
#> 11:     11  24.3211016 [m^2]
#> 12:     12   3.1237687 [m^2]
#> 13:     13   2.6583596 [m^2]
#> 14:     14   6.7422042 [m^2]
#> 15:     15  19.3132428 [m^2]
#> 16:     16   9.9446881 [m^2]
#> 17:     17  14.8188207 [m^2]
#> 18:     18  37.0611108 [m^2]
#> 19:     19  96.9233173 [m^2]
#> 20:     20  80.7164561 [m^2]
#> 21:     21  18.1444417 [m^2]
#> 22:     22  10.2427539 [m^2]
#> 23:     23 133.2688176 [m^2]
#> 24:     24  62.1124473 [m^2]
#> 25:     25  11.6953775 [m^2]
#> 26:     26 150.8832148 [m^2]
#> 27:     27  33.9686861 [m^2]
#> 28:     29  37.1740182 [m^2]
#> 29:     30  15.3446975 [m^2]
#> 30:     31   0.1427747 [m^2]
#> 31:     32  35.8933028 [m^2]
#> 32:     33   7.1492461 [m^2]
#> 33:     34 147.5949907 [m^2]
#>     zoneID           area_f3

In this vignette, the foot package has been extended to incorporate user-defined functions. These functions can use one or more values from within the footprints, or even access other objects in the environment. While the examples used calculate_footstats, the same approaches can be used to create new gridded summary metrics with calculate_bigfoot.


sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
#>  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
#>  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] sf_1.0-0 foot_0.8
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.1.0   xfun_0.27          bslib_0.2.4        purrr_0.3.4       
#>  [5] vctrs_0.3.8        generics_0.1.0     stars_0.5-3        htmltools_0.5.2   
#>  [9] s2_1.0.6           yaml_2.2.1         utf8_1.2.2         rlang_0.4.12      
#> [13] e1071_1.7-7        pkgdown_1.6.1      jquerylib_0.1.3    pillar_1.6.4      
#> [17] glue_1.4.2         DBI_1.1.1          wk_0.4.1           foreach_1.5.1     
#> [21] lifecycle_1.0.1    stringr_1.4.0      ragg_1.1.3         codetools_0.2-18  
#> [25] memoise_2.0.0      evaluate_0.14      knitr_1.36         fastmap_1.1.0     
#> [29] doParallel_1.0.16  parallel_4.1.1     class_7.3-19       fansi_0.5.0       
#> [33] Rcpp_1.0.7         KernSmooth_2.23-20 classInt_0.4-3     filelock_1.0.2    
#> [37] formatR_1.11       lwgeom_0.2-6       cachem_1.0.6       desc_1.4.0        
#> [41] jsonlite_1.7.2     abind_1.4-5        systemfonts_1.0.3  fs_1.5.0          
#> [45] textshaping_0.3.5  digest_0.6.28      stringi_1.7.5      dplyr_1.0.5       
#> [49] rprojroot_2.0.2    grid_4.1.1         tools_4.1.1        magrittr_2.0.1    
#> [53] sass_0.3.1         proxy_0.4-26       tibble_3.1.5       pkgconfig_2.0.3   
#> [57] crayon_1.4.2       ellipsis_0.3.2     data.table_1.14.2  rmarkdown_2.11    
#> [61] iterators_1.0.13   R6_2.5.1           units_0.7-2        compiler_4.1.1