The foot package was developed by WorldPop at the University of Southampton (www.worldpop.org) to support geometric calculations and summaries of measures from building footprint polygons. This vignette demonstrates how users can extend the basic functionality of calculate_footstats and calculate_bigfoot by making and supplying their own functions to summarise footprint characteristics. For an introduction to the package, see vignette("footsteps").
foot
As noted in the introductory vignettes, foot primarily uses calculate_footstats to calculate and summarise metrics. Internally this function uses data.table in order to handle large sets of building footprints and efficiently summarise them. The attributes to be summarised (what) are supplied to function names (how). These internal structures also allow for user-defined functions to be specified.
To demonstrate using custom functions, we will first add some additional attribute “data” to the footprints which we will use.
data("kampala", package = "foot")
buildings <- kampala$buildings
adminzones <- kampala$adminZones
# Adding random data categorical variable
buildings$category <- sample(LETTERS[1:5], size = nrow(buildings), replace = T)
# continuous variable
buildings$mult <- sample(rnorm(nrow(buildings), mean = 10, sd = 2))We can use any attributes of the footprints within calculate_footstats and calculate_bigfoot, not only the built-in morphology measures listed by list_fs(what='all').
# summarising a new data value
calculate_footstats(buildings,
adminzones,
what="mult", # new attribute to summarise
how="mean",
verbose=F)
#> zoneID mult_mean
#> 1: 1 9.947260
#> 2: 2 10.078251
#> 3: 3 9.952500
#> 4: 4 9.476538
#> 5: 5 9.873005
#> 6: 6 10.322830
#> 7: 7 10.454661
#> 8: 8 9.762317
#> 9: 9 10.013834
#> 10: 10 10.349565
#> 11: 11 9.913699
#> 12: 12 9.510073
#> 13: 13 9.942433
#> 14: 14 10.013817
#> 15: 15 10.069056
#> 16: 16 10.072072
#> 17: 17 10.386338
#> 18: 18 9.989056
#> 19: 19 10.016845
#> 20: 20 10.039145
#> 21: 21 9.885370
#> 22: 22 10.627935
#> 23: 23 9.637838
#> 24: 24 9.908850
#> 25: 25 10.615264
#> 26: 26 10.162614
#> 27: 27 9.889993
#> 28: 29 9.994113
#> 29: 30 9.920652
#> 30: 31 12.489383
#> 31: 32 9.976741
#> 32: 33 9.882411
#> 33: 34 10.018653
#> zoneID mult_meanThe internal foot functions are documented in ?fs_functions; however, these functions are intended to be used within the wrapper functions of foot and are rarely intended to be used as standalone functions. One built-in summary function, not applied by default, is majority. It is designed to summarise categorical data. This function is available for users in the same manner of specifying the how argument.
# get the majority category in each zone
calculate_footstats(buildings,
adminzones,
what="category",
how="majority",
verbose=F)
#> zoneID category_majority
#> 1: 1 D
#> 2: 2 E
#> 3: 3 A
#> 4: 4 B
#> 5: 5 B
#> 6: 6 A
#> 7: 7 C
#> 8: 8 A
#> 9: 9 C
#> 10: 10 D
#> 11: 11 D
#> 12: 12 C
#> 13: 13 D
#> 14: 14 B
#> 15: 15 D
#> 16: 16 B
#> 17: 17 C
#> 18: 18 A
#> 19: 19 B
#> 20: 20 B
#> 21: 21 D
#> 22: 22 E
#> 23: 23 D
#> 24: 24 C
#> 25: 25 C
#> 26: 26 C
#> 27: 27 B
#> 28: 29 C
#> 29: 30 D
#> 30: 31 B
#> 31: 32 E
#> 32: 33 A
#> 33: 34 C
#> zoneID category_majorityThe majority function is similar to the idea of supplying a user-defined function which is demonstrated in the next section.
Creating functions for use with foot follows the same procedures and syntax for functions in R in general. They must be declared with <- function() and they must be available within the environment where foot functions are being used. When the functions are used internally by calculate_footstats, they are applied to footprints by zone index. Therefore they should return a single, summary value since the function for that group of footprints in the zone.
The name of the function is what is passed to foot as an argument to how. The argument(s) to the custom function can be named anything, but they will typically be values present within the footprint attributes to be summarised.
The example below shows a simple function that calculates the sum of the square root of the values. We will apply it to ‘area’, and foot will automatically pre-calculate this characteristic since it is not present in the column names of the footprints.
# simple function example 1
f1 <- function(v){
units(v) <- NULL # ignore units in our function
return(sum(sqrt(v)))
}
# applying custom summary function to area
calculate_footstats(buildings,
adminzones,
what="area", how="f1",
verbose=F)
#> zoneID area_f1
#> 1: 1 365.00586
#> 2: 2 1827.32853
#> 3: 3 873.59117
#> 4: 4 1970.96031
#> 5: 5 869.86802
#> 6: 6 824.36335
#> 7: 7 486.75006
#> 8: 8 1197.14781
#> 9: 9 3055.03307
#> 10: 10 386.10669
#> 11: 11 1407.28115
#> 12: 12 295.43981
#> 13: 13 73.96388
#> 14: 14 731.89385
#> 15: 15 2019.18463
#> 16: 16 1156.94489
#> 17: 17 1136.03871
#> 18: 18 3480.35645
#> 19: 19 5918.73046
#> 20: 20 5274.46461
#> 21: 21 1000.86861
#> 22: 22 723.99592
#> 23: 23 3448.71743
#> 24: 24 3890.88401
#> 25: 25 436.64536
#> 26: 26 3434.11917
#> 27: 27 2288.60545
#> 28: 29 1914.86276
#> 29: 30 991.99424
#> 30: 31 11.94884
#> 31: 32 1917.43260
#> 32: 33 310.27950
#> 33: 34 9861.36237
#> zoneID area_f1Although this function was just used to process area, the function can be used for any continuous value. It can also be used on multiple characteristics or combined with other lists of functions, just like any other built-in function in foot.
calculate_footstats(buildings,
adminzones,
what=list("area","perimeter"), how="f1",
verbose=F)
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
#> WARNING: different compile-time and runtime versions for GEOS found:
#> Linked against: 3.9.0-CAPI-1.16.2 compiled against: 3.8.1-CAPI-1.13.3
#> It is probably a good idea to reinstall sf, and maybe rgeos and rgdal too
#> zoneID area_f1 perimeter_f1
#> 1: 1 365.00586 183.488651
#> 2: 2 1827.32853 1048.879091
#> 3: 3 873.59117 426.068457
#> 4: 4 1970.96031 968.615609
#> 5: 5 869.86802 412.069737
#> 6: 6 824.36335 337.518569
#> 7: 7 486.75006 171.066602
#> 8: 8 1197.14781 515.464525
#> 9: 9 3055.03307 1911.775834
#> 10: 10 386.10669 226.579855
#> 11: 11 1407.28115 857.273283
#> 12: 12 295.43981 214.216882
#> 13: 13 73.96388 35.239420
#> 14: 14 731.89385 500.492421
#> 15: 15 2019.18463 1407.986177
#> 16: 16 1156.94489 835.883724
#> 17: 17 1136.03871 716.240827
#> 18: 18 3480.35645 2477.699320
#> 19: 19 5918.73046 3199.202606
#> 20: 20 5274.46461 2994.324208
#> 21: 21 1000.86861 552.056838
#> 22: 22 723.99592 400.124081
#> 23: 23 3448.71743 1548.754569
#> 24: 24 3890.88401 2167.419552
#> 25: 25 436.64536 207.129346
#> 26: 26 3434.11917 1360.146927
#> 27: 27 2288.60545 1300.818322
#> 28: 29 1914.86276 1117.788513
#> 29: 30 991.99424 569.928890
#> 30: 31 11.94884 6.938055
#> 31: 32 1917.43260 1042.133260
#> 32: 33 310.27950 155.175160
#> 33: 34 9861.36237 6005.424628
#> zoneID area_f1 perimeter_f1In some instances, a custom function may need to make use of two or more characteristics from within the building footprint datasets. The built-in functions in foot are primarily designed to work with a single value (e.g. area or perimeter).
While it may sometimes be quicker to pre-calculate the combination, it could be advantageous to use a function, particularly in calculate_bigfoot where smaller subsets of a large building footprint dataset are processed. To make sure multiple attributes are supplied to the summary function, the arguments in what should be specified using a special type (fs_varlist). The fs_varlist creates a nested list within the internal processing to keep the argument together. Keep in mind that the arguments are passed to the summary function by position, not be name, so the order within fs_varlist must match the order of parameters that the function is expecting.
An example of a custom function using two characteristics is the average perimeter-area ratio. We can compare this to the built-in function which uses a Polsby-Popper metric (fs_compact).
# average perimeter-area ratio
pa <- function(p, a){
return(mean(p / a))
}
# used to summarise within zones
# note that fs_varlist is still within a list
calculate_footstats(buildings,
adminzones,
what=list(list("compact"), fs_varlist("perimeter","area")),
how=list(list("mean"), list("pa")),
verbose=T
)
#> Selecting metrics
#> Setting control values.
#> Pre-calculating areas
#> Pre-calculating perimeters
#> Pre-calculating compactness
#> Creating zonal index
#>
#> Calculating 2 metrics ...
#> compact mean
#> perimeter area pa
#> Finished calculating metrics.
#> zoneID compact_mean perimeter_area_pa
#> 1: 1 0.6018474 0.3661908 [1/m]
#> 2: 2 0.6604745 0.4291126 [1/m]
#> 3: 3 0.6154374 0.3848159 [1/m]
#> 4: 4 0.6300735 0.3979379 [1/m]
#> 5: 5 0.6080815 0.4014357 [1/m]
#> 6: 6 0.6283676 0.3119084 [1/m]
#> 7: 7 0.5472201 0.2687669 [1/m]
#> 8: 8 0.5541688 0.3042632 [1/m]
#> 9: 9 0.6141427 0.5018987 [1/m]
#> 10: 10 0.6247686 0.5062606 [1/m]
#> 11: 11 0.5803702 0.4672657 [1/m]
#> 12: 12 0.6620628 0.6440226 [1/m]
#> 13: 13 0.5655211 0.5569546 [1/m]
#> 14: 14 0.7154767 0.4860048 [1/m]
#> 15: 15 0.7279642 0.5282777 [1/m]
#> 16: 16 0.7013814 0.5653320 [1/m]
#> 17: 17 0.6548002 0.5064319 [1/m]
#> 18: 18 0.6900773 0.6217822 [1/m]
#> 19: 19 0.6716734 0.3503747 [1/m]
#> 20: 20 0.6841125 0.3815382 [1/m]
#> 21: 21 0.6957138 0.3679120 [1/m]
#> 22: 22 0.7145771 0.3238091 [1/m]
#> 23: 23 0.6178726 0.3983421 [1/m]
#> 24: 24 0.6712216 0.3692950 [1/m]
#> 25: 25 0.6332470 0.3589913 [1/m]
#> 26: 26 0.6146447 0.3022588 [1/m]
#> 27: 27 0.6862459 0.3886627 [1/m]
#> 28: 29 0.6646580 0.4497591 [1/m]
#> 29: 30 0.6477064 0.3973210 [1/m]
#> 30: 31 0.7743018 0.3371508 [1/m]
#> 31: 32 0.6386669 0.3898653 [1/m]
#> 32: 33 0.5873562 0.3436390 [1/m]
#> 33: 34 0.6441987 0.4537800 [1/m]
#> zoneID compact_mean perimeter_area_paR objects other than the footprintsA more complicated scenario exists when a user-defined function needs to access data which is not an attribute of the footprints dataset. In order to access the non-footprint data, a partial function must be created first and then supplied to calculation function.
In the example below, a simple constant value is supplied to a summary function; however, the idea extends to any object in the R environment. This process is how the nearest neighbour index is calculated in foot by drawing on the spatial zones object as well as the footprints.
# external "data"
d1 <- 0.001
# This will NOT work because argument 'd' is not found
# f2 <- function(x, d){
# return(sum(d * x))
# }
#
# calculate_footstats(buildings, adminzones, what="area", how="f2", verbose=T)
# Instead...
# example of creating a partial function
gen_f3 <- function(d){
force(d) # must include
function(x){
return(sum(d * x))
}
}
# generate the function and initialise it with `d1` from above.
f3 <- gen_f3(d1)
# this now uses the generated function, and `d` is found
calculate_footstats(buildings,
adminzones,
what="area",
how="f3",
verbose=F
)
#> zoneID area_f3
#> 1: 1 8.4925307 [m^2]
#> 2: 2 29.8921595 [m^2]
#> 3: 3 25.8441923 [m^2]
#> 4: 4 64.1223606 [m^2]
#> 5: 5 26.8496529 [m^2]
#> 6: 6 35.9289675 [m^2]
#> 7: 7 28.5153292 [m^2]
#> 8: 8 44.7039953 [m^2]
#> 9: 9 48.3619770 [m^2]
#> 10: 10 7.5938634 [m^2]
#> 11: 11 24.3211016 [m^2]
#> 12: 12 3.1237687 [m^2]
#> 13: 13 2.6583596 [m^2]
#> 14: 14 6.7422042 [m^2]
#> 15: 15 19.3132428 [m^2]
#> 16: 16 9.9446881 [m^2]
#> 17: 17 14.8188207 [m^2]
#> 18: 18 37.0611108 [m^2]
#> 19: 19 96.9233173 [m^2]
#> 20: 20 80.7164561 [m^2]
#> 21: 21 18.1444417 [m^2]
#> 22: 22 10.2427539 [m^2]
#> 23: 23 133.2688176 [m^2]
#> 24: 24 62.1124473 [m^2]
#> 25: 25 11.6953775 [m^2]
#> 26: 26 150.8832148 [m^2]
#> 27: 27 33.9686861 [m^2]
#> 28: 29 37.1740182 [m^2]
#> 29: 30 15.3446975 [m^2]
#> 30: 31 0.1427747 [m^2]
#> 31: 32 35.8933028 [m^2]
#> 32: 33 7.1492461 [m^2]
#> 33: 34 147.5949907 [m^2]
#> zoneID area_f3In this vignette, the foot package has been extended to incorporate user-defined functions. These functions can use one or more values from within the footprints, or even access other objects in the environment. While the examples used calculate_footstats, the same approaches can be used to create new gridded summary metrics with calculate_bigfoot.
sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#>
#> locale:
#> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
#> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
#> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] sf_1.0-0 foot_0.8
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.1.0 xfun_0.27 bslib_0.2.4 purrr_0.3.4
#> [5] vctrs_0.3.8 generics_0.1.0 stars_0.5-3 htmltools_0.5.2
#> [9] s2_1.0.6 yaml_2.2.1 utf8_1.2.2 rlang_0.4.12
#> [13] e1071_1.7-7 pkgdown_1.6.1 jquerylib_0.1.3 pillar_1.6.4
#> [17] glue_1.4.2 DBI_1.1.1 wk_0.4.1 foreach_1.5.1
#> [21] lifecycle_1.0.1 stringr_1.4.0 ragg_1.1.3 codetools_0.2-18
#> [25] memoise_2.0.0 evaluate_0.14 knitr_1.36 fastmap_1.1.0
#> [29] doParallel_1.0.16 parallel_4.1.1 class_7.3-19 fansi_0.5.0
#> [33] Rcpp_1.0.7 KernSmooth_2.23-20 classInt_0.4-3 filelock_1.0.2
#> [37] formatR_1.11 lwgeom_0.2-6 cachem_1.0.6 desc_1.4.0
#> [41] jsonlite_1.7.2 abind_1.4-5 systemfonts_1.0.3 fs_1.5.0
#> [45] textshaping_0.3.5 digest_0.6.28 stringi_1.7.5 dplyr_1.0.5
#> [49] rprojroot_2.0.2 grid_4.1.1 tools_4.1.1 magrittr_2.0.1
#> [53] sass_0.3.1 proxy_0.4-26 tibble_3.1.5 pkgconfig_2.0.3
#> [57] crayon_1.4.2 ellipsis_0.3.2 data.table_1.14.2 rmarkdown_2.11
#> [61] iterators_1.0.13 R6_2.5.1 units_0.7-2 compiler_4.1.1