The foot
package was developed by WorldPop at the University of Southampton (www.worldpop.org) to support geometric calculations and summaries of measures from building footprint polygons. This vignette demonstrates how users can extend the basic functionality of calculate_footstats
and calculate_bigfoot
by making and supplying their own functions to summarise footprint characteristics. For an introduction to the package, see vignette("footsteps")
.
foot
As noted in the introductory vignettes, foot
primarily uses calculate_footstats
to calculate and summarise metrics. Internally this function uses data.table
in order to handle large sets of building footprints and efficiently summarise them. The attributes to be summarised (what
) are supplied to function names (how
). These internal structures also allow for user-defined functions to be specified.
To demonstrate using custom functions, we will first add some additional attribute “data” to the footprints which we will use.
data("kampala", package = "foot")
buildings <- kampala$buildings
adminzones <- kampala$adminZones
# Adding random data categorical variable
buildings$category <- sample(LETTERS[1:5], size = nrow(buildings), replace = T)
# continuous variable
buildings$mult <- sample(rnorm(nrow(buildings), mean = 10, sd = 2))
We can use any attributes of the footprints within calculate_footstats
and calculate_bigfoot
, not only the built-in morphology measures listed by list_fs(what='all')
.
# summarising a new data value
calculate_footstats(buildings,
adminzones,
what="mult", # new attribute to summarise
how="mean",
verbose=F)
#> zoneID mult_mean
#> 1: 1 9.947260
#> 2: 2 10.078251
#> 3: 3 9.952500
#> 4: 4 9.476538
#> 5: 5 9.873005
#> 6: 6 10.322830
#> 7: 7 10.454661
#> 8: 8 9.762317
#> 9: 9 10.013834
#> 10: 10 10.349565
#> 11: 11 9.913699
#> 12: 12 9.510073
#> 13: 13 9.942433
#> 14: 14 10.013817
#> 15: 15 10.069056
#> 16: 16 10.072072
#> 17: 17 10.386338
#> 18: 18 9.989056
#> 19: 19 10.016845
#> 20: 20 10.039145
#> 21: 21 9.885370
#> 22: 22 10.627935
#> 23: 23 9.637838
#> 24: 24 9.908850
#> 25: 25 10.615264
#> 26: 26 10.162614
#> 27: 27 9.889993
#> 28: 29 9.994113
#> 29: 30 9.920652
#> 30: 31 12.489383
#> 31: 32 9.976741
#> 32: 33 9.882411
#> 33: 34 10.018653
#> zoneID mult_mean
The internal foot
functions are documented in ?fs_functions
; however, these functions are intended to be used within the wrapper functions of foot
and are rarely intended to be used as standalone functions. One built-in summary function, not applied by default, is majority
. It is designed to summarise categorical data. This function is available for users in the same manner of specifying the how
argument.
# get the majority category in each zone
calculate_footstats(buildings,
adminzones,
what="category",
how="majority",
verbose=F)
#> zoneID category_majority
#> 1: 1 D
#> 2: 2 E
#> 3: 3 A
#> 4: 4 B
#> 5: 5 B
#> 6: 6 A
#> 7: 7 C
#> 8: 8 A
#> 9: 9 C
#> 10: 10 D
#> 11: 11 D
#> 12: 12 C
#> 13: 13 D
#> 14: 14 B
#> 15: 15 D
#> 16: 16 B
#> 17: 17 C
#> 18: 18 A
#> 19: 19 B
#> 20: 20 B
#> 21: 21 D
#> 22: 22 E
#> 23: 23 D
#> 24: 24 C
#> 25: 25 C
#> 26: 26 C
#> 27: 27 B
#> 28: 29 C
#> 29: 30 D
#> 30: 31 B
#> 31: 32 E
#> 32: 33 A
#> 33: 34 C
#> zoneID category_majority
The majority
function is similar to the idea of supplying a user-defined function which is demonstrated in the next section.
Creating functions for use with foot
follows the same procedures and syntax for functions in R
in general. They must be declared with <- function()
and they must be available within the environment where foot
functions are being used. When the functions are used internally by calculate_footstats
, they are applied to footprints by zone index. Therefore they should return a single, summary value since the function for that group of footprints in the zone.
The name of the function is what is passed to foot
as an argument to how
. The argument(s) to the custom function can be named anything, but they will typically be values present within the footprint attributes to be summarised.
The example below shows a simple function that calculates the sum of the square root of the values. We will apply it to ‘area’, and foot
will automatically pre-calculate this characteristic since it is not present in the column names of the footprints.
# simple function example 1
f1 <- function(v){
units(v) <- NULL # ignore units in our function
return(sum(sqrt(v)))
}
# applying custom summary function to area
calculate_footstats(buildings,
adminzones,
what="area", how="f1",
verbose=F)
#> zoneID area_f1
#> 1: 1 365.00586
#> 2: 2 1827.32853
#> 3: 3 873.59117
#> 4: 4 1970.96031
#> 5: 5 869.86802
#> 6: 6 824.36335
#> 7: 7 486.75006
#> 8: 8 1197.14781
#> 9: 9 3055.03307
#> 10: 10 386.10669
#> 11: 11 1407.28115
#> 12: 12 295.43981
#> 13: 13 73.96388
#> 14: 14 731.89385
#> 15: 15 2019.18463
#> 16: 16 1156.94489
#> 17: 17 1136.03871
#> 18: 18 3480.35645
#> 19: 19 5918.73046
#> 20: 20 5274.46461
#> 21: 21 1000.86861
#> 22: 22 723.99592
#> 23: 23 3448.71743
#> 24: 24 3890.88401
#> 25: 25 436.64536
#> 26: 26 3434.11917
#> 27: 27 2288.60545
#> 28: 29 1914.86276
#> 29: 30 991.99424
#> 30: 31 11.94884
#> 31: 32 1917.43260
#> 32: 33 310.27950
#> 33: 34 9861.36237
#> zoneID area_f1
Although this function was just used to process area, the function can be used for any continuous value. It can also be used on multiple characteristics or combined with other lists of functions, just like any other built-in function in foot
.
calculate_footstats(buildings,
adminzones,
what=list("area","perimeter"), how="f1",
verbose=F)
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
#> WARNING: different compile-time and runtime versions for GEOS found:
#> Linked against: 3.9.0-CAPI-1.16.2 compiled against: 3.8.1-CAPI-1.13.3
#> It is probably a good idea to reinstall sf, and maybe rgeos and rgdal too
#> zoneID area_f1 perimeter_f1
#> 1: 1 365.00586 183.488651
#> 2: 2 1827.32853 1048.879091
#> 3: 3 873.59117 426.068457
#> 4: 4 1970.96031 968.615609
#> 5: 5 869.86802 412.069737
#> 6: 6 824.36335 337.518569
#> 7: 7 486.75006 171.066602
#> 8: 8 1197.14781 515.464525
#> 9: 9 3055.03307 1911.775834
#> 10: 10 386.10669 226.579855
#> 11: 11 1407.28115 857.273283
#> 12: 12 295.43981 214.216882
#> 13: 13 73.96388 35.239420
#> 14: 14 731.89385 500.492421
#> 15: 15 2019.18463 1407.986177
#> 16: 16 1156.94489 835.883724
#> 17: 17 1136.03871 716.240827
#> 18: 18 3480.35645 2477.699320
#> 19: 19 5918.73046 3199.202606
#> 20: 20 5274.46461 2994.324208
#> 21: 21 1000.86861 552.056838
#> 22: 22 723.99592 400.124081
#> 23: 23 3448.71743 1548.754569
#> 24: 24 3890.88401 2167.419552
#> 25: 25 436.64536 207.129346
#> 26: 26 3434.11917 1360.146927
#> 27: 27 2288.60545 1300.818322
#> 28: 29 1914.86276 1117.788513
#> 29: 30 991.99424 569.928890
#> 30: 31 11.94884 6.938055
#> 31: 32 1917.43260 1042.133260
#> 32: 33 310.27950 155.175160
#> 33: 34 9861.36237 6005.424628
#> zoneID area_f1 perimeter_f1
In some instances, a custom function may need to make use of two or more characteristics from within the building footprint datasets. The built-in functions in foot
are primarily designed to work with a single value (e.g. area or perimeter).
While it may sometimes be quicker to pre-calculate the combination, it could be advantageous to use a function, particularly in calculate_bigfoot
where smaller subsets of a large building footprint dataset are processed. To make sure multiple attributes are supplied to the summary function, the arguments in what
should be specified using a special type (fs_varlist
). The fs_varlist
creates a nested list within the internal processing to keep the argument together. Keep in mind that the arguments are passed to the summary function by position, not be name, so the order within fs_varlist
must match the order of parameters that the function is expecting.
An example of a custom function using two characteristics is the average perimeter-area ratio. We can compare this to the built-in function which uses a Polsby-Popper metric (fs_compact
).
# average perimeter-area ratio
pa <- function(p, a){
return(mean(p / a))
}
# used to summarise within zones
# note that fs_varlist is still within a list
calculate_footstats(buildings,
adminzones,
what=list(list("compact"), fs_varlist("perimeter","area")),
how=list(list("mean"), list("pa")),
verbose=T
)
#> Selecting metrics
#> Setting control values.
#> Pre-calculating areas
#> Pre-calculating perimeters
#> Pre-calculating compactness
#> Creating zonal index
#>
#> Calculating 2 metrics ...
#> compact mean
#> perimeter area pa
#> Finished calculating metrics.
#> zoneID compact_mean perimeter_area_pa
#> 1: 1 0.6018474 0.3661908 [1/m]
#> 2: 2 0.6604745 0.4291126 [1/m]
#> 3: 3 0.6154374 0.3848159 [1/m]
#> 4: 4 0.6300735 0.3979379 [1/m]
#> 5: 5 0.6080815 0.4014357 [1/m]
#> 6: 6 0.6283676 0.3119084 [1/m]
#> 7: 7 0.5472201 0.2687669 [1/m]
#> 8: 8 0.5541688 0.3042632 [1/m]
#> 9: 9 0.6141427 0.5018987 [1/m]
#> 10: 10 0.6247686 0.5062606 [1/m]
#> 11: 11 0.5803702 0.4672657 [1/m]
#> 12: 12 0.6620628 0.6440226 [1/m]
#> 13: 13 0.5655211 0.5569546 [1/m]
#> 14: 14 0.7154767 0.4860048 [1/m]
#> 15: 15 0.7279642 0.5282777 [1/m]
#> 16: 16 0.7013814 0.5653320 [1/m]
#> 17: 17 0.6548002 0.5064319 [1/m]
#> 18: 18 0.6900773 0.6217822 [1/m]
#> 19: 19 0.6716734 0.3503747 [1/m]
#> 20: 20 0.6841125 0.3815382 [1/m]
#> 21: 21 0.6957138 0.3679120 [1/m]
#> 22: 22 0.7145771 0.3238091 [1/m]
#> 23: 23 0.6178726 0.3983421 [1/m]
#> 24: 24 0.6712216 0.3692950 [1/m]
#> 25: 25 0.6332470 0.3589913 [1/m]
#> 26: 26 0.6146447 0.3022588 [1/m]
#> 27: 27 0.6862459 0.3886627 [1/m]
#> 28: 29 0.6646580 0.4497591 [1/m]
#> 29: 30 0.6477064 0.3973210 [1/m]
#> 30: 31 0.7743018 0.3371508 [1/m]
#> 31: 32 0.6386669 0.3898653 [1/m]
#> 32: 33 0.5873562 0.3436390 [1/m]
#> 33: 34 0.6441987 0.4537800 [1/m]
#> zoneID compact_mean perimeter_area_pa
R
objects other than the footprintsA more complicated scenario exists when a user-defined function needs to access data which is not an attribute of the footprints dataset. In order to access the non-footprint data, a partial function must be created first and then supplied to calculation function.
In the example below, a simple constant value is supplied to a summary function; however, the idea extends to any object in the R
environment. This process is how the nearest neighbour index is calculated in foot
by drawing on the spatial zones object as well as the footprints.
# external "data"
d1 <- 0.001
# This will NOT work because argument 'd' is not found
# f2 <- function(x, d){
# return(sum(d * x))
# }
#
# calculate_footstats(buildings, adminzones, what="area", how="f2", verbose=T)
# Instead...
# example of creating a partial function
gen_f3 <- function(d){
force(d) # must include
function(x){
return(sum(d * x))
}
}
# generate the function and initialise it with `d1` from above.
f3 <- gen_f3(d1)
# this now uses the generated function, and `d` is found
calculate_footstats(buildings,
adminzones,
what="area",
how="f3",
verbose=F
)
#> zoneID area_f3
#> 1: 1 8.4925307 [m^2]
#> 2: 2 29.8921595 [m^2]
#> 3: 3 25.8441923 [m^2]
#> 4: 4 64.1223606 [m^2]
#> 5: 5 26.8496529 [m^2]
#> 6: 6 35.9289675 [m^2]
#> 7: 7 28.5153292 [m^2]
#> 8: 8 44.7039953 [m^2]
#> 9: 9 48.3619770 [m^2]
#> 10: 10 7.5938634 [m^2]
#> 11: 11 24.3211016 [m^2]
#> 12: 12 3.1237687 [m^2]
#> 13: 13 2.6583596 [m^2]
#> 14: 14 6.7422042 [m^2]
#> 15: 15 19.3132428 [m^2]
#> 16: 16 9.9446881 [m^2]
#> 17: 17 14.8188207 [m^2]
#> 18: 18 37.0611108 [m^2]
#> 19: 19 96.9233173 [m^2]
#> 20: 20 80.7164561 [m^2]
#> 21: 21 18.1444417 [m^2]
#> 22: 22 10.2427539 [m^2]
#> 23: 23 133.2688176 [m^2]
#> 24: 24 62.1124473 [m^2]
#> 25: 25 11.6953775 [m^2]
#> 26: 26 150.8832148 [m^2]
#> 27: 27 33.9686861 [m^2]
#> 28: 29 37.1740182 [m^2]
#> 29: 30 15.3446975 [m^2]
#> 30: 31 0.1427747 [m^2]
#> 31: 32 35.8933028 [m^2]
#> 32: 33 7.1492461 [m^2]
#> 33: 34 147.5949907 [m^2]
#> zoneID area_f3
In this vignette, the foot
package has been extended to incorporate user-defined functions. These functions can use one or more values from within the footprints, or even access other objects in the environment. While the examples used calculate_footstats
, the same approaches can be used to create new gridded summary metrics with calculate_bigfoot
.
sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#>
#> locale:
#> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
#> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
#> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] sf_1.0-0 foot_0.8
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.1.0 xfun_0.27 bslib_0.2.4 purrr_0.3.4
#> [5] vctrs_0.3.8 generics_0.1.0 stars_0.5-3 htmltools_0.5.2
#> [9] s2_1.0.6 yaml_2.2.1 utf8_1.2.2 rlang_0.4.12
#> [13] e1071_1.7-7 pkgdown_1.6.1 jquerylib_0.1.3 pillar_1.6.4
#> [17] glue_1.4.2 DBI_1.1.1 wk_0.4.1 foreach_1.5.1
#> [21] lifecycle_1.0.1 stringr_1.4.0 ragg_1.1.3 codetools_0.2-18
#> [25] memoise_2.0.0 evaluate_0.14 knitr_1.36 fastmap_1.1.0
#> [29] doParallel_1.0.16 parallel_4.1.1 class_7.3-19 fansi_0.5.0
#> [33] Rcpp_1.0.7 KernSmooth_2.23-20 classInt_0.4-3 filelock_1.0.2
#> [37] formatR_1.11 lwgeom_0.2-6 cachem_1.0.6 desc_1.4.0
#> [41] jsonlite_1.7.2 abind_1.4-5 systemfonts_1.0.3 fs_1.5.0
#> [45] textshaping_0.3.5 digest_0.6.28 stringi_1.7.5 dplyr_1.0.5
#> [49] rprojroot_2.0.2 grid_4.1.1 tools_4.1.1 magrittr_2.0.1
#> [53] sass_0.3.1 proxy_0.4-26 tibble_3.1.5 pkgconfig_2.0.3
#> [57] crayon_1.4.2 ellipsis_0.3.2 data.table_1.14.2 rmarkdown_2.11
#> [61] iterators_1.0.13 R6_2.5.1 units_0.7-2 compiler_4.1.1