workshop

View project on GitHub

Nieve Response

Nieves et al. used a random forest machine learning alogrithm in order to best predict the value, globally, of population density and what covariates are it’s best predictor. Random forest algorithms generally function the same, so I will speak in terms of the variables used in the study. From a large set of different covariates, the algorithm will take randomly selected sets of the covariateds, and then the RF algorithm takes 1/3 of those covariates into an out of bag (OOB) data set that will later be used in for estimating error. From the other 2/3’s of of the covariates remaining, the different sets of decision trees will be applied to the different sets of covariates and the RF algorithm will grow and split those trees until the threshold dictated by the researcher. At the end of the growing process, the OOB error is predicted by applying the OOB data to the new trees. From their, the researchers have a random forest which details which covariates are the most important. Using those statistics, the researches can then designated the population density according to which geospatial covariate is the most important for that area. Likewise, a dasymetric population attribution is an unequal population distribution based on the surrounding factors of an area. When calculating population density, the researchers found that urban/suburban extents (ie buildings), urban/suburban proxies, and leading down to transportation system.