What is it about?
Usual cross-validation (CV) techniques end up to an overly optimistic performance estimates, since they do not properly take into account the effect of the spatial auto-correlation. The typical subset triplets train-validate-test have to be restricted spatially so that the predictions are always located within a certain minimum distance from any known data. The resulting performance measure is more realistic (and more pessimistic) than traditional estimates. The performance value is a function of the distance (gap between the "known" and predicted locations) and the function shape divides the phenomena to be predicted to 3 different categories: 1) globally predictable (e.g. ground height and the annual average athmospheric pressure) 2) locally predictable (the predictions possible over limited distances into an unknown zone, e.g. 2 km). 3) extrapolation (the prediction depends only on the distance) By prediction we mean the classical machine learning problem, where global features (p X _) and some local measurements (p0 X0 y0), p0 subset p have been given, and the problem is to predict (p X y). By extrapolation we mean the classical problem, where (p0 _ y0), p0 subset p, has been given, and (p _ y) has to be estimated.
Featured Image
Why is it important?
Basic message is: please do CV correctly with the spatial problems. This is the first attempt to have a systematic categorization on which problems one can attempt to predict, and which one should simply extrapolate at the proximity of the measurements.
Perspectives
Read the Original
This page is a summary of: Estimating the prediction performance of spatial models via spatial k-fold cross validation, International Journal of Geographical Information Science, July 2017, Taylor & Francis,
DOI: 10.1080/13658816.2017.1346255.
You can read the full text:
Contributors
The following have contributed to this page