Matthias Katzfuss (Texas A&M University): Statistical inference for massive distributed spatial data using low-rank models

Due to rapid data growth, it is becoming increasingly infeasible to move massive datasets, and statistical analyses have to be carried out where the data reside. If several datasets stored in separate physical locations are all relevant to a given problem, the challenge is to obtain valid inference based on all data without moving the datasets. This distributed data problem frequently arises in the geophysical and environmental sciences, for example, when the variable of interest is measured by several satellite instruments. We show that for a very widely used class of spatial low-rank models, which contain a component that can be written as a linear combination of spatial basis functions, computationally feasible spatial inference and prediction for massive distributed data can be carried out exactly. The required number of floating-point operations is linear in the number of data points, while the required amount of communication does not depend on the data sizes at all. After discussing several extensions and special cases, we apply our methodology to carry out spatio-temporal particle filtering inference on total precipitable water measured by three different sensor systems.

Joint work with Dorit Hammerling

Advertisements