Interpolate your spatial data

A free tool for interpolating missing spatial data using modern machine learning techniques.

Add geographic features (?)
Add Twitter language features (?)

Formatting Your Data


Download example data, which you can use to interpolate county-level personality (agreeableness, as estimated from Twitter data) from socio-demographics.

How to Use

Upload a .csv file in which each row is a single spatial unit (e.g., U.S. county). The first column must be an ID for that entry (e.g., county FIPS code), and the second column must be the variable you wish to interpolate. All other columns will be used to train the Gaussian Process model used for interpolation.


This tool then trains a Gaussian Process model and interpolates the outcome for each row with missing data. It returns a CSV with the interpolated value (i.e., the mean of the predictive distribution) as well as the standard deviation.


The final optimized lengthscale parameter is added to the filename of the returned CSV. For example, a returned file named interpolations_ls3.296.csv means that the final lengthscale was 3.296.


An overview of the full technical details can be found here.

Methodology

...
  • This tool presents principled methods to interpolate spatial data using formal Bayesian models, namely Gaussian Processes.
  • Powered by GPytorch, a highly efficient and modular implementation of Gaussian Processes.
  • Using a modern machine learning framework, the GP hyperparameters are learned from the training data, as opposed to chosen a priori.
  • Full technical details are available here or you can read our paper here.
  • Try interpolating U.S. county Agreeableness with this sample data. Note, this data does not work with the Twitter language features, since this agreeablness measure is built from the same Twitter language features.

Citation

Please cite this tool as follows:
@article{giorgi2023filling,
  title={Filling in the white space: Spatial interpolation with Gaussian processes and social media data},
  author={Giorgi, Salvatore and Eichstaedt, Johannes C and Preo{\c{t}}iuc-Pietro, Daniel and Gardner, Jacob R and Schwartz, H Andrew and Ungar, Lyle H},
  journal={Current research in ecological and social psychology},
  volume={5},
  pages={100159},
  year={2023},
  publisher={Elsevier}
}

Giorgi, S., Eichstaedt, J. C., Preoţiuc-Pietro, D., Gardner, J. R., Schwartz, H. A., & Ungar, L. H. (2023). Filling in the white space: Spatial interpolation with Gaussian processes and social media data. Current research in ecological and social psychology, 5, 100159.

                                    
Giorgi, Salvatore, et al. "Filling in the white space: Spatial interpolation with Gaussian processes and social media data." Current research in ecological and social psychology 5 (2023): 100159.

                                    

Who we are

This resource is provided by Salvatore Giorgi, Johannes C. Eichstaedt, Jacob R. Gardner, H. Andrew Schwartz, and Lyle H. Ungar of the University of Pennsylvania, Stanford University, and Stonybrook University.


Please send comments, suggestions, and bug reports to sal.giorgi@sas.upenn.edu.