Authors:
Yuhao Jia、Zile Wu、Shengao Yi、Yifei Sun
Paper:
https://arxiv.org/abs/2408.08852
Introduction
Urban forecasting, which involves predicting economic indicators and human mobility, has traditionally relied on low-dimensional numeric data such as Point of Interest (POI) data, survey data, GPS records, demographic census, and geospatial features. Recent advancements have explored using high-dimensional information to capture the complexities of urban dynamics more effectively. This includes utilizing urban imagery for feature extraction and prediction, and encoding urban information into high-dimensional space representations.
While these studies have established frameworks for encoding urban information, there has been limited investigation into optimizing these high-dimensional data for urban forecasting. To address this gap, the authors propose GeoTransformer, a novel structure that synergizes the Transformer architecture with geospatial statistics prior. GeoTransformer employs an innovative geospatial attention mechanism to incorporate extensive urban information and spatial dependencies into a unified predictive model.
Related Work
High Dimensional Input for Urban Forecasting
Urban forecasting leverages multiple data sources and advanced computational techniques to address challenges faced by cities. Traditional methods usually adopt low-dimensional numeric data for prediction. In recent years, researchers have explored various ways to leverage high-dimensional urban information. This includes using urban imagery for feature extraction and prediction, and encoding urban information into high-dimensional representations.
Approaches to Modeling Spatial Dependency
Spatial dependency has traditionally been addressed using feature engineering methods like k-nearest neighbors (KNN) and spatial statistical methods such as spatial error regression, spatial lag models, and geographically weighted regression (GWR). Recently, Graph Neural Networks (GNNs) have become widely adopted for capturing spatial dependencies, especially in spatiotemporal prediction. However, grid-based high-dimensional urban representations often lack clear graph-like structures, making graph construction challenging.
Economic and Mobility Indicators Prediction
Gross Domestic Product (GDP) is a critical indicator in urban studies, offering vital insights into the economic health, developmental patterns, and sustainability of cities. Leveraging Nighttime Light (NTL) data to estimate GDP as an economic indicator has become a widely adopted approach. Mobility indicators such as ride-share demand serve as a critical indicator for urban forecasting, with significant implications for transportation planning, economic activity, and population behavior.
Methodology
Encode the City: Mixing Operator
To fuse satellite imagery and sociodemographic information into a high-dimensional latent representation, the authors adopt a mixing operator. This operator uses a supervised autoencoder (SAE) to map the imagery into latent variables through the encoder and reconstruct images through the decoder. The mixing operator allows for the fusion of sociodemographics and satellite imagery information in a high-dimensional latent space.
GeoTransformer: Geospatial-weighted Attention Mechanisms
GeoTransformer is designed to decode the urban latent representations for downstream tasks including GDP and ride-share demand prediction. The method adopts a structure akin to the Transformer decoder, with a novel geospatial attention mechanism to capture spatial dependencies and provide location information between city regions. The geospatial attention mechanism computes attention scores between city regions and applies geospatial weighting to these scores.
The geospatial attention mechanism uses various spatial weighting options, including linear normalization, inverse distance weighting (IDW), and Gaussian weighting, to assign higher weights to regions closer to the query region.
GDP and Ride-share Demand Modeling
For GDP modeling, the authors use the Global 1km×1km gridded revised data based on nighttime light data. For ride-share demand modeling, they utilize taxi trip data from Chicago, focusing on the pickup locations to represent ride-share demand. The area-weighted average ride-share demand is calculated to align the spatial resolution of the taxi trip data with the remote sensing imagery.
Experiment
Experiment Setup
The authors provide results in GDP and ride-share demand prediction and compare them with baseline methods. They conduct ablation experiments to demonstrate the effectiveness of the geospatial attention component and model architecture design.
Data Preparation
The satellite imagery data is obtained from the National Agriculture Imagery Program (NAIP) for the Greater Chicago Area. The ride-share demand data is obtained from the City of Chicago’s Taxi Trips dataset. Sociodemographic data from the American Community Survey (ACS) is also incorporated.
Mixing Operator Training
The VAE component from Stable Diffusion is used as the initial weights for the mixing operator. The training process spans 100 epochs, and the latent representation for each region is inferred using the encoder part of the mixing operator.
Baselines
Various baseline models are trained for comparison, including linear regression, Visual Transformer (ViT), polynomial regression, and Graph Attention Network (GAT).
GeoTransformer Training
GeoTransformer is trained using three spatial weighting methods. The number of decoder layers is set to 4, and the number of attention heads is set to 16. Mean squared error (MSE) is employed as the loss function.
Evaluation Metrics
The authors use Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R2) to evaluate the accuracy of the predictions.
Evaluation Results
GDP Prediction
GeoTransformer models surpass every baseline model in GDP prediction. The linear weighting approach exhibits the best predictive performance.
Ride-share Demand Prediction
GeoTransformer with IDW weighting outperforms the best baseline by 18% in R2 on the test set. The IDW model also exhibits the fastest convergence.
Qualitative Analysis
The authors visualize several query regions’ attention maps with their neighboring value regions to investigate the functionality of the geospatial attention mechanism. The results validate the effectiveness of the geospatial attention mechanism in accurately capturing urban features relevant to GDP.
Ablation Study
Geospatial Attention Module
The authors conduct a controlled experiment by training an alternative version of the model devoid of the weighting attention mechanism. The comparative analyses illustrate that all three geospatial attention variants significantly enhanced the GDP prediction accuracy.
Trainable Key Matrices
The authors validate the effectiveness of setting the key matrix in the attention mechanism as trainable parameters. The linear weighting GeoTransformer with trainable K achieves higher prediction accuracy.
k-Nearest Neighbors
The authors search for the optimal number of neighboring regions k for GDP and ride-share demand predictions. The model performs best when k is set to 49 or 81.
Conclusion and Discussion
The authors propose GeoTransformer to extract information from high-dimensional urban latent representations. The novel geospatial attention mechanism allows GeoTransformer to capture extensive urban information and its spatial dependency. The experimental results demonstrate the capacity of the method in urban forecasting tasks. An ablation study validates the effectiveness of the geospatial attention mechanism and the design of GeoTransformer. The self-attention module in Transformer holds potential for adapting to spatiotemporal sequence prediction tasks, meriting further exploration in subsequent research.