Authors:
Zili Liu、Hao Chen、Lei Bai、Wenyuan Li、Wanli Ouyang、Zhengxia Zou、Zhenwei Shi
Paper:
https://arxiv.org/abs/2408.10854
Introduction
In recent years, the increasing frequency and intensity of extreme weather events have highlighted the critical need for accurate and reliable weather forecasting. Traditional numerical weather prediction methods, along with rapidly advancing deep learning-based forecasting models, have enabled increasingly accurate global-scale weather predictions. However, due to computational resource constraints, the resolution of global-scale forecasts is limited to tens of kilometers to 100 km. Such coarse spatial resolution is insufficient for the refined forecasting needs of specific regions and related downstream tasks. As a result, using downscaling techniques to generate high-resolution weather forecasts for specific regions is essential for addressing the spatial resolution limitations of global forecast models.
Downscaling methods for meteorological variables can be categorized into dynamical downscaling and statistical downscaling. Dynamical downscaling methods, often referred to as Regional Climate Models (RCMs), use forecast results from Global Climate Models (GCMs) as boundary conditions and incorporate topography and other regional information to construct regional dynamical processes. However, these methods heavily rely on differential equations to describe dynamical processes, which often results in significant biases and computational intensity. Statistical downscaling methods, on the other hand, directly learn the mapping from coarse-resolution to fine-resolution meteorological data using machine learning techniques.
In this paper, we introduce MambaDS, a novel model that pioneers the integration of the selective state space model into meteorological field downscaling. MambaDS enhances the utilization of multivariable correlations and topography information, unique challenges in the downscaling process, while retaining the advantages of Mamba in long-range dependency modeling and linear computational complexity.
Related Work
Deep Models in Meteorological Field Downscaling
Meteorological field downscaling can be broadly divided into dynamical downscaling and statistical downscaling based on modeling approaches. In the deep learning modeling paradigm, the downscaling process of meteorological fields is modeled as a nonlinear mapping from low-resolution meteorological grids to high-resolution grids, which is very similar to image super-resolution (SR). Therefore, the development of downscaling models has greatly drawn on the model structures of image SR.
CNN-based SR models are the earliest and most common in the application and research of downscaling methods. Representative work includes DeepSD, which utilizes the SRCNN to achieve the mapping from low-resolution to high-resolution meteorological fields. Subsequent improvement methods mostly utilize advanced super-resolution models like VDSR and encoder-decoder-based UNet to achieve downscaling for meteorological variables such as wind, temperature, and precipitation. Additionally, generative adversarial networks (GANs) have been widely used in downscaling to enhance the recovery of high-frequency texture information.
To further enhance the global context modeling capability of downscaling models, recent work has started using Transformer-based SR models. However, the quadratic computational complexity caused by the self-attention mechanism in Transformer models requires significant computational power and memory when handling high-resolution images. Some efficient Transformer improvements can alleviate this issue but at the cost of losing the ability to model long-term correlations.
Auxiliary Data for Meteorological Field Downscaling
Incorporating auxiliary information such as topography into the downscaling process is common in previous work. Early precipitation downscaling efforts utilized digital elevation model (DEM) data as auxiliary information. Recent improvements address the issue of simple integration by designing specialized encoder structures for topography data to extract features at different scales and integrate them into the downscaling process. However, this approach increases the overall model parameters and computational complexity.
Visual State Space Model
State space models represented by Mamba have gained significant attention and are widely applied in various fields such as computer vision and natural language processing. Its linear complexity and ability to model long-term dependence help alleviate the inherent issues present in CNN and Transformer models. Although Mamba was initially proposed for sequence data, recent work has demonstrated its strong performance in the visual and image domains.
Research Methodology
Preliminaries: State Space Models
Original State Space Models (SSMs) are mathematical representations used to describe dynamic systems, extensively applied in control theory, economics, signal processing, and more. They employ a set of first-order differential or difference equations to capture the system’s dynamics. A common type of state space model is the Linear Time-Invariant (LTI) system, which uses a linear ordinary differential equation (ODE) to describe the mapping relationship from the stimulation to the response through a latent state.
To further discretize the continuous differential equation for adaptation to deep learning algorithms, a timescale parameter is used to discretize the continuous parameters using the Zero-Order Hold (ZOH) criterion. The discretized version of the equation can be expressed as a recurrent neural network (RNN). However, the inherent structure of RNNs poses challenges for parallel processing. Recent advancements in state space models, such as S4 and Mamba, have transformed the RNN-based representation into a parallelizable CNN form.
Overall Structure
Our proposed MambaDS consists of three main stages: a shallow feature extraction, a residual hierarchical Mamba-based encoder, and a topography-constrained reconstruction. Given the input low-resolution meteorological fields, we first utilize a convolutional layer to embed the input field to the desired dimension. Subsequently, deep features are extracted through multiple stacked Residual State Space Blocks (RSSBs), each containing several Multivariable Correlation-Enhanced Visual State Space Modules (MCE-VSSMs) and a final convolutional layer to refine extracted features. The final deep feature is combined with the input low-resolution field to be fed into the topography-constrained reconstruction to achieve a high-resolution meteorological field with additional topography information.
Multivariable Correlation-Enhanced VSS Module
Different meteorological variable fields are treated as channel dimensions of a natural image and stacked together. However, unlike similar spectral channel distributions in images like RGB, different meteorological variables typically exhibit completely different distribution characteristics. Therefore, directly applying image super-resolution methods to multivariable meteorological downscaling tasks is limited because it overlooks the different distribution characteristics of each variable. To enhance the modeling capability of correlations between different meteorological variables, we propose a Multivariable Correlation-Enhanced Visual State Space Module (MCE-VSSM).
5-Direction Selective Scan Module
The vanilla Mamba, designed for one-dimensional sequence data in natural language, uses only two directions (forward and backward scanning) to capture the correlation between tokens. However, for image-like 2D grid data with a non-casual nature, both temporal and spatial correlations need to be captured. Therefore, the improved method uses four-directional scanning to enhance the model’s spatiotemporal modeling capability. Additionally, we added a random scanning branch to further enhance the model’s ability to capture chaotic dynamic systems.
Efficient Topography-Constraint Layer
Topography is an important factor and prior information in the downscaling process of meteorological variables. Effectively utilizing high-resolution topographic data to guide the texture restoration of meteorological fields is key to enhancing downscaling performance. We propose an efficient topography constraint layer for better preserving the detailed textures of the topography and avoiding additional computational overhead. This layer imposes a hard constraint on the downscaled output through topography weighting at the final stage of the model.
Experimental Design
Study Area
To fully validate the effectiveness of our proposed method, we selected two regions for our study: China mainland and the Continental United States (CONUS). These regions have significant differences in terrain and location, which helps to better demonstrate the method’s generalization capabilities.
Dataset Description
We use three different datasets to validate our method:
-
ERA5 Reanalysis: A comprehensive reanalysis dataset produced by the European Centre for Medium-Range Weather Forecasts (ECMWF). It provides detailed climate data, combining past observations with models to produce consistent and accurate records of atmospheric conditions.
-
NOAA High-Resolution Rapid Refresh (HRRR): A weather prediction model developed by the National Oceanic and Atmospheric Administration (NOAA). It provides high-frequency, short-term forecasts with a focus on the United States.
-
Fengwu Forecast: An artificial intelligence medium-range forecast model developed by the Shanghai Artificial Intelligence Laboratory. It uses ERA5 reanalysis data as the background field and outputs global hourly forecasts for the next 10 days.
Experiment Setup
We design three types of experiments to evaluate our proposed method: (1) ERA5 reanalysis downscaling, (2) HRRR analysis downscaling, and (3) Fengwu forecast downscaling. The architecture details and training details are as follows:
-
Architecture Details: MambaDS consists of three stages. During the shallow feature extraction stage, we use a convolutional layer to embed the channel dimension from the number of variables into the desired dimension. In the residual hierarchical Mamba-based encoder, we use four RSSBs with depth of {14, 1, 1, 1}. Finally, during the reconstruction stage, we upsample the extracted deep features with the pixel shuffle operation together with a convolutional layer.
-
Training Details: We train our proposed MambaDS with the robust Charbonnier loss. We train the model for 120 epochs and optimized the model using the Adam optimizer. We trained our proposed model using 4x NVIDIA A100-40G GPUs.
-
Evaluation Metrics: To evaluate the performance of our MambaDS, we employ four metrics: mean square error (MSE), mean absolute error (MAE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM).
Results and Analysis
ERA5 Reanalysis Downscaling
Downscaling the ERA5 reanalysis field is a simple baseline for assessing the performance of downscaling models. The results show that the proposed MambaDS method outperforms other methods across all metrics. The results also indicate that different downscaling methods effectively recover known downsampling processes.
HRRR Analysis Downscaling
The downscaling of the HRRR analysis field is a process of learning from the low-resolution ERA5 reanalysis field to the high-resolution analysis field. The results show that the error of this experiment is significantly increased compared with the ERA5 downscaling experiment with known degradation processes. Despite this, the MambaDS method still significantly outperforms other methods in all metrics.
Fengwu Forecast Downscaling
To further verify the effectiveness of the proposed method and get close to the actual downscaling business application scenario, we designed a downscaling task for the forecast field. The results show that the MambaDS model proposed in this paper shows a stable advantage in all variables, especially for surface pressure, which is more obvious.
Ablation Studies
To further analyze the impact of different components of the proposed MambaDS on the model performance, we conducted ablation experiments. The experimental results show that each component in MambaDS improves the performance of the model to varying degrees, and adding all innovative components to MambaDS will achieve the best downscaling performance.
Comparison of Different Topography Prior Integration Methods
In the downscaling of meteorological variables, topography data is important prior information. The results show that the proposed topography constraint layer achieves downscaling results that are similar to or even exceed those of other methods without a significant increase in the number of parameters.
Overall Conclusion
In this paper, we pioneer the selective state space model into meteorological field downscaling and propose a novel downscale model namely MambaDS. Compared to previous downscaling methods based on CNN and Transformer super-resolution models, MambaDS can model long-range dependencies while maintaining efficient linear computational complexity. We made specific designs and improvements tailored to the characteristics of the downscaling task, including the MCE-VSSM to enhance the modeling of correlations between different meteorological variables and an efficient topography constraint layer to guide the restoration of fine texture details in meteorological fields. Through extensive experimental comparisons of three different downscale settings and two study areas, we have verified the effectiveness of our proposed model for multivariable near-surface meteorological field downscaling.