Models Modules
The Models modules provide a comprehensive suite of statistical and machine learning models for climate prediction, including linear models, EOF-based models, canonical correlation analysis (CCA), analog methods, and multi-model ensemble (MME) techniques. These models are designed to handle both deterministic and probabilistic forecasts, with support for hyperparameter tuning. Models are evaluated using cross-validation schemes.
The models modules are organized into several classes, each implementing a specific type of model:
Machine Learning Models: This includes linear models such as multiple linear regression, logistic regression and regularized models like ridge, lasso, elastic-net. Additionally, more advanced models are available, including support vector regression, random forests, XGBoost, and neural networks.
EOF and PCR Models: For dimensionality reduction and regression using principal components.
CCA Models: For identifying relationships between two multivariate datasets.
Analog Methods: For finding historical analogs to current conditions.
Multi-Model Ensemble (MME) Techniques: For combining predictions from multiple models.
Machine Learning Models
The available models are:
WAS_LinearRegression_Model:
Standard Multiple Linear Regression. * WAS_Ridge_Model: Ridge regression with L2 regularization. * WAS_Lasso_Model: Lasso regression with L1 regularization. * WAS_LassoLars_Model: Lasso regression using the LARS algorithm. * WAS_ElasticNet_Model: Elastic net regression combining L1 and L2 regularization. * WAS_LogisticRegression_Model: Logistic regression for classification. * WAS_SVR: Support vector regression. * WAS_PolynomialRegression: Polynomial regression. * WAS_PoissonRegression: Poisson regression. * WAS_RandomForest_XGBoost_ML_Stacking: Random forest and XGBoost regression with stacking. * WAS_MLP: Multi-Layer Perceptron regression. * WAS_RandomForest_XGBoost_Stacking_MLP: Random forest, XGBoost, and MLP regression with stacking. * WAS_Stacking_Ridge: Random forest, XGBoost, MLP, and Ridge regression with stacking.
Except for WAS_LogisticRegression_Model, each model class includes methods for:
compute_model: Training the model and making predictions.
compute_prob: Computing tercile probabilities for the predictions.
forecast: Making forecasts for new data.
EOF and PCR Models
The was_eof.py and was_pcr.py modules provide classes for EOF analysis and Principal Component Regression (PCR), with support for multiple EOF zones:
WAS_EOF: Performs EOF analysis with options for cosine latitude weighting, standardization, and L2 normalization.
WAS_PCR: Combines EOF analysis with a regression model for prediction, supporting multiple EOF zones.
WAS_EOF
Initialization
n_modes: Number of EOF modes to retain.
use_coslat: Apply cosine latitude weighting (default: True).
standardize: Standardize the input data (default: False).
opti_explained_variance: Target cumulative explained variance to determine modes.
L2norm: Normalize components and scores to have L2 norm (default: True).
Methods
fit: Fits the EOF model to the data, supporting multiple zones by applying EOF analysis to the entire dataset.
transform: Projects new data onto the EOF modes.
inverse_transform: Reconstructs data from principal components (PCs).
plot_EOF: Plots the EOF spatial patterns with explained variance.
WAS_PCR
Initialization
regression_model: The regression model (e.g., WAS_Ridge_Model) to use with PCs.
n_modes: Number of EOF modes to retain.
use_coslat: Apply cosine latitude weighting (default: True).
standardize: Standardize the input data (default: False).
opti_explained_variance: Target cumulative explained variance.
L2norm: Normalize EOF components and scores (default: True).
Methods
compute_model: Fits the EOF model, transforms data to PCs, and applies the regression model.
compute_prob: Computes tercile probabilities using the regression model.
forecast: Makes forecasts using EOF-transformed data.
Example Usage: Seasonal Forecasting Based on Observational Data
from wass2s import *
## Define the directory to save the data
dir_to_save_reanalysis = "/path/to/save_reanalysis"
dir_to_save_agroindicators = "/path/to/save_agroindicators"
## Define the climatology year range and the season
clim_year_start = 1991
clim_year_end = 2020
seas_reanalysis = ["01", "02", "03"]
seas_agroindicators = ["05", "06", "07"]
## Define the variables to download
variables = ["AGRO.PRCP"]
## Define the center and the predictor variables
center_variable = ["ERA5.SST"]:
## Define the extent for reanalysis
extent = [45, -180, -45, 180] # [North, West, South, East]
## Define the extent for Observation
extent_obs = [30, -25, 0, 30] # [North, West, South, East]
## Download the predictors and the predictand
downloader = WAS_Download()
## Download the predictors
downloader.WAS_Download_Reanalysis(
dir_to_save=dir_to_save_reanalysis,
center_variable=center_variable,
year_start=1991,
year_end=2025,
area=extent,
seas=seas_reanalysis,
force_download=False
)
## Download the predictand
downloader.WAS_Download_AgroIndicators(
dir_to_save=dir_to_save_agroindicators,
variables=["AGRO.PRCP"],
year_start=1991,
year_end=2024,
area=extent_obs,
seas=seas_agroindicators,
force_download=False
)
Case 1: Used SST index as a predictor
# Prepare predictand and predictors
predictand = prepare_predictand(dir_to_save_agroindicators, variables, year_start, year_end, seas_agroindicators, ds=False, daily=False)
# Prepare predictors
## Print available SST indices
print(list(sst_indices_name.keys()))
## Choose yours
sst_index_name = ['NINO34','TNA', 'TSA', 'DMI']
## Plot the SST index zone
plot_map([extent[1],extent[3],extent[2],extent[0]], sst_indices = sst_index_name, title="Index Zone",fig_size=(7,4))
## Compute the SST indices
predictors = compute_sst_indices(dir_to_save_reanalysis, sst_index_name, center_variable[0], year_start, year_end, seas_reanalysis)
## Compute variance inflation factor to see multicolinearity between predictors
vif_data = pd.DataFrame()
vif_data["feature"] = predictors.to_dataframe().columns
vif_data["VIF"] = [VIF(predictors.to_dataframe(), i) for i in range(predictors.to_dataframe().shape[1])]
## Print VIF values
print(vif_data)
## Set a threshold for VIF
vif_threshold = 5
# Remove features with VIF greater than the threshold
low_vif_predictors = vif_data[vif_data["VIF"] < vif_threshold]["feature"].tolist()
filtered_predictors = predictors[low_vif_predictors].to_array()
filtered_predictors = filtered_predictors.rename({"variable": "features"}).transpose('T', 'features')
# Initialize the model class
model = WAS_LinearRegression_Model(nb_cores=2, dist_method="lognormal")
# Assuming predictand follows a lognormal distribution. otherwise, normal, student-t or gamma are available. used dist_method="normal" or dist_method="t" or dist_method="gamma".
# Perform cross-validation
was_cv = WAS_Cross_Validator(n_splits=len(predictand.get_index("T")), nb_omit=2)
hindcast_det, hindcast_prob = was_cv.cross_validate(model, predictand, filtered_predictorsisel(T=slice(None,-1)), clim_year_start, clim_year_end)
# clim_year_start and clim_year_end are the years used to compute the climatology.
# Initialize the model class
model = WAS_Ridge_Model(n_clusters=6, alpha_range=np.logspace(-4, 0.1, 20), nb_cores = 2)
# Compute alpha parameters
alpha, clusters = model.compute_hyperparameters(predictand, filtered_predictors)
# Perform cross-validation
was_cv = WAS_Cross_Validator(n_splits=len(predictand.get_index("T")), nb_omit=2)
hindcast_det_Ridge, hindcast_prob_Ridge = was_cv.cross_validate(model, predictand, filtered_predictors.isel(T=slice(None,-1)), clim_year_start, clim_year_end, alpha=alpha)
# Make a forecast
forecast_det_Ridge, forecast_prob_Ridge = model.forecast(predictand, clim_year_start, clim_year_end, filtered_predictors.isel(T=slice(None,-1)), hindcast_det_Ridge, filtered_predictors.isel(T=[-1]), alpha=alpha, l1_ratio=l1_ratio)
Case 2: Used PCRs as a predictor
# Set your own zones ( zones not available in built-in)
# define zone as dict : {'zone_name_key': ('Explicit_Zone_name', lon_min, lon_max, lat_min, lat_max)}
zones_for_PCR = {'A': ('A', -150, 150, -45, 45)}
# Set number of modes
n_modes = 6
# ElasticNet hyperparameters range
alpha_range = np.logspace(-4, 0.1, 20)
l1_ratio_range = [0.5, 0.9999]
# Initialize the model class
model = WAS_PCR_Model(n_clusters=6, alpha_range=np.logspace(-4, 0.1, 20), nb_cores = 2)
plot_map([extent[1],extent[3],extent[2],extent[0]], sst_indices = zones_for_PCR, title="Predictors Area",fig_size=(8,6))
# Retrieve predictor data for the defined zone
predictor = retrieve_single_zone_for_PCR(dir_to_save_Reanalysis, zones_for_PCR, variables_reanalysis[0], year_start, year_end, season, clim_year_start, clim_year_end)
# Load WAS_EOF Class
eof_model = WAS_EOF(n_modes=n_modes, use_coslat=True, standardize=True)
# Load predictor, compute EOFs and retrieve component, scores and explained variances
s_eofs, s_pcs, s_expvar, _ = eof_model.fit(predictor, dim="T", clim_year_start=clim_year_start, clim_year_end=clim_year_end)
# Plot EOFs and explained variances
eof_model.plot_EOF(s_eofs, s_expvar)
# Perform Cross-validation with elastic-net
## Load class for model
regression_model = WAS_ElasticNet_Model(alpha_range = alpha_range, l1_ratio_range = l1_ratio_range, nb_cores = 2, dist_method="lognormal")
pcr_model = WAS_PCR(regression_model=regression_model, n_modes=n_modes, standardize=False)
## Compute alpha parameters
alpha, l1_ratio, clusters = regression_model.compute_hyperparameters(predictand, s_pcs.isel(T=slice(None,-1)).rename({"mode": "features"}).transpose('T', 'features'))
## Perform cross-validation
was_cv = WAS_Cross_Validator(n_splits=len(predictand.get_index("T")), nb_omit=2)
hindcast_det, hindcast_prob = was_cv.cross_validate(pcr_model, predictand, s_pcs.isel(T=slice(None,-1)).rename({"mode": "features"}).transpose('T', 'features'), clim_year_start, clim_year_end, alpha=alpha, l1_ratio=l1_ratio)
CCA Models
The was_cca.py module provides classes for Canonical Correlation Analysis (CCA):
WAS_CCA: Performs CCA to identify relationships between two multivariate datasets.
Initialization
n_modes: Number of CCA modes to retain.
n_pca_modes: Number of PCA modes to use for dimensionality reduction.
dist_method: distribution method for probability computations.
Methods
compute_model: Fits the CCA model and makes predictions.
compute_prob: Computes tercile probabilities for the predictions.
Example Usage: Recalibrating Seasonal Forecast Outputs from Global Climate Models (GCMs)
Analog Forecasting Methods
The was_analog.py module provides the WAS_Analog class for analog-based forecasting using various techniques to identify historical analogs to current conditions for prediction, particularly for seasonal rainfall forecasts using sea surface temperature (SST) data.
Initialization Parameters
dir_to_save(str): Directory path to save downloaded and processed data files.year_start(int): Starting year for historical data.year_forecast(int): Target forecast year.reanalysis_name(str): Reanalysis dataset name (e.g., “ERA5.SST” or “NOAA.SST”).model_name(str): Forecast model name (e.g., “ECMWF_51.SST”).method_analog(str, default=”som”): Analog method to use (“som”, “cor_based”, “pca_based”).best_prcp_models(list, optional): List of best precipitation models. Default is None.month_of_initialization(int, optional): Forecast initialization month. Default is None (uses current month).lead_time(list, optional): Lead times in months. Default is None (uses [1, 2, 3, 4, 5]).ensemble_mean(str, default=”mean”): Ensemble mean method (“mean” or “median”).clim_year_start(int, optional): Start year for climatology period.clim_year_end(int, optional): End year for climatology period.define_extent(tuple, optional): Bounding box as (lon_min, lon_max, lat_min, lat_max) for regional analysis.index_compute(list, optional): Climate indices to compute (e.g., [“NINO34”, “DMI”]).some_grid_size(tuple, default=(None, None)): SOM grid dimensions (rows, cols); None uses automatic sizing.some_learning_rate(float, default=0.5): Learning rate for SOM training.some_neighborhood_function(str, default=”gaussian”): Neighborhood function for SOM (“gaussian”, etc.).some_sigma(float, default=1.0): Initial neighborhood radius for SOM.dist_method(str, default=”gamma”): Probability method (“gamma”, “t”, “normal”, “lognormal”, “nonparam”).
Key Methods
download_sst_reanalysis(): Downloads and processes SST reanalysis data from the specified center for the given years and area.download_models(): Downloads seasonal forecast model data for the specified model, initialization month, and lead times.standardize_timeseries(): Standardizes time series data over a specified climatology period.calc_index(): Computes specified climate indices (e.g., NINO34, DMI) from SST data.compute_model(): Identifies historical analogs using the specified method and computes deterministic forecasts.compute_prob(): Calculates tercile probabilities (Below Normal, Near Normal, Above Normal) using the specified distribution method.forecast(): Generates deterministic and probabilistic forecasts for the target year, returning processed SST data, similar years, deterministic forecast, and probabilistic forecast.composite_plot(): Creates composite plots of forecast results, optionally including the predictor (SST) visualization.
Example Usage
Basic analog forecast setup:
from wass2s.was_analog import WAS_Analog
# Initialize analog model
analog_model = WAS_Analog(
dir_to_save="./s2s_data/analog",
year_start=1990,
year_forecast=2025,
reanalysis_name="NOAA.SST",
model_name="ECMWF_51.SST",
method_analog="som",
month_of_initialization=3,
clim_year_start=1991,
clim_year_end=2020,
define_extent=(-180, 180, -45, 45),
index_compute=["NINO34", "DMI"],
dist_method="gamma"
)
# Download and process data
sst_hist, sst_for = analog_model.download_and_process()
# Generate forecast
ddd, similar_years, forecast_det, forecast_prob = analog_model.forecast(
predictant=rainfall_data,
clim_year_start=1991,
clim_year_end=2020,
hindcast_det=hindcast_data,
forecast_year=2025
)
# Create composite plot
similar_years = analog_model.composite_plot(
predictant=rainfall_data,
clim_year_start=1991,
clim_year_end=2020,
hindcast_det=hindcast_data,
plot_predictor=True
)
Cross-Validation Example
from wass2s.was_analog import WAS_Cross_Validator
# Perform cross-validation
was_analog_cv = WAS_Cross_Validator(n_splits=len(rainfall.get_index("T")), nb_omit=2)
hindcast_analog_det, hindcast_analog_prob = was_analog_cv.cross_validate(
analog_model,
rainfall,
clim_year_start=1991,
clim_year_end=2020
)
# Generate forecast using cross-validated hindcast
ddd, similar_years, forecast_det, forecast_prob = analog_model.forecast(
predictant=rainfall,
clim_year_start=1991,
clim_year_end=2020,
hindcast_det=hindcast_analog_det,
forecast_year=2025
)
Note
Ensure WAS_Cross_Validator is correctly imported from the wass2s.was_analog module and that the rainfall variable is an xarray DataArray with appropriate dimensions (T, Y, X).