Models Modules -------------- The Models modules provide a comprehensive suite of statistical and machine learning models for climate prediction, including linear models, EOF-based models, canonical correlation analysis (CCA), analog methods, and multi-model ensemble (MME) techniques. These models are designed to handle both deterministic and probabilistic forecasts, with support for hyperparameter tuning. Models are evaluated using cross-validation schemes. The models modules are organized into several classes, each implementing a specific type of model: 1. **Machine Learning Models**: This includes linear models such as multiple linear regression, logistic regression and regularized models like ridge, lasso, elastic-net. Additionally, more advanced models are available, including support vector regression, random forests, XGBoost, and neural networks. 2. **EOF and PCR Models**: For dimensionality reduction and regression using principal components. 3. **CCA Models**: For identifying relationships between two multivariate datasets. 4. **Analog Methods**: For finding historical analogs to current conditions. 5. **Multi-Model Ensemble (MME) Techniques**: For combining predictions from multiple models. ============================================== Machine Learning Models ============================================== The available models are: * `WAS_LinearRegression_Model`: Standard Multiple Linear Regression. * `WAS_Ridge_Model`: Ridge regression with L2 regularization. * `WAS_Lasso_Model`: Lasso regression with L1 regularization. * `WAS_LassoLars_Model`: Lasso regression using the LARS algorithm. * `WAS_ElasticNet_Model`: Elastic net regression combining L1 and L2 regularization. * `WAS_LogisticRegression_Model`: Logistic regression for classification. * `WAS_SVR`: Support vector regression. * `WAS_PolynomialRegression`: Polynomial regression. * `WAS_PoissonRegression`: Poisson regression. * `WAS_RandomForest_XGBoost_ML_Stacking`: Random forest and XGBoost regression with stacking. * `WAS_MLP`: Multi-Layer Perceptron regression. * `WAS_RandomForest_XGBoost_Stacking_MLP`: Random forest, XGBoost, and MLP regression with stacking. * `WAS_Stacking_Ridge`: Random forest, XGBoost, MLP, and Ridge regression with stacking. Except for `WAS_LogisticRegression_Model`, each model class includes methods for: * `compute_model`: Training the model and making predictions. * `compute_prob`: Computing tercile probabilities for the predictions. * `forecast`: Making forecasts for new data. ============================================== EOF and PCR Models ============================================== The `was_eof.py` and `was_pcr.py` modules provide classes for EOF analysis and Principal Component Regression (PCR), with support for multiple EOF zones: - `WAS_EOF`: Performs EOF analysis with options for cosine latitude weighting, standardization, and L2 normalization. - `WAS_PCR`: Combines EOF analysis with a regression model for prediction, supporting multiple EOF zones. **WAS_EOF** **Initialization** - `n_modes`: Number of EOF modes to retain. - `use_coslat`: Apply cosine latitude weighting (default: True). - `standardize`: Standardize the input data (default: False). - `opti_explained_variance`: Target cumulative explained variance to determine modes. - `L2norm`: Normalize components and scores to have L2 norm (default: True). **Methods** - `fit`: Fits the EOF model to the data, supporting multiple zones by applying EOF analysis to the entire dataset. - `transform`: Projects new data onto the EOF modes. - `inverse_transform`: Reconstructs data from principal components (PCs). - `plot_EOF`: Plots the EOF spatial patterns with explained variance. **WAS_PCR** **Initialization** - `regression_model`: The regression model (e.g., `WAS_Ridge_Model`) to use with PCs. - `n_modes`: Number of EOF modes to retain. - `use_coslat`: Apply cosine latitude weighting (default: True). - `standardize`: Standardize the input data (default: False). - `opti_explained_variance`: Target cumulative explained variance. - `L2norm`: Normalize EOF components and scores (default: True). **Methods** - `compute_model`: Fits the EOF model, transforms data to PCs, and applies the regression model. - `compute_prob`: Computes tercile probabilities using the regression model. - `forecast`: Makes forecasts using EOF-transformed data. **Example Usage: Seasonal Forecasting Based on Observational Data** .. code-block:: python from wass2s import * ## Define the directory to save the data dir_to_save_reanalysis = "/path/to/save_reanalysis" dir_to_save_agroindicators = "/path/to/save_agroindicators" ## Define the climatology year range and the season clim_year_start = 1991 clim_year_end = 2020 seas_reanalysis = ["01", "02", "03"] seas_agroindicators = ["05", "06", "07"] ## Define the variables to download variables = ["AGRO.PRCP"] ## Define the center and the predictor variables center_variable = ["ERA5.SST"]: ## Define the extent for reanalysis extent = [45, -180, -45, 180] # [North, West, South, East] ## Define the extent for Observation extent_obs = [30, -25, 0, 30] # [North, West, South, East] ## Download the predictors and the predictand downloader = WAS_Download() ## Download the predictors downloader.WAS_Download_Reanalysis( dir_to_save=dir_to_save_reanalysis, center_variable=center_variable, year_start=1991, year_end=2025, area=extent, seas=seas_reanalysis, force_download=False ) ## Download the predictand downloader.WAS_Download_AgroIndicators( dir_to_save=dir_to_save_agroindicators, variables=["AGRO.PRCP"], year_start=1991, year_end=2024, area=extent_obs, seas=seas_agroindicators, force_download=False ) **Case 1: Used SST index as a predictor** .. code-block:: python # Prepare predictand and predictors predictand = prepare_predictand(dir_to_save_agroindicators, variables, year_start, year_end, seas_agroindicators, ds=False, daily=False) # Prepare predictors ## Print available SST indices print(list(sst_indices_name.keys())) ## Choose yours sst_index_name = ['NINO34','TNA', 'TSA', 'DMI'] ## Plot the SST index zone plot_map([extent[1],extent[3],extent[2],extent[0]], sst_indices = sst_index_name, title="Index Zone",fig_size=(7,4)) ## Compute the SST indices predictors = compute_sst_indices(dir_to_save_reanalysis, sst_index_name, center_variable[0], year_start, year_end, seas_reanalysis) ## Compute variance inflation factor to see multicolinearity between predictors vif_data = pd.DataFrame() vif_data["feature"] = predictors.to_dataframe().columns vif_data["VIF"] = [VIF(predictors.to_dataframe(), i) for i in range(predictors.to_dataframe().shape[1])] ## Print VIF values print(vif_data) ## Set a threshold for VIF vif_threshold = 5 # Remove features with VIF greater than the threshold low_vif_predictors = vif_data[vif_data["VIF"] < vif_threshold]["feature"].tolist() filtered_predictors = predictors[low_vif_predictors].to_array() filtered_predictors = filtered_predictors.rename({"variable": "features"}).transpose('T', 'features') # Initialize the model class model = WAS_LinearRegression_Model(nb_cores=2, dist_method="lognormal") # Assuming predictand follows a lognormal distribution. otherwise, normal, student-t or gamma are available. used dist_method="normal" or dist_method="t" or dist_method="gamma". # Perform cross-validation was_cv = WAS_Cross_Validator(n_splits=len(predictand.get_index("T")), nb_omit=2) hindcast_det, hindcast_prob = was_cv.cross_validate(model, predictand, filtered_predictorsisel(T=slice(None,-1)), clim_year_start, clim_year_end) # clim_year_start and clim_year_end are the years used to compute the climatology. # Initialize the model class model = WAS_Ridge_Model(n_clusters=6, alpha_range=np.logspace(-4, 0.1, 20), nb_cores = 2) # Compute alpha parameters alpha, clusters = model.compute_hyperparameters(predictand, filtered_predictors) # Perform cross-validation was_cv = WAS_Cross_Validator(n_splits=len(predictand.get_index("T")), nb_omit=2) hindcast_det_Ridge, hindcast_prob_Ridge = was_cv.cross_validate(model, predictand, filtered_predictors.isel(T=slice(None,-1)), clim_year_start, clim_year_end, alpha=alpha) # Make a forecast forecast_det_Ridge, forecast_prob_Ridge = model.forecast(predictand, clim_year_start, clim_year_end, filtered_predictors.isel(T=slice(None,-1)), hindcast_det_Ridge, filtered_predictors.isel(T=[-1]), alpha=alpha, l1_ratio=l1_ratio) **Case 2: Used PCRs as a predictor** .. code-block:: python # Set your own zones ( zones not available in built-in) # define zone as dict : {'zone_name_key': ('Explicit_Zone_name', lon_min, lon_max, lat_min, lat_max)} zones_for_PCR = {'A': ('A', -150, 150, -45, 45)} # Set number of modes n_modes = 6 # ElasticNet hyperparameters range alpha_range = np.logspace(-4, 0.1, 20) l1_ratio_range = [0.5, 0.9999] # Initialize the model class model = WAS_PCR_Model(n_clusters=6, alpha_range=np.logspace(-4, 0.1, 20), nb_cores = 2) plot_map([extent[1],extent[3],extent[2],extent[0]], sst_indices = zones_for_PCR, title="Predictors Area",fig_size=(8,6)) # Retrieve predictor data for the defined zone predictor = retrieve_single_zone_for_PCR(dir_to_save_Reanalysis, zones_for_PCR, variables_reanalysis[0], year_start, year_end, season, clim_year_start, clim_year_end) # Load WAS_EOF Class eof_model = WAS_EOF(n_modes=n_modes, use_coslat=True, standardize=True) # Load predictor, compute EOFs and retrieve component, scores and explained variances s_eofs, s_pcs, s_expvar, _ = eof_model.fit(predictor, dim="T", clim_year_start=clim_year_start, clim_year_end=clim_year_end) # Plot EOFs and explained variances eof_model.plot_EOF(s_eofs, s_expvar) # Perform Cross-validation with elastic-net ## Load class for model regression_model = WAS_ElasticNet_Model(alpha_range = alpha_range, l1_ratio_range = l1_ratio_range, nb_cores = 2, dist_method="lognormal") pcr_model = WAS_PCR(regression_model=regression_model, n_modes=n_modes, standardize=False) ## Compute alpha parameters alpha, l1_ratio, clusters = regression_model.compute_hyperparameters(predictand, s_pcs.isel(T=slice(None,-1)).rename({"mode": "features"}).transpose('T', 'features')) ## Perform cross-validation was_cv = WAS_Cross_Validator(n_splits=len(predictand.get_index("T")), nb_omit=2) hindcast_det, hindcast_prob = was_cv.cross_validate(pcr_model, predictand, s_pcs.isel(T=slice(None,-1)).rename({"mode": "features"}).transpose('T', 'features'), clim_year_start, clim_year_end, alpha=alpha, l1_ratio=l1_ratio) ============================================== CCA Models ============================================== The `was_cca.py` module provides classes for Canonical Correlation Analysis (CCA): - `WAS_CCA`: Performs CCA to identify relationships between two multivariate datasets. **Initialization** - `n_modes`: Number of CCA modes to retain. - `n_pca_modes`: Number of PCA modes to use for dimensionality reduction. - `dist_method`: distribution method for probability computations. **Methods** - `compute_model`: Fits the CCA model and makes predictions. - `compute_prob`: Computes tercile probabilities for the predictions. **Example Usage: Recalibrating Seasonal Forecast Outputs from Global Climate Models (GCMs)** .. code-block:: python from wass2s import * # Filter model names to identify precipitation-related models center_variable = ["ECMWF_51.PRCP"] # Specify the directory to save downloaded model data dir_to_save_model = "/path/to/save" # Define the month for model initialization (March) month_of_initialization = "03" # Define lead times corresponding to seasonal forecast targets (MJJ season in this case) lead_time = ["02", "03", "04"] # Define the hindcast period for model data (years 1993 to 2016) year_start_model = 1993 year_end_model = 2016 # Set the bounding box for the area of interest (latitude and longitude bounds) extent = [30, -25, 0, 30] # [Northern, Western, Southern and Eastern] # Define if you want to download forecast or hindcast year_forecast = None # Define if you want all members of ensemble or doing an ensemble mean ensemble_mean = "mean" # Specify whether to overwrite existing files when downloading data force_download = False # Define the climatology year range clim_year_start = 1993 clim_year_end = 2016 # Download the GCM data downloader = WAS_Download() # Download hindcast data downloader.WAS_Download_Models( dir_to_save=dir_to_save_model, center_variable=center_variable, month_of_initialization=month_of_initialization, lead_time=lead_time, year_start_hindcast=year_start_model, year_end_hindcast=year_end_model, extent=extent, year_forecast=year_forecast, ensemble_mean=ensemble_mean, force_download=force_download ) year_forecast = 2024 # Download forecast data downloader.WAS_Download_Models( dir_to_save=dir_to_save_model, center_variable=center_variable, month_of_initialization=month_of_initialization, lead_time=lead_time, year_start_forecast=year_start_model, year_end_forecast=year_end_model, extent=extent, year_forecast=year_forecast, ensemble_mean=ensemble_mean, force_download=force_download ) # Initialize CCA model was_cca = WAS_CCA(n_modes=3, n_pca_modes=10, dist_method="lognormal") # Define zone as dict : {'zone_name_key': ('Explicit_Zone_name', lon_min, lon_max, lat_min, lat_max)} defined_zone = {'A': ('A', -150, 150, -45, 45)} # Plot the zone plot_map([extent[1],extent[3],extent[2],extent[0]], sst_indices = defined_zone, title="Predictors Area",fig_size=(6,4)) # Retrieve predictor data for the defined zone center_variable_model = "ECMWF_51.PRCP" predictors = retrieve_single_zone_for_PCR(dir_to_save_model, defined_zone, center_variable_model, year_start, year_end, clim_year_start, clim_year_end, model=True, month_of_initialization=3, lead_time=1) predictor = predictors.isel(T=slice(None, -1)) predictor['T'] = predictand.sel(T=slice(str(year_start_model), str(year_end_model)))['T'] # Plot the CCA modes and scores was_cca.plot_cca_results(X=predictor, Y=predictand.sel(T=slice(str(year_start_model), str(year_end_model))), clim_year_start=clim_year_start, clim_year_end=clim_year_end) # Perform cross-validation for each model was_cv = WAS_Cross_Validator(n_splits=len(predictand.sel(T=slice(str(year_start_model), str(year_end_model))).get_index("T")), nb_omit=2) hindcast_det_cca, hindcast_prob_cca = was_cv.cross_validate(was_cca, predictand.sel(T=slice(str(year_start_model), str(year_end_model))), predictor, clim_year_start, clim_year_end) forecast_det_cca, forecast_prob_cca = was_cca.forecast(predictand.sel(T=slice(str(year_start_model),str(year_end_model))), clim_year_start, clim_year_end, predictor, hindcast_det_cca, predictor) ============================================= Analog Forecasting Methods ============================================= The `was_analog.py` module provides the `WAS_Analog` class for analog-based forecasting using various techniques to identify historical analogs to current conditions for prediction, particularly for seasonal rainfall forecasts using sea surface temperature (SST) data. **Initialization Parameters** - ``dir_to_save`` (str): Directory path to save downloaded and processed data files. - ``year_start`` (int): Starting year for historical data. - ``year_forecast`` (int): Target forecast year. - ``reanalysis_name`` (str): Reanalysis dataset name (e.g., "ERA5.SST" or "NOAA.SST"). - ``model_name`` (str): Forecast model name (e.g., "ECMWF_51.SST"). - ``method_analog`` (str, default="som"): Analog method to use ("som", "cor_based", "pca_based"). - ``best_prcp_models`` (list, optional): List of best precipitation models. Default is None. - ``month_of_initialization`` (int, optional): Forecast initialization month. Default is None (uses current month). - ``lead_time`` (list, optional): Lead times in months. Default is None (uses [1, 2, 3, 4, 5]). - ``ensemble_mean`` (str, default="mean"): Ensemble mean method ("mean" or "median"). - ``clim_year_start`` (int, optional): Start year for climatology period. - ``clim_year_end`` (int, optional): End year for climatology period. - ``define_extent`` (tuple, optional): Bounding box as (lon_min, lon_max, lat_min, lat_max) for regional analysis. - ``index_compute`` (list, optional): Climate indices to compute (e.g., ["NINO34", "DMI"]). - ``some_grid_size`` (tuple, default=(None, None)): SOM grid dimensions (rows, cols); None uses automatic sizing. - ``some_learning_rate`` (float, default=0.5): Learning rate for SOM training. - ``some_neighborhood_function`` (str, default="gaussian"): Neighborhood function for SOM ("gaussian", etc.). - ``some_sigma`` (float, default=1.0): Initial neighborhood radius for SOM. - ``dist_method`` (str, default="gamma"): Probability method ("gamma", "t", "normal", "lognormal", "nonparam"). **Key Methods** - ``download_sst_reanalysis()``: Downloads and processes SST reanalysis data from the specified center for the given years and area. - ``download_models()``: Downloads seasonal forecast model data for the specified model, initialization month, and lead times. - ``standardize_timeseries()``: Standardizes time series data over a specified climatology period. - ``calc_index()``: Computes specified climate indices (e.g., NINO34, DMI) from SST data. - ``compute_model()``: Identifies historical analogs using the specified method and computes deterministic forecasts. - ``compute_prob()``: Calculates tercile probabilities (Below Normal, Near Normal, Above Normal) using the specified distribution method. - ``forecast()``: Generates deterministic and probabilistic forecasts for the target year, returning processed SST data, similar years, deterministic forecast, and probabilistic forecast. - ``composite_plot()``: Creates composite plots of forecast results, optionally including the predictor (SST) visualization. **Example Usage** Basic analog forecast setup: .. code-block:: python from wass2s.was_analog import WAS_Analog # Initialize analog model analog_model = WAS_Analog( dir_to_save="./s2s_data/analog", year_start=1990, year_forecast=2025, reanalysis_name="NOAA.SST", model_name="ECMWF_51.SST", method_analog="som", month_of_initialization=3, clim_year_start=1991, clim_year_end=2020, define_extent=(-180, 180, -45, 45), index_compute=["NINO34", "DMI"], dist_method="gamma" ) # Download and process data sst_hist, sst_for = analog_model.download_and_process() # Generate forecast ddd, similar_years, forecast_det, forecast_prob = analog_model.forecast( predictant=rainfall_data, clim_year_start=1991, clim_year_end=2020, hindcast_det=hindcast_data, forecast_year=2025 ) # Create composite plot similar_years = analog_model.composite_plot( predictant=rainfall_data, clim_year_start=1991, clim_year_end=2020, hindcast_det=hindcast_data, plot_predictor=True ) **Cross-Validation Example** .. code-block:: python from wass2s.was_analog import WAS_Cross_Validator # Perform cross-validation was_analog_cv = WAS_Cross_Validator(n_splits=len(rainfall.get_index("T")), nb_omit=2) hindcast_analog_det, hindcast_analog_prob = was_analog_cv.cross_validate( analog_model, rainfall, clim_year_start=1991, clim_year_end=2020 ) # Generate forecast using cross-validated hindcast ddd, similar_years, forecast_det, forecast_prob = analog_model.forecast( predictant=rainfall, clim_year_start=1991, clim_year_end=2020, hindcast_det=hindcast_analog_det, forecast_year=2025 ) .. Note:: Ensure `WAS_Cross_Validator` is correctly imported from the `wass2s.was_analog` module and that the `rainfall` variable is an xarray DataArray with appropriate dimensions (T, Y, X).