This project provides a framework to develop, train, tune, and evaluate various deep learning models for probabilistic, multivariate wind forecasting. It is designed to work with diverse wind farm operational datasets and facilitate integration with control systems like the wind-hybrid-open-controller.
The framework supports multiple forecasting architectures and is built for execution on High-Performance Computing (HPC) clusters, leveraging Slurm for job management and Optuna for distributed hyperparameter optimization. While the examples use PostgreSQL, Optuna supports various backends (SQLite, MySQL, Journal Storage) via configuration.
To provide a flexible and scalable platform for experimenting with and deploying state-of-the-art wind forecasting models, particularly for ultra-short-term predictions relevant to wind farm control.
π Table of Contents
This framework utilizes a modern stack for deep learning and time series analysis with a modular, domain-driven architecture:
- π Programming Language: Python (v3.12+)
- π§ Deep Learning:
- PyTorch: Primary tensor computation library.
- PyTorch Lightning: Framework for structuring training, validation, testing, checkpointing, logging, multi-GPU/distributed training (
DDP), and callbacks.
- π°οΈ Time Series Modeling:
- GluonTS (Fork): Provides foundational components (
PyTorchLightningEstimator, data structures, transformations). Note: This project uses a specific fork.
- GluonTS (Fork): Provides foundational components (
- π Hyperparameter Optimization:
- Optuna: Used for distributed hyperparameter tuning via configurable storage backends (PostgreSQL, SQLite, etc.), including pruning mechanisms.
- βοΈ Distributed Computing & Scheduling:
- Slurm: HPC workload manager for resource allocation and job execution via batch scripts (
.sh).
- Slurm: HPC workload manager for resource allocation and job execution via batch scripts (
- π Experiment Tracking & Logging:
- WandB (Weights & Biases): Used for logging metrics, parameters, and configurations.
- Python
logging: Standard library for application messages.
- π¦ Environment Management:
- Conda / Mamba: Recommended for managing the Python environment.
- πΎ Data Handling:
- Polars / Pandas: Efficient data manipulation.
- Parquet: Recommended file format for storing processed time series data.
- Modular Design: Clean separation between core functionality, tuning-specific utilities, and cross-mode components.
- Domain-Driven Organization: Hyperparameter tuning is encapsulated in the
wind_forecasting.tuningsubpackage with clear APIs. - Flexible Configuration: YAML-based configuration system supporting multiple modes (tune/train/test) with shared utilities.
- Scalable Infrastructure: Supports both local development and distributed HPC execution with minimal configuration changes.
wind-forecasting/
βββ π config/ # YAML configurations (training, preprocessing)
β βββ training/
βββ π wind_forecasting/ # Core application source code
β βββ π preprocessing/ # Data loading, processing, splitting (DataModule)
β βββ π run_scripts/ # Main execution scripts (run_model.py, testing.py, etc.)
β β βββ tune_scripts/ # Example Slurm scripts for tuning
β βββ π tuning/ # Hyperparameter optimization subpackage
β β βββ core.py # Main tune_model orchestration
β β βββ objective.py # MLTuningObjective class
β β βββ scripts/ # Standalone tuning scripts
β β βββ utils/ # Tuning-specific utilities
β βββ π utils/ # General & cross-mode utilities
β βββ optuna_*.py # Optuna utilities (storage, config, params) used across modes
β βββ callbacks.py # General PyTorch Lightning callbacks
βββ π logs/ # Default directory for runtime outputs (Slurm, WandB, Checkpoints)
βββ π optuna/ # Default directory for Optuna storage artifacts (DB data, sockets)
βββ π examples/ # Example scripts (data download) & input configurations
β βββ inputs/ # Example configuration files & data directory
βββ π install_rc/ # Environment setup scripts & YAML files
βββ π .gitignore
βββ π .gitattributes
βββ π README.md # This file
This framework is designed to be model-agnostic. Forecasting models are implemented externally in the pytorch-transformer-ts repository and integrated here. Currently supported models include:
- Informer
- Autoformer
- Spacetimeformer
- TACTiS-2
Refer to the pytorch-transformer-ts repository for detailed model implementations and architectures. New models following the GluonTS/PyTorch Lightning Estimator pattern can be added and configured via YAML.
The install_rc/ directory provides scripts to help create the necessary Python environment using Conda or Mamba.
- Navigate to the directory:
cd install_rc - Run the installation script:
This script uses the provided
./install.sh
.yamlfiles (e.g.,wind_forecasting_cuda.yaml) to create a Conda environment with the required dependencies.
Note: On HPC environments, necessary system modules (CUDA, compilers, etc.) should be loaded before activating the Conda environment, typically within the Slurm job script.
A detailed list of dependencies can be found in the environment YAML files within install_rc/. Key requirements include:
- Python 3.12+
- PyTorch 2.x
- PyTorch Lightning 2.x
- Optuna
- GluonTS (from the specified fork)
- WandB
- Polars
- NumPy, Pandas
- PyYAML
- ... (TODO Add other dependencies)
To test the framework, you can download and prepare the public SMARTEOLE dataset from NREL's FLASC repository.
- Run the download script:
This downloads the data into
python examples/download_flasc_data.py
examples/inputs/SMARTEOLE-WFC-open-dataset/. - Use this data path in your preprocessing configuration.
The typical workflow involves these stages:
- Write a preprocessing configuration file similar to
wind-forecasting/examples/inputs/preprocessing_inputs_flasc.yaml - Run preprocessing on a local machine with
python preprocessing_main.py --config /Users/ahenry/Documents/toolboxes/wind-forecasting/examples/inputs/preprocessing_inputs_flasc.yaml --reload_data --preprocess_data --regenerate_filters --multiprocessor cf --verboseor on a HPC by runningwind-forecasting/wind_forecasting/preprocessing/load_data.sh, followed bywind-forecasting/wind_forecasting/preprocessing/preprocess_data.sh. - Write a training configuration file similar to
wind-forecasting/examples/inputs/training_inputs_kestrel_flasc.yaml. - Run
python wind-forecasting/wind_forecasting/run_scripts/load_data.py --config wind-forecasting/examples/inputs/training_inputs_aoifemac_flasc.yaml --reload, or on a HPC by runningwind-forecasting/wind_forecasting/run_scripts/load_data_kestrel.sh, to resample the data as needed, caterogize the variables, and generate train/test/val splits.
The framework includes a comprehensive hyperparameter tuning system using Optuna for distributed optimization. The tuning functionality is organized in the wind_forecasting/tuning/ subpackage for maintainability and modularity.
- Tune a ML model on a local machine with
python wind-forecasting/wind_forecasting/run_scripts/run_model.py --config wind-forecasting/examples/inputs/training_inputs_aoifemac_flasc.yaml --mode tune --model informer, or on a HPC by runningwind-forecasting/wind_forecasting/run_scripts/tune_model.sh.
- Tune a statistical model on a local machine with
python wind-hybrid-open-controller/whoc/wind_forecast/tuning.py --model_config wind_forecasting/examples/inputs/training_inputs_aoifemac_flasc.yaml --data_config wind_forecasting/examples/inputs/preprocessing_inputs_flasc.yaml --model svr --study_name svr_tuning --restart_tuning, or on a HPC by runningwind-hybrid-open-controller/whoc/wind_forecast/run_tuning.sh [model] [number of models to tune].
- Train a ML model on a local machine with
python wind-forecasting/wind_forecasting/run_scripts/run_model.py --config wind-forecasting/examples/inputs/training_inputs_aoifemac_flasc.yaml --mode train --model informer --use_tuned_parameters, or on a HPC by runningwind-forecasting/wind_forecasting/run_scripts/train_model_kestrel.sh.
- Test a ML model on a local machine with
python wind-forecasting/wind_forecasting/run_scripts/run_model.py --config wind-forecasting/examples/inputs/training_inputs_aoifemac_flasc.yaml --mode test --model informer --checkpoint latest, or on a HPC by runningwind-forecasting/wind_forecasting/run_scripts/test_model.sh.
- Make predictions at a given controller sampling time intervals, for a given SCADA dataset, and a given prediction time interval, compute the accuracy score and plot the results with
python wind-hybrid-open-controller/whoc/wind_forecast/WindForecast.py --model_config wind_forecasting/examples/inputs/training_inputs_aoifemac_flasc.yaml --data_config wind_forecasting/examples/inputs/preprocessing_inputs_flasc.yaml --model informer.
- Write a WHOC configuration file similar to
wind-hybrid-open-controller/examples/hercules_input_001.yaml. Run a case study of a yaw controller with a trained model withpython wind-hybrid-open-controller/whoc/case_studies/run_case_studies.py 15 -rs -rrs --verbose -ps -rps -ras -st auto -ns 3 -m cf -sd wind-hybrid-open-controller/examples/floris_case_studies -mcnf wind_forecasting/examples/inputs/training_inputs_aoifemac_flasc.yaml -dcnf wind_forecasting/examples/inputs/preprocessing_inputs_flasc.yaml -wcnf wind-hybrid-open-controller/examples/hercules_input_001.yaml -wf scada, where you can fine tune parameters for a suite of cases by editing the dictionarycase_studies["baseline_controllers_preview_flasc"]inwind-hybrid-open-controller/whoc/case_studies/initialize_case_studies.pyand you can edit the common default parameters in the WHOC configuration file.
- TODO add HPC version
Primary configuration is via YAML files in config/training/.
- Example:
config/training/training_inputs_juan_flasc.yaml - Sections:
experiment,logging,optuna,dataset,model(with nested<model_name>keys),callbacks,trainer. - Supports basic variable substitution (e.g.,
${logging.optuna_dir}).
- Clone the repository and set up the Jupyter notebook collaboration as described in the setup section.
- Download the required data using the script in
examplesor use your own data. - Set up the appropriate environment (CUDA or ROCm) using the scripts in the
install_rcfolder. - Preprocess the data using the script in the
wind_forecasting/preprocessingfolder. - Train and evaluate models using the scripts in the
wind_forecasting/modelsdirectory. - For running jobs on HPC environments, use the SLURM scripts provided in the
rc_jobsfolder.
- Data Preprocessing Configuration YAML
- ML-Model Configuration YAML
- WHOC Configuration YAML
- Command Line Arguments for
wind-forecasting/wind_forecasting/preprocessing/preprocessing_main.py,wind-forecasting/wind_forecasting/run_scripts/load_data.py,wind-forecasting/wind_forecasting/run_scripts/run_model.py,wind-hybrid-open-controller/whoc/wind_forecast/tuning.py, andwind-hybrid-open-controller/whoc/case_studies/run_case_studies.py. - WHOC Case Study Suite in the
case_studiesdictionary defined at the top ofwind-hybrid-open-controller/whoc/case_studies/initialize_case_studies.py.
- Configure: Create/edit preprocessing YAML (e.g.,
examples/inputs/preprocessing_inputs_flasc.yaml). - Run: Execute
wind_forecasting/preprocessing/preprocessing_main.pywith appropriate flags or use HPC scripts.
Local Machine:
python preprocessing_main.py --config examples/inputs/preprocessing_inputs_flasc.yaml --reload_data --preprocess_data --regenerate_filters --multiprocessor cf --verboseHPC System:
# First load the data
./wind_forecasting/preprocessing/load_data.sh
# Then preprocess the data
./wind_forecasting/preprocessing/preprocess_data.sh- Data Loading: After preprocessing, load and prepare the data for model training:
python wind_forecasting/run_scripts/load_data.py --config examples/inputs/training_inputs_flasc.yaml --reloadHPC System:
./wind_forecasting/run_scripts/load_data_kestrel.shThe framework's modular tuning system supports distributed hyperparameter optimization with PostgreSQL backend and comprehensive monitoring.
- Configure: Edit training YAML (
config/training/) with Optuna settings. - Submit Job: Modify and submit Slurm script (e.g.,
tune_model_storm.sh), ensuring the correct--model <model_name>is targeted.sbatch wind_forecasting/run_scripts/tune_scripts/tune_model_storm.sh
- Monitor: Use
squeue, Slurm logs, WandB, and Optuna dashboard.
Local Machine:
python wind_forecasting/run_scripts/run_model.py --config examples/inputs/training_inputs_flasc.yaml --mode tune --model informerHPC System:
# Use the provided tuning script
./wind_forecasting/run_scripts/tune_model.sh- Configure: Edit training YAML. Set
use_tuned_parameters: true(optional), highlimit_train_batches,max_epochs. - Run:
(Or use an HPC script)
python wind_forecasting/run_scripts/run_model.py \ --config config/training/training_inputs_*.yaml \ --mode train \ --model <model_name> \ [--use_tuned_parameters] \ [--checkpoint <path | 'best' | 'latest'>] # To resume
Local Machine:
python wind_forecasting/run_scripts/run_model.py --config examples/inputs/training_inputs_flasc.yaml --mode train --model informer --use_tuned_parametersHPC System:
# Use the provided training script
./wind_forecasting/run_scripts/train_model_kestrel.sh- Configure: Ensure training YAML points to the correct dataset config.
- Run:
(Or use an HPC script)
python wind_forecasting/run_scripts/run_model.py \ --config config/training/training_inputs_*.yaml \ --mode test \ --model <model_name> \ --checkpoint <path | 'best' | 'latest'>
Local Machine:
python wind_forecasting/run_scripts/run_model.py --config examples/inputs/training_inputs_flasc.yaml --mode test --model informer --checkpoint latestHPC System:
# Use the provided testing script
./wind_forecasting/run_scripts/test_model.sh- Tune a statistical model on a local machine with
python wind-hybrid-open-controller/whoc/wind_forecast/tuning.py --model_config wind_forecasting/examples/inputs/training_inputs_aoifemac_flasc.yaml --data_config wind_forecasting/examples/inputs/preprocessing_inputs_flasc.yaml --model svr --study_name svr_tuning --restart_tuning, or on a HPC by runningwind-hybrid-open-controller/whoc/wind_forecast/run_tuning_kestrel.sh [model] [model_config].
Contributions are welcome! Please follow standard Git practices (fork, branch, pull request).
- Authors and developers of the integrated forecasting models and underlying libraries (PyTorch, Lightning, GluonTS, Optuna, WandB, etc.).
- Compute resources provided by the University of Oldenburg HPC group, University of Colorado Boulder, and NREL.
- TACTiS: Drouin, A., Marcotte, Γ., & Chapados, N. (2022). TACTiS: Transformer-Attentional Copulas for Time Series. ICML. (Link)
- TACTiS-2: Ashok, A., Marcotte, Γ., Zantedeschi, V., Chapados, N., & Drouin, A. (2024). TACTIS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series. ICLR. (arXiv)
- Informer: Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., & Zhang, W. (2021). Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. AAAI. (arXiv)
- Autoformer: Wu, H., Xu, J., Wang, J., & Long, M. (2021). Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. NeurIPS. (arXiv)
- Spacetimeformer: Shinde, A., et al. (2021). Spacetimeformer: Spatio-Temporal Transformer for Time Series Forecasting. (arXiv)
- GluonTS: Alexandrov, A., et al. (2020). GluonTS: Probabilistic Time Series Modeling in Python. JMLR. (Link)
- PyTorch Lightning: (Link)
- Optuna: Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A Next-generation Hyperparameter Optimization Framework. KDD. (Link)
- WandB: (Link)
- Related Repositories:
pytorch-transformer-ts(Model Implementations)gluonts(Fork)wind-hybrid-open-controller(Downstream Application)