Our solution for the Enefit Kaggle challenge, focused on energy production and consumption forecasting for prosumers (consumers who also generate solar energy). The goal is to reduce energy imbalance costs by predicting unpredictable prosumer behavior.
We implement and compare three architectures:
- LSTM Encoder-Decoder with temporal attention (~500K params)
- Specialized Transformer using AdaLN-Zero with time-wise, contract-wise, and county-wise attention heads (3.2M params)
- Hybrid Transformer-LSTM combining a transformer encoder with relative attention and a non-autoregressive LSTM decoder (188K params)
| Model | Avg MAE | Production MAE | Consumption MAE |
|---|---|---|---|
| LSTM Encoder-Decoder | 95.01 | 128.57 | 61.45 |
| Specialized Transformer | 64.86 | 72.24 | 57.45 |
| Hybrid Transformer-LSTM | 81.35 | 98.67 | 64.03 |
| Ensemble | 63.5 | — | — |
The feature engineering pipeline includes convex interpolation of weather grid data by Estonian county, z-score normalization, and lag features for temporal context. The dataset spans geographical coordinates, meteorological variables (solar radiation, temperature, snowfall), energy pricing, and photovoltaic capacity records.
The specialized transformer consistently delivered the best standalone performance, demonstrating how a well-structured transformer can outperform classical recurrent models even on structured tabular time series data.
