Szczegóły publikacji
Opis bibliograficzny
Spatio-temporal PM2.5 forecasting using machine learning and low-cost sensors: an urban perspective / Mateusz ZARĘBA, Szymon Cogiel, Tomasz DANEK // Engineering Proceedings [Dokument elektroniczny]. — Czasopismo elektroniczne ; ISSN 2673-4591 . — 2025 — vol. 101 iss. 1 art. no. 6, s. 1–11. — Wymagania systemowe: Adobe Reader. — Bibliogr. s. 10–11, Abstr. — Publikacja dostępna online od: 2025-07-25. — 11th international conference on time series and forecasting : Canaria, Spain, 16–18 July 2025
Autorzy (3)
Słowa kluczowe
Dane bibliometryczne
| ID BaDAP | 161567 |
|---|---|
| Data dodania do BaDAP | 2025-08-25 |
| Tekst źródłowy | URL |
| DOI | 10.3390/engproc2025101006 |
| Rok publikacji | 2025 |
| Typ publikacji | referat w czasopiśmie |
| Otwarty dostęp | |
| Creative Commons | |
| Czasopismo/seria | Engineering Proceedings |
Abstract
This study analyzes air pollution time-series big data to assess stationarity, seasonal patterns, and the performance of machine learning models in forecasting PM2.5 concentrations. Fifty-two low-cost sensors (LCS) were deployed across Krakow city and its surroundings (Poland), collecting hourly air quality data and generating nearly 20,000 observations per month. The network captured both spatial and temporal variability. The Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test confirmed trend-based non-stationarity, which was addressed through differencing, revealing distinct daily and 12 h cycles linked to traffic and temperature variations. Additive seasonal decomposition exhibited time-inconsistent residuals, leading to the adoption of multiplicative decomposition, which better captured pollution outliers associated with agricultural burning. Machine learning models—Ridge Regression, XGBoost, and LSTM (Long Short-Term Memory) neural networks—were evaluated under high spatial and temporal variability (winter) and low variability (summer) conditions. Ridge Regression showed the best performance, achieving the highest 𝑅2 (0.97 in winter, 0.93 in summer) and the lowest mean squared errors. XGBoost showed strong predictive capabilities but tended to overestimate moderate pollution events, while LSTM systematically underestimated PM2.5 levels in December. The residual analysis confirmed that Ridge Regression provided the most stable predictions, capturing extreme pollution episodes effectively, whereas XGBoost exhibited larger outliers. The study proved the potential of low-cost sensor networks and machine learning in urban air quality forecasting focused on rare smog episodes (RSEs).