Szczegóły publikacji

Opis bibliograficzny

Automating air pollution map analysis with multi-modal AI and visual context engineering / Szymon COGIEL, Mateusz ZARĘBA, Tomasz DANEK, Filip Arnaut // Atmosphere [Dokument elektroniczny]. — Czasopismo elektroniczne ; ISSN  2073-4433 . — 2026 — vol. 17 iss. 1 art. no. 2, s. 1-14. — Wymagania systemowe: Adobe Reader. — Bibliogr. s. 12-14, Abstr. — Publikacja dostępna online od: 2025-12-19

Autorzy (4)

Słowa kluczowe

spatiotemporal air pollution mappingvisual context engineeringgeospatial deep learningmultimodal large language modelsautomated map understandingexplainable environmental AIgenerative artificial intelligenceAI-driven ecological monitoringPM2.5 sensor networksenvironmental data interpretation

Dane bibliometryczne

ID BaDAP165587
Data dodania do BaDAP2026-02-03
Tekst źródłowyURL
DOI10.3390/atmos17010002
Rok publikacji2026
Typ publikacjiartykuł w czasopiśmie
Otwarty dostęptak
Creative Commons
Czasopismo/seriaAtmosphere

Abstract

The increasing volume of data from IoT sensors has made manual inspection time-consuming and prone to bias, particularly for spatiotemporal air pollution maps. While rule-based methods are adequate for simple datasets or individual maps, they are insufficient for interpreting multi-year time series data with 1 h timestamps, which require both domain-specific expertise and significant time investment. This limitation is especially critical in environmental monitoring, where analyzing long-term spatiotemporal PM2.5 maps derived from 52 low-cost sensors remains labor-intensive and susceptible to human error. This study investigates the potential of generative artificial intelligence, specifically multi-modal large language models (MLLMs), for interpreting spatiotemporal PM2.5 maps. Both open-source models (Janus-Pro and LLaVA-1.5) and commercial large language models (GPT-4o and Gemini 2.5 Pro) were evaluated. The initial results showed a limited performance, highlighting the difficulty of extracting meaningful information directly from raw sensor-derived maps. To address this, a visual context engineering framework was introduced, comprising systematic optimization of colormaps, normalization of intensity ranges, and refinement of map layers and legends to improve clarity and interpretability for AI models. Evaluation using the GEval metric demonstrated that visual context engineering increased interpretation accuracy (defined as the detection of PM2.5 spatial extrema) by over 32.3% (relative improvement). These findings provide strong evidence that tailored visual preprocessing enables MLLMs to effectively interpret complex environmental time series data, representing a novel approach that bridges data-driven modeling with ecological monitoring and offers a scalable solution for automated, reliable, and reproducible analysis of high-resolution air quality datasets.