Szczegóły publikacji

Opis bibliograficzny

Enhancing full-reference image quality assessment via dual-attention fusion of global–local features and CLIP-based text prior / Mengyao Huang, Yi Zhang, Damon M. Chandler, Mikołaj LESZCZUK, Mylène C. Q. Farias // Journal on Image and Video Processing [Dokument elektroniczny]. — Czasopismo elektroniczne ; ISSN 3091-454X . — 2026 — vol. 2026 art. no. 6, s. 1–26. — Wymagania systemowe: Adobe Reader. — Bibliogr. s. 24–26, Abstr. — Publikacja dostępna online od: 2026-03-11

Autorzy (5)

Huang Mengyao
Zhang Yi
Chandler Damon M.
AGHLeszczuk Mikołaj
Farias Mylène C. Q.

Słowa kluczowe

perceptual quality assessment constrastive language-image pre-training human visual system full reference feature fusion

Dane bibliometryczne

ID BaDAP	167765
Data dodania do BaDAP	2026-06-02
Tekst źródłowy	URL
DOI	10.1186/s13640-026-00687-6
Rok publikacji	2026
Typ publikacji	artykuł w czasopiśmie
Otwarty dostęp
Creative Commons
Czasopismo/seria	EURASIP Journal on Image and Video Processing

Abstract

Image quality assessment (IQA) is a field that focuses on evaluating the quality of images, playing a crucial role in various image processing and/or computer vision applications. Traditional full-reference (FR) IQA algorithms struggle with an accurate perceptual quality evaluation due in part to their reliance on handcrafted features and simple mathematical functions to calculate the elementwise distance between the reference and distorted images. Although deep-learning-based FR IQA methods have shown advantages in providing a certain degree of tolerance to texture resampling, their performances are still limited by the redundant model parameters and ineffective quality-aware feature extraction/representation. To address this issue, in this paper, we propose a multi-modal dual-attention FR IQA algorithm based on combining a global-and-local image structure analysis with text information interpreted by the widely used large language model. Specifically, the proposed multi-modal dual-attention network consists of four modules. First, a global-and-local feature extraction module was employed to extract the quality-aware features from the reference and distorted images, which were then realigned along the spatial and channel dimensions by a feature fusion module. To take into account both the channel and spatial attentions and thus increase the model capacity in representing long-range dependencies among different image areas, a feature enhancement module was designed to encode the spatial information along two directions, based on which the direction-aware attention maps with position information were generated. Finally, the text prior knowledge interpreted by the contrastive language-image pre-training (CLIP) model was embedded to assist the attention-based prediction module for quality estimation. Experimental results on four benchmark datasets demonstrate the effectiveness of our model as compared with other state-of-the-art FR IQA methods. The code is available at https://vinelab.jp/m2da/.