Szczegóły publikacji
Opis bibliograficzny
Teaching AI to read consumer law: evaluating modern languauge models for semantic search : [prezentacja Youtube] / Piotr POTIOPA // W: Proceedings of the 46th International Business Information Management Association Computer Science Conference (IBIMA) [Dokument elektroniczny] : research in advancements in generative AI, quantum computing and computer security : 26–27 November 2025, Ronda, Spain. — Wersja do Windows. — Dane tekstowe. — [Spain] : International Business Information Management Association (IBIMA), [2025]. — ( Proceedings of the... International Business Information Management Association Conference ; ISSN 2767-9640 ). — e-ISBN: 979-8-9867719-9-1. — czas trwania: 2 min. — Abstr. na stronie https://ibima.org/accepted-paper/teaching-ai-to-read-consumer-law-evaluating-modern-language-models-for-semantic-search/
Autor
Słowa kluczowe
Dane bibliometryczne
| ID BaDAP | 165307 |
|---|---|
| Data dodania do BaDAP | 2026-01-13 |
| Tekst źródłowy | URL |
| Rok publikacji | 2025 |
| Typ publikacji | materiały konferencyjne (aut.) |
| Otwarty dostęp | |
| Konferencja | International Business Information Management 2025 |
| Czasopismo/seria | Proceedings of the... International Business Information Management Association Conference |
Abstract
This paper compares four information retrieval methods: three semantic models (SBERT, E5-small, DistilRoBERTa-PL) and the classical lexical approach (BM25). The study is based on a corpus of Polish consumer law documents. Semantic models were used for embedding-based indexing and retrieval via FAISS, while the lexical approach used Elasticsearch with BM25. A question-answering (QA) system was built in two variants: (1) Semantic QA — retrieving legal text fragments based on meaning using sentence embeddings; (2) Lexical QA — traditional keyword-based retrieval. Both systems were evaluated on a custom set of 200 consumer law questions using standard IR metrics: precision@k, recall@k, Mean Reciprocal Rank (MRR), and Normalized Discounted Cumulative Gain (NDCG). Results show that semantic methods outperform BM25 in relevance (e.g., SBERT MRR@5 = 0.9194 vs BM25 = 0.5400), especially for queries requiring semantic understanding, with only a slight drop in absolute precision.