TY - JOUR
T1 - Beyond Templates and BERT: Headword-centric parsing for semantic question answering in non-english financial domains
AU - Al Qundus, Jamal
AU - Al-Shargabi, Bassam
A2 - Graff-Guerrero, Mario
N1 - Copyright: © 2026 Al Qundus, Al-Shargabi. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2026/5/4
Y1 - 2026/5/4
N2 - Recent advances in semantic question-answering (QA) systems struggle with linguistic variability, particularly in non-English domains like German finance. This work presents INAGQA, a novel QA system that addresses this gap through headword-centric parsing, combining syntactic chunking with knowledge graph embeddings to resolve the question ambiguity. The main innovations are as follows: First, a hybrid disambiguation algorithm that achieves 0.91 F1 in German financial queries, validated on 2,100 expert-annotated questions. Second, domain-optimized shallow parsing with customizable grammar rules that reduces relation-linking errors by 35% for compound nouns (e.g., Eigenkapitalrendite). And finally, seamless knowledge integration to prioritize user-curated data and demonstrates 2.1s average response time in a case study with financial analysts. Our experiments show that INAGQA outperforms BERT-KGQA (F1: 0.83) and template-based systems (F1: 0.79) while handling temporal / quantitative variants (e.g., When vs. Where was X founded?) with 98% accuracy. The editable system’s outputs of the system align with the Corporate Smart Insights frameworks, offering practical value for SMEs. To this end, the work contributes to Information SYstem (IS) research by proposing headword extraction as a replicable IS artifact for non-English QA and demonstrating language-sensitive design principles applicable to healthcare/legal domains.
AB - Recent advances in semantic question-answering (QA) systems struggle with linguistic variability, particularly in non-English domains like German finance. This work presents INAGQA, a novel QA system that addresses this gap through headword-centric parsing, combining syntactic chunking with knowledge graph embeddings to resolve the question ambiguity. The main innovations are as follows: First, a hybrid disambiguation algorithm that achieves 0.91 F1 in German financial queries, validated on 2,100 expert-annotated questions. Second, domain-optimized shallow parsing with customizable grammar rules that reduces relation-linking errors by 35% for compound nouns (e.g., Eigenkapitalrendite). And finally, seamless knowledge integration to prioritize user-curated data and demonstrates 2.1s average response time in a case study with financial analysts. Our experiments show that INAGQA outperforms BERT-KGQA (F1: 0.83) and template-based systems (F1: 0.79) while handling temporal / quantitative variants (e.g., When vs. Where was X founded?) with 98% accuracy. The editable system’s outputs of the system align with the Corporate Smart Insights frameworks, offering practical value for SMEs. To this end, the work contributes to Information SYstem (IS) research by proposing headword extraction as a replicable IS artifact for non-English QA and demonstrating language-sensitive design principles applicable to healthcare/legal domains.
KW - Algorithms
KW - Humans
KW - Language
KW - Semantics
U2 - 10.1371/journal.pone.0347261
DO - 10.1371/journal.pone.0347261
M3 - Article
C2 - 42081576
SN - 1932-6203
VL - 21
JO - PLoS ONE
JF - PLoS ONE
IS - 5
ER -