Neidio i’r brif dudalen lywio Neidio i chwilio Neidio i’r prif gynnwys

Large Language Models for Spreadsheets: Benchmarking Progress and Evaluating Performance with FLARE

Allbwn ymchwil: Pennod mewn Llyfr/Adroddiad/Trafodion CynhadleddCyfraniad mewn cynhadleddadolygiad gan gymheiriaid

Crynodeb

Large Language Models (LLMs) have demonstrated some significant capabilities across various
domains; however, their effectiveness in spreadsheet related tasks remains underexplored. This study
introduces a foundation for a comprehensive benchmark framework to evaluate the performance of
leading LLMs in executing spreadsheet functions, formula generation, and data manipulation tasks.
The benchmark encompasses tasks ranging from basic formula creation to complex, real world
spreadsheet scenarios. Our findings reveal that while LLMs exhibit proficiency in straightforward
tasks, they often falter in complex, multi step operations, frequently producing plausible yet incorrect
outputs. These results underscore the limitations of current LLMs in handling spreadsheet tasks that
require precise logical reasoning and highlight the need for integrating symbolic reasoning
capabilities into LLM architectures. To support this, we introduce FLARE (Formula Logic, Auditing,
Reasoning and Evaluation) a new benchmark for eval
Iaith wreiddiolSaesneg
TeitlProceedings of the EuSpRIG 2025 Conference "Spreadsheet Productivity & Risks"
GolygyddionSimon Thorne, Grenville J. Croll
CyhoeddwrEuropean Spreadsheet Risks Interest Group
ISBN (Argraffiad)9781905404605
StatwsCyhoeddwyd - 1 Gorff 2025
DigwyddiadEuSpRIG 2025 Conference Spreadsheet Productivity & Risks - University of Greenwich, London, Y Deyrnas Unedig
Hyd: 3 Gorff 20254 Gorff 2025

Cynhadledd

CynhadleddEuSpRIG 2025 Conference Spreadsheet Productivity & Risks
Gwlad/TiriogaethY Deyrnas Unedig
DinasLondon
Cyfnod3/07/254/07/25

Dyfynnu hyn