Crynodeb
Large Language Models (LLMs) have demonstrated some significant capabilities across various
domains; however, their effectiveness in spreadsheet related tasks remains underexplored. This study
introduces a foundation for a comprehensive benchmark framework to evaluate the performance of
leading LLMs in executing spreadsheet functions, formula generation, and data manipulation tasks.
The benchmark encompasses tasks ranging from basic formula creation to complex, real world
spreadsheet scenarios. Our findings reveal that while LLMs exhibit proficiency in straightforward
tasks, they often falter in complex, multi step operations, frequently producing plausible yet incorrect
outputs. These results underscore the limitations of current LLMs in handling spreadsheet tasks that
require precise logical reasoning and highlight the need for integrating symbolic reasoning
capabilities into LLM architectures. To support this, we introduce FLARE (Formula Logic, Auditing,
Reasoning and Evaluation) a new benchmark for eval
domains; however, their effectiveness in spreadsheet related tasks remains underexplored. This study
introduces a foundation for a comprehensive benchmark framework to evaluate the performance of
leading LLMs in executing spreadsheet functions, formula generation, and data manipulation tasks.
The benchmark encompasses tasks ranging from basic formula creation to complex, real world
spreadsheet scenarios. Our findings reveal that while LLMs exhibit proficiency in straightforward
tasks, they often falter in complex, multi step operations, frequently producing plausible yet incorrect
outputs. These results underscore the limitations of current LLMs in handling spreadsheet tasks that
require precise logical reasoning and highlight the need for integrating symbolic reasoning
capabilities into LLM architectures. To support this, we introduce FLARE (Formula Logic, Auditing,
Reasoning and Evaluation) a new benchmark for eval
| Iaith wreiddiol | Saesneg |
|---|---|
| Teitl | Proceedings of the EuSpRIG 2025 Conference "Spreadsheet Productivity & Risks" |
| Golygyddion | Simon Thorne, Grenville J. Croll |
| Cyhoeddwr | European Spreadsheet Risks Interest Group |
| ISBN (Argraffiad) | 9781905404605 |
| Statws | Cyhoeddwyd - 1 Gorff 2025 |
| Digwyddiad | EuSpRIG 2025 Conference Spreadsheet Productivity & Risks - University of Greenwich, London, Y Deyrnas Unedig Hyd: 3 Gorff 2025 → 4 Gorff 2025 |
Cynhadledd
| Cynhadledd | EuSpRIG 2025 Conference Spreadsheet Productivity & Risks |
|---|---|
| Gwlad/Tiriogaeth | Y Deyrnas Unedig |
| Dinas | London |
| Cyfnod | 3/07/25 → 4/07/25 |
Dyfynnu hyn
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver