Abstract
Large Language Models (LLMs) have demonstrated some significant capabilities across various
domains; however, their effectiveness in spreadsheet related tasks remains underexplored. This study
introduces a foundation for a comprehensive benchmark framework to evaluate the performance of
leading LLMs in executing spreadsheet functions, formula generation, and data manipulation tasks.
The benchmark encompasses tasks ranging from basic formula creation to complex, real world
spreadsheet scenarios. Our findings reveal that while LLMs exhibit proficiency in straightforward
tasks, they often falter in complex, multi step operations, frequently producing plausible yet incorrect
outputs. These results underscore the limitations of current LLMs in handling spreadsheet tasks that
require precise logical reasoning and highlight the need for integrating symbolic reasoning
capabilities into LLM architectures. To support this, we introduce FLARE (Formula Logic, Auditing,
Reasoning and Evaluation) a new benchmark for eval
domains; however, their effectiveness in spreadsheet related tasks remains underexplored. This study
introduces a foundation for a comprehensive benchmark framework to evaluate the performance of
leading LLMs in executing spreadsheet functions, formula generation, and data manipulation tasks.
The benchmark encompasses tasks ranging from basic formula creation to complex, real world
spreadsheet scenarios. Our findings reveal that while LLMs exhibit proficiency in straightforward
tasks, they often falter in complex, multi step operations, frequently producing plausible yet incorrect
outputs. These results underscore the limitations of current LLMs in handling spreadsheet tasks that
require precise logical reasoning and highlight the need for integrating symbolic reasoning
capabilities into LLM architectures. To support this, we introduce FLARE (Formula Logic, Auditing,
Reasoning and Evaluation) a new benchmark for eval
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the EuSpRIG 2025 Conference "Spreadsheet Productivity & Risks" |
| Editors | Simon Thorne, Grenville J. Croll |
| Publisher | European Spreadsheet Risks Interest Group |
| ISBN (Print) | 9781905404605 |
| Publication status | Published - 1 Jul 2025 |
| Event | EuSpRIG 2025 Conference Spreadsheet Productivity & Risks - University of Greenwich, London, United Kingdom Duration: 3 Jul 2025 → 4 Jul 2025 |
Conference
| Conference | EuSpRIG 2025 Conference Spreadsheet Productivity & Risks |
|---|---|
| Country/Territory | United Kingdom |
| City | London |
| Period | 3/07/25 → 4/07/25 |
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver