Abstract
Large language models (LLMs) are increasingly explored as general-purpose reasoners, particularly in agentic contexts. However, their outputs remain prone to mathematical and logical errors. This is especially challenging in open-ended tasks, where unstructured outputs lack explicit ground truth and may contain subtle inconsistencies. To address this issue, we propose Logic-Enhanced Language Model Agents (LELMA), a framework that integrates LLMs with formal logic to enable validation and refinement of natural language reasoning. LELMA comprises three components: an LLM-Reasoner, an LLM-Translator, and a Solver, and employs autoformalization to translate reasoning into logic representations, which are then used to assess logical validity. Using game-theoretic scenarios such as the Prisoner’s Dilemma as testbeds, we highlight the limitations of both less capable (Gemini 1.0 Pro) and advanced (GPT4o) models in generating logically sound reasoning. LELMA achieves high accuracy in error detection and improves reasoning correctness via self-refinement, particularly in GPT-4o. The study also highlights challenges in autoformalization accuracy and in evaluation of inherently ambiguous open-ended reasoning tasks.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2025 IEEE 37th International Conference on Tools with Artificial Intelligence |
| Publisher | IEEE Computer Society |
| Pages | 838-842 |
| Number of pages | 5 |
| ISBN (Electronic) | 979-8-3315-4919-0 |
| ISBN (Print) | 979-8-3315-4920-6 |
| DOIs | |
| Publication status | Published - 15 Dec 2025 |
Keywords
- cs.AI
- cs.CL
- cs.GT
- cs.LO