Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      CodeSOD: A Unique Way to Primary Key

      July 22, 2025

      BrowserStack launches Figma plugin for detecting accessibility issues in design phase

      July 22, 2025

      Parasoft brings agentic AI to service virtualization in latest release

      July 22, 2025

      Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

      July 21, 2025

      The best CRM software with email marketing in 2025: Expert tested and reviewed

      July 22, 2025

      This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

      July 22, 2025

      I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

      July 22, 2025

      8 ways I quickly leveled up my Linux skills – and you can too

      July 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025
      Recent

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025

      Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

      July 22, 2025

      Execute Ping Commands and Get Back Structured Data in PHP

      July 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025
      Recent

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025

      “I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

      July 22, 2025

      Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

      July 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»ReTool: A Tool-Augmented Reinforcement Learning Framework for Optimizing LLM Reasoning with Computational Tools

    ReTool: A Tool-Augmented Reinforcement Learning Framework for Optimizing LLM Reasoning with Computational Tools

    April 21, 2025
    ReTool: A Tool-Augmented Reinforcement Learning Framework for Optimizing LLM Reasoning with Computational Tools

    Reinforcement learning (RL) is a powerful technique for enhancing the reasoning capabilities of LLMs, enabling them to develop and refine long Chain-of-Thought (CoT). Models like OpenAI o1 and DeepSeek R1 have shown great performance in text-based reasoning tasks, however, they face limitations on tasks that require precise numerical calculations or symbolic manipulations, such as geometric reasoning, complex computations, or equation solving. Recent research has explored prompting and supervised fine-tuning methods to equip LLMs with tool-use capabilities, but they are constrained by their reliance on imitating curated data distributions. This often results in poor generalization beyond seen patterns and an inability to determine when and how to invoke external tools.

    Recent advancements in LLMs show progress toward human-like metacognition through CoT prompting. Research has evolved from train-time scaling to test-time scaling, allocating additional computational resources during inference to generate intermediate reasoning steps. Techniques like stepwise preference optimization, Monte Carlo Tree Search, and RL have improved multi-step mathematical reasoning, as evidenced by models like OpenAI-o1 and DeepSeek-R1. In addition to CoT, Program-of-Thought reasoning integrates external computational tools such as Python interpreters to simplify complex reasoning steps. Further, Tool-integrated reasoning was initially introduced to help LLMs solve computationally intensive problems through programming strategies.

    Researchers from ByteDance Seed have proposed ReTool, a CI-powered RL framework designed to address math problem-solving tasks. It enhances long-form reasoning with tool-integrated learning through two key features. First, it enables dynamic interleaving of real-time code execution within natural language reasoning processes. Second, it implements an automated RL technique that allows policy rollouts with multi-turn real-time code execution, teaching the model when and how to invoke tools based on outcome feedback. ReTool employs a systematic training framework that begins with synthetic cold-start data generation to produce code-augmented long-form reasoning traces for fine-tuning base models.

    The ReTool consists of two primary stages, cold-start supervised fine-tuning followed by RL with interleaved code execution rollout. The pipeline designed for collecting and curating high-quality data begins with collecting high-quality mathematical reasoning data from diverse sources, including open-source datasets like OpenThoughts. A dual-verification approach combining human expert curation and Deepseek-R1 evaluation filters invalid data. From this foundation, code-integrated reasoning data is automatically constructed. The VeRL framework is employed with PPO as the RL method for training. The maximum sequence length is set to 16384 tokens, with a 512 mini-batch size and a KL coefficient of 0.0, using Qwen2.5-32B-Instruct as the main backbone.

    ReTool enables the LLM to utilize the code interpreter flexibly during the RL stage, leading to substantial performance improvements. ReTool (Qwen2.5-32B-Instruct) achieves accuracies of 67.0% on AIME2024 and 49.3% on AIME2025 with only 400 training steps. This outperforms the text-based RL baseline (Qwen2.5-32B-Instruct), which attains 40.0% and 36.7% on the respective benchmarks despite using over 1000 training steps. Moreover, on AIME2024, ReTool (Qwen2.5-32B-Instruct) surpasses the competitive baseline s1-32B by 10.3%. Similarly, on AIME2025, it achieves an 11.4% gain over OpenAI’s o1-preview. When combined with a more advanced backbone, ReTool (DeepSeek-R1-Distill-Qwen-32B) further improves performance with scores of 72.5% on AIME2024 and 54.3% on AIME2025.

    In conclusion, researchers introduced ReTool, a novel RL framework that empowers LLMs to self-enhance their mathematical reasoning capabilities through effective Code Interpreter utilization. Experiments on AIME2024 and AIME2025 show that ReTool achieves superior accuracy compared to conventional text-based RL approaches and converges with significantly fewer training steps. Through careful data curation and a specialized tool-using pipeline, ReTool enables models to develop complex computational intervention strategies, paving the way for more efficient and powerful tool-augmented reasoning in LLMs. The results demonstrate that tool-integrated RL represents a promising direction for advancing mathematical reasoning capabilities in LLMs for tasks requiring precise computation and symbolic manipulation.


    Check out the Paper. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post ReTool: A Tool-Augmented Reinforcement Learning Framework for Optimizing LLM Reasoning with Computational Tools appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleOpenAI Releases a Practical Guide to Identifying and Scaling AI Use Cases in Enterprise Workflows
    Next Article Skywings Marketing – Best SEO Company in Laxmi Nagar, Delhi for Digital Success

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 22, 2025
    Machine Learning

    Boolformer: Symbolic Regression of Logic Functions with Transformers

    July 22, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-5787 – TOTOLINK X15 HTTP POST Request Handler Buffer Overflow Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Xbox Game Pass deals ranged from “$50,000 to $50,000,000” — offering a glimpse at how much Microsoft drops on content

    News & Updates

    How to Create Reusable Canva Templates for Your Brand

    Web Development
    Thailand and Malaysia Ramp Up Financial Cybersecurity Amid Escalating Threats

    Thailand and Malaysia Ramp Up Financial Cybersecurity Amid Escalating Threats

    Development

    Highlights

    Why development leaders are investing in design

    July 18, 2025

    What does it take to consistently ship great products? For many development leaders, the answer…

    PSA: I need people to know the ROG ‘Xbox Ally’ is Xbox by name and software experience only — It is a full Windows PC, you can install *anything* you want on it

    June 10, 2025

    CVE-2024-53569 – Volmarg Personal Management System Stored XSS

    April 22, 2025

    CVE-2025-5276 – MCP Markdownify Server SSRF

    May 29, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.