Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      CodeSOD: A Unique Way to Primary Key

      July 22, 2025

      BrowserStack launches Figma plugin for detecting accessibility issues in design phase

      July 22, 2025

      Parasoft brings agentic AI to service virtualization in latest release

      July 22, 2025

      Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

      July 21, 2025

      The best CRM software with email marketing in 2025: Expert tested and reviewed

      July 22, 2025

      This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

      July 22, 2025

      I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

      July 22, 2025

      8 ways I quickly leveled up my Linux skills – and you can too

      July 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025
      Recent

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025

      Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

      July 22, 2025

      Execute Ping Commands and Get Back Structured Data in PHP

      July 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025
      Recent

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025

      “I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

      July 22, 2025

      Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

      July 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»How Do LLMs Really Reason? A Framework to Separate Logic from Knowledge

    How Do LLMs Really Reason? A Framework to Separate Logic from Knowledge

    June 11, 2025

    Unpacking Reasoning in Modern LLMs: Why Final Answers Aren’t Enough

    Recent advancements in reasoning-focused LLMs like OpenAI’s o1/3 and DeepSeek-R1 have led to notable improvements on complex tasks. However, the step-by-step reasoning behind these models remains unclear. Most evaluations focus on final-answer accuracy, which hides the reasoning process and doesn’t reveal how models combine knowledge and logic. Some earlier methods attempt to measure reasoning by comparing answers to the original question, but this approach is flawed since models often rely on prior deductions or internal knowledge. Domains such as math and medicine differ in their reasoning needs, highlighting the importance of developing better, domain-aware evaluation methods for building trustworthy AI.

    The Shortcomings of Final-Answer Evaluations in Math and Medicine

    Recent LLMs have made impressive strides in reasoning tasks, especially in math and medicine, thanks to better training data and reward strategies. However, most of this progress focuses on boosting final answer accuracy rather than understanding how the model reasons step-by-step. Past work has flagged factual errors in reasoning chains or measured similarity between reasoning steps and the original question. But such similarity doesn’t guarantee logical soundness or factual correctness, since LLMs often draw on internal knowledge or earlier reasoning.

    A New Framework for Separating Knowledge and Logic in LLM Reasoning

    Researchers from UC Santa Cruz, Stanford, and Tongji University go beyond final-answer evaluation by breaking down LLM reasoning into two key parts: factual knowledge and logical steps. They introduce a detailed framework that utilizes two metrics: the Knowledge Index (KI) for factual accuracy and Information Gain (InfoGain) for reasoning quality. Their analysis of Qwen models across math and medical tasks reveals that reasoning skills don’t easily transfer between domains. While supervised fine-tuning improves accuracy, it often harms reasoning depth. Reinforcement learning, however, helps refine reasoning by removing irrelevant information. This work highlights the importance of evaluating and training LLMs more thoughtfully.

    Assessing Reasoning with Qwen2.5-7B and DeepSeek-R1 Models

    The researchers evaluate reasoning in LLMs by analyzing Qwen2.5-7B and its DeepSeek-R1-distilled version, trained with SFT and RL. Using tasks from both math and medical domains, they decompose responses into logical steps and assess them using two key metrics: Information Gain (how much uncertainty is reduced with each reasoning step) and Knowledge Index (how factually accurate each step is, verified against expert sources). While InfoGain tracks the informativeness of each step, KI checks whether the knowledge aligns with real-world facts. This approach reveals how models reason and where they may falter in accuracy or logic.

    Supervised Fine-Tuning vs. Reinforcement Learning in Domain-Specific Tasks

    The study evaluates two variants of Qwen-2.5-7B—Qwen-Base and the distilled Qwen-R1 on medical tasks. Results show that Qwen-Base consistently outperforms Qwen-R1 in accuracy, knowledge retention, and reasoning, especially after SFT and RL. The distilled model likely struggles due to prior training focused on math and code, resulting in a domain mismatch. Interestingly, SFT enhances medical knowledge more effectively than RL, although it may slightly compromise reasoning efficiency. RL, on the other hand, improves both reasoning and knowledge when applied post-SFT. Medical benchmarks tend to rely more on factual knowledge than abstract reasoning, unlike math-focused tasks.

    Conclusion: Toward More Interpretable and Trustworthy LLMs

    In conclusion, the study introduces a framework that separates knowledge from reasoning to evaluate better how LLMs think, particularly in high-stakes areas like medicine and math. Using Qwen models trained with SFT and RL, the researchers found that while SFT improves factual accuracy, essential in medicine, it often weakens reasoning. RL, however, enhances reasoning by trimming out incorrect information. The framework could be extended to fields such as law or finance, where structured thinking is crucial. Overall, this approach helps clarify how LLMs make decisions and suggests ways to tailor their training for specific domains.


    Check out the Paper, Code and Project Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 99k+ ML SubReddit and Subscribe to our Newsletter.

    The post How Do LLMs Really Reason? A Framework to Separate Logic from Knowledge appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDevelop a Multi-Tool AI Agent with Secure Python Execution using Riza and Gemini
    Next Article Adobe enhances developer productivity using Amazon Bedrock Knowledge Bases

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 22, 2025
    Machine Learning

    Boolformer: Symbolic Regression of Logic Functions with Transformers

    July 22, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Canva just dropped 6 exciting AI features in its biggest update in years

    News & Updates

    Microsoft tells Windows 10 users to buy Copilot+ AI Windows 11 PC because it’s better

    Operating Systems

    Windows 11 is planning a huge redesign of its Start Menu

    Operating Systems

    Google’s Agent2Agent protocol finds new home at the Linux Foundation

    Tech & Work

    Highlights

    Windows “inetpub” security fix can be abused to block future updates

    April 25, 2025

    Windows “inetpub” security fix can be abused to block future updates

    A recent Windows security update that creates an ‘inetpub’ folder has introduced a new weakness allowing attackers to prevent the installation of future updates.
    After people installed this month’s Mi …
    Read more

    Published Date:
    Apr 25, 2025 (3 hours, 52 minutes ago)

    Vulnerabilities has been mentioned in this article.

    CVE-2025-21204

    Out of the blue, GTA V is finally approved for release in Saudi Arabia — and it’s coming soon

    July 4, 2025

    Dems demand audit of CVE program as Federal funding remains uncertain

    June 15, 2025

    CVE-2025-53075 – Samsung Open Source rLottie Path Traversal Vulnerability

    June 29, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.