Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      CodeSOD: A Unique Way to Primary Key

      July 22, 2025

      BrowserStack launches Figma plugin for detecting accessibility issues in design phase

      July 22, 2025

      Parasoft brings agentic AI to service virtualization in latest release

      July 22, 2025

      Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

      July 21, 2025

      The best CRM software with email marketing in 2025: Expert tested and reviewed

      July 22, 2025

      This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

      July 22, 2025

      I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

      July 22, 2025

      8 ways I quickly leveled up my Linux skills – and you can too

      July 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025
      Recent

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025

      Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

      July 22, 2025

      Execute Ping Commands and Get Back Structured Data in PHP

      July 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025
      Recent

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025

      “I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

      July 22, 2025

      Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

      July 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Microsoft AI Released Phi-4-Reasoning: A 14B Parameter Open-Weight Reasoning Model that Achieves Strong Performance on Complex Reasoning Tasks

    Microsoft AI Released Phi-4-Reasoning: A 14B Parameter Open-Weight Reasoning Model that Achieves Strong Performance on Complex Reasoning Tasks

    May 1, 2025

    Despite notable advancements in large language models (LLMs), effective performance on reasoning-intensive tasks—such as mathematical problem solving, algorithmic planning, or coding—remains constrained by model size, training methodology, and inference-time capabilities. Models that perform well on general NLP benchmarks often lack the ability to construct multi-step reasoning chains or reflect on intermediate problem-solving states. Furthermore, while scaling up model size can improve reasoning capacity, it introduces prohibitive computational and deployment costs, especially for applied use in education, engineering, and decision-support systems.

    Microsoft Releases Phi-4 Reasoning Model Suite

    Microsoft recently introduced the Phi-4 reasoning family, consisting of three models—Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning. These models are derived from the Phi-4 base (14B parameters) and are specifically trained to handle complex reasoning tasks in mathematics, scientific domains, and software-related problem solving. Each variant addresses different trade-offs between computational efficiency and output precision. Phi-4-reasoning is optimized via supervised fine-tuning, while Phi-4-reasoning-plus extends this with outcome-based reinforcement learning, particularly targeting improved performance in high-variance tasks such as competition-level mathematics.

    The open weight models were released with transparent training details and evaluation logs, including benchmark design, and are hosted on Hugging Face for reproducibility and public access.

    Technical Composition and Methodological Advances

    The Phi-4-reasoning models build upon the Phi-4 architecture with targeted improvements to model behavior and training regime. Key methodological decisions include:

    • Structured Supervised Fine-Tuning (SFT): Over 1.4M prompts were curated with a focus on “boundary” cases—problems at the edge of Phi-4’s baseline capabilities. Prompts were sourced and filtered to emphasize multi-step reasoning rather than factual recall, and responses were synthetically generated using o3-mini in high-reasoning mode.
    • Chain-of-Thought Format: To facilitate structured reasoning, models were trained to generate output using explicit <think> tags, encouraging separation between reasoning traces and final answers.
    • Extended Context Handling: The RoPE base frequency was modified to support a 32K token context window, allowing for deeper solution traces, particularly relevant in multi-turn or long-form question formats.
    • Reinforcement Learning (Phi-4-reasoning-plus): Using Group Relative Policy Optimization (GRPO), Phi-4-reasoning-plus was further refined on a small curated set of ∼6,400 math-focused problems. A reward function was crafted to favor correct, concise, and well-structured outputs, while penalizing verbosity, repetition, and format violations.

    This data-centric and format-aware training regime supports better inference-time utilization and model generalization across domains, including unseen symbolic reasoning problems.

    Evaluation and Comparative Performance

    Across a broad range of reasoning benchmarks, Phi-4-reasoning and Phi-4-reasoning-plus deliver competitive results relative to significantly larger open-weight models:

    Phi-4-reasoning-plus shows strong performance not only on domain-specific evaluations but also generalizes well to planning and combinatorial problems like TSP and 3SAT, despite no explicit training in these areas. Performance gains were also observed in instruction-following (IFEval) and long-context QA (FlenQA), suggesting the chain-of-thought formulation improves broader model utility.

    Importantly, Microsoft reports full variance distributions across 50+ generation runs for sensitive datasets like AIME 2025, revealing that Phi-4-reasoning-plus matches or exceeds the performance consistency of models like o3-mini, while remaining disjoint from smaller baseline distributions like DeepSeek-R1-Distill.

    Conclusion and Implications

    The Phi-4 reasoning models represent a methodologically rigorous effort to advance small model capabilities in structured reasoning. By combining data-centric training, architectural tuning, and minimal but well-targeted reinforcement learning, Microsoft demonstrates that 14B-scale models can match or outperform much larger systems in tasks requiring multi-step inference and generalization.

    The models’ open weight availability and transparent benchmarking set a precedent for future development in small LLMs, particularly for applied domains where interpretability, cost, and reliability are paramount. Future work is expected to extend the reasoning capabilities into additional STEM fields, improve decoding strategies, and explore scalable reinforcement learning on longer horizons.


    Check out the Paper, HuggingFace Page and Microsoft Blog. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post Microsoft AI Released Phi-4-Reasoning: A 14B Parameter Open-Weight Reasoning Model that Achieves Strong Performance on Complex Reasoning Tasks appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleFrom Weeks to Days – How NG-TxAutomate Shrinks Automation Timelines
    Next Article Meta AI Introduces ReasonIR-8B: A Reasoning-Focused Retriever Optimized for Efficiency and RAG Performance

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 22, 2025
    Machine Learning

    Boolformer: Symbolic Regression of Logic Functions with Transformers

    July 22, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2024-12862 – OpenText Content Server Unauthorized Deletion

    Common Vulnerabilities and Exposures (CVEs)

    Grabber is an imageboard/booru downloader

    Linux

    CVE-2025-45237 – DBSyncer Unsecured Configuration File Access Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-48950 – MaxKB Sandbox Directory Permission Bypass Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2023-28912 – Skoda MIB3 In-vehicle Contact Data Disclosure

    June 28, 2025

    CVE ID : CVE-2023-28912

    Published : June 28, 2025, 4:15 p.m. | 3 hours, 3 minutes ago

    Description : The MIB3 unit stores the synchronized phone contact book in clear-text, allowing an attacker with either code execution privilege on the system or physical access to the system to obtain vehicle owner’s contact data.
    The vulnerability was originally discovered in Skoda Superb III car with MIB3 infotainment unit OEM part number 3V0035820. The list of affected MIB3 OEM part numbers is provided in the referenced resources.

    Severity: 5.7 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Prime Day portable power station deals 2025: Best 11 generator sales up to 60% off

    July 10, 2025

    CVE-2025-45879 – Miliaris Amigdala XSS Vulnerability

    June 17, 2025

    WineCharm – Wine GUI

    July 10, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.