Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      CodeSOD: A Unique Way to Primary Key

      July 22, 2025

      BrowserStack launches Figma plugin for detecting accessibility issues in design phase

      July 22, 2025

      Parasoft brings agentic AI to service virtualization in latest release

      July 22, 2025

      Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

      July 21, 2025

      The best CRM software with email marketing in 2025: Expert tested and reviewed

      July 22, 2025

      This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

      July 22, 2025

      I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

      July 22, 2025

      8 ways I quickly leveled up my Linux skills – and you can too

      July 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025
      Recent

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025

      Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

      July 22, 2025

      Execute Ping Commands and Get Back Structured Data in PHP

      July 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025
      Recent

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025

      “I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

      July 22, 2025

      Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

      July 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Advancing Vision-Language Reward Models: Challenges, Benchmarks, and the Role of Process-Supervised Learning

    Advancing Vision-Language Reward Models: Challenges, Benchmarks, and the Role of Process-Supervised Learning

    April 3, 2025

    Process-supervised reward models (PRMs) offer fine-grained, step-wise feedback on model responses, aiding in selecting effective reasoning paths for complex tasks. Unlike output reward models (ORMs), which evaluate responses based on final outputs, PRMs provide detailed assessments at each step, making them particularly valuable for reasoning-intensive applications. While PRMs have been extensively studied in language tasks, their application in multimodal settings remains largely unexplored. Most vision-language reward models still rely on the ORM approach, highlighting the need for further research into how PRMs can enhance multimodal learning and reasoning.

    Existing reward benchmarks primarily focus on text-based models, with some specifically designed for PRMs. In the vision-language domain, evaluation methods generally assess broad model capabilities, including knowledge, reasoning, fairness, and safety. VL-RewardBench is the first benchmark incorporating reinforcement learning preference data to refine knowledge-intensive vision-language tasks. Additionally, multimodal RewardBench expands evaluation criteria beyond standard visual question answering (VQA) tasks, covering six key areas—correctness, preference, knowledge, reasoning, safety, and VQA—through expert annotations. These benchmarks provide a foundation for developing more effective reward models for multimodal learning.

    Researchers from UC Santa Cruz, UT Dallas, and Amazon Research benchmarked VLLMs as ORMs and PRMs across multiple tasks, revealing that neither consistently outperforms the other. To address evaluation gaps, they introduced VILBENCH, a benchmark requiring step-wise reward feedback, where GPT-4o with Chain-of-Thought achieved only 27.3% accuracy. Additionally, they collected 73.6K vision-language reward samples using an enhanced tree-search algorithm, training a 3B PRM that improved evaluation accuracy by 3.3%. Their study provides insights into vision-language reward modeling and highlights challenges in multimodal step-wise evaluation.

    VLLMs are increasingly effective across various tasks, particularly when evaluated for test-time scaling. Seven models were benchmarked using the LLM-as-a-judge approach to analyze their step-wise critique abilities on five vision-language datasets. A Best-of-N (BoN) setting was used, where VLLMs scored responses generated by GPT-4o. Key findings reveal that ORMs outperform PRMs in most cases except for real-world tasks. Additionally, stronger VLLMs do not always excel as reward models, and a hybrid approach between ORM and PRM is optimal. Moreover, VLLMs benefit from text-heavy tasks more than visual ones, underscoring the need for specialized vision-language reward models.

    To assess ViLPRM’s effectiveness, experiments were conducted on VILBENCH using different RMs and solution samplers. The study compared performance across multiple VLLMs, including Qwen2.5-VL-3B, InternVL-2.5-8B, GPT-4o, and o1. Results show that PRMs generally outperform ORMs, improving accuracy by 1.4%, though o1’s responses showed minimal difference due to limited detail. ViLPRM surpassed other PRMs, including URSA, by 0.9%, demonstrating superior consistency in response selection. Additionally, findings suggest that existing VLLMs are not robust enough as reward models, highlighting the need for specialized vision-language PRMs that perform well beyond math reasoning tasks.

    In conclusion, Vision-language PRMs perform well when reasoning steps are segmented, as seen in structured tasks like mathematics. However, in functions with unclear step divisions, PRMs can reduce accuracy, particularly in visual-dominant cases. Prioritizing key steps rather than treating all equally improves performance. Additionally, current multimodal reward models struggle with generalization, as PRMs trained on specific domains often fail in others. Enhancing training by incorporating diverse data sources and adaptive reward mechanisms is crucial. The introduction of ViLReward-73K improves PRM accuracy by 3.3%, but further advancements in step segmentation and evaluation frameworks are needed for robust multimodal models.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored]

    The post Advancing Vision-Language Reward Models: Challenges, Benchmarks, and the Role of Process-Supervised Learning appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleSnowflake Proposes ExCoT: A Novel AI Framework that Iteratively Optimizes Open-Source LLMs by Combining CoT Reasoning with off-Policy and on-Policy DPO, Relying Solely on Execution Accuracy as Feedback
    Next Article SEO vs Google Ads vs Omni-Channel: What Really Works in 2025?

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 22, 2025
    Machine Learning

    Boolformer: Symbolic Regression of Logic Functions with Transformers

    July 22, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    This split keyboard offers deep customization – if you’re willing to go all in

    News & Updates

    Marathon is doomed — ARC Raiders is already a much better extraction shooter, and I feel bad for Bungie’s developers

    News & Updates

    CVE-2025-7119 – Campcodes Complaint Management System SQL Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-27132 – OpenHarmony Out-of-Bounds Write Arbitrary Code Execution

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    You’re a business

    May 31, 2025

    Quick thought from a conversation with friends Source: Read More

    CVE-2025-6212 – “Ultra Addons for Contact Form 7 Stored Cross-Site Scripting Vulnerability”

    June 26, 2025

    Figma moves closer to its community with Latin American Spanish localization

    July 8, 2025

    Researchers from Dataocean AI and Tsinghua University Introduces Dolphin: A Multilingual Automatic Speech Recognition ASR Model Optimized for Eastern Languages and Dialects

    April 3, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.