Internal Coherence Maximization (ICM): A Label-Free, Unsupervised Training Framework for LLMs

Post-training methods for pre-trained language models (LMs) depend on human supervision through demonstrations or preference feedback to specify desired behaviors. However, this approach faces critical limitations as tasks and model behaviors become very complex. Human supervision is unreliable in these scenarios as LMs learn to mimic mistakes in demonstrations or exploit inherent flaws in feedback systems. The core challenge lies in training LMs for tasks that exceed human capability in reliability in demonstrations or evaluations. Recent research has identified diverse failure modes, including reward-hacking of human-designed supervision signals or real humans themselves.

Limitations of Human Supervision in LLM Post-Training

Researchers have explored several approaches to scale beyond human supervision. One standard method utilizes high-quality verifiable rewards, such as matching model outputs with ground-truth solutions in mathematical domains. Despite evidence that pre-trained base models have strong latent capabilities for downstream tasks, with post-training adding minimal improvements, effective elicitation remains challenging. The Contrast Consistent Search (CCS) method is an unsupervised elicitation approach that uses logical consistency to find latent knowledge without supervision. However, CCS underperforms supervised approaches and often fails to identify knowledge due to other prominent features satisfying consistency properties.

Introducing Internal Coherence Maximization (ICM)

Researchers from Anthropic, Schmidt Sciences, Independent, Constellation, New York University, and George Washington University have proposed Internal Coherence Maximization (ICM), which fine-tunes pre-trained models on their own generated labels without using any provided labels. ICM solves this by searching for label sets that are both logically consistent and mutually predictable according to the pre-trained model. Since optimal label set identification remains computationally infeasible, ICM uses a simulated annealing-inspired search algorithm to approximate the maximum objective. Moreover, this method matches the performance of training on golden labels on TruthfulQA and GSM8K, and outperforms training on crowdsourced human labels on Alpaca.

How the ICM Algorithm Works

The ICM algorithm follows an iterative three-step process: (a) the system samples a new unlabeled example from the dataset for potential inclusion, (b) it determines the optimal label for this example while simultaneously resolving any logical inconsistencies, and (c) the algorithm evaluates whether to accept this new labeled example based on the scoring function. ICM is evaluated across three datasets: TruthfulQA for truthfulness assessment, GSM8K-verification for mathematical correctness, and Alpaca for helpfulness and harmlessness. Researchers used four baselines in their experiments: Zero-shot, Zero-shot (Chat), Golden Label, and Human Label. Moreover, Experiments used two open-weight models, Llama 3.1 8B and 70B, and two proprietary models: Claude 3 Haiku and Claude 3.5 Haiku.

Benchmark Performance and Model Comparisons

In superhuman capability elicitation tasks, ICM matches golden supervision accuracy at 80%, outperforming the estimated human accuracy of 60%. Using ICM-generated reward models, researchers successfully trained an assistant chatbot without human supervision. The unsupervised reward model achieves 75.0% accuracy on RewardBench, compared to 72.2% for human-supervised alternatives trained on production data. Moreover, using both the unsupervised and human-supervised RM, two policies are trained with RL to create helpful, harmless, and honest assistants. The policy trained with the unsupervised RM achieves a 60% win rate. However, these policies still lag behind the publicly released Claude 3.5 Haiku, which achieves 92% win rates.

Conclusion and Future Outlook

This paper introduces Internal Coherence Maximization (ICM), an advancement in unsupervised LM for fine-tuning pre-trained models on self-generated labels. The method consistently matches golden supervision performance and surpasses crowdsourced human supervision across GSM8K-verification, TruthfulQA, and Alpaca reward modeling tasks. However, ICM’s limitations include dependency on concept salience within pre-trained models and ineffectiveness with long inputs due to context window constraints. As LMs advance beyond human evaluation capabilities, ICM offers promising alternatives to traditional RLHF, ensuring model alignment with human intent without human supervision boundaries.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Internal Coherence Maximization (ICM): A Label-Free, Unsupervised Training Framework for LLMs appeared first on MarkTechPost.

Source: Read MoreÂ

CodeSOD: A Unique Way to Primary Key

BrowserStack launches Figma plugin for detecting accessibility issues in design phase

Parasoft brings agentic AI to service virtualization in latest release

Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

The best CRM software with email marketing in 2025: Expert tested and reviewed

This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

8 ways I quickly leveled up my Linux skills – and you can too

The Intersection of Agile and Accessibility – A Series on Designing for Everyone

The Intersection of Agile and Accessibility – A Series on Designing for Everyone

Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

Execute Ping Commands and Get Back Structured Data in PHP

A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

“I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

Internal Coherence Maximization (ICM): A Label-Free, Unsupervised Training Framework for LLMs

Limitations of Human Supervision in LLM Post-Training

Introducing Internal Coherence Maximization (ICM)

How the ICM Algorithm Works

Benchmark Performance and Model Comparisons

Conclusion and Future Outlook

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

Boolformer: Symbolic Regression of Logic Functions with Transformers

Oshin OS is an Arch Linux distribution

CVE-2025-5479 – Sony XAV-AX8500 Bluetooth AVCTP Protocol Heap-based Buffer Overflow Remote Code Execution

Report shows overinflated opinion of infrastructure automation excellence

CVE-2025-2817 – Mozilla Firefox System File Privilege Escalation

CVE-2025-40628 – DomainsPRO SQL Injection

CVE-2025-5734 – TOTOLINK X15 HTTP POST Request Handler Buffer Overflow

The Ampere Porting Advisor Tutorial

CVE-2025-3621 – ProTNS ActADUR Remote Code Inclusion and Command Injection

Internal Coherence Maximization (ICM): A Label-Free, Unsupervised Training Framework for LLMs

Limitations of Human Supervision in LLM Post-Training

Introducing Internal Coherence Maximization (ICM)

How the ICM Algorithm Works

Benchmark Performance and Model Comparisons

Conclusion and Future Outlook

Related Posts