Tiny Models, Big Reasoning Gains: USC Researchers Introduce Tina for Cost-Effective Reinforcement Learning with LoRA

Achieving strong, multi-step reasoning in LMs remains a major challenge, despite notable progress in general task performance. Such reasoning is crucial for complex problem-solving domains, such as scientific research and strategic planning. Traditionally, enhancing reasoning skills involves supervised fine-tuning (SFT), where models learn by imitating step-by-step reasoning demonstrations from more advanced models, such as o1. While effective, this method heavily depends on the availability of high-quality reasoning traces, which are costly and risk promoting shallow mimicry over genuine logical exploration. RL offers an alternative by enabling models to learn directly from reward signals, encouraging broader reasoning exploration. However, RL approaches are often resource-heavy and complex, raising the question of how to build reasoning-capable models cost-effectively.

Following the release of strong models like o1-preview, several open-source efforts such as STILL, Sky-T1, SimpleRL, PRIME, and DeepScaleR have explored efficient strategies to replicate or surpass o1’s reasoning capabilities. Techniques include lightweight imitation learning, scalable instruction tuning, and simplified RL methods. Meanwhile, newer innovations, such as Group Relative Policy Optimization (GRPO), enhance RL training efficiency by eliminating the need for separate value networks, as seen in models like DeepSeek-R1. To further lower training costs, researchers are also investigating Low-Rank Adaptation (LoRA) methods, which update only a small subset of model parameters, maintaining modularity while preserving reasoning ability. This approach enables efficient fine-tuning without the computational demands of full-parameter updates.

Researchers from the University of Southern California introduce Tina, a family of compact reasoning models that achieve strong performance with minimal cost. Using RL enhanced by LoRA on a 1.5B parameter base model, Tina models outperform or match state-of-the-art models at a fraction of the computational expense. Their best model improves reasoning performance by over 20% and achieves 43.33% Pass@1 on AIME24, with a post-training cost of just $9. By leveraging LoRA’s efficiency to adapt reasoning formats while preserving base knowledge, Tina highlights a highly accessible, cost-effective approach, with all resources fully open-sourced.

Tina is a family of tiny reasoning models built by post-training the DeepSeek-R1-Distill-Qwen-1.5B model using LoRA during reinforcement learning with a GRPO-style approach. The framework emphasizes minimalism—tiny models, small parameter updates, and a low hardware and budget footprint. Tina models were trained using public datasets and replicated setups from models like STILL-3, DeepScaleR, and Open-RS. Training leveraged the OpenR1 codebase, minimal hyperparameter tuning, and just two NVIDIA L40S GPUs, occasionally RTX 6000 Ada GPUs. Training and evaluation costs were low, averaging well under a $100 budget per experiment, making Tina a highly accessible platform for reasoning research.

To ensure fair comparisons, the authors reevaluated baseline reasoning models using a consistent setup with the LightEval framework and vLLM engine, thereby eliminating variations introduced by previous studies. Six reasoning benchmarks, including AIME 24/25, AMC 23, MATH 500, GPQA, and Minerva, were utilized. They then evaluated Tina models—small, LoRA-trained versions of baseline models—showing that Tina models often outperformed their full-parameter counterparts despite using minimal training (19–57% of an epoch). Further ablation studies revealed that smaller, high-quality datasets, appropriate learning rates, moderate LoRA ranks, and careful choice of RL algorithm significantly impacted performance, confirming the efficiency and robustness of their LoRA-based reasoning approach.

In conclusion, Tina, a series of lightweight reasoning models that achieve strong performance using minimal computational resources. By applying LoRA during RL on a 1.5 B-parameter base model, they achieve reasoning abilities competitive with larger state-of-the-art models at a post-training cost of just $9. Tina models show over a 20% improvement in reasoning and 43.33% Pass@1 accuracy on AIME24. While showcasing impressive cost-performance efficiency, limitations remain, including the smaller model scale, limited diversity in reasoning tasks, and minimal hyperparameter tuning. All code, logs, and model checkpoints are open-sourced to promote accessible research and further exploration.

Check out the Paper and GitHub Page. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

[Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

The post Tiny Models, Big Reasoning Gains: USC Researchers Introduce Tina for Cost-Effective Reinforcement Learning with LoRA appeared first on MarkTechPost.

Source: Read MoreÂ

CodeSOD: A Unique Way to Primary Key

BrowserStack launches Figma plugin for detecting accessibility issues in design phase

Parasoft brings agentic AI to service virtualization in latest release

Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

The best CRM software with email marketing in 2025: Expert tested and reviewed

This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

8 ways I quickly leveled up my Linux skills – and you can too

The Intersection of Agile and Accessibility – A Series on Designing for Everyone

The Intersection of Agile and Accessibility – A Series on Designing for Everyone

Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

Execute Ping Commands and Get Back Structured Data in PHP

A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

“I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

Tiny Models, Big Reasoning Gains: USC Researchers Introduce Tina for Cost-Effective Reinforcement Learning with LoRA

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

Boolformer: Symbolic Regression of Logic Functions with Transformers

Microsoft Makes Security Copilot in Entra Generally Available for IT Admins

React Coverflow with Scroll-Driven Animations

This Call of Duty game just hit Xbox Game Pass, but it’s infested with RCE hackers — I’d take cover and avoid playing until there’s a fix

Monster Hunter Wilds disappointed me, so I returned to Monster Hunter Rise now that it’s cheaper than $11 — You only have a few hours left until this steal of a deal goes away, so get hunting!

CVE-2025-6257 – WordPress Euro FxRef Currency Converter Stored Cross-Site Scripting Vulnerability

XSSTRON — Find XSS Vulnerabilities by Just Browsing

Is Elden Ring Nightreign on Xbox Game Pass?

Surprise! FFXIV Patch 7.3 is adding a feature players have begged for since Shadowbringers — I can’t believe it’s finally coming 6 years later

Tiny Models, Big Reasoning Gains: USC Researchers Introduce Tina for Cost-Effective Reinforcement Learning with LoRA

Related Posts