Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      CodeSOD: A Unique Way to Primary Key

      July 22, 2025

      BrowserStack launches Figma plugin for detecting accessibility issues in design phase

      July 22, 2025

      Parasoft brings agentic AI to service virtualization in latest release

      July 22, 2025

      Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

      July 21, 2025

      The best CRM software with email marketing in 2025: Expert tested and reviewed

      July 22, 2025

      This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

      July 22, 2025

      I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

      July 22, 2025

      8 ways I quickly leveled up my Linux skills – and you can too

      July 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025
      Recent

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025

      Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

      July 22, 2025

      Execute Ping Commands and Get Back Structured Data in PHP

      July 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025
      Recent

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025

      “I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

      July 22, 2025

      Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

      July 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»NVIDIA AI Releases Introduce UltraLong-8B: A Series of Ultra-Long Context Language Models Designed to Process Extensive Sequences of Text (up to 1M, 2M, and 4M tokens)

    NVIDIA AI Releases Introduce UltraLong-8B: A Series of Ultra-Long Context Language Models Designed to Process Extensive Sequences of Text (up to 1M, 2M, and 4M tokens)

    April 13, 2025

    Large language mdoels LLMs have shown remarkable performance across diverse text and multimodal tasks. However, many applications, such as document and video understanding, in-context learning, and inference-time scaling, demand the ability to process and reason over long sequences of tokens. The limited context window of LLMs poses a significant challenge in these situations, as critical information spread over lengthy documents may be overlooked. Models often miss vital information when processing extensive documents or videos, falling outside their fixed-context windows. This limitation creates a need for models that can efficiently handle ultra-long contexts without sacrificing performance on standard tasks.

    Existing context extension strategies for long-context language models fall into three categories: exact attention methods, approximate attention methods, and approaches incorporating additional modules. Methods like Position Interpolation, NTK-aware, Dynamic NTK, YaRN, and CLEX enhance attention mechanisms through redesigned position embeddings. Recent advancements include models like GPT-4o, Gemini, and Claude that support extensive context windows of hundreds of thousands of tokens, but their closed-source nature limits reproducibility. Open-source efforts like ProLong use NTK-aware scaling but require expensive computation, while Gradient uses continued pretraining that contains standard task performance.

    Researchers from UIUC and NVIDIA have proposed an efficient training recipe for building ultra-long context LLMs from aligned instruct models, pushing the boundaries of context lengths from 128K to 1M, 2M, and 4M tokens. The method utilizes efficient, continued pretraining strategies to extend the context window while using instruction tuning to maintain instruction-following and reasoning abilities. Moreover, their UltraLong-8B model achieves state-of-the-art performance across diverse long-context benchmarks. Models trained with this approach maintain competitive performance on standard benchmarks, showing balanced improvements for long and short context tasks. The research provides an in-depth analysis of key design choices, highlighting impacts of scaling strategies and data composition.

    The proposed method consists of two key stages: continued pretraining and instruction tuning. Together, these stages enable the effective processing of ultra-long inputs while maintaining strong performance across tasks. A YaRN-based scaling approach is adopted for context extension with fixed hyperparameters as α = 1 and β = 4 rather than NTK-aware scaling strategies. The scale factors are computed based on target context length and employ larger scaling factors for RoPE embeddings to accommodate extended sequences and mitigate performance degradation at maximum lengths. Researchers subsample high-quality SFT datasets spanning general, mathematics, and code domains for training data and further utilize GPT-4o and GPT-4o-mini to refine responses and perform rigorous data decontamination.

    The proposed models show superior long-context retrieval capabilities in the Needle in a Haystack passkey retrieval test. Baseline models like Llama-3-8B-Instruct-Gradient-1048k pass the test, but Llama3.1-8B-Instruct and Llama-3-8B-ProLong-512k-Instruct show errors. In contrast, the UltraLong models achieve 100% accuracy across all input lengths and depths, showing strong retrieval capability. The UltraLong achieves the highest average scores on RULER for inputs up to 512K and 1M tokens, the highest F1 scores on LV-Eval within 128K and 256K token lengths, and the best performance on InfiniteBench. Moreover, the models maintain strong performance across general, math, and code domains with average scores of 62.47, 61.06, and 60.95, exceeding the base model’s 61.45.

    This research paper introduces an efficient and systematic training recipe for ultra-long context language models, extending context windows to 1M, 2M, and 4M tokens while maintaining competitive performance on standard benchmarks. The approach combines efficient continued pretraining with instruction tuning to enhance long-context understanding and instruction-following capabilities. However, this approach focuses only on SFT on instruction datasets during the instruction tuning stage without exploring reinforcement learning or preference optimization. Also, it does not address safety alignment. Future research includes integrating safety alignment mechanisms and exploring advanced tuning strategies, further enhancing performance and trustworthiness.


    Check out Paper and Model on Hugging Face. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

    The post NVIDIA AI Releases Introduce UltraLong-8B: A Series of Ultra-Long Context Language Models Designed to Process Extensive Sequences of Text (up to 1M, 2M, and 4M tokens) appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleA Coding Implementation on Introduction to Weight Quantization: Key Aspect in Enhancing Efficiency in Deep Learning and LLMs
    Next Article Train an AI Agent That Thinks and Predicts Like Stock Market Legends

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 22, 2025
    Machine Learning

    Boolformer: Symbolic Regression of Logic Functions with Transformers

    July 22, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-5407 – “Chaitak-Gorai Blogbook Cross-Site Scripting Vulnerability”

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-5439 – “Linksys RE6500/RE6250/RE6300/RE6350/RE7000/RE9000 Facebook Like Command Injection Vulnerability”

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-5982 – GitLab EE IP Access Restriction Bypass Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-2470 – Nextend Social Login WordPress Plugin Privilege Escalation Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Common eBay scams and how to avoid them

    April 9, 2025

    Buying and selling on eBay can be great, but it can also be fraught with…

    CVE-2025-37790 – “Linux Net MCTP Socket RCU Free Vulnerability”

    May 1, 2025

    CVE-2025-37089 – HPE StoreOnce Command Injection Remote Code Execution Vulnerability

    June 2, 2025

    BrosTrend 5 Port 2.5GB Switch Review

    June 19, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.