Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      CodeSOD: A Unique Way to Primary Key

      July 22, 2025

      BrowserStack launches Figma plugin for detecting accessibility issues in design phase

      July 22, 2025

      Parasoft brings agentic AI to service virtualization in latest release

      July 22, 2025

      Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

      July 21, 2025

      The best CRM software with email marketing in 2025: Expert tested and reviewed

      July 22, 2025

      This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

      July 22, 2025

      I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

      July 22, 2025

      8 ways I quickly leveled up my Linux skills – and you can too

      July 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025
      Recent

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025

      Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

      July 22, 2025

      Execute Ping Commands and Get Back Structured Data in PHP

      July 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025
      Recent

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025

      “I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

      July 22, 2025

      Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

      July 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Meta AI Introduces Token-Shuffle: A Simple AI Approach to Reducing Image Tokens in Transformers

    Meta AI Introduces Token-Shuffle: A Simple AI Approach to Reducing Image Tokens in Transformers

    April 26, 2025

    Autoregressive (AR) models have made significant advances in language generation and are increasingly explored for image synthesis. However, scaling AR models to high-resolution images remains a persistent challenge. Unlike text, where relatively few tokens are required, high-resolution images necessitate thousands of tokens, leading to quadratic growth in computational cost. As a result, most AR-based multimodal models are constrained to low or medium resolutions, limiting their utility for detailed image generation. While diffusion models have shown strong performance at high resolutions, they come with their own limitations, including complex sampling procedures and slower inference. Addressing the token efficiency bottleneck in AR models remains an important open problem for enabling scalable and practical high-resolution image synthesis.

    Meta AI Introduces Token-Shuffle

    Meta AI introduces Token-Shuffle, a method designed to reduce the number of image tokens processed by Transformers without altering the fundamental next-token prediction reach. The key insight underpinning Token-Shuffle is the recognition of dimensional redundancy in visual vocabularies used by multimodal large language models (MLLMs). Visual tokens, typically derived from vector quantization (VQ) models, occupy high-dimensional spaces but carry a lower intrinsic information density compared to text tokens. Token-Shuffle exploits this by merging spatially local visual tokens along the channel dimension before Transformer processing and subsequently restoring the original spatial structure after inference. This token fusion mechanism allows AR models to handle higher resolutions with significantly reduced computational cost while maintaining visual fidelity.

    Technical Details and Benefits

    Token-Shuffle consists of two operations: token-shuffle and token-unshuffle. During input preparation, spatially neighboring tokens are merged using an MLP to form a compressed token that preserves essential local information. For a shuffle window size sss, the number of tokens is reduced by a factor of s2s^2s2, leading to a substantial reduction in Transformer FLOPs. After the Transformer layers, the token-unshuffle operation reconstructs the original spatial arrangement, again assisted by lightweight MLPs.

    By compressing token sequences during Transformer computation, Token-Shuffle enables the efficient generation of high-resolution images, including those at 2048×2048 resolution. Importantly, this approach does not require modifications to the Transformer architecture itself, nor does it introduce auxiliary loss functions or pretraining of additional encoders.

    Furthermore, the method integrates a classifier-free guidance (CFG) scheduler specifically adapted for autoregressive generation. Rather than applying a fixed guidance scale across all tokens, the scheduler progressively adjusts guidance strength, minimizing early token artifacts and improving text-image alignment.

    Results and Empirical Insights

    Token-Shuffle was evaluated on two major benchmarks: GenAI-Bench and GenEval. On GenAI-Bench, using a 2.7B parameter LLaMA-based model, Token-Shuffle achieved a VQAScore of 0.77 on “hard” prompts, outperforming other autoregressive models such as LlamaGen by a margin of +0.18 and diffusion models like LDM by +0.15. In the GenEval benchmark, it attained an overall score of 0.62, setting a new baseline for AR models operating in the discrete token regime.

    Large-scale human evaluation further supported these findings. Compared to LlamaGen, Lumina-mGPT, and diffusion baselines, Token-Shuffle showed improved alignment with textual prompts, reduced visual flaws, and higher subjective image quality in most cases. However, minor degradation in logical consistency was observed relative to diffusion models, suggesting avenues for further refinement.

    In terms of visual quality, Token-Shuffle demonstrated the capability to produce detailed and coherent 1024×1024 and 2048×2048 images. Ablation studies revealed that smaller shuffle window sizes (e.g., 2×2) offered the best trade-off between computational efficiency and output quality. Larger window sizes provided additional speedups but introduced minor losses in fine-grained detail.

    Conclusion

    Token-Shuffle presents a straightforward and effective method to address the scalability limitations of autoregressive image generation. By leveraging the inherent redundancy in visual vocabularies, it achieves substantial reductions in computational cost while preserving, and in some cases improving, generation quality. The method remains fully compatible with existing next-token prediction frameworks, making it easy to integrate into standard AR-based multimodal systems.

    The results demonstrate that Token-Shuffle can push AR models beyond prior resolution limits, making high-fidelity, high-resolution generation more practical and accessible. As research continues to advance scalable multimodal generation, Token-Shuffle provides a promising foundation for efficient, unified models capable of handling text and image modalities at large scales.


    Check out the Paper. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post Meta AI Introduces Token-Shuffle: A Simple AI Approach to Reducing Image Tokens in Transformers appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDoris is a modern data warehouse for real-time analytics
    Next Article AgentA/B: A Scalable AI System Using LLM Agents that Simulate Real User Behavior to Transform Traditional A/B Testing on Live Web Platforms

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 22, 2025
    Machine Learning

    Boolformer: Symbolic Regression of Logic Functions with Transformers

    July 22, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Hiring an AI Developer? Here’s What You Need to Know Before You Start

    Web Development

    CVE-2025-24270 – Apple macOS Network Information Leakage Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    DistroWatch Weekly, Issue 1123

    News & Updates

    CVE-2025-4261 – GAIR-NLP Factool Code Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2025-7431 – WordPress Knowledge Base Stored Cross-Site Scripting Vulnerability

    July 18, 2025

    CVE ID : CVE-2025-7431

    Published : July 18, 2025, 2:15 a.m. | 3 hours, 59 minutes ago

    Description : The Knowledge Base plugin for WordPress is vulnerable to Stored Cross-Site Scripting via the plugin slug setting in all versions up to, and including, 2.3.1 due to insufficient input sanitization and output escaping. This makes it possible for authenticated attackers, with administrator-level access, to inject arbitrary web scripts in pages that will execute whenever a user accesses an injected page. This only affects multi-site installations and installations where unfiltered_html has been disabled.

    Severity: 4.4 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2025-27197 – Adobe Lightroom Out-of-Bounds Write Arbitrary Code Execution Vulnerability

    May 13, 2025
    Rilasciato OpenSSL 3.5: Importanti aggiornamenti crittografici e novità per la sicurezza

    Rilasciato OpenSSL 3.5: Importanti aggiornamenti crittografici e novità per la sicurezza

    April 9, 2025

    8 Comic-Inspired Snippets Powered by CSS & JavaScript

    April 13, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.