Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      CodeSOD: A Unique Way to Primary Key

      July 22, 2025

      BrowserStack launches Figma plugin for detecting accessibility issues in design phase

      July 22, 2025

      Parasoft brings agentic AI to service virtualization in latest release

      July 22, 2025

      Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

      July 21, 2025

      The best CRM software with email marketing in 2025: Expert tested and reviewed

      July 22, 2025

      This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

      July 22, 2025

      I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

      July 22, 2025

      8 ways I quickly leveled up my Linux skills – and you can too

      July 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025
      Recent

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025

      Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

      July 22, 2025

      Execute Ping Commands and Get Back Structured Data in PHP

      July 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025
      Recent

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025

      “I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

      July 22, 2025

      Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

      July 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Samsung Researchers Introduced ANSE (Active Noise Selection for Generation): A Model-Aware Framework for Improving Text-to-Video Diffusion Models through Attention-Based Uncertainty Estimation

    Samsung Researchers Introduced ANSE (Active Noise Selection for Generation): A Model-Aware Framework for Improving Text-to-Video Diffusion Models through Attention-Based Uncertainty Estimation

    May 29, 2025

    Video generation models have become a core technology for creating dynamic content by transforming text prompts into high-quality video sequences. Diffusion models, in particular, have established themselves as a leading approach for this task. These models work by starting from random noise and iteratively refining it into realistic video frames. Text-to-video (T2V) models extend this capability by incorporating temporal elements and aligning generated content with textual prompts, producing videos that are both visually compelling and semantically accurate. Despite advancements in architecture design, such as latent diffusion models and motion-aware attention modules, a significant challenge remains: ensuring consistent, high-quality video generation across different runs, particularly when the only change is the initial random noise seed. This challenge has highlighted the need for smarter, model-aware noise selection strategies to avoid unpredictable outputs and wasted computational resources.

    The core problem lies in how diffusion models initialize their generation process from Gaussian noise. The specific noise seed used can drastically impact the final video quality, temporal coherence, and prompt fidelity. For example, the same text prompt might generate entirely different videos depending on the random noise seed. Current approaches often attempt to address this problem by using handcrafted noise priors or frequency-based adjustments. Methods like FreeInit and FreqPrior apply external filtering techniques, while others like PYoCo introduce structured noise patterns. However, these methods rely on assumptions that may not hold across different datasets or models, require multiple full sampling passes (resulting in high computational costs), and fail to leverage the model’s internal attention signals, which could indicate which seeds are most promising for generation. As a result, there is a need for a more principled, model-aware method that can guide noise selection without incurring heavy computational penalties or relying on handcrafted priors.

    The research team from Samsung Research introduced ANSE (Active Noise Selection for Generation), an Active Noise Selection framework for video diffusion models. ANSE addresses the noise selection problem by using internal model signals, specifically attention-based uncertainty estimates, to guide noise seed selection. At the core of ANSE is BANSA (Bayesian Active Noise Selection via Attention), a novel acquisition function that quantifies the consistency and confidence of the model’s attention maps under stochastic perturbations. The research team designed BANSA to operate efficiently during inference by approximating its calculations through Bernoulli-masked attention sampling, which introduces randomness directly into the attention computation without requiring multiple full forward passes. This stochastic method enables the model to estimate the stability of its attention behavior across different noise seeds and select those that promote more confident and coherent attention patterns, which are empirically linked to improved video quality.

    BANSA works by evaluating entropy in the attention maps, which are generated at specific layers during the early denoising steps. The researchers identified that layers 14 for the CogVideoX-2B model and layer 19 for the CogVideoX-5B model provided sufficient correlation (above a 0.7 threshold) with the full-layer uncertainty estimate, significantly reducing computational overhead. The BANSA score is computed by comparing the average entropy of individual attention maps to the entropy of their mean, where a lower BANSA score indicates higher confidence and consistency in attention patterns. This score is used to rank candidate noise seeds from a pool of 10 (M = 10), each evaluated using 10 stochastic forward passes (K = 10). The noise seed with the lowest BANSA score is then used to generate the final video, achieving improved quality without requiring model retraining or external priors.

    On the CogVideoX-2B model, the total VBench score improved from 81.03 to 81.66 (+0.63), with a +0.48 gain in quality score and +1.23 gain in semantic alignment. On the larger CogVideoX-5B model, ANSE increased the total VBench score from 81.52 to 81.71 (+0.25), with a +0.17 gain in quality and +0.60 gain in semantic alignment. Notably, these improvements came with only an 8.68% increase in inference time for CogVideoX-2B and 13.78% for CogVideoX-5B. In contrast, prior methods, such as FreeInit and FreqPrior, required a 200% increase in inference time, making ANSE significantly more efficient. Qualitative evaluations further highlighted the benefits, showing that ANSE improved visual clarity, semantic consistency, and motion portrayal. For example, videos of “a koala playing the piano” and “a zebra running” showed more natural, anatomically correct motion under ANSE, while in prompts like “exploding,” ANSE-generated videos captured dynamic transitions more effectively.

    The research also explored different acquisition functions, comparing BANSA against random noise selection and entropy-based methods. BANSA using Bernoulli-masked attention achieved the highest total scores (81.66 for CogVideoX-2B), outperforming both random (81.03) and entropy-based methods (81.13). The study also found that increasing the number of stochastic forward passes (K) improved performance up to K = 10, beyond which the gains plateaued. Similarly, performance saturated at a noise pool size (M) of 10. A control experiment where the model intentionally selected seeds with the highest BANSA scores resulted in degraded video quality, confirming that lower BANSA scores correlate with better generation outcomes.

    While ANSE improves noise selection, it does not modify the generation process itself, meaning that some low-BANSA seeds can still result in suboptimal videos. The team acknowledged this limitation and suggested that BANSA is best viewed as a practical surrogate for more computationally intensive methods, such as per-seed sampling with post-hoc filtering. They also proposed that future work could integrate information-theoretic refinements or active learning strategies to enhance the quality of generation further.

    Several key takeaways from the research include:

    • ANSE improves total VBench scores for video generation: from 81.03 to 81.66 on CogVideoX-2B and from 81.52 to 81.71 on CogVideoX-5B.
    • Quality and semantic alignment gains are +0.48 and +1.23 for CogVideoX-2B, and +0.17 and +0.60 for CogVideoX-5B, respectively.
    • Inference time increases are modest: +8.68% for CogVideoX-2B and +13.78% for CogVideoX-5B.
    • BANSA scores derived from Bernoulli-masked attention outperform random and entropy-based methods for noise selection.
    • The layer selection strategy reduces computational load by computing uncertainty at layers 14 and 19 for CogVideoX-2B and CogVideoX-5B, respectively.
    • ANSE achieves efficiency by avoiding multiple full sampling passes, in contrast to methods like FreeInit, which require 200% more inference time.
    • The research confirms that low BANSA scores reliably correlate with higher video quality, making it an effective criterion for seed selection.

    In conclusion, the research tackled the challenge of unpredictable video generation in diffusion models by introducing a model-aware noise selection framework that leverages internal attention signals. By quantifying uncertainty through BANSA and selecting noise seeds that minimize this uncertainty, the researchers provided a principled, efficient method for improving video quality and semantic alignment in text-to-video models. ANSE’s design, which combines attention-based uncertainty estimation with computational efficiency, enables it to scale across different model sizes without incurring significant runtime costs, providing a practical solution for enhancing video generation in T2V systems.


    Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.

    The post Samsung Researchers Introduced ANSE (Active Noise Selection for Generation): A Model-Aware Framework for Improving Text-to-Video Diffusion Models through Attention-Based Uncertainty Estimation appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleA Coding Guide for Building a Self-Improving AI Agent Using Google’s Gemini API with Intelligent Adaptation Features
    Next Article Revolutionizing earth observation with geospatial foundation models on AWS

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 22, 2025
    Machine Learning

    Boolformer: Symbolic Regression of Logic Functions with Transformers

    July 22, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Linux App Release Roundup (June 2025)

    Linux

    Streamline Pipeline Cleanup with Laravel’s finally Method

    Development

    CVE-2022-44614 – Apache HTTP Server Command Injection

    Common Vulnerabilities and Exposures (CVEs)

    EchoLeak Zero-Click AI Attack in Microsoft Copilot Exposes Company Data

    Security

    Highlights

    CVE-2025-2905 – WSO2 API Manager XXE File Disclosure and Denial of Service Vulnerability

    May 5, 2025

    CVE ID : CVE-2025-2905

    Published : May 5, 2025, 9:15 a.m. | 2 hours, 24 minutes ago

    Description : An XML External Entity (XXE) vulnerability exists in the gateway component of WSO2 API Manager due to insufficient validation of XML input in crafted URL paths. User-supplied XML is parsed without appropriate restrictions, enabling external entity resolution.

    This vulnerability can be exploited by an unauthenticated remote attacker to read files from the server’s filesystem or perform denial-of-service (DoS) attacks.

    *
    On systems running JDK 7 or early JDK 8, full file contents may be exposed.

    *
    On later versions of JDK 8 and newer, only the first line of a file may be read, due to improvements in XML parser behavior.

    *
    DoS attacks such as “Billion Laughs” payloads can cause service disruption.

    Severity: 9.1 | CRITICAL

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2025-1732 – “Fortinet USG FLEX H Series Privilege Escalation Vulnerability”

    April 22, 2025

    Serial Studio – cross-platform telemetry dashboard and real-time data visualization tool

    July 7, 2025

    Training Llama 3.3 Swallow: A Japanese sovereign LLM on Amazon SageMaker HyperPod

    June 13, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.