Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      CodeSOD: A Unique Way to Primary Key

      July 22, 2025

      BrowserStack launches Figma plugin for detecting accessibility issues in design phase

      July 22, 2025

      Parasoft brings agentic AI to service virtualization in latest release

      July 22, 2025

      Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

      July 21, 2025

      The best CRM software with email marketing in 2025: Expert tested and reviewed

      July 22, 2025

      This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

      July 22, 2025

      I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

      July 22, 2025

      8 ways I quickly leveled up my Linux skills – and you can too

      July 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025
      Recent

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025

      Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

      July 22, 2025

      Execute Ping Commands and Get Back Structured Data in PHP

      July 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025
      Recent

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025

      “I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

      July 22, 2025

      Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

      July 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Salesforce AI Introduce BingoGuard: An LLM-based Moderation System Designed to Predict both Binary Safety Labels and Severity Levels

    Salesforce AI Introduce BingoGuard: An LLM-based Moderation System Designed to Predict both Binary Safety Labels and Severity Levels

    April 2, 2025

    The advancement of large language models (LLMs) has significantly influenced interactive technologies, presenting both benefits and challenges. One prominent issue arising from these models is their potential to generate harmful content. Traditional moderation systems, typically employing binary classifications (safe vs. unsafe), lack the necessary granularity to distinguish varying levels of harmfulness effectively. This limitation can lead to either excessively restrictive moderation, diminishing user interaction, or inadequate filtering, which could expose users to harmful content.

    Salesforce AI introduces BingoGuard, an LLM-based moderation system designed to address the inadequacies of binary classification by predicting both binary safety labels and detailed severity levels. BingoGuard utilizes a structured taxonomy, categorizing potentially harmful content into eleven specific areas, including violent crime, sexual content, profanity, privacy invasion, and weapon-related content. Each category incorporates five clearly defined severity levels ranging from benign (level 0) to extreme risk (level 4). This structure enables platforms to calibrate their moderation settings precisely according to their specific safety guidelines, ensuring appropriate content management across varying severity contexts.

    From a technical perspective, BingoGuard employs a “generate-then-filter” methodology to assemble its comprehensive training dataset, BingoGuardTrain, consisting of 54,897 entries spanning multiple severity levels and content styles. This framework initially generates responses tailored to different severity tiers, subsequently filtering these outputs to ensure alignment with defined quality and relevance standards. Specialized LLMs undergo individual fine-tuning processes for each severity tier, using carefully selected and expertly audited seed datasets. This fine-tuning guarantees that generated outputs adhere closely to predefined severity rubrics. The resultant moderation model, BingoGuard-8B, leverages this meticulously curated dataset, enabling precise differentiation among various degrees of harmful content. Consequently, moderation accuracy and flexibility are significantly enhanced.

    Empirical evaluations of BingoGuard indicate strong performance. Testing against BingoGuardTest, an expert-labeled dataset comprising 988 examples, revealed that BingoGuard-8B achieves higher detection accuracy than leading moderation models such as WildGuard and ShieldGemma, with improvements of up to 4.3%. Notably, BingoGuard demonstrates superior accuracy in identifying lower-severity content (levels 1 and 2), traditionally difficult for binary classification systems. Additionally, in-depth analyses uncovered a relatively weak correlation between predicted “unsafe” probabilities and the actual severity level, underscoring the necessity of explicitly incorporating severity distinctions. These findings illustrate fundamental gaps in current moderation methods that primarily rely on binary classifications.

    In conclusion, BingoGuard enhances the precision and effectiveness of AI-driven content moderation by integrating detailed severity assessments alongside binary safety evaluations. This approach allows platforms to handle moderation with greater accuracy and sensitivity, minimizing the risks associated with both overly cautious and insufficient moderation strategies. Salesforce’s BingoGuard thus provides an improved framework for addressing the complexities of content moderation within increasingly sophisticated AI-generated interactions.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored]

    The post Salesforce AI Introduce BingoGuard: An LLM-based Moderation System Designed to Predict both Binary Safety Labels and Severity Levels appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDull Desktop? Install ‘Picture of the Day’ App on Ubuntu
    Next Article Enhancing Strategic Decision-Making in Gomoku Using Large Language Models and Reinforcement Learning

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 22, 2025
    Machine Learning

    Boolformer: Symbolic Regression of Logic Functions with Transformers

    July 22, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    7 new features coming with the July 2025 Security Update for Windows 11

    News & Updates

    CVE-2025-24206 – Apple Local Network Authentication Bypass

    Common Vulnerabilities and Exposures (CVEs)

    What is Solarium? Everything we know about Apple’s biggest UI overhaul in a decade

    News & Updates

    5 ways to avoid spyware disguised as legit apps – before it’s too late

    News & Updates

    Highlights

    CVE-2025-6534 – Xxyopen Novel-Plus File Handler Remote Resource Identification Vulnerability

    June 23, 2025

    CVE ID : CVE-2025-6534

    Published : June 24, 2025, 1:15 a.m. | 46 minutes ago

    Description : A vulnerability, which was classified as problematic, was found in xxyopen/201206030 novel-plus up to 5.1.3. This affects the function remove of the file novel-admin/src/main/java/com/java2nb/common/controller/FileController.java of the component File Handler. The manipulation leads to improper control of resource identifiers. It is possible to initiate the attack remotely. The complexity of an attack is rather high. The exploitability is told to be difficult. The exploit has been disclosed to the public and may be used. The vendor was contacted early about this disclosure but did not respond in any way.

    Severity: 4.2 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2022-24067 – Apache Struts Deserialization Vulnerability

    May 28, 2025

    CVE-2025-37981 – SmartPQi Linux Kernel Memory Corruption Vulnerability

    May 20, 2025

    AI in Healthcare: Transforming Diagnosis, Treatment & Patient Care Excellence🧠

    May 20, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.