Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      CodeSOD: A Unique Way to Primary Key

      July 22, 2025

      BrowserStack launches Figma plugin for detecting accessibility issues in design phase

      July 22, 2025

      Parasoft brings agentic AI to service virtualization in latest release

      July 22, 2025

      Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

      July 21, 2025

      The best CRM software with email marketing in 2025: Expert tested and reviewed

      July 22, 2025

      This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

      July 22, 2025

      I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

      July 22, 2025

      8 ways I quickly leveled up my Linux skills – and you can too

      July 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025
      Recent

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025

      Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

      July 22, 2025

      Execute Ping Commands and Get Back Structured Data in PHP

      July 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025
      Recent

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025

      “I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

      July 22, 2025

      Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

      July 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Tech & Work»Garbage in, garbage out: The importance of data quality when training AI models

    Garbage in, garbage out: The importance of data quality when training AI models

    June 2, 2025

    As every company moves to implement AI in some form or another, data is king. Without quality data to train on, the AI likely won’t deliver the results people are looking for and any investment made into training the model won’t pay off in the way it was intended.  

    “If you’re training your AI model on poor quality data, you’re likely to get bad results,” explained Robert Stanley, senior director of special projects at Melissa. 

    According to Stanley, there are a number of data quality best practices to stick to when it comes to training data. “You need to have data that is of good quality, which means it’s properly typed, it’s fielded correctly, it’s deduplicated, and it’s rich. It’s accurate, complete and augmented or well-defined with lots of useful metadata, so that there’s context for the AI model to work off of,” he said. 

    If the training data does not meet those standards, it’s likely that the outputs of the AI model won’t be reliable, Stanley explained. For instance, if data has the wrong fields, then the model might start giving strange and unexpected outputs. “It thinks it’s giving you a noun, but it’s really a verb. Or it thinks it’s giving you a number, but it’s really a string because it’s fielded incorrectly,” he said. 

    It’s also important to ensure that you have the right kind of data that is appropriate to the model you are trying to build, whether that be business data or contact data or health care data. 

    “I would just sort of be going down these data quality steps that would be recommended before you even start your AI project,” he said. Melissa’s “Gold Standard” for any business critical data is to use data that’s coming in from at least three different sources, and is dynamically updated. 

    According to Stanley, large language models (LLMs) unfortunately really want to please their users, which sometimes means giving answers that look like compelling right answers, but are actually incorrect. 

    This is why the data quality process doesn’t stop after training; it’s important to continue testing the model’s outputs to ensure that its responses are what you’d expect to see. 

    “You can ask questions of the model and then check the answers by comparing it back to the reference data and making sure it’s matching your expectations, like they’re not mixing up names and addresses or anything like that,” Stanley explained.

    For instance, Melissa has curated reference datasets that include geographic, business, identification, and other domains, and its informatics division utilizes ontological reasoning using formal semantic technologies in order to compare AI results to expected results based on real world models. 

    The post Garbage in, garbage out: The importance of data quality when training AI models appeared first on SD Times.

    Source: Read More 

    news
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleSecure GUI VPN for Kali Linux
    Next Article Designing For Neurodiversity

    Related Posts

    Tech & Work

    CodeSOD: A Unique Way to Primary Key

    July 22, 2025
    Tech & Work

    BrowserStack launches Figma plugin for detecting accessibility issues in design phase

    July 22, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-37796 – “Linux Kernel WiFi at76c50x Use After Free”

    Common Vulnerabilities and Exposures (CVEs)

    Iranian Hacker Pleads Guilty in $19 Million Robbinhood Ransomware Attack on Baltimore

    Development

    CVE-2025-49029 – Bitto Kazi Custom Login And Signup Widget Code Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Atomfall’s Wicked Isle DLC is coming sooner than you think and fully embraces folk horror

    News & Updates

    Highlights

    AI Is Flipping UX Upside Down

    May 23, 2025

    The article explores the fundamental shift in UX as AI-first systems minimize the role of…

    CVE-2025-49153 – MICROSENS NMP Web+ Remote Code Execution

    June 25, 2025

    CVE-2025-3458 – WordPress Ocean Extra Stored Cross-Site Scripting Vulnerability

    April 22, 2025

    Fileless Remcos RAT Delivered via LNK Files and MSHTA in PowerShell-Based Attacks

    May 16, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.