Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      CodeSOD: A Unique Way to Primary Key

      July 22, 2025

      BrowserStack launches Figma plugin for detecting accessibility issues in design phase

      July 22, 2025

      Parasoft brings agentic AI to service virtualization in latest release

      July 22, 2025

      Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

      July 21, 2025

      The best CRM software with email marketing in 2025: Expert tested and reviewed

      July 22, 2025

      This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

      July 22, 2025

      I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

      July 22, 2025

      8 ways I quickly leveled up my Linux skills – and you can too

      July 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025
      Recent

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025

      Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

      July 22, 2025

      Execute Ping Commands and Get Back Structured Data in PHP

      July 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025
      Recent

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025

      “I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

      July 22, 2025

      Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

      July 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Google Researchers Advance Diagnostic AI: AMIE Now Matches or Outperforms Primary Care Physicians Using Multimodal Reasoning with Gemini 2.0 Flash

    Google Researchers Advance Diagnostic AI: AMIE Now Matches or Outperforms Primary Care Physicians Using Multimodal Reasoning with Gemini 2.0 Flash

    May 4, 2025

    LLMs have shown impressive promise in conducting diagnostic conversations, particularly through text-based interactions. However, their evaluation and application have largely ignored the multimodal nature of real-world clinical settings, especially in remote care delivery, where images, lab reports, and other medical data are routinely shared through messaging platforms. While systems like the Articulate Medical Intelligence Explorer (AMIE) have matched or surpassed primary care physicians in text-only consultations, this format falls short of reflecting telemedicine environments. Multimodal communication is essential in modern care, as patients often share photographs, documents, and other visual artifacts that cannot be fully conveyed through text alone. Limiting AI systems to textual inputs risks omitting critical clinical information, increasing diagnostic errors, and creating accessibility barriers for patients with lower health or digital literacy. Despite the widespread use of multimedia messaging apps in global healthcare, there has been little research into how LLMs can reason over such diverse data during diagnostic interactions.

    Research in diagnostic conversational agents began with rule-based systems like MYCIN, but recent developments have focused on LLMs capable of emulating clinical reasoning. While multimodal AI systems, such as vision-language models, have demonstrated success in radiology and dermatology, integrating these capabilities into conversational diagnostics remains challenging. Effective AI-based diagnostic tools must handle the complexity of multimodal reasoning and uncertainty-driven information gathering, a step beyond merely answering isolated questions. Evaluation frameworks like OSCEs and platforms such as AgentClinic provide useful starting points, yet tailored metrics are still needed to assess performance in multimodal diagnostic contexts. Moreover, while messaging apps are increasingly used in low-resource settings for sharing clinical data, concerns about data privacy, integration with formal health systems, and policy compliance persist. 

    Google DeepMind and Google Research have enhanced the AMIE with multimodal capabilities for improved conversational diagnosis and management. Using Gemini 2.0 Flash, AMIE employs a state-aware dialogue framework that adapts conversation flow based on patient state and diagnostic uncertainty, allowing strategic, structured history-taking with multimodal inputs like skin images, ECGs, and documents. AMIE outperformed or matched primary care physicians in a randomized OSCE-style study with 105 scenarios and 25 patient actors across 29 of 32 clinical metrics and 7 of 9 multimodal-specific criteria, demonstrating strong diagnostic accuracy, reasoning, communication, and empathy. 

    The study enhances the AMIE diagnostic system by incorporating multimodal perception and a state-aware dialogue framework that guides conversations through phases of history taking, diagnosis, and follow-up. Gemini 2.0 Flash powers the system and dynamically adapts based on evolving patient data, including text, images, and clinical documents. A structured patient profile and differential diagnosis are updated throughout the interaction, with targeted questions and multimodal data requests guiding clinical reasoning. Evaluation includes automated perception tests on isolated artifacts, simulated dialogues rated by auto-evaluators, and expert OSCE-style assessments, ensuring robust diagnostic performance and clinical realism. 

    The results show that the multimodal AMIE system performs at par or better than primary care physicians (PCPs) across multiple clinical tasks in simulated text-chat consultations. In OSCE-style assessments, AMIE consistently outperformed PCPs in diagnostic accuracy, especially when interpreting multimodal data such as images and clinical documents. It also demonstrated greater robustness when image quality was poor and showed fewer hallucinations. Patient actors rated AMIE’s communication skills highly, including empathy and trust. Automated evaluations confirmed that AMIE’s advanced reasoning framework, built on the Gemini 2.0 Flash model, significantly improved diagnosis and conversation quality, validating its design and effectiveness in real-world clinical scenarios. 

    In conclusion, the study advances conversational diagnostic AI by enhancing AMIE to integrate multimodal reasoning within patient dialogues. Using a novel state-aware inference-time strategy with Gemini 2.0 Flash, AMIE can interpret and reason about medical artifacts like images or ECGs in real-time clinical conversations. Evaluated through a multimodal OSCE framework, AMIE outperformed or matched primary care physicians in diagnostic accuracy, empathy, and artifact interpretation, even in complex cases. Despite limitations tied to chat-based interfaces and the need for real-world testing, these findings highlight AMIE’s potential as a robust, context-aware diagnostic assistant for future telehealth applications. 


    Check out the Paper and Technical details. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit. For Promotion and Partnerships, please talk us.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post Google Researchers Advance Diagnostic AI: AMIE Now Matches or Outperforms Primary Care Physicians Using Multimodal Reasoning with Gemini 2.0 Flash appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleBuilding AI Agents Using Agno’s Multi-Agent Teaming Framework for Comprehensive Market Analysis and Risk Reporting
    Next Article InfHow: Learn how to do anything

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 22, 2025
    Machine Learning

    Boolformer: Symbolic Regression of Logic Functions with Transformers

    July 22, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-45607 – Itranswarp Authentication Bypass Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-7465 – Tenda FH1201 HTTP POST Request Handler Buffer Overflow

    Common Vulnerabilities and Exposures (CVEs)

    Google’s AI Tool Big Sleep Uncovered Critical SQLite 0-Day Vulnerability and Blocks Active Exploitation

    Security

    CVE-2024-13915 – Ulefone and Krüger&Matz Android Smartphones Factory Reset Service Remote Code Execution Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2025-49278 – Blogty PHP RFI Vulnerability

    June 9, 2025

    CVE ID : CVE-2025-49278

    Published : June 9, 2025, 4:15 p.m. | 25 minutes ago

    Description : Improper Control of Filename for Include/Require Statement in PHP Program (‘PHP Remote File Inclusion’) vulnerability in Unfoldwp Blogty allows PHP Local File Inclusion. This issue affects Blogty: from n/a through 1.0.11.

    Severity: 8.1 | HIGH

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2025-47677 – GT3themes Photo Gallery Stored Cross-site Scripting

    May 7, 2025

    CVE-2025-2932 – JKDEVKIT WordPress Arbitrary File Deletion Vulnerability

    July 3, 2025

    CVE-2025-5907 – TOTOLINK EX1200T HTTP POST Request Handler Buffer Overflow Vulnerability

    June 9, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.