Machine Learning

SpeakStream: Streaming Text-to-Speech with Interleaved Data

May 29, 2025

With the increasing integration of speech front-ends and large language models (LLM), there is a need to explore architectures that…

Machine Learning

National University of Singapore Researchers Introduce Dimple: A Discrete Diffusion Multimodal Language Model for Efficient and Controllable Text Generation

May 29, 2025

In recent months, there has been growing interest in applying diffusion models—originally designed for continuous data, such as images—to natural…

Machine Learning

This AI Paper Introduces WEB-SHEPHERD: A Process Reward Model for Web Agents with 40K Dataset and 10× Cost Efficiency

May 29, 2025

Web navigation focuses on teaching machines how to interact with websites to perform tasks such as searching for information, shopping,…

Interleaved Reasoning for Large Language Models via Reinforcement Learning

May 28, 2025

Long chain-of-thought (CoT) significantly enhances large language models’ (LLM) reasoning capabilities. However, the extensive reasoning traces lead to inefficiencies and…

Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation

May 28, 2025

Auscultation, particularly heart sound, is a non-invasive technique that provides essential vital sign information. Recently, self-supervised acoustic representation founda- tion…

Machine Learning

LLMs Can Now Reason Beyond Language: Researchers Introduce Soft Thinking to Replace Discrete Tokens with Continuous Concept Embeddings

May 28, 2025

Human reasoning naturally operates through abstract, non-verbal concepts rather than strictly relying on discrete linguistic tokens. However, current LLMs are…

A Coding Implementation to Build an Interactive Transcript and PDF Analysis with Lyzr Chatbot Framework

May 28, 2025

In this tutorial, we introduce a streamlined approach for extracting, processing, and analyzing YouTube video transcripts using Lyzr, an advanced…

How Rufus doubled their inference speed and handled Prime Day traffic with AWS AI chips and parallel decoding

May 28, 2025

Large language models (LLMs) have revolutionized the way we interact with technology, but their widespread adoption has been blocked by…

Machine Learning

Tailoring foundation models for your business needs: A comprehensive guide to RAG, fine-tuning, and hybrid approaches

May 28, 2025

Foundation models (FMs) have revolutionised AI capabilities, but adopting them for specific business needs can be challenging. Organizations often struggle…

Machine Learning

Building a multimodal RAG based application using Amazon Bedrock Data Automation and Amazon Bedrock Knowledge Bases

May 28, 2025

Organizations today deal with vast amounts of unstructured data in various formats including documents, images, audio files, and video files.…

Machine Learning

Gemma 3 27B model now available on Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

May 28, 2025

We are excited to announce the availability of Gemma 3 27B Instruct models through Amazon Bedrock Marketplace and Amazon SageMaker…

Machine Learning

A generative AI prototype with Amazon Bedrock transforms life sciences and the genome analysis process

May 28, 2025

It takes biopharma companies over 10 years, at a cost of over $2 billion and with a failure rate of…

Machine Learning

Part 3: Building an AI-powered assistant for investment research with multi-agent collaboration in Amazon Bedrock and Amazon Bedrock Data Automation

May 28, 2025

In the financial services industry, analysts need to switch between structured data (such as time-series pricing information), unstructured text (such…

Machine Learning

Incorrect Answers Improve Math Reasoning? Reinforcement Learning with Verifiable Rewards (RLVR) Surprises with Qwen2.5-Math

May 28, 2025

In natural language processing (NLP), RL methods, such as reinforcement learning with human feedback (RLHF), have been utilized to enhance…

This AI Paper Introduces MMaDA: A Unified Multimodal Diffusion Model for Textual Reasoning, Visual Understanding, and Image Generation

May 28, 2025

Diffusion models, known for their success in generating high-quality images, are now being explored as a foundation for handling diverse…

CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling

May 27, 2025

Mixture-of-Experts (MoE) models are crucial for scaling model capacity while controlling inference costs. While integrating MoE into multimodal models like…

Machine Learning

GuardianGamer scales family-safe cloud gaming with AWS

May 27, 2025

This blog post is co-written with Heidi Vogel Brockmann and Ronald Brockmann from GuardianGamer. Millions of families face a common…

Machine Learning

Meta AI Introduces Multi-SpatialMLLM: A Multi-Frame Spatial Understanding with Multi-modal Large Language Models

May 27, 2025

Multi-modal large language models (MLLMs) have shown great progress as versatile AI assistants capable of handling diverse visual tasks. However,…

Machine Learning

New Amazon Bedrock Data Automation capabilities streamline video and audio analysis

May 27, 2025

Organizations across a wide range of industries are struggling to process massive amounts of unstructured video and audio content to…

A Step-by-Step Coding Implementation of an Agent2Agent Framework for Collaborative and Critique-Driven AI Problem Solving with Consensus-Building

May 27, 2025

In this tutorial, we implement the Agent2Agent collaborative framework built atop Google’s Gemini models. The guide walks through the creation…