Monitor agents built on Amazon Bedrock with Datadog LLM Observability

This post was co-written with Mohammad Jama, Yun Kim, and Barry Eom from Datadog.

The emergence of generative AI agents in recent years has transformed the AI landscape, driven by advances in large language models (LLMs) and natural language processing (NLP). The focus is shifting from simple AI assistants to Agentic AI systems that can think, iterate, and take actions to solve complex tasks. These Agentic AI systems may use multiple agents, interact with tools both within and outside organizational boundaries to make decisions, and connect with knowledge sources to learn about processes. While these autonomous systems help organizations improve workplace productivity, streamline business workflows, and transform research and more, they introduce additional operational requirements. To ensure reliability, performance, and responsible AI use, teams need observability solutions purpose-built for tracking agent behavior, coordination, and execution flow.

The multi-agentic system collaboration capabilities of Amazon Bedrock Agents make it straightforward and fast to build these systems. Developers can configure a set of coordinated agents by breaking down complex user requests into multiple steps, calling internal APIs, accessing knowledge bases, and maintaining contextual conversations—all without managing the logic themselves.

In order for organizations to scale Agentic AI systems they need robust observability solutions to ensure reliability, performance, and responsible use of AI technology.

Datadog LLM Observability helps teams operate production-grade LLM applications with confidence by monitoring performance, quality, and security issues—such as latency spikes, hallucinations, tool selection, or prompt injection attempts. With full visibility into model behavior and application context, developers can identify, troubleshoot, and resolve issues faster.

We’re excited to announce a new integration between Datadog LLM Observability and Amazon Bedrock Agents that helps monitor agentic applications built on Amazon Bedrock. Beyond tracking the overall health of agentic applications, developers can track step-by-step agent executions across complex workflows and monitor foundational model calls, tool invocations, and knowledge base interactions.

In this post, we’ll explore how Datadog’s LLM Observability provides the visibility and control needed to successfully monitor, operate, and debug production-grade agentic applications built on Amazon Bedrock Agents.

Solution Overview

Datadog’s integration with Amazon Bedrock Agents offers comprehensive observability tailored for agentic Generative AI applications that programmatically invoke agents by using the InvokeAgent API. This integration captures detailed telemetry from each agent execution, enabling teams to monitor, troubleshoot, and optimize their LLM applications effectively.

Optimize Performance and Control Costs

As teams scale their agentic applications, each agent interaction—whether it’s retrieving knowledge, invoking tools, or calling models—can impact latency and cost. Without visibility into how these resources are used, it’s difficult to pinpoint inefficiencies or control spend as workflows grow more complex. For applications built on Bedrock Agents, Datadog automatically captures and provides:

Latency monitoring: Track the time taken for each step and overall execution to identify bottlenecks
Error rate tracking: Observe the frequency and types of errors encountered to improve reliability and debug issues
Token usage analysis: Monitor the number of tokens consumed during processing to manage costs
Tool invocation details: Gain insights into external API calls made by agents, such as Lambda functions or knowledge base queries

LLM Observability dashboard displaying key performance indicators, usage trends, and topic distribution for an AI-powered support chatbot.

This LLM Observability dashboard presents a detailed overview of an AI-powered support chatbot’s performance and usage patterns.

Monitor Complex Agentic Workflows

Agents can perform specific tasks, invoke tools, access knowledge bases, and maintain contextual conversations. Datadog provides comprehensive visibility into agent workflows by capturing detailed telemetry from Amazon Bedrock Agents, enabling teams to monitor, troubleshoot, and optimize their LLM applications effectively, providing:

End-to-end execution visibility: Visualize each operation of agent’s workflow, from pre-processing through post-processing, including orchestration and guardrail evaluations
Efficient troubleshooting: Debug with detailed execution insights to quickly pinpoint failure points and understand error contexts

Travel agent bot trace details displaying bedrock runtime invocation, model calls, and location suggestion tool execution.

This LLM Observability trace details the execution of a travel agent bot using Amazon Bedrock.

Evaluate output, tool selection, and overall quality

In agentic applications, it’s not enough to know that a task completed, you also need to know how well it was completed. For example, are generated summaries accurate and on-topic? Are user-facing answers clear, helpful, and free of harmful content? Did an agent select the right tool? Without visibility into these questions, silent failures can slip through and undercut intended outcomes—like reducing handoffs to human agents or automating repetitive decisions.

Datadog LLM Observability helps teams assess the quality and safety of their LLM applications by evaluating the inputs and outputs of model calls—both at the root level and within nested steps of a workflow. With this integration, you can:

Run built-in evaluations: Detect quality, safety, and security, issues like prompt injection, off-topic completions, or toxic content, with Datadog LLM Observability Evaluations
Submit custom evaluations: Visualize domain-specific quality metrics, such as whether an output matched expected formats or adhered to policy guidelines
Monitor guardrails: Inspect when and why content filters are triggered during execution.

These insights appear directly alongside latency, cost, and trace data—helping teams identify not just how an agent behaved, but whether it produced the right result.

How to get started

Datadog Bedrock Agent Observability is initially available for Python applications, with additional language support on the roadmap. Tracing Bedrock Agent invocations is handled by integrating Datadog’s ddtrace library into your application.

Prerequisites

An AWS account with Bedrock access enabled.
A python-base application using Amazon Bedrock. If needed, please see the examples in amazon-bedrock-samples.
A Datadog account and api key.

Instrumentation is accomplished with just a few steps, please consult the latest LLM Observability Python SDK Reference for full details. In most cases only 2 lines are required to add ddtrace to your application:

from ddtrace.llmobs import LLMObs
LLMObs.enable()

The ddtrace library can be configured using environment variables or at runtime passing values to the enable function. Please consult the SDK reference above and also the setup documentation for more details and customization options.

Finally, be sure to stop or remove any applications when you are finished to manage costs.

Conclusion

Datadog is an AWS Specialization Partner and AWS Marketplace Seller that has been building integrations with AWS services for over a decade, amassing a growing catalog of 100+ integrations. This new Amazon Bedrock Agents integration builds upon Datadog’s strong track record of AWS partnership success. For organizations looking to implement generative AI solutions, this capability provides essential observability tools to ensure their agentic AI applications built on AWS Bedrock Agents perform optimally and deliver business value.

To get started, see Datadog LLM Observability.

To learn more about how Datadog integrates with Amazon AI/ML services, see Monitor Amazon Bedrock with Datadog and Monitoring Amazon SageMaker with Datadog.

If you don’t already have a Datadog account, you can sign up for a free 14-day trial today.

About the authors

Nina Chen is a Customer Solutions Manager at AWS specializing in leading software companies to leverage the power of the AWS cloud to accelerate their product innovation and growth. With over 4 years of experience working in the strategic Independent Software Vendor (ISV) vertical, Nina enjoys guiding ISV partners through their cloud transformation journeys, helping them optimize their cloud infrastructure, driving product innovation, and delivering exceptional customer experiences.

Sujatha Kuppuraju is a Principal Solutions Architect at AWS, specializing in Cloud and, Generative AI Security. She collaborates with software companies’ leadership teams to architect secure, scalable solutions on AWS and guide strategic product development. Leveraging her expertise in cloud architecture and emerging technologies, Sujatha helps organizations optimize offerings, maintain robust security, and bring innovative products to market in an evolving tech landscape.

Jason Mimick is a Partner Solutions Architect at AWS supporting top customers and working closely with product, engineering, marketing, and sales teams daily. Jason focuses on enabling product development and sales success for partners and customers across all industries.

Mohammad Jama is a Product Marketing Manager at Datadog. He leads go-to-market for Datadog’s AWS integrations, working closely with product, marketing, and sales to help companies observe and secure their hybrid and AWS environments.

Yun Kim is a software engineer on Datadog’s LLM Observability team, where he specializes on developing client-side SDKs and integrations. He is excited about the development of trustworthy, transparent Generative AI models and frameworks.

Barry Eom is a Product Manager at Datadog, where he has launched and leads the development of AI/ML and LLM Observability solutions. He is passionate about enabling teams to create and productionize ethical and humane technologies.

Source: Read MoreÂ

CodeSOD: A Unique Way to Primary Key

BrowserStack launches Figma plugin for detecting accessibility issues in design phase

Parasoft brings agentic AI to service virtualization in latest release

Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

The best CRM software with email marketing in 2025: Expert tested and reviewed

This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

8 ways I quickly leveled up my Linux skills – and you can too

The Intersection of Agile and Accessibility – A Series on Designing for Everyone

The Intersection of Agile and Accessibility – A Series on Designing for Everyone

Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

Execute Ping Commands and Get Back Structured Data in PHP

A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

“I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

Monitor agents built on Amazon Bedrock with Datadog LLM Observability