Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      CodeSOD: A Unique Way to Primary Key

      July 22, 2025

      BrowserStack launches Figma plugin for detecting accessibility issues in design phase

      July 22, 2025

      Parasoft brings agentic AI to service virtualization in latest release

      July 22, 2025

      Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

      July 21, 2025

      The best CRM software with email marketing in 2025: Expert tested and reviewed

      July 22, 2025

      This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

      July 22, 2025

      I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

      July 22, 2025

      8 ways I quickly leveled up my Linux skills – and you can too

      July 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025
      Recent

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025

      Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

      July 22, 2025

      Execute Ping Commands and Get Back Structured Data in PHP

      July 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025
      Recent

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025

      “I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

      July 22, 2025

      Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

      July 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Build scalable containerized RAG based generative AI applications in AWS using Amazon EKS with Amazon Bedrock

    Build scalable containerized RAG based generative AI applications in AWS using Amazon EKS with Amazon Bedrock

    May 13, 2025

    Generative artificial intelligence (AI) applications are commonly built using a technique called Retrieval Augmented Generation (RAG) that provides foundation models (FMs) access to additional data they didn’t have during training. This data is used to enrich the generative AI prompt to deliver more context-specific and accurate responses without continuously retraining the FM, while also improving transparency and minimizing hallucinations.

    In this post, we demonstrate a solution using Amazon Elastic Kubernetes Service (EKS) with Amazon Bedrock to build scalable and containerized RAG solutions for your generative AI applications on AWS while bringing your unstructured user file data to Amazon Bedrock in a straightforward, fast, and secure way.

    Amazon EKS provides a scalable, secure, and cost-efficient environment for building RAG applications with Amazon Bedrock and also enables efficient deployment and monitoring of AI-driven workloads while leveraging Bedrock’s FMs for inference. It enhances performance with optimized compute instances, auto-scales GPU workloads while reducing costs via Amazon EC2 Spot Instances and AWS Fargate and provides enterprise-grade security via native AWS mechanisms such as Amazon VPC networking and AWS IAM.

    Our solution uses Amazon S3 as the source of unstructured data and populates an Amazon OpenSearch Serverless vector database via the use of Amazon Bedrock Knowledge Bases with the user’s existing files and folders and associated metadata. This enables a RAG scenario with Amazon Bedrock by enriching the generative AI prompt using Amazon Bedrock APIs with your company-specific data retrieved from the OpenSearch Serverless vector database.

    Solution overview

    The solution uses Amazon EKS managed node groups to automate the provisioning and lifecycle management of nodes (Amazon EC2 instances) for the Amazon EKS Kubernetes cluster. Every managed node in the cluster is provisioned as part of an Amazon EC2 Auto Scaling group that’s managed for you by EKS.

    The EKS cluster consists of a Kubernetes deployment that runs across two Availability Zones for high availability where each node in the deployment hosts multiple replicas of a Bedrock RAG container image registered and pulled from Amazon Elastic Container Registry (ECR). This setup makes sure that resources are used efficiently, scaling up or down based on the demand. The Horizontal Pod Autoscaler (HPA) is set up to further scale the number of pods in our deployment based on their CPU utilization.

    The RAG Retrieval Application container uses Bedrock Knowledge Bases APIs and Anthropic’s Claude 3.5 Sonnet LLM hosted on Bedrock to implement a RAG workflow. The solution provides the end user with a scalable endpoint to access the RAG workflow using a Kubernetes service that is fronted by an Amazon Application Load Balancer (ALB) provisioned via an EKS ingress controller.

    The RAG Retrieval Application container orchestrated by EKS enables RAG with Amazon Bedrock by enriching the generative AI prompt received from the ALB endpoint with data retrieved from an OpenSearch Serverless index that is synced via Bedrock Knowledge Bases from your company-specific data uploaded to Amazon S3.

    The following architecture diagram illustrates the various components of our solution:

    Prerequisites

    Complete the following prerequisites:

    1. Ensure model access in Amazon Bedrock. In this solution, we use Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock.
    2. Install the AWS Command Line Interface (AWS CLI).
    3. Install Docker.
    4. Install Kubectl.
    5. Install Terraform.

    Deploy the solution

    The solution is available for download on the GitHub repo. Cloning the repository and using the Terraform template will provision the components with their required configurations:

    1. Clone the Git repository:
      sudo yum install -y unzip
      git clone https://github.com/aws-samples/genai-bedrock-serverless.git
      cd eksbedrock/terraform
    2. From the terraform folder, deploy the solution using Terraform:
      terraform init
      terraform apply -auto-approve

    Configure EKS

    1. Configure a secret for the ECR registry:
      aws ecr get-login-password 
      --region <aws_region> | docker login 
      --username AWS 
      --password-stdin <your account id>.dkr.ecr.<your account region>.amazonaws.com/bedrockragrepodocker pull <your account id>.dkr.ecr.<aws_region>.amazonaws.com/bedrockragrepo:latestaws eks update-kubeconfig 
      --region <aws_region> 
      --name eksbedrockkubectl create secret docker-registry ecr-secret  
      --docker-server=<your account id>.dkr.ecr.<aws_region>.amazonaws.com 
      --docker-username=AWS 
      --docker-password=$(aws ecr get-login-password --region <aws_region>)
    2. Navigate to the kubernetes/ingress folder:
      • Make sure that the AWS_Region variable in the bedrockragconfigmap.yaml file points to your AWS region.
      • Replace the image URI in line 20 of the bedrockragdeployment.yaml file with the image URI of your bedrockrag image from your ECR repository.
    3. Provision the EKS deployment, service and ingress:
      cd ..
      kubectl apply -f ingress/

    Create a knowledge base and upload data

    To create a knowledge base and upload data, follow these steps:

    1. Create an S3 bucket and upload your data into the bucket. In our blog post, we uploaded these two files, Amazon Bedrock User Guide and the Amazon FSx for ONTAP User Guide, into our S3 bucket.
    2. Create an Amazon Bedrock knowledge base. Follow the steps here to create a knowledge base. Accept all the defaults including using the Quick create a new vector store option in Step 7 of the instructions that creates an Amazon OpenSearch Serverless vector search collection as your knowledge base.
      1. In Step 5c of the instructions to create a knowledge base, provide the S3 URI of the object containing the files for the data source for the knowledge base
      2. Once the knowledge base is provisioned, obtain the Knowledge Base ID from the Bedrock Knowledge Bases console for your newly created knowledge base.

    Query using the Application Load Balancer

    You can query the model directly using the API front end provided by the AWS ALB provisioned by the Kubernetes (EKS) Ingress Controller. Navigate to the AWS ALB console and obtain the DNS name for your ALB to use as your API:

    curl -X POST "<ALB DNS name>/query" 
    
    -H "Content-Type: application/json" 
    
    -d '{"prompt": "What is a bedrock knowledgebase?", "kbId": "<Knowledge Base ID>"}'

    Cleanup

    To avoid recurring charges, clean up your account after trying the solution:

    1. From the terraform folder, delete the Terraform template for the solution:
      terraform apply --destroy 
    2. Delete the Amazon Bedrock knowledge base. From the Amazon Bedrock console, select the knowledge base you created in this solution, select Delete, and follow the steps to delete the knowledge base.

    Conclusion

    In this post, we demonstrated a solution that uses Amazon EKS with Amazon Bedrock and provides you with a framework to build your own containerized, automated, scalable, and highly available RAG-based generative AI applications on AWS. Using Amazon S3 and Amazon Bedrock Knowledge Bases, our solution automates bringing your unstructured user file data to Amazon Bedrock within the containerized framework. You can use the approach demonstrated in this solution to automate and containerize your AI-driven workloads while using Amazon Bedrock FMs for inference with built-in efficient deployment, scalability, and availability from a Kubernetes-based containerized deployment.

    For more information about how to get started building with Amazon Bedrock and EKS for RAG scenarios, refer to the following resources:

    • Amazon Bedrock Workshop GitHub repo
    • Amazon EKS Workshop
    • Build RAG-based generative AI applications in AWS using Amazon Bedrock and Amazon FSx for NetApp ONTAP

    About the Authors

    Kanishk Mahajan is Principal, Solutions Architecture at AWS. He leads cloud transformation and solution architecture for AWS customers and partners. Kanishk specializes in containers, cloud operations, migrations and modernizations, AI/ML, resilience and security and compliance. He is a Technical Field Community (TFC) member in each of those domains at AWS.

    Sandeep Batchu is a Senior Security Architect at Amazon Web Services, with extensive experience in software engineering, solutions architecture, and cybersecurity. Passionate about bridging business outcomes with technological innovation, Sandeep guides customers through their cloud journey, helping them design and implement secure, scalable, flexible, and resilient cloud architectures.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleSecuring Amazon Bedrock Agents: A guide to safeguarding against indirect prompt injections
    Next Article How Hexagon built an AI assistant using AWS generative AI services

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 22, 2025
    Machine Learning

    Boolformer: Symbolic Regression of Logic Functions with Transformers

    July 22, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-6361 – Simple Pizza Ordering System SQL Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-40619 – Bookgy Authentication Bypass

    Common Vulnerabilities and Exposures (CVEs)

    This would be my favorite wireless controller of all time if it wasn’t STILL wired-only on Xbox — but that could actually change

    News & Updates

    Solo.io Launches Agent Gateway and Introduces Agent Mesh for Unified AI Connectivity

    Tech & Work

    Highlights

    CVE-2025-46524 – Stesvis WordPress CSRF Stored XSS

    April 24, 2025

    CVE ID : CVE-2025-46524

    Published : April 24, 2025, 4:15 p.m. | 2 hours, 44 minutes ago

    Description : Cross-Site Request Forgery (CSRF) vulnerability in stesvis WP Filter Post Category allows Stored XSS. This issue affects WP Filter Post Category: from n/a through 2.1.4.

    Severity: 7.1 | HIGH

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Wing FTP Server Vulnerability Actively Exploited – 2000+ Servers Exposed Online

    July 15, 2025

    CVE-2024-51099 – PHPGURUKUL Medical Card Generation System Reflected XSS

    May 23, 2025

    How to Fix Stellar Blade PC Common Issues (2025 Guide)

    June 17, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.