Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      CodeSOD: A Unique Way to Primary Key

      July 22, 2025

      BrowserStack launches Figma plugin for detecting accessibility issues in design phase

      July 22, 2025

      Parasoft brings agentic AI to service virtualization in latest release

      July 22, 2025

      Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

      July 21, 2025

      The best CRM software with email marketing in 2025: Expert tested and reviewed

      July 22, 2025

      This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

      July 22, 2025

      I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

      July 22, 2025

      8 ways I quickly leveled up my Linux skills – and you can too

      July 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025
      Recent

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025

      Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

      July 22, 2025

      Execute Ping Commands and Get Back Structured Data in PHP

      July 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025
      Recent

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025

      “I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

      July 22, 2025

      Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

      July 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Build a just-in-time knowledge base with Amazon Bedrock

    Build a just-in-time knowledge base with Amazon Bedrock

    July 7, 2025

    Software as a service (SaaS) companies managing multiple tenants face a critical challenge: efficiently extracting meaningful insights from vast document collections while controlling costs. Traditional approaches often lead to unnecessary spending on unused storage and processing resources, impacting both operational efficiency and profitability. Organizations need solutions that intelligently scale processing and storage resources based on actual tenant usage patterns while maintaining data isolation. Traditional Retrieval Augmented Generation (RAG) systems consume valuable resources by ingesting and maintaining embeddings for documents that might never be queried, resulting in unnecessary storage costs and reduced system efficiency. Systems designed to handle large amounts of small to mid-sized tenants can exceed cost structure and infrastructure limits or might need to use silo-style deployments to keep each tenant’s information and usage separate. Adding to this complexity, many projects are transitory in nature, with work being completed on an intermittent basis, leading to data occupying space in knowledge base systems that could be used by other active tenants.

    To address these challenges, this post presents a just-in-time knowledge base solution that reduces unused consumption through intelligent document processing. The solution processes documents only when needed and automatically removes unused resources, so organizations can scale their document repositories without proportionally increasing infrastructure costs.

    With a multi-tenant architecture with configurable limits per tenant, service providers can offer tiered pricing models while maintaining strict data isolation, making it ideal for SaaS applications serving multiple clients with varying needs. Automatic document expiration through Time-to-Live (TTL) makes sure the system remains lean and focused on relevant content, while refreshing the TTL for frequently accessed documents maintains optimal performance for information that matters. This architecture also makes it possible to limit the number of files each tenant can ingest at a specific time and the rate at which tenants can query a set of files.This solution uses serverless technologies to alleviate operational overhead and provide automatic scaling, so teams can focus on business logic rather than infrastructure management. By organizing documents into groups with metadata-based filtering, the system enables contextual querying that delivers more relevant results while maintaining security boundaries between tenants.The architecture’s flexibility supports customization of tenant configurations, query rates, and document retention policies, making it adaptable to evolving business requirements without significant rearchitecting.

    Solution overview

    This architecture combines several AWS services to create a cost-effective, multi-tenant knowledge base solution that processes documents on demand. The key components include:

    • Vector-based knowledge base – Uses Amazon Bedrock and Amazon OpenSearch Serverless for efficient document processing and querying
    • On-demand document ingestion – Implements just-in-time processing using the Amazon Bedrock CUSTOM data source type
    • TTL management – Provides automatic cleanup of unused documents using the TTL feature in Amazon DynamoDB
    • Multi-tenant isolation – Enforces secure data separation between users and organizations with configurable resource limits

    The solution enables granular control through metadata-based filtering at the user, tenant, and file level. The DynamoDB TTL tracking system supports tiered pricing structures, where tenants can pay for different TTL durations, document ingestion limits, and query rates.

    The following diagram illustrates the key components and workflow of the solution.

    Multi-tier AWS serverless architecture diagram showcasing data flow and integration of various AWS services

    The workflow consists of the following steps:

    1. The user logs in to the system, which attaches a tenant ID to the current user for calls to the Amazon Bedrock knowledge base. This authentication step is crucial because it establishes the security context and makes sure subsequent interactions are properly associated with the correct tenant. The tenant ID becomes the foundational piece of metadata that enables proper multi-tenant isolation and resource management throughout the entire workflow.
    2. After authentication, the user creates a project that will serve as a container for the files they want to query. This project creation step establishes the organizational structure needed to manage related documents together. The system generates appropriate metadata and creates the necessary database entries to track the project’s association with the specific tenant, enabling proper access control and resource management at the project level.
    3. With a project established, the user can begin uploading files. The system manages this process by generating pre-signed URLs for secure file upload. As files are uploaded, they are stored in Amazon Simple Storage Service (Amazon S3), and the system automatically creates entries in DynamoDB that associate each file with both the project and the tenant. This three-way relationship (file-project-tenant) is essential for maintaining proper data isolation and enabling efficient querying later.
    4. When a user requests to create a chat with a knowledge base for a specific project, the system begins ingesting the project files using the CUSTOM data source. This is where the just-in-time processing begins. During ingestion, the system applies a TTL value based on the tenant’s tier-specific TTL interval. The TTL makes sure project files remain available during the chat session while setting up the framework for automatic cleanup later. This step represents the core of the on-demand processing strategy, because files are only processed when they are needed.
    5. Each chat session actively updates the TTL for the project files being used. This dynamic TTL management makes sure frequently accessed files remain in the knowledge base while allowing rarely used files to expire naturally. The system continually refreshes the TTL values based on actual usage, creating an efficient balance between resource availability and cost optimization. This approach maintains optimal performance for actively used content while helping to prevent resource waste on unused documents.
    6. After the chat session ends and the TTL value expires, the system automatically removes files from the knowledge base. This cleanup process is triggered by Amazon DynamoDB Streams monitoring TTL expiration events, which activate an AWS Lambda function to remove the expired documents. This final step reduces the load on the underlying OpenSearch Serverless cluster and optimizes system resources, making sure the knowledge base remains lean and efficient.

    Prerequisites

    You need the following prerequisites before you can proceed with solution. For this post, we use the us-east-1 AWS Region.

    • An active AWS account with permissions to create resources in us-east-1
    • The AWS Command Line Interface (AWS CLI) installed
    • The AWS Cloud Development Kit (AWS CDK) installed
    • Git installed to clone the repository

    Deploy the solution

    Complete the following steps to deploy the solution:

    1. Download the AWS CDK project from the GitHub repo.
    2. Install the project dependencies:
      npm run install:all
    3. Deploy the solution:
      npm run deploy
    4. Create a user and log in to the system after validating your email.

    Validate the knowledge base and run a query

    Before allowing users to chat with their documents, the system performs the following steps:

    • Performs a validation check to determine if documents need to be ingested. This process happens transparently to the user and includes checking document status in DynamoDB and the knowledge base.
    • Validates that the required documents are successfully ingested and properly indexed before allowing queries.
    • Returns both the AI-generated answers and relevant citations to source documents, maintaining traceability and empowering users to verify the accuracy of responses.

    The following screenshot illustrates an example of chatting with the documents.

    AWS Just In Time Knowledge Base interface displaying project files and AI-powered question-answering feature

    Looking at the following example method for file ingestion, note how file information is stored in DynamoDB with a TTL value for automatic expiration. The ingest knowledge base documents call includes essential metadata (user ID, tenant ID, and project), enabling precise filtering of this tenant’s files in subsequent operations.

    # Ingesting files with tenant-specific TTL values
    def ingest_files(user_id, tenant_id, project_id, files):
        # Get tenant configuration and calculate TTL
        tenants = json.loads(os.environ.get('TENANTS'))['Tenants']
        tenant = find_tenant(tenant_id, tenants)
        ttl = int(time.time()) + (int(tenant['FilesTTLHours']) * 3600)
        
        # For each file, create a record with TTL and start ingestion
        for file in files:
            file_id = file['id']
            s3_key = file.get('s3Key')
            bucket = file.get('bucket')
            
            # Create a record in the knowledge base files table with TTL
            knowledge_base_files_table.put_item(
                Item={
                    'id': file_id,
                    'userId': user_id,
                    'tenantId': tenant_id,
                    'projectId': project_id,
                    'documentStatus': 'ready',
                    'createdAt': int(time.time()),
                    'ttl': ttl  # TTL value for automatic expiration
                }
            )
            
            # Start the ingestion job with tenant, user, and project metadata for filtering
            bedrock_agent.ingest_knowledge_base_documents(
                knowledgeBaseId=KNOWLEDGE_BASE_ID,
                dataSourceId=DATA_SOURCE_ID,
                clientToken=str(uuid.uuid4()),
                documents=[
                    {
                        'content': {
                            'dataSourceType': 'CUSTOM',
                            'custom': {
                                'customDocumentIdentifier': {
                                    'id': file_id
                                },
                                's3Location': {
                                    'uri': f"s3://{bucket}/{s3_key}"
                                },
                                'sourceType': 'S3_LOCATION'
                            }
                        },
                        'metadata': {
                            'type': 'IN_LINE_ATTRIBUTE',
                            'inlineAttributes': [
                                {'key': 'userId', 'value': {'stringValue': user_id, 'type': 'STRING'}},
                                {'key': 'tenantId', 'value': {'stringValue': tenant_id, 'type': 'STRING'}},
                                {'key': 'projectId', 'value': {'stringValue': project_id, 'type': 'STRING'}},
                                {'key': 'fileId', 'value': {'stringValue': file_id, 'type': 'STRING'}}
                            ]
                        }
                    }
                ]
            )

    During a query, you can use the associated metadata to construct parameters that make sure you only retrieve files belonging to this specific tenant. For example:

        filter_expression = {
            "andAll": [
                {
                    "equals": {
                        "key": "tenantId",
                        "value": tenant_id
                    }
                },
                {
                    "equals": {
                        "key": "projectId",
                        "value": project_id
                    }
                },
                {
                    "in": {
                        "key": "fileId",
                        "value": file_ids
                    }
                }
            ]
        }
    
        # Create base parameters for the API call
        retrieve_params = {
            'input': {
                'text': query
            },
            'retrieveAndGenerateConfiguration': {
                'type': 'KNOWLEDGE_BASE',
                'knowledgeBaseConfiguration': {
                    'knowledgeBaseId': knowledge_base_id,
                    'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-pro-v1:0',
                    'retrievalConfiguration': {
                        'vectorSearchConfiguration': {
                            'numberOfResults': limit,
                            'filter': filter_expression
                        }
                    }
                }
            }
        }
        response = bedrock_agent_runtime.retrieve_and_generate(**retrieve_params)

    Manage the document lifecycle with TTL

    To further optimize resource usage and costs, you can implement an intelligent document lifecycle management system using the DynamoDB (TTL) feature. This consists of the following steps:

    1. When a document is ingested into the knowledge base, a record is created with a configurable TTL value.
    2. This TTL is refreshed when the document is accessed.
    3. DynamoDB Streams with specific filters for TTL expiration events is used to trigger a cleanup Lambda function.
    4. The Lambda function removes expired documents from the knowledge base.

    See the following code:

    # Lambda function triggered by DynamoDB Streams when TTL expires items
    def lambda_handler(event, context):
        """
        This function is triggered by DynamoDB Streams when TTL expires items.
        It removes expired documents from the knowledge base.
        """
        
        # Process each record in the event
        for record in event.get('Records', []):
            # Check if this is a TTL expiration event (REMOVE event from DynamoDB Stream)
            if record.get('eventName') == 'REMOVE':
                # Check if this is a TTL expiration
                user_identity = record.get('userIdentity', {})
                if user_identity.get('type') == 'Service' and user_identity.get('principalId') == 'dynamodb.amazonaws.com':
                    # Extract the file ID and tenant ID from the record
                    keys = record.get('dynamodb', {}).get('Keys', {})
                    file_id = keys.get('id', {}).get('S')
                    
                    # Delete the document from the knowledge base
                    bedrock_agent.delete_knowledge_base_documents(
                        clientToken=str(uuid.uuid4()),
                        knowledgeBaseId=knowledge_base_id,
                        dataSourceId=data_source_id,
                        documentIdentifiers=[
                            {
                                'custom': {
                                    'id': file_id
                                },
                                'dataSourceType': 'CUSTOM'
                            }
                        ]
                    )

    Multi-tenant isolation with tiered service levels

    Our architecture enables sophisticated multi-tenant isolation with tiered service levels:

    • Tenant-specific document filtering – Each query includes user, tenant, and file-specific filters, allowing the system to reduce the number of documents being queried.
    • Configurable TTL values – Different tenant tiers can have different TTL configurations. For example:
      • Free tier: Five documents ingested with a 7-day TTL and five queries per minute.
      • Standard tier: 100 documents ingested with a 30-day TTL and 10 queries per minute.
      • Premium tier: 1,000 documents ingested with a 90-day TTL and 50 queries per minute.
      • You can configure additional limits, such as total queries per month or total ingested files per day or month.

    Clean up

    To clean up the resources created in this post, run the following command from the same location where you performed the deploy step:

    npm run destroy

    Conclusion

    The just-in-time knowledge base architecture presented in this post transforms document management across multiple tenants by processing documents only when queried, reducing the unused consumption of traditional RAG systems. This serverless implementation uses Amazon Bedrock, OpenSearch Serverless, and the DynamoDB TTL feature to create a lean system with intelligent document lifecycle management, configurable tenant limits, and strict data isolation, which is essential for SaaS providers offering tiered pricing models.

    This solution directly addresses cost structure and infrastructure limitations of traditional systems, particularly for deployments handling numerous small to mid-sized tenants with transitory projects. This architecture combines on-demand document processing with automated lifecycle management, delivering a cost-effective, scalable resource that empowers organizations to focus on extracting insights rather than managing infrastructure, while maintaining security boundaries between tenants.

    Ready to implement this architecture? The full sample code is available in the GitHub repository.


    About the author

    Steven Warwick is a Senior Solutions Architect at AWS, where he leads customer engagements to drive successful cloud adoption and specializes in SaaS architectures and Generative AI solutions. He produces educational content including blog posts and sample code to help customers implement best practices, and has led programs on GenAI topics for solution architects. Steven brings decades of technology experience to his role, helping customers with architectural reviews, cost optimization, and proof-of-concept development.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleQwen3 family of reasoning models now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart
    Next Article Agents as escalators: Real-time AI video monitoring with Amazon Bedrock Agents and video streams

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 22, 2025
    Machine Learning

    Boolformer: Symbolic Regression of Logic Functions with Transformers

    July 22, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-4819 – Y_Project RuoYi Remote Improper Authorization Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-46815 – ZITADEL IdP Intent Session Token Abuse

    Common Vulnerabilities and Exposures (CVEs)

    LastOSLinux: Una Versione Ottimizzata di Linux Mint 22.1 per Utenti Windows

    Linux

    Evaluate models or RAG systems using Amazon Bedrock Evaluations – Now generally available

    Machine Learning

    Highlights

    Using Manim For Making UI Animations Tech & Work

    Using Manim For Making UI Animations

    April 8, 2025

    Say you are learning to code for the first time, in Python, for example, which…

    Designer Spotlight: Bimo Tri

    June 26, 2025

    Everything you need to know about Lottie animations

    April 28, 2025

    CVE-2025-53371 – DiscordNotifications SSRF and DOS

    July 10, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.