Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      CodeSOD: A Unique Way to Primary Key

      July 22, 2025

      BrowserStack launches Figma plugin for detecting accessibility issues in design phase

      July 22, 2025

      Parasoft brings agentic AI to service virtualization in latest release

      July 22, 2025

      Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

      July 21, 2025

      The best CRM software with email marketing in 2025: Expert tested and reviewed

      July 22, 2025

      This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

      July 22, 2025

      I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

      July 22, 2025

      8 ways I quickly leveled up my Linux skills – and you can too

      July 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025
      Recent

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025

      Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

      July 22, 2025

      Execute Ping Commands and Get Back Structured Data in PHP

      July 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025
      Recent

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025

      “I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

      July 22, 2025

      Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

      July 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»How to Build a Powerful and Intelligent Question-Answering System by Using Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain Framework

    How to Build a Powerful and Intelligent Question-Answering System by Using Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain Framework

    May 18, 2025

    In this tutorial, we demonstrate how to build a powerful and intelligent question-answering system by combining the strengths of Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain framework. The pipeline leverages real-time web search using Tavily, semantic document caching with Chroma vector store, and contextual response generation through the Gemini model. These tools are integrated through LangChain’s modular components, such as RunnableLambda, ChatPromptTemplate, ConversationBufferMemory, and GoogleGenerativeAIEmbeddings. It goes beyond simple Q&A by introducing a hybrid retrieval mechanism that checks for cached embeddings before invoking fresh web searches. The retrieved documents are intelligently formatted, summarized, and passed through a structured LLM prompt, with attention to source attribution, user history, and confidence scoring. Key functions such as advanced prompt engineering, sentiment and entity analysis, and dynamic vector store updates make this pipeline suitable for advanced use cases like research assistance, domain-specific summarization, and intelligent agents.

    Copy CodeCopiedUse a different Browser
    !pip install -qU langchain-community tavily-python langchain-google-genai streamlit matplotlib pandas tiktoken chromadb langchain_core pydantic langchain

    We install and upgrade a comprehensive set of libraries required to build an advanced AI search assistant. It includes tools for retrieval (tavily-python, chromadb), LLM integration (langchain-google-genai, langchain), data handling (pandas, pydantic), visualization (matplotlib, streamlit), and tokenization (tiktoken). These components form the core foundation for constructing a real-time, context-aware QA system.

    Copy CodeCopiedUse a different Browser
    import os
    import getpass
    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    import json
    import time
    from typing import List, Dict, Any, Optional
    from datetime import datetime

    We import essential Python libraries used throughout the notebook. It includes standard libraries for environment variables, secure input, time tracking, and data types (os, getpass, time, typing, datetime). Additionally, it brings in core data science tools like pandas, matplotlib, and numpy for data handling, visualization, and numerical computations, as well as json for parsing structured data.

    Copy CodeCopiedUse a different Browser
    if "TAVILY_API_KEY" not in os.environ:
        os.environ["TAVILY_API_KEY"] = getpass.getpass("Enter Tavily API key: ")
       
    if "GOOGLE_API_KEY" not in os.environ:
        os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter Google API key: ")
    
    
    import logging
    logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
    logger = logging.getLogger(__name__)

    We securely initialize API keys for Tavily and Google Gemini by prompting users only if they’re not already set in the environment, ensuring safe and repeatable access to external services. It also configures a standardized logging setup using Python’s logging module, which helps monitor execution flow and capture debug or error messages throughout the notebook.

    Copy CodeCopiedUse a different Browser
    from langchain_community.retrievers import TavilySearchAPIRetriever
    from langchain_community.vectorstores import Chroma
    from langchain_core.documents import Document
    from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
    from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
    from langchain_core.runnables import RunnablePassthrough, RunnableLambda
    from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain.chains.summarize import load_summarize_chain
    from langchain.memory import ConversationBufferMemory

    We import key components from the LangChain ecosystem and its integrations. It brings in the TavilySearchAPIRetriever for real-time web search, Chroma for vector storage, and GoogleGenerativeAI modules for chat and embedding models. Core LangChain modules like ChatPromptTemplate, RunnableLambda, ConversationBufferMemory, and output parsers enable flexible prompt construction, memory handling, and pipeline execution.

    Copy CodeCopiedUse a different Browser
    class SearchQueryError(Exception):
        """Exception raised for errors in the search query."""
        pass
    
    
    def format_docs(docs):
        formatted_content = []
        for i, doc in enumerate(docs):
            metadata = doc.metadata
            source = metadata.get('source', 'Unknown source')
            title = metadata.get('title', 'Untitled')
            score = metadata.get('score', 0)
           
            formatted_content.append(
                f"Document {i+1} [Score: {score:.2f}]:n"
                f"Title: {title}n"
                f"Source: {source}n"
                f"Content: {doc.page_content}n"
            )
       
        return "nn".join(formatted_content)

    We define two essential components for search and document handling. The SearchQueryError class creates a custom exception to manage invalid or failed search queries gracefully. The format_docs function processes a list of retrieved documents by extracting metadata such as title, source, and relevance score and formatting them into a clean, readable string.

    Copy CodeCopiedUse a different Browser
    class SearchResultsParser:
        def parse(self, text):
            try:
                if isinstance(text, str):
                    import re
                    import json
                    json_match = re.search(r'{.*}', text, re.DOTALL)
                    if json_match:
                        json_str = json_match.group(0)
                        return json.loads(json_str)
                    return {"answer": text, "sources": [], "confidence": 0.5}
                elif hasattr(text, 'content'):
                    return {"answer": text.content, "sources": [], "confidence": 0.5}
                else:
                    return {"answer": str(text), "sources": [], "confidence": 0.5}
            except Exception as e:
                logger.warning(f"Failed to parse JSON: {e}")
                return {"answer": str(text), "sources": [], "confidence": 0.5}

    The SearchResultsParser class provides a robust method for extracting structured information from LLM responses. It attempts to parse a JSON-like string from the model output, returning to a plain text response format if parsing fails. It gracefully handles string outputs and message objects, ensuring consistent downstream processing. In case of errors, it logs a warning and returns a fallback response containing the raw answer, empty sources, and a default confidence score, enhancing the system’s fault tolerance.

    Copy CodeCopiedUse a different Browser
    class EnhancedTavilyRetriever:
        def __init__(self, api_key=None, max_results=5, search_depth="advanced", include_domains=None, exclude_domains=None):
            self.api_key = api_key
            self.max_results = max_results
            self.search_depth = search_depth
            self.include_domains = include_domains or []
            self.exclude_domains = exclude_domains or []
            self.retriever = self._create_retriever()
            self.previous_searches = []
           
        def _create_retriever(self):
            try:
                return TavilySearchAPIRetriever(
                    api_key=self.api_key,
                    k=self.max_results,
                    search_depth=self.search_depth,
                    include_domains=self.include_domains,
                    exclude_domains=self.exclude_domains
                )
            except Exception as e:
                logger.error(f"Failed to create Tavily retriever: {e}")
                raise
       
        def invoke(self, query, **kwargs):
            if not query or not query.strip():
                raise SearchQueryError("Empty search query")
           
            try:
                start_time = time.time()
                results = self.retriever.invoke(query, **kwargs)
                end_time = time.time()
               
                search_record = {
                    "timestamp": datetime.now().isoformat(),
                    "query": query,
                    "num_results": len(results),
                    "response_time": end_time - start_time
                }
                self.previous_searches.append(search_record)
               
                return results
            except Exception as e:
                logger.error(f"Search failed: {e}")
                raise SearchQueryError(f"Failed to perform search: {str(e)}")
       
        def get_search_history(self):
            return self.previous_searches

    The EnhancedTavilyRetriever class is a custom wrapper around the TavilySearchAPIRetriever, adding greater flexibility, control, and traceability to search operations. It supports advanced features like limiting search depth, domain inclusion/exclusion filters, and configurable result counts. The invoke method performs web searches and tracks each query’s metadata (timestamp, response time, and result count), storing it for later analysis.

    Copy CodeCopiedUse a different Browser
    class SearchCache:
        def __init__(self):
            self.embedding_function = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
            self.vector_store = None
            self.text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
           
        def add_documents(self, documents):
            if not documents:
                return
           
            try:
                if self.vector_store is None:
                    self.vector_store = Chroma.from_documents(
                        documents=documents,
                        embedding=self.embedding_function
                    )
                else:
                    self.vector_store.add_documents(documents)
            except Exception as e:
                logger.error(f"Failed to add documents to cache: {e}")
       
        def search(self, query, k=3):
            if self.vector_store is None:
                return []
           
            try:
                return self.vector_store.similarity_search(query, k=k)
            except Exception as e:
                logger.error(f"Vector search failed: {e}")
                return []

    The SearchCache class implements a semantic caching layer that stores and retrieves documents using vector embeddings for efficient similarity search. It uses GoogleGenerativeAIEmbeddings to convert documents into dense vectors and stores them in a Chroma vector database. The add_documents method initializes or updates the vector store, while the search method enables fast retrieval of the most relevant cached documents based on semantic similarity. This reduces redundant API calls and improves response times for repeated or related queries, serving as a lightweight hybrid memory layer in the AI assistant pipeline.

    Copy CodeCopiedUse a different Browser
    search_cache = SearchCache()
    enhanced_retriever = EnhancedTavilyRetriever(max_results=5)
    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
    
    
    system_template = """You are a research assistant that provides accurate answers based on the search results provided.
    Follow these guidelines:
    1. Only use the context provided to answer the question
    2. If the context doesn't contain the answer, say "I don't have sufficient information to answer this question."
    3. Cite your sources by referencing the document numbers
    4. Don't make up information
    5. Keep the answer concise but complete
    
    
    Context: {context}
    Chat History: {chat_history}
    """
    
    
    system_message = SystemMessagePromptTemplate.from_template(system_template)
    human_template = "Question: {question}"
    human_message = HumanMessagePromptTemplate.from_template(human_template)
    
    
    prompt = ChatPromptTemplate.from_messages([system_message, human_message])
    

    We initialize the core components of the AI assistant: a semantic SearchCache, the EnhancedTavilyRetriever for web-based querying, and a ConversationBufferMemory to retain chat history across turns. It also defines a structured prompt using ChatPromptTemplate, guiding the LLM to act as a research assistant. The prompt enforces strict rules for factual accuracy, context usage, source citation, and concise answering, ensuring reliable and grounded responses.

    Copy CodeCopiedUse a different Browser
    def get_llm(model_name="gemini-2.0-flash-lite", temperature=0.2, response_mode="json"):
        try:
            return ChatGoogleGenerativeAI(
                model=model_name,
                temperature=temperature,
                convert_system_message_to_human=True,
                top_p=0.95,
                top_k=40,
                max_output_tokens=2048
            )
        except Exception as e:
            logger.error(f"Failed to initialize LLM: {e}")
            raise
    
    
    output_parser = SearchResultsParser()
    

    We define the get_llm function, which initializes a Google Gemini language model with configurable parameters such as model name, temperature, and decoding settings (e.g., top_p, top_k, and max tokens). It ensures robustness with error handling for failed model initialization. An instance of SearchResultsParser is also created to standardize and structure the LLM’s raw responses, enabling consistent downstream processing of answers and metadata.

    Copy CodeCopiedUse a different Browser
    def plot_search_metrics(search_history):
        if not search_history:
            print("No search history available")
            return
       
        df = pd.DataFrame(search_history)
       
        plt.figure(figsize=(12, 6))
        plt.subplot(1, 2, 1)
        plt.plot(range(len(df)), df['response_time'], marker='o')
        plt.title('Search Response Times')
        plt.xlabel('Search Index')
        plt.ylabel('Time (seconds)')
        plt.grid(True)
       
        plt.subplot(1, 2, 2)
        plt.bar(range(len(df)), df['num_results'])
        plt.title('Number of Results per Search')
        plt.xlabel('Search Index')
        plt.ylabel('Number of Results')
        plt.grid(True)
       
        plt.tight_layout()
        plt.show()
    

    The plot_search_metrics function visualizes performance trends from past queries using Matplotlib. It converts the search history into a DataFrame and plots two subgraphs: one showing response time per search and the other displaying the number of results returned. This aids in analyzing the system’s efficiency and search quality over time, helping developers fine-tune the retriever or identify bottlenecks in real-world usage.

    Copy CodeCopiedUse a different Browser
    def retrieve_with_fallback(query):
        cached_results = search_cache.search(query)
       
        if cached_results:
            logger.info(f"Retrieved {len(cached_results)} documents from cache")
            return cached_results
       
        logger.info("No cache hit, performing web search")
        search_results = enhanced_retriever.invoke(query)
       
        search_cache.add_documents(search_results)
       
        return search_results
    
    
    def summarize_documents(documents, query):
        llm = get_llm(temperature=0)
       
        summarize_prompt = ChatPromptTemplate.from_template(
            """Create a concise summary of the following documents related to this query: {query}
           
            {documents}
           
            Provide a comprehensive summary that addresses the key points relevant to the query.
            """
        )
       
        chain = (
            {"documents": lambda docs: format_docs(docs), "query": lambda _: query}
            | summarize_prompt
            | llm
            | StrOutputParser()
        )
       
        return chain.invoke(documents)

    These two functions enhance the assistant’s intelligence and efficiency. The retrieve_with_fallback function implements a hybrid retrieval mechanism: it first attempts to fetch semantically relevant documents from the local Chroma cache and, if unsuccessful, falls back to a real-time Tavily web search, caching the new results for future use. Meanwhile, summarize_documents leverages a Gemini LLM to generate concise summaries from retrieved documents, guided by a structured prompt that ensures relevance to the query. Together, they enable low-latency, informative, and context-aware responses.

    Copy CodeCopiedUse a different Browser
    def advanced_chain(query_engine="enhanced", model="gemini-1.5-pro", include_history=True):
        llm = get_llm(model_name=model)
       
        if query_engine == "enhanced":
            retriever = lambda query: retrieve_with_fallback(query)
        else:
            retriever = enhanced_retriever.invoke
       
        def chain_with_history(input_dict):
            query = input_dict["question"]
            chat_history = memory.load_memory_variables({})["chat_history"] if include_history else []
           
            docs = retriever(query)
           
            context = format_docs(docs)
           
            result = prompt.invoke({
                "context": context,
                "question": query,
                "chat_history": chat_history
            })
           
            memory.save_context({"input": query}, {"output": result.content})
           
            return llm.invoke(result)
       
        return RunnableLambda(chain_with_history) | StrOutputParser()

    The advanced_chain function defines a modular, end-to-end reasoning workflow for answering user queries using cached or real-time search. It initializes the specified Gemini model, selects the retrieval strategy (cached fallback or direct search), constructs a response pipeline incorporating chat history (if enabled), formats documents into context, and prompts the LLM using a system-guided template. The chain also logs the interaction in memory and returns the final answer, parsed into clean text. This design enables flexible experimentation with models and retrieval strategies while maintaining conversation coherence.

    Copy CodeCopiedUse a different Browser
    qa_chain = advanced_chain()
    
    
    def analyze_query(query):
        llm = get_llm(temperature=0)
       
        analysis_prompt = ChatPromptTemplate.from_template(
            """Analyze the following query and provide:
            1. Main topic
            2. Sentiment (positive, negative, neutral)
            3. Key entities mentioned
            4. Query type (factual, opinion, how-to, etc.)
           
            Query: {query}
           
            Return the analysis in JSON format with the following structure:
            {{
                "topic": "main topic",
                "sentiment": "sentiment",
                "entities": ["entity1", "entity2"],
                "type": "query type"
            }}
            """
        )
       
        chain = analysis_prompt | llm | output_parser
       
        return chain.invoke({"query": query})
    
    
    print("Advanced Tavily-Gemini Implementation")
    print("="*50)
    
    
    query = "what year was breath of the wild released and what was its reception?"
    print(f"Query: {query}")

    We initialize the final components of the intelligent assistant. qa_chain is the assembled reasoning pipeline ready to process user queries using retrieval, memory, and Gemini-based response generation. The analyze_query function performs a lightweight semantic analysis on a query, extracting the main topic, sentiment, entities, and query type using the Gemini model and a structured JSON prompt. The example query, about Breath of the Wild’s release and reception, showcases how the assistant is triggered and prepared for full-stack inference and semantic interpretation. The printed heading marks the start of interactive execution.

    Copy CodeCopiedUse a different Browser
    try:
        print("nSearching for answer...")
        answer = qa_chain.invoke({"question": query})
        print("nAnswer:")
        print(answer)
       
        print("nAnalyzing query...")
        try:
            query_analysis = analyze_query(query)
            print("nQuery Analysis:")
            print(json.dumps(query_analysis, indent=2))
        except Exception as e:
            print(f"Query analysis error (non-critical): {e}")
    except Exception as e:
        print(f"Error in search: {e}")
    
    
    history = enhanced_retriever.get_search_history()
    print("nSearch History:")
    for i, h in enumerate(history):
        print(f"{i+1}. Query: {h['query']} - Results: {h['num_results']} - Time: {h['response_time']:.2f}s")
    
    
    print("nAdvanced search with domain filtering:")
    specialized_retriever = EnhancedTavilyRetriever(
        max_results=3,
        search_depth="advanced",
        include_domains=["nintendo.com", "zelda.com"],
        exclude_domains=["reddit.com", "twitter.com"]
    )
    
    
    try:
        specialized_results = specialized_retriever.invoke("breath of the wild sales")
        print(f"Found {len(specialized_results)} specialized results")
       
        summary = summarize_documents(specialized_results, "breath of the wild sales")
        print("nSummary of specialized results:")
        print(summary)
    except Exception as e:
        print(f"Error in specialized search: {e}")
    
    
    print("nSearch Metrics:")
    plot_search_metrics(history)
    

    We demonstrate the complete pipeline in action. It performs a search using the qa_chain, displays the generated answer, and then analyzes the query for sentiment, topic, entities, and type. It also retrieves and prints each query’s search history, response time, and result count. Also, it runs a domain-filtered search focused on Nintendo-related sites, summarizes the results, and visualizes search performance using plot_search_metrics, offering a comprehensive view of the assistant’s capabilities in real-time use.

    In conclusion, following this tutorial gives users a comprehensive blueprint for creating a highly capable, context-aware, and scalable RAG system that bridges real-time web intelligence with conversational AI. The Tavily Search API lets users directly pull fresh and relevant content from the web. The Gemini LLM adds robust reasoning and summarization capabilities, while LangChain’s abstraction layer allows seamless orchestration between memory, embeddings, and model outputs. The implementation includes advanced features such as domain-specific filtering, query analysis (sentiment, topic, and entity extraction), and fallback strategies using a semantic vector cache built with Chroma and GoogleGenerativeAIEmbeddings. Also, structured logging, error handling, and analytics dashboards provide transparency and diagnostics for real-world deployment.


    Check out the Colab Notebook. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit.

    The post How to Build a Powerful and Intelligent Question-Answering System by Using Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain Framework appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous Articlelinkding is a self-hosted bookmark manager
    Next Article SWE-Bench Performance Reaches 50.8% Without Tool Use: A Case for Monolithic State-in-Context Agents

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 22, 2025
    Machine Learning

    Boolformer: Symbolic Regression of Logic Functions with Transformers

    July 22, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Critical Cisco ISE Auth Bypass Flaw Impacts Cloud Deployments on AWS, Azure, and OCI

    Development

    A beginner’s guide to Retrieval-Augmented Generation (RAG)

    Development

    Freelancer vs React.js Development Agency: Who Should Build Your Web App? (2025)

    Tech & Work

    Microsoft killed Xbox VR — and this latest PlayStation PSVR news shows they were probably right to have done so

    News & Updates

    Highlights

    Digital Accessibility Is Rising: Here’s How APAC and LATAM Are Leading the Shift

    June 23, 2025

    The blog discusses how accessibility laws in APAC and Latin America are evolving, making compliance a business-critical need. It also explores regional legal updates and how AI-powered accessibility testing helps ensure inclusion, reduce risk and support ethical, user-friendly design.
    The post Digital Accessibility Is Rising: Here’s How APAC and LATAM Are Leading the Shift first appeared on TestingXperts.

    AboutHimachal

    July 15, 2025

    CVE-2025-6177 – Google ChromeOS MiniOS Debug Shell Privilege Escalation

    June 16, 2025

    CVE-2025-5424 – Juzaweb CMS Remote File Access Control Vulnerability

    June 2, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.