Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      CodeSOD: A Unique Way to Primary Key

      July 22, 2025

      BrowserStack launches Figma plugin for detecting accessibility issues in design phase

      July 22, 2025

      Parasoft brings agentic AI to service virtualization in latest release

      July 22, 2025

      Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

      July 21, 2025

      The best CRM software with email marketing in 2025: Expert tested and reviewed

      July 22, 2025

      This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

      July 22, 2025

      I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

      July 22, 2025

      8 ways I quickly leveled up my Linux skills – and you can too

      July 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025
      Recent

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025

      Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

      July 22, 2025

      Execute Ping Commands and Get Back Structured Data in PHP

      July 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025
      Recent

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025

      “I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

      July 22, 2025

      Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

      July 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»A Code Implementation to Use Ollama through Google Colab and Building a Local RAG Pipeline on Using DeepSeek-R1 1.5B through Ollama, LangChain, FAISS, and ChromaDB for Q&A

    A Code Implementation to Use Ollama through Google Colab and Building a Local RAG Pipeline on Using DeepSeek-R1 1.5B through Ollama, LangChain, FAISS, and ChromaDB for Q&A

    April 8, 2025
    A Code Implementation to Use Ollama through Google Colab and Building a Local RAG Pipeline on Using DeepSeek-R1 1.5B through Ollama, LangChain, FAISS, and ChromaDB for Q&A

    In this tutorial, we’ll build a fully functional Retrieval-Augmented Generation (RAG) pipeline using open-source tools that run seamlessly on Google Colab. First, we will look into how to set up Ollama and use models through Colab. Integrating the DeepSeek-R1 1.5B large language model served through Ollama, the modular orchestration of LangChain, and the high-performance ChromaDB vector store allows users to query real-time information extracted from uploaded PDFs. With a combination of local language model reasoning and retrieval of factual data from PDF documents, the pipeline demonstrates a powerful, private, and cost-effective alternative.

    Copy CodeCopiedUse a different Browser
    !pip install colab-xterm
    %load_ext colabxterm

    We use the colab-xterm extension to enable terminal access directly within the Colab environment. By installing it with !pip install collab and loading it via %load_ext colabxterm, users can open an interactive terminal window inside Colab, making it easier to run commands like llama serve or monitor local processes.

    Copy CodeCopiedUse a different Browser
    %xterm

    The %xterm magic command is used after loading the collab extension to launch an interactive terminal window within the Colab notebook interface. This allows users to execute shell commands in real time, just like a regular terminal, making it especially useful for running background services like llama serve, managing files, or debugging system-level operations without leaving the notebook.

    Here, we install ollama using curl https://ollama.ai/install.sh | sh.

    Then, we start the ollama using ollama serve.

    At last, we download the DeepSeek-R1:1.5B through ollama locally that can be utilized for building the RAG pipeline.

    Copy CodeCopiedUse a different Browser
    !pip install langchain langchain-community sentence-transformers chromadb faiss-cpu

    To set up the core components of the RAG pipeline, we install essential libraries, including langchain, langchain-community, sentence-transformers, chromadb, and faiss-cpu. These packages enable document processing, embedding, vector storage, and retrieval functionalities required to build an efficient and modular local RAG system.

    Copy CodeCopiedUse a different Browser
    from langchain_community.document_loaders import PyPDFLoader
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain_community.vectorstores import Chroma
    from langchain_community.embeddings import HuggingFaceEmbeddings
    from langchain_community.llms import Ollama
    from langchain.chains import RetrievalQA
    from google.colab import files
    import os
    from langchain_core.prompts import ChatPromptTemplate
    from langchain_ollama.llms import OllamaLLM

    We import key modules from the langchain-community and langchain-ollama libraries to handle PDF loading, text splitting, embedding generation, vector storage with Chroma, and LLM integration via Ollama. It also includes Colab’s file upload utility and prompt templates, enabling a seamless flow from document ingestion to query answering using a locally hosted model.

    Copy CodeCopiedUse a different Browser
    print("Please upload your PDF file...")
    uploaded = files.upload()
    
    
    file_path = list(uploaded.keys())[0]
    print(f"File '{file_path}' successfully uploaded.")
    
    
    if not file_path.lower().endswith('.pdf'):
        print("Warning: Uploaded file is not a PDF. This may cause issues.")

    To allow users to add their knowledge sources, we prompt for a PDF upload using google.colab.files.upload(). It verifies the uploaded file type and provides feedback, ensuring that only PDFs are processed for further embedding and retrieval.

    Copy CodeCopiedUse a different Browser
    !pip install pypdf
    import pypdf
    loader = PyPDFLoader(file_path)
    documents = loader.load()
    print(f"Successfully loaded {len(documents)} pages from PDF")

    To extract content from the uploaded PDF, we install the pypdf library and use PyPDFLoader from LangChain to load the document. This process converts each page of the PDF into a structured format, enabling downstream tasks like text splitting and embedding.

    Copy CodeCopiedUse a different Browser
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200
    )
    chunks = text_splitter.split_documents(documents)
    print(f"Split documents into {len(chunks)} chunks")

    The loaded PDF is split into manageable chunks using RecursiveCharacterTextSplitter, with each chunk sized at 1000 characters and a 200-character overlap. This ensures better context retention across chunks, which improves the relevance of retrieved passages during question answering.

    Copy CodeCopiedUse a different Browser
    embeddings = HuggingFaceEmbeddings(
        model_name="all-MiniLM-L6-v2",
        model_kwargs={'device': 'cpu'}
    )
    
    
    persist_directory = "./chroma_db"
    
    
    vectorstore = Chroma.from_documents(
        documents=chunks,
        embedding=embeddings,
        persist_directory=persist_directory
    )
    
    
    vectorstore.persist()
    print(f"Vector store created and persisted to {persist_directory}")

    The text chunks are embedded using the all-MiniLM-L6-v2 model from sentence-transformers, running on CPU to enable semantic search. These embeddings are then stored in a persistent ChromaDB vector store, allowing efficient similarity-based retrieval across sessions.

    Copy CodeCopiedUse a different Browser
    llm = OllamaLLM(model="deepseek-r1:1.5b")
    retriever = vectorstore.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 3}  
    )
    
    
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",  
        retriever=retriever,
        return_source_documents=True  
    )
    
    
    print("RAG pipeline created successfully!")

    The RAG pipeline is finalized by connecting the local DeepSeek-R1 model (via OllamaLLM) with the Chroma-based retriever. Using LangChain’s RetrievalQA chain with a “stuff” strategy, the model retrieves the top 3 most relevant chunks to a query and generates context-aware answers, completing the local RAG setup.

    Copy CodeCopiedUse a different Browser
    def query_rag(question):
        result = qa_chain({"query": question})
       
        print("nQuestion:", question)
        print("nAnswer:", result["result"])
       
        print("nSources:")
        for i, doc in enumerate(result["source_documents"]):
            print(f"Source {i+1}:n{doc.page_content[:200]}...n")
       
        return result
    
    
    question = "What is the main topic of this document?"  
    result = query_rag(question)
    

    To test the RAG pipeline, a query_rag function takes a user question, retrieves relevant context using the retriever, and generates an answer using the LLM. It also displays the top source documents, providing transparency and traceability for the model’s response.

    In conclusion, this tutorial combines ollama, the retrieval power of ChromaDB, the orchestration capabilities of LangChain, and the reasoning abilities of DeepSeek-R1 via Ollama. It showcased building a lightweight yet powerful RAG system that runs efficiently on Google Colab’s free tier. The solution enables users to ask questions grounded in up-to-date content from uploaded documents, with answers generated through a local LLM. This architecture provides a foundation for building scalable, customizable, and privacy-friendly AI assistants without incurring cloud costs or compromising performance.


    Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 85k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored]

    The post A Code Implementation to Use Ollama through Google Colab and Building a Local RAG Pipeline on Using DeepSeek-R1 1.5B through Ollama, LangChain, FAISS, and ChromaDB for Q&A appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleBuilding a Fully-Featured 3D World in the Browser with Blender and Three.js
    Next Article This AI Paper Introduces Inference-Time Scaling Techniques: Microsoft’s Deep Evaluation of Reasoning Models on Complex Tasks

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 22, 2025
    Machine Learning

    Boolformer: Symbolic Regression of Logic Functions with Transformers

    July 22, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Your Android devices are getting several upgrades for free – including a big one for Auto

    News & Updates
    Rilasciato FreeDOS 1.4: Un aggiornamento importante per il sistema operativo compatibile con MS-DOS

    Rilasciato FreeDOS 1.4: Un aggiornamento importante per il sistema operativo compatibile con MS-DOS

    Linux

    CVE-2024-53017 – Citrix ADC Integer Overflow Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-44083 – D-Link DI-8100 Remote Authentication Bypass Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    News & Updates

    Design system annotations, part 1: How accessibility gets left out of components

    May 9, 2025

    When it comes to design systems, every organization tends to be at a different place…

    Microsoft wants to streamline your workday with powerful AI agents

    April 24, 2025

    CVE-2025-2773 – BEC Technologies Multiple Routers TCP Port 22 Command Injection Remote Code Execution Vulnerability

    April 23, 2025

    Qtel – EchoLink client

    June 21, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.