Category: Claude Code Workflows & Patterns

  • I Indexed 468 Files into a Local Vector Database. Now My Laptop Answers Questions About My Business.

    I Indexed 468 Files into a Local Vector Database. Now My Laptop Answers Questions About My Business.

    The experiment started as a practical problem. I had 468 files scattered across four years of client work: SOPs, contracts, project notes, email threads, campaign reports, Notion exports. Finding anything specific meant either remembering which folder it lived in or running a search that returned 200 results and hoping the right one was near the top.

    I spent a weekend building a local vector database with Ollama, ChromaDB, and a small embedding model. The result: I can now ask my laptop a question about my business and get the answer in three seconds, cited back to the specific document.

    What a Vector Database Actually Does

    A traditional search index finds documents that contain the words you typed. A vector database finds documents that mean what you meant. You can ask “what were the concerns about the insurance claim workflow” and it will surface a document that never uses those exact words but describes the same concept.

    The mechanism: each document chunk gets converted into a numerical vector that represents its meaning. Documents with similar meanings cluster together. When you ask a question, it also becomes a vector, and the database returns the nearest documents.

    Run it locally and none of your data leaves your machine.

    The Stack

    • Ollama — runs LLMs locally, used for both embedding and answer generation
    • ChromaDB — the vector database, runs as a local process, persists between sessions
    • nomic-embed-text — the embedding model via Ollama, fast and accurate for English business documents
    • llama3.2 — the answer model, reads retrieved chunks and generates the response
    • Python 3.11 — glue

    Total cost: $0. Runs on a MacBook Pro M2 with 16GB RAM. No API keys, no cloud services.

    Building the Index

    Install the dependencies:

    pip install chromadb ollama langchain langchain-community
    ollama pull nomic-embed-text
    ollama pull llama3.2

    The indexing script walks a directory, reads each file, splits it into overlapping chunks, embeds each chunk, and stores it in ChromaDB:

    import os
    import chromadb
    import ollama
    from pathlib import Path
    
    client = chromadb.PersistentClient(path="./my_business_db")
    collection = client.get_or_create_collection("business_docs")
    
    def chunk_text(text, chunk_size=500, overlap=50):
        words = text.split()
        chunks = []
        for i in range(0, len(words), chunk_size - overlap):
            chunk = " ".join(words[i:i + chunk_size])
            chunks.append(chunk)
        return chunks
    
    def index_directory(directory_path):
        files_indexed = 0
        for filepath in Path(directory_path).rglob("*"):
            if filepath.suffix in [".txt", ".md", ".csv", ".json"]:
                try:
                    text = filepath.read_text(encoding="utf-8", errors="ignore")
                    chunks = chunk_text(text)
                    for i, chunk in enumerate(chunks):
                        embedding = ollama.embeddings(
                            model="nomic-embed-text", prompt=chunk
                        )["embedding"]
                        collection.add(
                            ids=[f"{filepath.name}_{i}"],
                            embeddings=[embedding],
                            documents=[chunk],
                            metadatas=[{"source": str(filepath), "chunk": i}]
                        )
                    files_indexed += 1
                except Exception as e:
                    print(f"Skipped {filepath.name}: {e}")
        return files_indexed
    
    count = index_directory("./my_documents")
    print(f"Indexed {count} files")

    468 files took about 22 minutes on an M2. You run this once — queries are instant after that.

    Querying the Database

    def ask(question, n_results=5):
        question_embedding = ollama.embeddings(
            model="nomic-embed-text", prompt=question
        )["embedding"]
    
        results = collection.query(
            query_embeddings=[question_embedding],
            n_results=n_results
        )
    
        context = "
    
    ---
    
    ".join(results["documents"][0])
        sources = [m["source"] for m in results["metadatas"][0]]
    
        prompt = (
            "Answer the question using only the context provided. "
            "If the answer is not in the context, say so clearly.
    
    "
            f"Context:
    {context}
    
    Question: {question}
    
    Answer:"
        )
    
        response = ollama.generate(model="llama3.2", prompt=prompt)
        print(response["response"])
        print("
    Sources:")
        for source in set(sources):
            print(f"  - {source}")
    
    ask("What did we decide about the content strategy in Q4?")

    What Works Well

    • Decision retrieval: “What did we decide about X?” surfaces meeting notes and SOPs instantly
    • Policy lookup: “What is our process for Y?” finds the relevant SOP section without knowing the filename
    • Client history: “What did [client] say about their budget?” finds the email or note
    • Template finding: Finds files you can not remember naming

    Where it struggles: questions requiring reasoning across many documents simultaneously, or questions where the answer genuinely is not in your files. Always include the instruction to say so if the answer is not in the context.

    Integrating with Claude Code

    Once the local database is running, Claude Code can query it as part of a larger workflow. Point Claude Code at a directory that includes your query script and give it access to ask questions about your business context before making architectural decisions, content strategies, or client recommendations.

    This is the pattern that makes local RAG genuinely powerful for agency and solo operator work: your institutional knowledge becomes queryable context for every AI-assisted task.

    Handling PDFs and Word Documents

    pip install pypdf python-docx
    
    from pypdf import PdfReader
    def read_pdf(filepath):
        reader = PdfReader(filepath)
        return " ".join(page.extract_text() for page in reader.pages)
    
    from docx import Document
    def read_docx(filepath):
        doc = Document(filepath)
        return " ".join(p.text for p in doc.paragraphs)

    Bottom Line

    If you have more than a few hundred business documents and spend time hunting for things you know exist somewhere, a local vector database is worth the weekend. The stack is free, the data stays local, and the experience of asking your laptop a business question and getting a sourced answer is immediately useful in a way most AI demos are not.