The experiment started as a practical problem. I had 468 files scattered across four years of client work: SOPs, contracts, project notes, email threads, campaign reports, Notion exports. Finding anything specific meant either remembering which folder it lived in or running a search that returned 200 results and hoping the right one was near the top.
I spent a weekend building a local vector database with Ollama, ChromaDB, and a small embedding model. The result: I can now ask my laptop a question about my business and get the answer in three seconds, cited back to the specific document.
What a Vector Database Actually Does
A traditional search index finds documents that contain the words you typed. A vector database finds documents that mean what you meant. You can ask “what were the concerns about the insurance claim workflow” and it will surface a document that never uses those exact words but describes the same concept.
The mechanism: each document chunk gets converted into a numerical vector that represents its meaning. Documents with similar meanings cluster together. When you ask a question, it also becomes a vector, and the database returns the nearest documents.
Run it locally and none of your data leaves your machine.
The Stack
- Ollama — runs LLMs locally, used for both embedding and answer generation
- ChromaDB — the vector database, runs as a local process, persists between sessions
- nomic-embed-text — the embedding model via Ollama, fast and accurate for English business documents
- llama3.2 — the answer model, reads retrieved chunks and generates the response
- Python 3.11 — glue
Total cost: $0. Runs on a MacBook Pro M2 with 16GB RAM. No API keys, no cloud services.
Building the Index
Install the dependencies:
pip install chromadb ollama langchain langchain-community
ollama pull nomic-embed-text
ollama pull llama3.2
The indexing script walks a directory, reads each file, splits it into overlapping chunks, embeds each chunk, and stores it in ChromaDB:
import os
import chromadb
import ollama
from pathlib import Path
client = chromadb.PersistentClient(path="./my_business_db")
collection = client.get_or_create_collection("business_docs")
def chunk_text(text, chunk_size=500, overlap=50):
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size - overlap):
chunk = " ".join(words[i:i + chunk_size])
chunks.append(chunk)
return chunks
def index_directory(directory_path):
files_indexed = 0
for filepath in Path(directory_path).rglob("*"):
if filepath.suffix in [".txt", ".md", ".csv", ".json"]:
try:
text = filepath.read_text(encoding="utf-8", errors="ignore")
chunks = chunk_text(text)
for i, chunk in enumerate(chunks):
embedding = ollama.embeddings(
model="nomic-embed-text", prompt=chunk
)["embedding"]
collection.add(
ids=[f"{filepath.name}_{i}"],
embeddings=[embedding],
documents=[chunk],
metadatas=[{"source": str(filepath), "chunk": i}]
)
files_indexed += 1
except Exception as e:
print(f"Skipped {filepath.name}: {e}")
return files_indexed
count = index_directory("./my_documents")
print(f"Indexed {count} files")
468 files took about 22 minutes on an M2. You run this once — queries are instant after that.
Querying the Database
def ask(question, n_results=5):
question_embedding = ollama.embeddings(
model="nomic-embed-text", prompt=question
)["embedding"]
results = collection.query(
query_embeddings=[question_embedding],
n_results=n_results
)
context = "
---
".join(results["documents"][0])
sources = [m["source"] for m in results["metadatas"][0]]
prompt = (
"Answer the question using only the context provided. "
"If the answer is not in the context, say so clearly.
"
f"Context:
{context}
Question: {question}
Answer:"
)
response = ollama.generate(model="llama3.2", prompt=prompt)
print(response["response"])
print("
Sources:")
for source in set(sources):
print(f" - {source}")
ask("What did we decide about the content strategy in Q4?")
What Works Well
- Decision retrieval: “What did we decide about X?” surfaces meeting notes and SOPs instantly
- Policy lookup: “What is our process for Y?” finds the relevant SOP section without knowing the filename
- Client history: “What did [client] say about their budget?” finds the email or note
- Template finding: Finds files you can not remember naming
Where it struggles: questions requiring reasoning across many documents simultaneously, or questions where the answer genuinely is not in your files. Always include the instruction to say so if the answer is not in the context.
Integrating with Claude Code
Once the local database is running, Claude Code can query it as part of a larger workflow. Point Claude Code at a directory that includes your query script and give it access to ask questions about your business context before making architectural decisions, content strategies, or client recommendations.
This is the pattern that makes local RAG genuinely powerful for agency and solo operator work: your institutional knowledge becomes queryable context for every AI-assisted task.
Handling PDFs and Word Documents
pip install pypdf python-docx
from pypdf import PdfReader
def read_pdf(filepath):
reader = PdfReader(filepath)
return " ".join(page.extract_text() for page in reader.pages)
from docx import Document
def read_docx(filepath):
doc = Document(filepath)
return " ".join(p.text for p in doc.paragraphs)
Bottom Line
If you have more than a few hundred business documents and spend time hunting for things you know exist somewhere, a local vector database is worth the weekend. The stack is free, the data stays local, and the experience of asking your laptop a business question and getting a sourced answer is immediately useful in a way most AI demos are not.
