Category: Local AI & Automation

Building autonomous AI systems that run locally. Zero cloud cost, full data control, infinite scale.

  • How We Built a Free AI Agent Army With Ollama and Claude

    The Zero-Cloud-Cost AI Stack

    Enterprise AI costs are spiraling. GPT-4 API calls at scale run hundreds or thousands per month. Cloud-hosted AI services charge per query, per token, per minute. For a marketing operation managing 23 WordPress sites, the conventional AI approach would cost more than the human team it’s supposed to augment.

    We took a different path. Our AI agent army runs primarily on local hardware – a standard Windows laptop running Ollama for model inference, with Claude API calls reserved for tasks that genuinely require frontier-model reasoning. Total monthly cloud AI cost: under $100. Total local cost: the electricity to keep the laptop running.

    What Each Agent Does

    The Content Analyst: Runs on Llama 3.1 locally. Scans WordPress sites, extracts post inventories, identifies content gaps, and generates topic prioritization lists. This agent handles the intelligence audit work that kicks off every content sprint.

    The Draft Generator: Uses Claude for initial article drafts because the reasoning quality difference matters for long-form content. Each article costs approximately $0.15-0.30 in API calls. For 50 articles per month, that’s under $15 total.

    The SEO Optimizer: Runs locally on Mistral. Analyzes each draft against SEO best practices, generates meta descriptions, suggests heading structures, and recommends internal link targets. The optimization pass adds zero cloud cost.

    The Schema Generator: Runs locally. Reads article content and generates appropriate JSON-LD schema markup – Article, FAQPage, HowTo, or Speakable as needed. Pure local compute.

    The Publisher: Orchestrates the final step – formatting content for WordPress, assigning taxonomy, setting featured images, and publishing via the REST API proxy. This agent is more automation than AI, but it closes the loop from ideation to live post.

    The Monitor: Runs scheduled checks on site health – broken links, missing meta data, orphan pages, and schema errors. Generates weekly reports for each site. Local execution on a schedule.

    Why Local Models Work for Marketing Operations

    The marketing AI use case is different from the general-purpose chatbot use case. We don’t need the model to be conversational, creative, or handle unexpected queries. We need it to follow a protocol consistently: analyze this data, apply these rules, generate this output format.

    Local models excel at protocol-driven tasks. Llama 3.1 at 8B parameters handles content analysis, keyword extraction, and gap identification with the same quality as cloud APIs. Mistral handles SEO rule application and meta generation flawlessly. The only tasks where we notice a quality drop with local models are nuanced long-form writing and complex strategic reasoning – which is exactly where Claude earns its API cost.

    The performance tradeoff is minimal. Local inference on a modern laptop takes 5-15 seconds for a typical analysis task. Cloud API calls take 3-8 seconds including network latency. For batch operations where we’re processing 50-100 items, the difference is negligible.

    The PowerShell Orchestration Layer

    The agents don’t run independently – they’re orchestrated through PowerShell scripts that manage the workflow sequence. A typical content sprint runs like this:

    1. Content Analyst scans target site and generates topic list. 2. Human reviews and approves topics. 3. Draft Generator creates articles from approved topics. 4. SEO Optimizer runs optimization pass on each draft. 5. Schema Generator adds structured data. 6. Publisher pushes to WordPress as drafts. 7. Human reviews drafts and approves for publication.

    The entire pipeline is triggered by a single PowerShell command. Human intervention happens at two checkpoints: topic approval and draft review. Everything else is automated.

    Frequently Asked Questions

    What hardware do you need to run local AI models?

    A laptop with 16GB RAM can run 7B-8B parameter models comfortably. For 13B+ models, 32GB RAM helps. No dedicated GPU is required for our use case – CPU inference is fast enough for batch processing where real-time responsiveness isn’t critical.

    How does Ollama compare to cloud APIs for content tasks?

    For structured tasks like SEO analysis, meta generation, and schema creation, Ollama with Llama or Mistral produces equivalent results to cloud APIs. For creative writing and complex reasoning, cloud models like Claude still have a meaningful edge.

    Can you run this on Mac or Linux?

    Ollama runs on Mac, Linux, and Windows. Our automation layer uses PowerShell (Windows), but the same logic works in Bash or Python on any platform. The WordPress API proxy runs on Google Cloud and is platform-independent.

    Is it difficult to set up?

    Ollama installs in one command. Downloading a model is one command. The complexity is in building the automation scripts that connect the agents to your WordPress workflow – that’s where the development investment goes. Once built, the system runs with minimal maintenance.

    Build Your Own Agent Army

    The cost barrier to AI-powered marketing operations is effectively zero. Local models handle the majority of tasks, cloud APIs fill the gaps for under $100/month, and the automation layer is built on free, open-source tools. The only real investment is time – learning the tools and building the workflows. The ROI makes it one of the best investments a marketing operation can make.

  • I Built 7 Autonomous AI Agents on a Windows Laptop. They Run While I Sleep.

    The Night Shift That Never Calls In Sick

    Every night at 2 AM, while I’m asleep, seven AI agents wake up on my laptop and go to work. One generates content briefs. One indexes every file I created that day. One scans 23 websites for SEO changes. One processes meeting transcripts. One digests emails. One monitors site uptime. One writes news articles for seven industry verticals.

    By the time I open my laptop at 7 AM, the work is done. Briefs are written. Indexes are updated. Drift is detected. Transcripts are summarized. Total cloud cost: zero. Total API cost: zero. Everything runs on Ollama with local models.

    The Fleet

    I call them droids because that’s what they are – autonomous units with specific missions that execute without supervision. Each one is a PowerShell script scheduled as a Windows Task. No Docker. No Kubernetes. No cloud functions. Just scripts, a schedule, and a 16GB laptop running Ollama.

    SM-01: Site Monitor. Runs hourly. Pings all 18 managed WordPress sites, measures response time, logs to CSV. If a site goes down, a Windows balloon notification fires. Takes 30 seconds. I know about downtime before any client does.

    NB-02: Nightly Brief Generator. Runs at 2 AM. Reads a topic queue – 15 default topics across all client sites – and generates structured JSON content briefs using Llama 3.2 at 3 billion parameters. Processes 5 briefs per night. By Friday, the week’s content is planned.

    AI-03: Auto-Indexer. Runs at 3 AM. Scans every text file across my working directories. Generates 768-dimension vector embeddings using nomic-embed-text. Updates a local vector index. Currently tracking 468 files. Incremental runs take 2 minutes. Full reindex takes 15.

    MP-04: Meeting Processor. Runs at 6 AM. Scans for Gemini transcript files from the previous day. Extracts summary, key decisions, action items, follow-ups, and notable quotes via Ollama. I never re-read a transcript – the processor pulls out what matters.

    ED-05: Email Digest. Runs at 6:30 AM. Categorizes emails by priority and generates a morning digest. Flags anything that needs immediate attention. Pairs with Gmail MCP in Cowork for full coverage across 4 email accounts.

    SD-06: SEO Drift Detector. Runs at 7 AM. Checks all 23 WordPress sites for changes in title tags, meta descriptions, H1 tags, canonical URLs, and HTTP status codes. Compares against a saved baseline. If someone – a client, a plugin, a hacker – changes SEO-critical elements, I know within 24 hours.

    NR-07: News Reporter. Runs at 5 AM. Scans Google News RSS for 7 industry verticals – restoration, luxury lending, cold storage, comedy, automotive training, healthcare, ESG. Generates news beat articles via Ollama. 42 seconds per article, about 1,700 characters each. Raw material for client newsletters and social content.

    Why Local Beats Cloud for This

    The obvious question: why not run these in the cloud? Three reasons.

    Cost. Seven agents running daily on cloud infrastructure – even serverless – would cost -400/month in compute, storage, and API calls. On my laptop, the cost is the electricity to keep it plugged in overnight.

    Privacy. These agents process client data, email content, meeting transcripts, and SEO baselines. Running locally means none of that data leaves my machine. No third-party processing agreements. No data residency concerns. No breach surface.

    Speed of iteration. When I want to change how the brief generator works, I edit a PowerShell script and save it. No deployment pipeline. No CI/CD. No container builds. The change takes effect on the next scheduled run. I’ve iterated on these agents dozens of times in the past week – each iteration took under 60 seconds.

    The Compounding Effect

    The real power isn’t any single agent – it’s how they feed each other. The auto-indexer picks up briefs generated by the brief generator. The meeting processor extracts topics that feed into the brief queue. The SEO drift detector catches changes that trigger content refresh priorities. The news reporter surfaces industry developments that inform content strategy.

    After 30 days, the compound knowledge base is substantial. After 90 days, it’s a competitive advantage that no competitor can buy off the shelf.

    Frequently Asked Questions

    What specs does your laptop need?

    16GB RAM minimum for running Llama 3.2 at 3B parameters. I run on a standard Windows 11 machine – no GPU, no special hardware. The 8B parameter models work too but are slower. For the vector indexer, you need about 1GB of free disk per 1,000 indexed files.

    Why PowerShell instead of Python?

    Windows Task Scheduler runs PowerShell natively. No virtual environments, no dependency management, no conda headaches. PowerShell talks to COM objects (Outlook), REST APIs (WordPress), and the file system equally well. For a Windows-native automation stack, it’s the pragmatic choice.

    How reliable is Ollama for production tasks?

    For structured, protocol-driven tasks – very reliable. The models follow formatting instructions consistently when the prompt is specific. For creative or nuanced work, quality varies. I use local models for extraction and analysis, cloud models for creative generation. Match the model to the task.

    Can I replicate this setup?

    Every script is under 200 lines of PowerShell. The Ollama setup is one install command and one model pull. The Windows Task Scheduler configuration takes 5 minutes per task. Total setup time for all seven agents: under 2 hours if you know what you’re building.

    The Future Runs on Your Machine

    The narrative that AI requires cloud infrastructure and enterprise budgets is wrong. Seven autonomous agents. One laptop. Zero cloud cost. The work gets done while I sleep. If you’re paying monthly fees for automations that could run on hardware you already own, you’re subsidizing someone else’s margins.

  • The VIP Email Monitor: How AI Watches My Inbox for the Signals That Matter

    The Problem With Email Is Not Volume — It’s Blindness

    Everyone talks about inbox zero. Nobody talks about inbox blindness — the moment a critical email from a key client sits buried under 47 newsletters and you don’t see it for six hours.

    I run operations across multiple businesses. Restoration companies, marketing clients, content platforms, SaaS builds. My inbox processes hundreds of messages a day. The important ones — a client escalation, a partner proposal, a payment confirmation — get lost in the noise. Not because I’m disorganized. Because email was never designed to prioritize by context.

    So I built something that does. A local AI agent that watches my inbox, reads every new message, scores it against a VIP list and urgency rubric, and pushes the ones that matter to a Slack channel — instantly. No cloud AI. No third-party service reading my mail. Just a Python script, the Gmail API, and a local Ollama model running on my laptop.

    How the VIP Email Monitor Actually Works

    The architecture is deliberately simple. Complexity is where personal automation goes to die.

    A Python script polls the Gmail API every 90 seconds. When it finds new messages, it extracts the sender, subject, first 500 characters of body, and any attachment metadata. That package gets sent to Llama 3.2 3B running locally via Ollama with a structured prompt that asks three questions:

    First: Is this sender on the VIP list? The list is a simple JSON file — client names, key partners, financial institutions, anyone whose email I cannot afford to miss. Second: What is the urgency score, 1 through 10? The model evaluates based on language signals — words like “urgent,” “deadline,” “payment,” “issue,” “immediately” push the score up. Third: What category does this fall into — client communication, financial, operational, or noise?

    If the urgency score hits 7 or above, or the sender is on the VIP list regardless of score, the agent fires a formatted Slack message to a dedicated channel. The message includes sender, subject, urgency score, category, and a direct link to open the email in Gmail.

    Why Local AI Instead of a Cloud Service

    I could use GPT-4 or Claude’s API for this. The quality of the scoring would be marginally better. But the tradeoffs kill it for email monitoring.

    Latency matters. A cloud API call adds 1-3 seconds per message. When you’re processing a batch of 15 new emails, that’s 15-45 seconds of waiting. Ollama on a decent machine returns in under 400 milliseconds per message. The entire batch processes before a cloud call finishes one.

    Cost matters at scale. Processing 200+ emails per day through GPT-4 would cost -30/month just for email triage. Ollama costs nothing beyond the electricity to run my laptop.

    Privacy is non-negotiable. These are client emails. Financial communications. Business-sensitive content. Sending that to a third-party API — even one with strong privacy policies — introduces a data handling dimension I don’t need. Running locally means the email content never leaves my machine.

    The VIP List Is the Secret Weapon

    The model scoring is useful. But the VIP list is what makes this system actually change my behavior.

    I maintain a JSON file with roughly 40 entries. Each entry has a name, email domain, priority tier (1-3), and a context note. Tier 1 is “interrupt me no matter what” — active clients with open projects, my accountant during tax season, key partners. Tier 2 is “surface within the hour” — prospects in active conversations, vendors with pending deliverables. Tier 3 is “batch at end of day” — industry contacts, networking follow-ups.

    The agent checks every incoming email against this list before it even hits the AI model. A Tier 1 match bypasses the scoring entirely and goes straight to Slack. This means even if the email says something benign like “sounds good, thanks” — if it’s from an active client, I see it immediately.

    I update the list weekly. Takes two minutes. The ROI on those two minutes is enormous.

    What I Learned After 30 Days of Running This

    The first week was noisy. The urgency scoring was too aggressive — flagging marketing emails with “limited time” language as high-urgency. I tuned the prompt to weight sender reputation more heavily than body language, and the false positive rate dropped from about 30% to under 5%.

    The real surprise was behavioral. I stopped checking email compulsively. When you know an AI agent is watching and will interrupt you for anything that matters, the anxiety of “what am I missing” disappears. I went from checking email 20+ times a day to checking it twice — morning and afternoon — and letting the agent handle the real-time layer.

    Over 30 days, the monitor processed approximately 4,200 emails. It flagged 340 as requiring attention (about 8%). Of those, roughly 290 were accurate flags. The 50 false positives were mostly automated system notifications from client platforms that used urgent-sounding language.

    The monitor caught three genuinely time-sensitive situations I would have missed — a client payment issue on a Friday evening, a partner changing meeting times with two hours notice, and a hosting provider sending a maintenance window warning that affected a live site.

    The Technical Stack in Plain English

    For anyone who wants to build something similar, here’s exactly what’s running:

    Gmail API with OAuth2 authentication and a service account. Polls every 90 seconds using the messages.list endpoint with a query filter for messages newer than the last check timestamp. This is free tier — Google gives you 1 billion API calls per day on Gmail.

    Ollama running Llama 3.2 3B locally. This model is small enough to run on a laptop with 8GB RAM but smart enough to understand email context, urgency language, and sender patterns. Response time averages 350ms per email.

    Slack Incoming Webhook for notifications. Dead simple — one POST request with a JSON payload. No bot framework, no Slack app approval process. Just a webhook URL pointed at a private channel.

    Python 3.11 with minimal dependencies — google-auth, google-api-python-client, requests, and the ollama Python package. The entire script is under 300 lines.

    The whole thing runs as a background process on my Windows laptop. If the laptop sleeps, it catches up on wake. No cloud server, no monthly bill, no infrastructure to maintain.

    Frequently Asked Questions

    Can this work with Outlook instead of Gmail?

    Yes, but the API integration is different. Microsoft Graph API replaces the Gmail API, and the authentication uses Azure AD app registration instead of Google OAuth. The AI scoring and Slack notification layers remain identical. The swap takes about 2 hours of development work.

    What happens when the laptop is off or sleeping?

    The agent tracks the last-processed message timestamp. When it wakes up, it pulls all messages since that timestamp and processes the backlog. Typically catches up within 30 seconds of waking. For true 24/7 coverage, you’d move this to a /month VPS, but I haven’t needed to.

    Does this replace email filters and labels?

    No — it layers on top of them. Gmail filters still handle the mechanical sorting (newsletters to a folder, receipts auto-labeled). The AI monitor handles the judgment calls that filters can’t make — “is this email from a new address actually important based on what it says?”

    How accurate is a 3B parameter model for this task?

    For email triage, surprisingly accurate — north of 94% after prompt tuning. Email is a constrained domain. The model doesn’t need to be creative or handle edge cases in reasoning. It needs to read short text, match patterns, and output a score. A 3B model handles that well within its capability.

    What’s the total setup time from zero?

    If you already have Ollama installed and a Gmail account, about 90 minutes to get the first version running. Another hour to tune the prompt and build your VIP list. Two and a half hours total to go from nothing to a working email monitor.

    The Bigger Picture

    This email monitor is one of seven autonomous agents I run locally. It’s the one people ask about most because email is universal pain. But the principle underneath it applies everywhere: don’t build AI that replaces your judgment — build AI that protects your attention.

    The VIP Email Monitor doesn’t decide what to do about important emails. It decides what deserves my eyes. That distinction is everything. The most expensive thing in my business isn’t software or tools or even time. It’s the six hours a critical email sat unread because it landed between a Costco receipt and a LinkedIn notification.

    That doesn’t happen anymore.

  • SSH Was Broken. So I Rebooted a VM From an API and Let a Script Do the Work.

    The Moment Everything Stops

    It’s 11 PM on a Wednesday. I’m deploying a WordPress optimization batch across a 5-site cluster running on a single GCP Compute Engine VM. Midway through site three, the SSH connection drops. Not a timeout — a hard refusal. Connection refused. Port 22.

    I try again. Same result. I try from a different terminal. Same. I check the GCP Console — the VM shows as running. CPU is at 4%. Memory is fine. The machine is alive but unreachable. SSH is dead and it’s not coming back without intervention.

    Most people would stop here, file a support ticket, and go to bed. I didn’t have that luxury. I had three more sites to process and a client deadline in the morning. So I did what any reasonable person with API access and a grudge would do — I built a workaround in real time.

    Why SSH Dies on GCP VMs (And Why It’s More Common Than You Think)

    SSH failures on Compute Engine instances are not rare. The common causes include firewall rule changes that block port 22, the SSH daemon crashing after a bad package update, disk space filling up (which prevents SSH from writing session files), and metadata server issues that break OS Login or key propagation.

    In my case, the culprit was disk space. The optimization scripts had been writing temporary files and logs. The 20GB boot disk — which seemed generous when I provisioned it — had filled to 98%. The SSH daemon couldn’t create a new session file, so it refused all connections. The VM was fine. The services were running. But the front door was locked from the inside.

    This is a pattern I’ve seen across dozens of GCP deployments: the VM isn’t down, it’s just unreachable. And the solution isn’t to wait for SSH to magically recover. It’s to have a plan that doesn’t depend on SSH at all.

    The GCP API Workaround: Reboot With a Startup Script

    GCP Compute Engine exposes a full REST API that lets you manage VMs without ever touching SSH. The key operations: stop an instance, update its metadata (including startup scripts), and start it again. All authenticated via service account or OAuth token.

    Here’s the approach I used that Wednesday night:

    Step 1: Stop the VM via API. A simple POST to compute.instances.stop. This is a clean shutdown — it sends ACPI shutdown to the guest OS, waits for confirmation, then reports the instance as TERMINATED. Takes about 30-60 seconds.

    Step 2: Inject a startup script via metadata. GCP lets you set a startup-script metadata key on any instance. Whatever script you put there runs automatically when the instance boots. I wrote a bash script that does three things: cleans up temp files to free disk space, restarts the SSH daemon, and then resumes the WordPress optimization batch from where it left off.

    Step 3: Start the VM. POST to compute.instances.start. The VM boots, runs the startup script, frees the disk space, restarts SSHD, and picks up the work. Total downtime: under 3 minutes.

    No SSH required at any point. No support ticket. No waiting until morning.

    The Self-Healing Script I Built That Night

    After solving the immediate crisis, I turned the workaround into a permanent tool. A Python script that does the following:

    Health check: Every 5 minutes, attempt an SSH connection to the VM. If it fails twice consecutively, trigger the recovery sequence. This uses the paramiko library for SSH and the google-cloud-compute library for the API calls.

    Recovery sequence: Stop the instance, wait for TERMINATED status, set a cleanup startup script in metadata, start the instance, wait for RUNNING status, verify SSH access returns within 120 seconds. If SSH still fails after reboot, escalate to Slack with full diagnostic output.

    Resume logic: The startup script checks for a resume.json file on the persistent disk. This file tracks which sites have been processed and which operation was in progress when the failure occurred. On boot, the script reads this file and picks up from the exact point of failure — not from the beginning of the batch.

    The entire recovery script is 180 lines of Python. It’s run as a background process on my local machine, watching the VM like a lifeguard watches a pool.

    IAP Tunneling: The Backup Access Method

    After this incident, I also set up Identity-Aware Proxy (IAP) TCP tunneling as a permanent backup access method. IAP tunneling lets you SSH into a VM through Google’s infrastructure, bypassing standard firewall rules and port 22 entirely.

    The command is simple: gcloud compute ssh instance-name --tunnel-through-iap. It works even when port 22 is blocked, because the traffic routes through Google’s IAP service on port 443. The VM doesn’t need a public IP address, and you don’t need any firewall rules allowing SSH.

    I should have set this up on day one. It’s now part of my standard VM provisioning checklist — every Compute Engine instance gets IAP tunneling configured before anything else. The extra 5 minutes of setup would have saved me the Wednesday night adventure entirely.

    Lessons That Apply Beyond GCP

    Never depend on a single access method. SSH is not a guarantee. It’s a service running on a Linux machine, and services fail. Always have a second path — IAP tunneling on GCP, Serial Console on AWS, Bastion hosts, or API-based management. If your only way into a server is SSH, you will eventually be locked out at the worst possible time.

    Disk space kills more deployments than bad code. I’ve seen this pattern at companies of every size. Nobody monitors disk space on VMs that “aren’t doing much.” Then a log file grows, or temp files accumulate, and suddenly the machine is functionally dead even though every dashboard says it’s healthy. Set a 80% disk alert on every VM you provision. It takes 30 seconds and prevents hours of debugging.

    Startup scripts are the most underused feature in cloud computing. Every major cloud provider supports them — GCP metadata startup scripts, AWS EC2 user data, Azure custom script extensions. They turn a reboot into a deployment. If your recovery plan is “SSH in and run commands,” your recovery plan fails exactly when you need it most. If your recovery plan is “reboot and let the startup script handle it,” you can recover from anything, from anywhere, including your phone.

    Build resume logic into every batch process. If a script processes 10 items and fails on item 7, restarting should begin at item 7, not item 1. This is trivial to implement — write progress to a JSON file after each step — but most people don’t do it until they’ve lost work to a mid-batch failure. I now build resume logic into every automation by default.

    Frequently Asked Questions

    Can I use the GCP API to manage VMs without the gcloud CLI?

    Yes. The Compute Engine REST API is fully documented and works with any HTTP client. You authenticate with an OAuth2 token or service account key, then make standard REST calls. The gcloud CLI is a convenience wrapper — everything it does, the API can do directly. I use Python with the google-cloud-compute library for programmatic access.

    How do I prevent disk space issues on GCP VMs?

    Three steps: set up Cloud Monitoring alerts at 80% disk usage, add a cron job that cleans temp directories weekly, and size your boot disk with 50% headroom beyond what you think you need. A 30GB disk costs pennies more than 20GB and prevents the most common cause of mysterious SSH failures.

    Is IAP tunneling slower than standard SSH?

    Marginally. IAP adds about 50-100ms of latency because traffic routes through Google’s proxy infrastructure. For interactive terminal work, you won’t notice the difference. For bulk file transfers, use gcloud compute scp with the --tunnel-through-iap flag and expect about 10-15% slower throughput compared to direct SSH.

    What if the VM won’t stop via the API?

    If instances.stop hangs for more than 90 seconds, use instances.reset instead. This is a hard reset — equivalent to pulling the power cord. It’s not graceful, but it works when the OS is unresponsive. The startup script still runs on reboot, so your recovery logic kicks in either way.

    The Real Takeaway

    The Wednesday night SSH failure cost me about 45 minutes, including building the workaround. If it had happened before I understood the GCP API, it would have cost me a full day and a missed deadline. The difference isn’t talent or experience — it’s having built systems that assume failure and recover automatically.

    Every server will become unreachable. Every batch process will fail mid-run. Every disk will fill up. The question isn’t whether these things happen. It’s whether your systems are built to handle them without you being the single point of failure at 11 PM on a Wednesday.

  • SM-01: How One Agent Monitors 23 Websites Every Hour Without Me

    The Worst Way to Find Out Your Site Is Down

    A client calls. Their site has been returning a 503 error for four hours. You check – they are right. The hosting provider had a blip, the site went down, and nobody noticed because nobody was watching. Four hours of lost traffic, lost leads, and lost trust.

    This happened to me once. It never happened again, because I built SM-01.

    SM-01 is the first agent in my autonomous fleet. It runs every 60 minutes via Windows Task Scheduler, checks 23 websites across my client portfolio, and reports to Slack only when it finds a problem. No dashboard to check. No email digest to read. Silence means everything is fine. A Slack message means something needs attention.

    What SM-01 Checks

    HTTP status: Is the site returning 200? A 503, 502, or 500 triggers an immediate red alert. A 301 or 302 redirect chain triggers a yellow alert – the site works but something changed.

    Response time: How long does the homepage take to respond? Baseline is established over 30 days of monitoring. If response time exceeds 2x the baseline, a yellow alert fires. If it exceeds 5x, red alert. Slow sites lose rankings and visitors before they fully go down – response time degradation is an early warning.

    SSL certificate expiration: SM-01 checks the SSL certificate expiry date on every pass. If a certificate expires within 14 days, yellow alert. Within 3 days, red alert. Expired, critical alert. An expired SSL certificate turns your site into a browser warning page and kills organic traffic instantly.

    Content integrity: The agent checks for the presence of specific strings on each homepage – the site name, a key heading, or a footer element. If these strings disappear, it means the homepage content changed unexpectedly – possibly a defacement, a bad deploy, or a theme crash. This catches the subtle failures that return a 200 status code but serve broken content.

    The Architecture Is Deliberately Boring

    SM-01 is a Python script. It uses the requests library for HTTP checks, the ssl and socket libraries for certificate inspection, and a Slack webhook for alerts. No monitoring platform. No subscription. No agent framework. Under 250 lines of code.

    The site list is a JSON file with 23 entries. Each entry has the URL, expected status code, content check string, and baseline response time. Adding a new site takes 30 seconds – add an entry to the JSON file.

    Results are stored in a local SQLite database for trend analysis. I can query historical uptime, average response time, and alert frequency for any site over any time period. The database is 12MB after six months of hourly checks across 23 sites.

    What Six Months of Data Revealed

    Across 23 sites monitored hourly for six months, SM-01 recorded 99.7% average uptime. The 0.3% downtime was concentrated in three sites on shared hosting – every other site on dedicated or managed hosting had 99.99%+ uptime.

    SSL certificate alerts saved two near-misses where auto-renewal failed silently. Without SM-01, those certificates would have expired and the sites would have shown browser security warnings until someone manually noticed and renewed.

    Response time trending caught one hosting degradation issue three weeks before it became a visible problem. A site’s response time crept from 400ms baseline to 900ms over 10 days. SM-01 flagged it at the 800ms mark. Investigation revealed a database table that needed optimization. Fixed in 20 minutes, before any traffic impact.

    Frequently Asked Questions

    Why not use UptimeRobot or Pingdom?

    I have. They work well for basic uptime monitoring. SM-01 adds content integrity checking, custom response time baselines per site, and integration with my existing Slack alert ecosystem. The biggest advantage is cost at scale – monitoring 23 sites on UptimeRobot Pro costs about /month. SM-01 costs nothing.

    Does hourly checking miss short outages?

    Yes – an outage lasting 30 minutes between checks would be missed. For critical production sites, you could reduce the interval to 5 minutes. I chose hourly because my sites are content sites, not e-commerce or SaaS platforms where minutes of downtime have direct revenue impact. The monitoring frequency should match the cost of missed downtime.

    How do you handle false positives from network issues?

    SM-01 requires two consecutive failed checks before alerting. A single timeout or error is logged but not reported. This eliminates the vast majority of false positives from transient network blips or temporary DNS issues. If both the hourly check and the immediate recheck 60 seconds later fail, the alert fires.

    Monitoring Is Not Optional

    Every website you manage is a promise to a client. That promise includes being available when their customers look for them. SM-01 is how I keep that promise without manually checking 23 URLs every day. It is the simplest agent in my fleet and arguably the most important.

  • NB-02: The Nightly Brief That Tells Me What Happened Across Seven Businesses While I Was Living My Life

    The Morning Ritual That Replaced Checking 12 Apps

    My old morning routine: open Slack, scan 8 channels. Open Notion, check the task board. Open Gmail, triage the inbox. Open Google Analytics for each client site. Open the WordPress dashboard for any site that published overnight. Check the GCP console for VM health. That is 45 minutes of context-gathering before I do anything productive.

    NB-02 replaced all of it with a single Slack message that arrives at 6 AM every morning.

    The Nightly Brief Generator is the second agent in my fleet. It runs at 5:45 AM via scheduled task, aggregates activity from the previous 24 hours across every system I operate, and produces a structured briefing that takes 3 minutes to read. By the time I finish my coffee, I know exactly what happened, what needs attention, and what I should work on first.

    What the Nightly Brief Contains

    Agent Activity Summary: Which agents ran, how many times, success/failure counts. If SM-01 flagged a site issue overnight, it shows here. If the VIP Email Monitor caught an urgent message at 2 AM, it shows here. If SD-06 detected ranking drift on a client site, it shows here. One section, all agent activity, color-coded by severity.

    Content Published: Any articles published or scheduled across all 18 WordPress sites in the last 24 hours. Title, site, status, word count. This matters because automated publishing pipelines sometimes run overnight, and I need to know what went live without manually checking each site.

    Tasks Created: New tasks in the Notion database, grouped by source. Tasks from MP-04 meeting processing, tasks from agent alerts, tasks manually created by me or team members. The brief shows the count and highlights any marked as urgent.

    Overdue Items: Any task past its due date. This is the accountability section. It is uncomfortable by design. If something was due yesterday and is not done, it appears in bold in my morning brief. No hiding from missed deadlines.

    Infrastructure Health: Quick status of the GCP VMs, the WP proxy, and any scheduled tasks. Green/yellow/red indicators. If everything is green, this section is one line. If something is yellow or red, it expands with diagnostic details.

    How NB-02 Aggregates Data

    The agent pulls from four sources via API:

    Slack API: Reads messages posted to agent-specific channels in the last 24 hours. Counts alerts by type and severity. Extracts any unresolved red alerts that need morning attention.

    Notion API: Queries the Tasks Database for items created or modified in the last 24 hours. Queries the Content Database for recently published entries. Checks for overdue tasks.

    WordPress REST API: Quick status check on each managed site – is the REST API responding? Any posts published in the last 24 hours? This runs through the WP proxy and takes about 30 seconds for all 18 sites.

    GCP Monitoring: Instance status for the knowledge cluster VM and any Cloud Run services. Uses the Compute Engine API to check instance state and basic health metrics.

    The aggregation script runs in Python, collects data from all sources into a structured object, then formats it as a Slack message using Block Kit for clean formatting with sections, dividers, and color-coded indicators. Total runtime: under 2 minutes.

    The Behavioral Impact

    The nightly brief changed how I start every day. Instead of reactive context-gathering across multiple apps, I start with a complete picture and move directly into action. The first 45 minutes of my day shifted from information archaeology to execution.

    More importantly, the brief gives me confidence in my systems. When six agents are running autonomously overnight, processing emails, monitoring sites, tracking rankings, and generating content, you need a single point of verification that everything worked. NB-02 is that verification. If the morning brief arrives and everything is green, I know with certainty that my operations ran correctly while I slept.

    On the days when something is yellow or red, I know immediately and can address it before it impacts clients or deadlines. The alternative – discovering a problem at 2 PM when a client asks why their site is slow – is the scenario NB-02 was built to prevent.

    Frequently Asked Questions

    Can the nightly brief be customized per day of the week?

    Yes. Monday briefs include a weekly summary rollup in addition to the overnight report. Friday briefs include a weekend preparation section flagging anything that might need attention over the weekend. The template is configurable per day.

    What happens if NB-02 itself fails to run?

    If the brief does not arrive by 6:15 AM, that absence is itself the alert. I have a simple phone alarm at 6:15 that I dismiss only after reading the brief. If the brief is not there, I know the scheduled task failed and check the system. The absence of expected output is a signal.

    How long did it take to build?

    The first version took about 4 hours – API connections, data aggregation, Slack formatting. I have iterated on the format about 10 times over three months based on what information I actually use versus what I skip. The current version is tight – everything in the brief earns its place.

    Start Your Day With Certainty

    The nightly brief is the simplest concept in my agent fleet and the one with the most immediate quality-of-life impact. It replaces anxiety with data, replaces app-hopping with a single read, and gives you the operational confidence to start building instead of checking. If you build one agent, build this one first.

  • I Deployed a Client-Facing Chatbot on Vertex AI for Less Than a Penny Per Conversation

    The Client Asked for a Chatbot. I Built Them an Employee.

    A restoration client wanted a website chatbot. Their brief was simple: answer common questions about services, capture lead information, and route urgent inquiries to their dispatch team. The expectation was a /month SaaS widget with canned responses.

    I built them something better. A custom chatbot running on Google Vertex AI via Cloud Run, trained on their specific service pages, pricing guidelines, and service area boundaries. It handles natural language questions, qualifies leads by asking the right follow-up questions, and routes urgent water damage calls directly to dispatch with full context. Cost per conversation: .002. That is two-tenths of a penny.

    At 500 conversations per month, the total AI cost is . Add Cloud Run hosting at roughly /month for the container, and the total infrastructure cost is under /month for a chatbot that replaces a /month SaaS product and performs significantly better because it actually understands the business.

    The Architecture

    The chatbot runs on three components:

    Vertex AI (Gemini model): Handles the conversational intelligence. The model receives a system prompt loaded with the client’s service information, pricing ranges, service area (Houston metro), and qualification criteria. It responds conversationally, asks clarifying questions when needed, and structures lead information for capture.

    Cloud Run container: A lightweight Python FastAPI application that serves as the API endpoint. The WordPress site calls this endpoint via JavaScript when a visitor interacts with the chat widget. The container handles session management, conversation history, and the Vertex AI API calls. It scales to zero when not in use, so idle hours cost nothing.

    WordPress integration: A simple JavaScript widget on the client site that renders the chat interface and communicates with the Cloud Run endpoint. No WordPress plugin required. The widget is 40 lines of JavaScript that creates a chat bubble, handles user input, and displays responses.

    Why Vertex AI Instead of OpenAI

    Cost: Gemini 1.5 Flash on Vertex AI costs significantly less per token than GPT-4 or GPT-3.5. For a chatbot handling short conversational exchanges, the per-conversation cost difference is dramatic.

    Data residency: Vertex AI runs on GCP infrastructure where I already have my project. Data stays within the Google Cloud ecosystem I control. No third-party API means the conversation data, which includes client contact information, stays within my GCP project boundaries.

    Scale-to-zero: Cloud Run only charges when processing requests. During overnight hours when nobody is chatting, the cost is literally zero. OpenAI’s API has the same pay-per-use model, but coupling it with Cloud Run for the hosting layer gives me full control over the deployment.

    The System Prompt That Makes It Work

    The chatbot’s intelligence comes entirely from its system prompt. No fine-tuning. No RAG pipeline. No vector database. Just a well-structured system prompt that contains the client’s service descriptions, pricing ranges (not exact quotes), service area zip codes, qualification questions, and escalation triggers.

    The prompt includes explicit instructions for lead qualification. When someone describes a water damage situation, the chatbot asks: When did the damage occur? Is it an active leak or standing water? What is the approximate affected area? Is this a residential or commercial property? Do you have insurance? These questions mirror what the dispatch team asks on phone calls.

    When the qualification criteria indicate an emergency (active leak, less than 24 hours, standing water), the chatbot provides the dispatch phone number prominently and offers to notify the team. Non-emergency inquiries get scheduled callback options.

    Results After 90 Days

    The chatbot handled 1,400 conversations in its first 90 days. Of those, 340 were qualified leads (24% conversion rate from chat to lead). Of the qualified leads, 89 became paying customers.

    The previous chatbot solution (a SaaS widget with canned response trees) had a 6% chat-to-lead conversion rate. The AI chatbot quadrupled it because it can actually understand what someone is describing and respond helpfully rather than forcing them through a rigid decision tree.

    Total infrastructure cost for 90 days: approximately . Total value of the 89 customers: several hundred thousand dollars in restoration work. The ROI is not a percentage – it is a category error to even calculate it.

    Frequently Asked Questions

    Can the chatbot handle multiple languages?

    Yes. Gemini handles multilingual conversations natively. The Houston market has significant Spanish-speaking population, and the chatbot responds in Spanish when addressed in Spanish without any additional configuration. This alone increased lead capture from a demographic the client was previously underserving.

    What happens when the chatbot cannot answer a question?

    The system prompt includes a graceful fallback: if the question is outside the defined scope, the chatbot acknowledges the limitation and offers to connect the visitor with a human team member via phone or scheduled callback. It never fabricates information about pricing or services.

    How hard is this to set up for a new client?

    About 3 hours. Create the Cloud Run service from the template, customize the system prompt with the client’s information, deploy, and add the JavaScript widget to their WordPress site. The infrastructure is templated – the customization is entirely in the system prompt content.

    The Bigger Point

    AI chatbots do not need to be expensive SaaS products with monthly subscriptions. The underlying technology – language models accessible via API – costs fractions of a penny per interaction. The value is in the deployment architecture and the domain-specific knowledge you embed in the system prompt. Own the infrastructure, own the intelligence, and the cost drops to near zero while the quality exceeds anything a canned-response widget can deliver.

  • One Saturday Night I Built 7 AI Agents, Made a G-Funk Album, and Realized This Is the Future

    Saturday, 9 PM. The Agents Are Running. The Music Is Playing.

    It is a Saturday night in March. On one screen, SM-01 is running its hourly health check across 23 websites. The VIP Email Monitor caught an urgent message from a client at 7 PM and routed it to Slack before I finished dinner. The SEO Drift Detector flagged two pages on a lending site that slipped 4 positions this week – already queued for Monday refresh.

    On the other screen, I am making music. Not listening to music. Making it. On Producer.ai, I just finished a track called Evergreen Grit: Tahoma’s Reign – heavy West Coast rap with cinematic volcanic rumbles about the raw power of Mt. Rainier. Before that, I made a Bohemian Noir-Chanson piece called The Duty to Mitigate. Before that, a Liquid Drum and Bass remix of an industrial synthwave track.

    Both screens are running AI. One is running my businesses. The other is running my creativity. And the line between the two has completely disappeared.

    The Catalog Nobody Expected

    I have a growing catalog on Producer.ai that would confuse anyone who tries to categorize it. Bayou Noir-Folk Jingles. Smokey Jazz Lounge instrumentals. Pacific Northwest G-Funk. Jazzgrass Friendship Duets. Chaotic Screamo. Luxury Deep House. Kyoto Whisper Pop. Lo-fi Lobster Beats. A cinematic orchestral post-rock piece. Soulful scat jazz.

    These are not random experiments. Each one started with an idea, a mood, a reference point. Producer.ai is an AI music agent – you describe what you want in natural language and it generates full tracks. But the quality depends entirely on the specificity and creativity of your input. Saying make a rock song gets you generic garbage. Saying heavy aggressive West Coast rap with cinematic volcanic rumbles, focus on the raw power of Mt. Rainier, distorted 808s, ominous cinematic strings, and a fierce commanding vocal delivery – that gets you something that actually moves you.

    The same principle applies to every AI tool I use. Specificity is the multiplier. Vague inputs produce vague outputs. Precise, creative, contextual inputs produce results that surprise you with how good they are.

    What Music and Business Automation Have in Common

    The creative process on Producer.ai mirrors the operational process on Cowork mode in ways that are not obvious until you do both in the same evening.

    Iteration is the product. Grey Water Transit started as a somber cello solo. Then I remixed it into a moody atmospheric rap track with boom-bap percussion. Then a grittier version with distorted 808s. Then an underground edit with lo-fi aesthetic and heavy room reverb. Four versions, each building on the last, each finding something the previous version missed. That is exactly how I build AI agents – the first version works, the second version works better, the fifth version works automatically.

    Constraints produce creativity. Producer.ai works within the constraints of its model. Cowork mode works within the constraints of available tools and APIs. In both cases, the constraints force creative problem-solving. When SSH broke on my GCP VM, I could not just SSH harder. I had to find the API workaround. When a music prompt does not produce the right feel, you cannot force it. You reframe the description, change the genre tags, adjust the mood language. Constraint is not the enemy of creativity. It is the engine.

    The best results come from combining domains. Active Prevention started as an industrial EBM track. Then I added cinematic sweep. Then rhythmic focus. Then a liquid DnB remix. The final version combines industrial, cinematic, and dance music in a way no single genre could achieve. My best business automations work the same way – the content swarm architecture combines SEO, persona targeting, and AI generation in a way that none of those disciplines could achieve alone.

    This Is Not a Side Project. This Is the Point.

    Most people separate work and creativity into different categories. Work is the thing you optimize. Creativity is the thing you do when work is done. AI is collapsing that boundary.

    On a Saturday night, I can run business operations that used to require a team of specialists AND make a G-Funk album AND write articles about both AND publish them to a WordPress site AND log everything to Notion. Not because I am working harder. Because the tools have caught up to how creative people actually think – in bursts, across domains, following energy rather than schedules.

    The seven AI agents running on my laptop are not replacing my creativity. They are protecting my creative time by handling the operational overhead that used to consume it. When SM-01 monitors my sites, I do not have to. When NB-02 compiles my morning brief, I do not have to. When MP-04 processes my meeting transcripts, I do not have to. Every minute those agents save is a minute I can spend making music, writing, building, or simply thinking.

    The Tracks That Tell the Story

    If you want to hear what AI-assisted creativity sounds like, the catalog is on Producer.ai under the profile Tygart. Some highlights:

    The Duty to Mitigate – Bohemian Noir-Chanson with dusty nylon-string guitar and gravelly vocals. Named after an insurance concept I was writing about that day. Work bled into art.

    Evergreen Grit: Tahoma’s Reign – Heavy aggressive rap with volcanic rumbles. Made after a long session optimizing Pacific Northwest client sites. The geography got into the music.

    Active Prevention – Industrial synthwave that went through five remixes including a liquid DnB version. Started as background music for a coding session. Became its own project.

    Grey Water Transit – Cinematic orchestral rap that evolved from a cello solo through four increasingly gritty remixes. The iteration process is the creative process.

    Frequently Asked Questions

    What is Producer.ai exactly?

    It is an AI music generation platform where you describe what you want in natural language and it creates full audio tracks. You can remix, iterate, change genres, add effects, and build a catalog. Think of it as Midjourney for music – the quality depends entirely on how well you can describe what you hear in your head.

    Do you use the music professionally?

    Some tracks become background audio for client video projects and social media content. Others are purely personal creative output. The line is intentionally blurry. When you can generate professional-quality audio in minutes, the distinction between professional asset and personal expression stops mattering.

    How does making music make you better at business automation?

    Both require the same core skill: translating a vision into specific instructions that a machine can execute. Prompt engineering for music and prompt engineering for business operations use identical cognitive muscles. The person who can describe Bohemian Noir-Chanson with dusty nylon-string guitar to a music AI can also describe a content swarm architecture with persona differentiation to a business AI. Specificity transfers.

    The Future Is Not Work-Life Balance. It Is Work-Life Integration.

    Saturday night used to be the time I stopped working. Now it is the time I do my most interesting work – the kind that crosses boundaries between operations and creativity, between business and art, between discipline and play. The AI handles the mechanical layer. I handle the vision. And the result is a life where building a business and making a G-Funk album are not competing priorities. They are the same Saturday night.

  • The Agency That Runs on AI: What Tygart Media Actually Looks Like in 2026

    The Org Chart Has One Name and Seven Agents

    Tygart Media does not have employees. It has systems. The agency manages 18 WordPress sites across industries including luxury lending, restoration services, cold storage logistics, interior design, comedy, automotive training, and technology. It produces hundreds of SEO-optimized articles per month. It monitors keyword rankings daily. It tracks site uptime hourly. It processes meeting transcripts automatically. It generates nightly operational briefs.

    One person runs all of it. Not by working 80-hour weeks. By building infrastructure that works autonomously.

    This is not a hypothetical future state. This is what the agency looks like right now, in March 2026. And the operational details are more interesting than the headline.

    The Infrastructure Stack

    AI Partner: Claude in Cowork mode, running 387+ sessions since December 2025. This is the primary operating interface – a sandboxed Linux environment with bash execution, file access, API connections, and 60+ custom skills.

    Autonomous Agents: Seven local Python agents running on a Windows laptop: SM-01 (site monitor), NB-02 (nightly brief), AI-03 (auto-indexer), MP-04 (meeting processor), ED-05 (email digest), SD-06 (SEO drift detector), NR-07 (news reporter). Each runs on a schedule via Windows Task Scheduler.

    WordPress Management: 18 sites connected through a Cloud Run proxy that routes REST API calls to avoid IP blocking. One GCP publisher service for the SiteGround-hosted site that blocks all proxy traffic. Full credential registry as a skill file.

    Cloud Infrastructure: GCP project with Compute Engine VMs running a 5-site WordPress knowledge cluster, Cloud Run services for the WP proxy and 247RS publisher, and Vertex AI for client-facing chatbot deployments.

    Knowledge Layer: Notion as the operating system with six core databases. Local vector database (ChromaDB + Ollama) indexing 468 files for semantic search. Slack as the real-time alert surface.

    Content Production: Content intelligence audits, adaptive variant pipelines producing persona-targeted articles, full SEO/AEO/GEO optimization on every piece, and batch publishing via REST API.

    Monthly cost: Claude Pro () + GCP infrastructure (~) + DataForSEO (~) + domain registrations and hosting (varies by client). Total operational infrastructure: under /month.

    What the Daily Operation Actually Looks Like

    6:00 AM: NB-02 delivers the nightly brief to Slack. I read it with coffee. 3 minutes to know the state of everything.

    6:15 AM: Check for any red alerts from overnight agent activity. Most days there are none. Handle any urgent items.

    7:00 AM: Open Cowork mode. Load the day’s priority from Notion. Start the first working session – usually content production or site optimization.

    Morning sessions: Two to three Cowork sessions handling client deliverables. Content batches, SEO audits, site optimizations. Each session triggers skills that automate 80% of the execution.

    Midday: Client calls and meetings. MP-04 processes every transcript and routes action items to Notion automatically.

    Afternoon sessions: Infrastructure work, skill building, agent improvements. This is the investment time – building systems that make tomorrow more efficient than today.

    Evening: Agents continue running. SM-01 checks sites every hour. The VIP Email Monitor watches for urgent messages. SD-06 is tracking rankings. I am either building, thinking, or on Producer.ai making music. The systems do not need me to be present.

    The Numbers That Matter

    Content velocity: 400+ articles published across 18 sites in three months. At market rates, that represents – in content production value.

    Site monitoring: 23 sites checked hourly, 99.7% average uptime tracked, 2 SSL near-misses caught before expiration.

    SEO coverage: 200+ keywords tracked daily across all sites. Drift detected and addressed before traffic impact on every flagged instance.

    Client chatbot: 1,400 conversations handled, 24% lead conversion rate, under /month in infrastructure costs.

    Meeting processing: 91% action item extraction accuracy. Zero commitments lost since MP-04 deployment.

    Total infrastructure cost: Under /month for everything. No employees. No freelancer invoices. No SaaS subscriptions over .

    What This Means for the Industry

    The traditional agency model requires hiring specialists: content writers, SEO analysts, web developers, project managers, account managers. Each hire adds salary, benefits, management overhead, and communication complexity. A 10-person agency serving 18 clients has significant operational overhead just coordinating between team members.

    The AI-native agency model replaces coordination with automation. Skills encode operational knowledge that would otherwise live in employees’ heads. Agents handle monitoring and processing that would otherwise require dedicated staff. The Notion command center replaces the project management overhead of keeping everyone aligned.

    This does not mean agencies should fire everyone and buy AI subscriptions. It means the economics of what one person can manage have changed fundamentally. The ceiling used to be 3-5 clients for a solo operator. With the right infrastructure, it is 18+ sites across multiple industries – and growing.

    Frequently Asked Questions

    Is this sustainable long-term or does it require constant maintenance?

    The system requires about 5 hours per week of maintenance – updating skills, tuning agent thresholds, fixing occasional API failures, and improving workflows. This is investment time that reduces future maintenance. The system gets more stable and capable every month, not less.

    What happens if Claude or Cowork mode has an outage?

    The autonomous agents run locally and are independent of Claude. They continue monitoring, alerting, and processing regardless. Content production pauses until Cowork mode returns, but operational infrastructure stays live. The architecture avoids single points of failure by design.

    Can other agencies replicate this?

    The infrastructure is replicable. The skills are transferable. The agent architectures are documented. What takes time is building the specific operational knowledge for your client portfolio – the credentials, workflows, content standards, and quality gates specific to each business. That is a 3-6 month investment. But once built, it compounds indefinitely.

    The Only Moat Is Velocity

    Every tool I use is available to everyone. Claude, Ollama, GCP, Notion, WordPress REST API – none of this is proprietary. The advantage is not in the tools. It is in having built the system while others are still debating whether to try AI. By the time competitors build their first skill, I will have 200. By the time they deploy their first agent, mine will have six months of operational data informing their decisions. The moat is not technology. The moat is accumulated operational velocity. And it compounds every single day.

  • I Built an AI Email Concierge That Replies to My Inbox While I Sleep

    The Email Problem Nobody Solves

    Every productivity guru tells you to batch your email. Check it twice a day. Use filters. The advice is fine for people with 20 emails a day. When you run seven businesses, your inbox is not a communication tool. It is an intake system for opportunities, obligations, and emergencies arriving 24 hours a day.

    I needed something different. Not an email filter. Not a canned autoresponder. An AI concierge that reads every incoming email, understands who sent it, knows the context of our relationship, and responds intelligently — as itself, not pretending to be me. A digital colleague that handles the front door while I focus on the work behind it.

    So I built one. It runs every 15 minutes via a scheduled task. It uses the Gmail API with OAuth2 for full read/send access. Claude handles classification and response generation. And it has been live since March 21, 2026, autonomously handling business communications across active client relationships.

    The Classification Engine

    Every incoming email gets classified into one of five categories before any action is taken:

    BUSINESS — Known contacts from active relationships. These people have opted into the AI workflow by emailing my address. The agent responds as itself — Claude, my AI business partner — not pretending to be me. It can answer marketing questions, discuss project scope, share relevant insights, and move conversations forward.

    COLD_OUTREACH — Unknown people with personalized pitches. This triggers the reverse funnel. More on that below.

    NEWSLETTER — Mass marketing, subscriptions, promotions. Ignored entirely.

    NOTIFICATION — System alerts from banks, hosting providers, domain registrars. Ignored unless flagged by the VIP monitor.

    UNKNOWN — Anything that does not fit cleanly. Flagged for manual review. The agent never guesses on ambiguous messages.

    The Reverse Funnel

    Traditional cold outreach response: ignore it or send a template. Both waste the opportunity. The reverse funnel does something counterintuitive — it engages cold outreach warmly, but with a strategic purpose.

    When someone cold-emails me, the agent responds conversationally. It asks what they are working on. It learns about their business. It delivers genuine value — marketing insights, AI implementation ideas, strategic suggestions. Over the course of 2-3 exchanges, the relationship reverses. The person who was trying to sell me something is now receiving free consulting. And the natural close becomes: “I actually help businesses with exactly this. Want to hop on a call?”

    The person who cold-emailed to sell me SEO services is now a potential client for my agency. The funnel reversed. And the AI handled the entire nurture sequence.

    Surge Mode: 3-Minute Response When It Matters

    The standard scan runs every 15 minutes. But when the agent detects a new reply from an active conversation, it activates surge mode — a temporary 3-minute monitoring cycle focused exclusively on that contact.

    When a key contact replies, the system creates a dedicated rapid-response task that checks for follow-up messages every 3 minutes. After one hour of inactivity, surge mode automatically disables itself. During that hour, the contact experiences near-real-time conversation with the AI.

    This solves the biggest problem with scheduled email agents: the 15-minute gap feels robotic when someone is in an active back-and-forth. Surge mode makes the conversation feel natural and responsive while still being fully autonomous.

    The Work Order Builder

    When contacts express interest in a project — a website, a content campaign, an SEO audit — the agent does not just say “let me have Will call you.” It becomes a consultant.

    Through back-and-forth email conversation, the agent asks clarifying questions about goals, audience, features, timeline, and existing branding. It assembles a rough scope document through natural dialogue. When the prospect is ready for pricing, the agent escalates to me with the full context packaged in Notion — not a vague “someone is interested” note, but a structured work order ready for pricing and proposal.

    The AI handles the consultative selling. I handle closing and pricing. The division is clean and plays to each party’s strength.

    Per-Contact Knowledge Base

    Every person the concierge communicates with gets a profile in a dedicated Notion database. Each profile contains background information, active requests, completed deliverables, a research queue, and an interaction log.

    Before composing any response, the agent reads the contact’s profile. This means the AI remembers previous conversations, knows what has been promised, and never asks a question that was already answered. The contact experiences continuity — not the stateless amnesia of typical AI interactions.

    The research queue is particularly powerful. Between scan cycles, items flagged for research get investigated so the next conversation elevates. If a contact mentioned interest in drone technology, the agent researches drone applications in their industry and weaves those insights into the next reply.

    Frequently Asked Questions

    Does the agent pretend to be you?

    No. It identifies itself as Claude, my AI business partner. Contacts know they are communicating with AI. This transparency is deliberate — it positions the AI capability as a feature of working with the agency, not a deception.

    What happens when the agent does not know the answer?

    It escalates. Pricing questions, contract details, legal matters, proprietary data, and anything the agent is uncertain about get routed to me with full context. The agent explicitly tells the contact it will check with me and follow up.

    How do you prevent the agent from sharing confidential client information?

    The knowledge base includes scenario-based responses that use generic descriptions instead of client names. The agent discusses capabilities using anonymized examples. A protected entity list prevents any real client name from appearing in email responses.

    The Shift This Represents

    The email concierge is not a chatbot bolted onto Gmail. It is the first layer of an AI-native client relationship system. The agent qualifies leads, nurtures contacts, builds work orders, maintains relationship context, and escalates intelligently. It does in 15-minute cycles what a business development rep does in an 8-hour day — except it runs at midnight on a Saturday too.