Category: Restoration Intelligence

The definitive resource for restoration company operators — business operations, marketing, estimating, AI, and growth strategy.

  • Restoration Company Marketing in 2026: LSA vs Google Ads vs SEO — Real CAC Numbers

    Restoration Company Marketing in 2026: LSA vs Google Ads vs SEO — Real CAC Numbers

    Restoration company marketing is one of the most expensive paid-search categories in the United States. “Water damage restoration” keywords routinely clear $60–$85 per click in competitive markets, with reported outlier bids running well over $200 in metros like New York, Houston, and South Florida. Industry tracking has flagged some emergency-restoration terms breaking $500 per click in specific moments. Meanwhile, the average home-services lead via Google Local Service Ads (LSA) is roughly $53 — but water damage restoration sits at the premium end, with reported LSA cost-per-lead ranges of approximately $80–$180 depending on market.

    If you run a $3M–$15M restoration company, this is the single biggest line item that nobody on your team is staring at correctly. Owners hear “marketing” and think website. The real fight in 2026 is channel allocation: how much should you spend on LSA, how much on Google Search Ads, and how much on owned SEO — and at what point does each one stop scaling? Here is the honest breakdown a $5M owner needs before their next marketing budget meeting.

    The three channels that actually matter

    For commercial water and fire restoration in 2026, three channels do the heavy lifting: Google Local Service Ads (the LSA “Google Guaranteed” boxes at the very top of the SERP), Google Search Ads (the paid text ads below LSA), and organic SEO (the map pack plus blue links). Everything else — Yelp, Angi, HomeAdvisor, Facebook, programmatic display, lead-broker buys — is either supplemental, declining, or actively cannibalizing your margin. The first decision is choosing where the bulk of your new-customer budget goes among those three.

    Local Service Ads (LSA) — the default starting point in 2026

    LSA is the highest-real-estate placement on a phone screen, period. For emergency-driven categories like water damage and mold, that real estate matters more than anything else. Reported 2026 cost-per-lead for water damage restoration through LSA generally falls in the $80–$180 range, with some markets reporting averages closer to $100 in stable competitive conditions. On a $6,000 average ticket, even a $150 LSA lead at a 25–35% close rate produces a customer acquisition cost (CAC) of roughly $450–$600 — which is workable on jobs that gross $1,800–$2,400.

    The catch: Google removed credits for “job type not serviced” and “geo not serviced” leads in 2025, meaning every junk lead now hits your card with no recourse. You have to dispute leads inside Google’s dispute window and you have to answer your phone in under 30 seconds. LSA also weights reviews more heavily than any other channel — a 4.6 average will visibly underperform a 4.9 in the same zip code. If your review velocity is under 8 per month, fix that before you scale LSA spend.

    Google Search Ads — the diminishing-returns layer

    Below LSA, traditional Google Search Ads remain expensive and uneven. Reported 2026 average CPC for water damage restoration keywords falls into bands: bottom-of-funnel emergency keywords like “emergency water damage [city]” run $60–$85; less-direct terms like “water damage cleanup near me” run $40–$65; awareness-stage keywords like “what to do after a flood” run $20–$40. The trap is that close rates on Search Ads have been compressing for three reasons: LSA is taking the highest-intent clicks, AI Overviews are stealing informational queries, and click fraud from competitor bots remains nontrivial.

    For most restoration owners, Search Ads should be a defense-and-coverage play, not a primary growth channel. Bid on your own brand name to keep TPA programs and franchise competitors from arbitraging your traffic. Bid on the keywords LSA does not cover well (commercial, mold remediation, biohazard, contents pack-out). Cap monthly spend. Watch the CAC, not the CPC.

    SEO — the compounding asset that owners under-invest in

    Owned SEO — Google Business Profile plus a real content engine on the company website — is where the math eventually breaks in your favor. Multiple cross-industry benchmarks in 2025–2026 put the cost-per-lead delta between SEO and paid search at roughly 4x–6x lower for SEO once a site is mature (typically 12–18 months in). One widely cited cross-industry benchmark places SEO CPL near $31 versus paid search closer to $181. Restoration-specific tracking from agencies serving the category has reported organic CPL well under $50 in established markets after 18+ months of investment, while paid CPL stays in the $150+ band.

    The painful truth: SEO has a CAC of essentially zero on the marginal lead, but you cannot start it in January and expect leads in March. The owners who win SEO in restoration started 24 months ago, publish 6–12 useful pieces a month, and have a Google Business Profile with 500+ reviews and weekly post activity. If you have not started, your starting line is today — not next quarter.

    The honest allocation for a $5M restoration company in 2026

    A defensible 2026 marketing budget for a $5M residential and small-commercial restoration company, assuming 60% TPA-fed and 40% self-generated, looks roughly like this on the self-gen side:

    • LSA: 45–55% of self-gen ad spend. Highest immediate ROI. Cap by service area until close rate clears 30%.
    • Google Search Ads: 15–20%. Brand defense plus commercial, mold, and biohazard keywords LSA underweights.
    • SEO and Google Business Profile: 25–35%. This is content, on-site technical work, review-generation systems, and GBP weekly posts. Treat it as an asset, not a cost.
    • Everything else (Yelp, Angi, Nextdoor, paid social): under 5% combined, and only with tracked phone numbers per channel.

    If your current mix is 80%+ LSA and 0% SEO, you are renting your customer pipeline from Google at a rate that will keep rising. If your current mix is 80%+ SEO and 0% LSA, you are leaving the highest-intent emergency calls on the table for competitors who will outbid you for them.

    What to measure, not what to chase

    CPC, CPL, and CAC are not the same number. Restoration owners chase CPC because Google Ads dashboards make it visible. The metric that should sit on your monitor is blended CAC by channel, calculated quarterly: total channel spend divided by booked jobs from that channel. Track three more numbers next to it — close rate from lead to booked job, average ticket size by channel, and lifetime value adjustments for repeat and referral. A $180 LSA lead with a 35% close on $7,000 average ticket is a different business than a $40 organic lead with a 12% close on $2,200 average ticket — even though the CPL looks better in column B.

    Bottom line

    In 2026, LSA pays the bills, Search Ads defends the perimeter, and SEO is the only channel that compounds. The restoration owners who will be writing larger checks to their estimators in 2028 are the ones who fund all three this year — and the ones who refuse to pay $150 for a water damage lead because “that’s expensive” will keep watching franchise competitors and out-of-town aggregators win the calls that finance their own retirement. The expensive lead is the one you didn’t bid on at 2 a.m. when the house was actively flooding.

    Frequently Asked Questions

    What is a good cost per lead for a water damage restoration company in 2026?

    Reported 2026 ranges put water damage LSA cost-per-lead at roughly $80–$180, with some stable markets averaging closer to $100. Google Search Ads CPL is generally higher and more volatile. Organic SEO CPL trends well under $50 in mature programs after 12–18 months. Evaluate against your average job size and close rate, not against a flat industry number.

    Are Google Local Service Ads still worth it for restoration companies?

    Yes, for emergency categories LSA remains the most cost-efficient paid channel in 2026 because of its top-of-screen placement and pay-per-lead structure. The caveats: Google removed credit for off-service-area and wrong-job-type leads, review velocity matters more than ever, and you have to answer the phone in under 30 seconds to keep ranking.

    How long until SEO produces restoration leads?

    Plan on 9–12 months for a Google Business Profile and review-driven program to generate meaningful local-pack volume, and 12–18 months for content-driven organic leads to show up in any volume. Owners who treat SEO as a 6-month sprint nearly always abandon it 30 days before it would have started working.

    Should I use a marketing agency or build in-house?

    Under $3M revenue, hire one credible local agency for LSA plus GBP and own SEO with a part-time writer. From $3M–$10M, split LSA/Search Ads with an agency and bring SEO content in-house under a marketing coordinator. Above $10M, build the function internally with a director-level hire — at that size your marketing spend funds a salary and the data needs to live on your side of the firewall.

  • What Restoration Companies Actually Sell For in 2026 (And What Kills the Deal at Close)

    What Restoration Companies Actually Sell For in 2026 (And What Kills the Deal at Close)

    Every restoration owner over fifty has the same question stuck in the back of their head: what is this thing actually worth? The honest answer in 2026 is somewhere between 2.3x SDE and 7x EBITDA — and the spread between those two numbers is not luck. It is the difference between a company a buyer wants and a company a buyer tolerates.

    Here is what is happening in the market right now, what private equity is paying, and what kills the deal at the eleventh hour.

    The 2026 Multiple Spread

    Restoration M&A in 2026 sorts cleanly into three tiers. The cutoffs matter — they are not aesthetic.

    Tier 1 — Sub-$2M revenue shops. Owner-operator businesses with one or two trucks, dependent on the founder for sales and crew leadership. These transact on Seller’s Discretionary Earnings (SDE), not EBITDA. Typical multiples: 2.3x to 3.0x SDE. The buyer is usually another restoration owner, a search-fund operator, or an industry veteran on their second act. There is no PE in this tier. The owner doing the work IS the asset, and that is exactly the problem.

    Tier 2 — $2M to $5M revenue shops. The PE feeder zone. These get bought by platforms like BluSky, First Onsite, Belfor, ATI, and Code Red as bolt-on acquisitions. Multiples: 3.0x to 3.5x SDE, or 4x to 5x EBITDA if the company is clean enough to have real EBITDA at all. Purchase prices land between $900K and $2.5M. This is the sweet spot for industry roll-ups — large enough to have a real second-in-command, small enough to absorb without indigestion.

    Tier 3 — $10M+ revenue, $2M+ EBITDA platforms. Now you are talking to PE directly, not through a strategic. Multiples: 5x to 7x EBITDA, occasionally higher for the right footprint. BluSky has announced 13 acquisitions in the last six years under Kohlberg & Company and Partners Group ownership. American Restoration rolled up 8 brands before exiting to Morgan Stanley. HighGround did 13 deals in five years before selling to Knox Lane. The playbook is well-documented. PE has put more than $6 billion into the space since 2018.

    What Buyers Actually Pay For

    The multiple is a function of risk, not affection. Sophisticated buyers pay up for five things, in roughly this order:

    1. Insurance carrier preferred-vendor status. If you are on the panel for State Farm, Allstate, USAA, Liberty Mutual, or any TPA program — Contractor Connection, Alacrity, Code Blue — that contract is the asset. It is also the hardest thing to replicate. Buyers will pay a premium for it because they cannot buy it any other way except by buying you.

    2. Mitigation-heavy revenue mix. Water mitigation runs gross margins around 70-80%. Reconstruction often runs 10% or less. A company that is 65% mitigation and 35% reconstruction is worth materially more than the same revenue split inverted. Buyers will pull your job-cost reports line by line during diligence to confirm the mix is real and not just how you are categorizing.

    3. Management depth below the founder. If you can take a two-week vacation and revenue does not blink, your multiple goes up by half a turn. If the phones stop ringing the moment you leave, you are selling a job, not a business. Hire a real general manager 18 months before you list.

    4. CAT exposure under 20%. Catastrophic event revenue is lumpy and cannot be modeled. If 40% of your last three years came from one hurricane season, buyers will discount that revenue heavily — sometimes valuing CAT-driven dollars at half the multiple of recurring carrier work. Diversify your revenue base before going to market.

    5. Clean books with a Quality of Earnings opinion. Every PE-backed deal includes a QoE — an outside accounting firm that re-audits your trailing twelve months and normalizes EBITDA. If your books are run on a personal-finance app and your CPA does taxes once a year, expect the QoE to find $200K-$500K of EBITDA adjustments that go against you. Spend $40K on a CFO-for-hire and a real GAAP P&L two years before sale.

    What Kills the Deal

    Roughly 30-40% of restoration LOIs do not close. Almost always for reasons the seller could have prevented.

    The biggest deal-killer is customer concentration. If one TPA program represents more than 35% of revenue, buyers panic. They have seen what happens when Contractor Connection decides to rebid a region — entire $8M revenue lines disappear in a quarter. Diversify before you list.

    The second is uncollected aged receivables. Restoration AR over 90 days is not an asset, it is a write-down waiting to happen. Buyers will deduct uncollected AR from purchase price dollar-for-dollar. Aggressively collect or write off everything before you go to market.

    The third is licensing and certification gaps. IICRC, state contractor licenses, mold remediation certifications by state — buyers run a full compliance audit. A single expired contractor license in a key state can cost $50K-$150K at close.

    The fourth is founder dependency on first-call relationships. If the property manager calls you personally when there is a flood — not a dispatch number, not a sales rep — buyers will require an earnout structure that makes you stay another three to five years. Most owners hate earnouts because they convert sale price into deferred contingent comp. Build the dispatch infrastructure before you list, and you keep the cash up front.

    The Honest Bottom Line

    If you are a $3M revenue restoration company today and you want a clean exit at a real multiple, you have an 18-to-24 month preparation window. Use it to get the books on accrual, hire a GM, diversify off any single TPA, build mitigation revenue past 60% of mix, and get every certification current.

    Do that, and a $3M shop running 18% EBITDA margins ($540K) sells at 4.5x to a strategic — about $2.4M cash at close. Skip it, and the same company sells at 2.6x SDE — closer to $1.4M, often with a punishing earnout attached.

    The difference is one million dollars. The work to capture it is roughly nine months of operator focus. That is the highest-ROI work an exiting restoration owner can do.

  • Snowflake’s $200M Claude Partnership and India’s Glasswing Gap: Two Enterprise Stories That Matter

    Two partnership and policy stories from the Anthropic desk that haven’t been covered here yet, both with meaningful implications for how Claude reaches enterprise users and how governments are thinking about AI security risk.

    Part 1: Snowflake’s $200M Partnership — 12,600 Enterprise Customers as Distribution

    In December 2025, Anthropic and Snowflake announced a multi-year, $200M partnership making Claude models available to Snowflake’s 12,600+ enterprise customers across all three major clouds. The partnership makes Claude the AI layer inside Snowflake’s data platform for a client base concentrated in financial services, healthcare, and life sciences — the three regulated verticals where Anthropic has been most deliberately building.

    The specific products:

    • Snowflake Intelligence — powered by Claude Sonnet 4.6, providing conversational data analysis directly within the Snowflake environment
    • Snowflake Cortex AI Functions — supporting Claude Opus 4.5 and newer models for structured AI functions across the Snowflake data warehouse

    Source: anthropic.com/news/snowflake-anthropic-expanded-partnership

    The number that matters most here isn’t $200M — it’s 12,600. That’s the customer count Snowflake brings as a distribution channel. These are enterprise organizations that have already made a procurement decision to standardize on Snowflake for data infrastructure. Embedding Claude inside that infrastructure means Claude becomes the AI system those organizations reach for when they need to query, analyze, or reason about their own data — without requiring a separate AI platform procurement decision.

    This is the distribution model that makes enterprise AI market share move: not direct sales to 12,600 enterprises, but a single partnership that makes Claude the default AI layer inside infrastructure those enterprises already use. Snowflake customers in financial services can run Claude-powered compliance analysis on their own Snowflake data. Healthcare organizations can run Claude-powered analysis on patient data that stays within their existing Snowflake security perimeter.

    The regulated-industry focus is deliberate. Financial services, healthcare, and life sciences are the verticals where data governance requirements are strictest — and where the ability to run AI on your own data, within your own security perimeter, without moving that data to an external AI service, is the deciding factor in procurement. Snowflake’s existing data residency and compliance infrastructure makes that possible in a way that a direct Anthropic API call often doesn’t.

    Part 2: India’s RBI Warning + The Glasswing Gap

    In late April 2026, India’s Finance Ministry and Reserve Bank of India convened meetings on cybersecurity preparedness specifically referencing Claude Mythos risk. Finance Minister Nirmala Sitharaman met with bank executives at North Block to advise pre-emptive hardening. The RBI began consulting with global regulators. CERT-In, major telcos, and fintechs ran parallel risk assessments.

    Source: Business Standard, April 27, 2026 — business-standard.com

    The structural issue underneath the news: Project Glasswing — Anthropic’s defensive cybersecurity consortium that provides early access to Mythos for defensive purposes — named the following founding partners: AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan Chase, Microsoft, and Nvidia. Zero Indian firms. India is Anthropic’s second-largest market globally. Its government is actively warning its financial sector about Mythos risk. And no Indian organization is in the defender consortium that gets early access to the model and the defensive research that goes with it.

    This is not a small gap. The Mozilla Firefox result (271 vulnerabilities in a month, including 20-year-old bugs) demonstrated what Mythos can do in a real production codebase. If that capability is available to offensive actors — or if non-partner organizations don’t have the same early visibility into what Mythos can find — organizations outside the Glasswing partner network are in a different risk position than those inside it.

    The Tension This Creates

    Anthropic’s distribution into India is accelerating. Cognizant deployed Claude across 350,000 employees. Razorpay built its Agent Studio on the Claude Agent SDK and wired UPI rails through Claude as an authorized payment agent with NPCI. Air India, CRED, and Swiggy are named enterprise customers. India is Anthropic’s second-largest market.

    Meanwhile: India’s government is warning its financial sector about the offensive potential of Claude Mythos, no Indian firm is in the Glasswing defender consortium, and INR-denominated pricing (with 18% GST) makes the effective Pro subscription cost approximately ₹2,240/month for Indian users — a meaningful friction point for the market Anthropic is describing as its #2 global market.

    The distribution is running faster than the partnership infrastructure is opening. Either Project Glasswing expands to include Indian financial institutions and cybersecurity organizations, or India builds its own parallel defensive capacity, or the gap becomes a structural political fact in Anthropic’s India relationship.

    India’s government isn’t opposed to Claude. It’s actively adopting it across both public and private sector. The RBI/Finance Ministry meetings were framed as hardening preparation, not restriction. But the asymmetry — India as top-2 market, zero Indian firms in the defender consortium — is conspicuous enough that it will eventually require a response.

    Frequently Asked Questions

    What does the Snowflake-Anthropic partnership include?

    A multi-year, $200M agreement announced December 2025, making Claude models available to Snowflake’s 12,600+ enterprise customers. Snowflake Intelligence launched powered by Claude Sonnet 4.6 for conversational data analysis (model at time of partnership announcement; verify current model with Snowflake). Snowflake Cortex AI Functions supports Opus 4.5 and newer models. The focus is regulated industries: financial services, healthcare, and life sciences.

    What is Project Glasswing?

    Project Glasswing is Anthropic’s invitation-only defensive cybersecurity program that provides early access to Claude Mythos Preview for organizations working to defend critical infrastructure. Named founding partners include AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan Chase, Microsoft, and Nvidia. Access is invitation-only with no self-serve sign-up. No Indian organizations are currently named as Glasswing partners.

    Why is India’s government warning about Claude Mythos if India is Anthropic’s second-largest market?

    The Indian government’s meetings (RBI, Finance Ministry, CERT-In) were framed as defensive preparation, not restriction. The concern is that Mythos-tier capability could be used offensively against Indian financial infrastructure — a legitimate risk that applies regardless of Anthropic’s commercial relationship with India. The tension is that organizations inside Project Glasswing get early access to defensive research while India’s financial sector, with no Glasswing presence, does not.

  • Cowork Routines and Windows Computer Use: What’s New and How We’re Using Both

    Two Cowork capabilities that haven’t been written about here yet, despite being live since late April: Cowork Routines (always-on scheduled tasks that run when your laptop is closed) and Windows computer use (Claude operating your Windows desktop directly from within Cowork). Both shipped in the April 28–30 window alongside the Claude GA release. Both materially change what Cowork is.

    Cowork Routines: The Laptop Can Be Closed

    The original Cowork model required your laptop to be open and the Cowork desktop app to be running. Useful — but bounded by your hardware being available and powered on. Cowork Routines changes that.

    Routines are cloud-hosted scheduled tasks that execute on Anthropic’s infrastructure regardless of your local hardware state. They run on a schedule you define. They execute when your laptop is off, sleeping, or in your bag on a plane. The task runs, the output lands where you configured it to land, and when you open the laptop you find the work done.

    The practical scope of what runs well as a Routine:

    • Daily briefings: Pull sources, synthesize, write to Notion or email — delivered before you open your laptop each morning
    • Monitoring tasks: Check a source on a schedule, flag anomalies, log findings
    • Content pipeline steps: Recurring publication tasks, social scheduling prep, site audit runs
    • Report generation: Weekly status documents assembled from live data sources
    • Notification triggers: Watch a condition, fire an action when it’s met

    We run our own Claude Newspaper Desk — a daily briefing that checks Anthropic’s news, release notes, GitHub releases, and external coverage, then writes a structured briefing to Notion before we start the day. That’s a Routine. The briefing that generated this article was produced by a Routine running on a schedule, not by someone manually triggering a task.

    The architectural decision that makes Routines significant: the task reads its instructions from a Notion desk spec page at runtime, not from a baked-in prompt. Change the Notion spec, change what the Routine does — without touching the scheduled task itself. The shim file that triggers the Routine is thin by design; the intelligence lives in Notion.

    Windows Computer Use: Claude Operates Your Desktop

    Computer use in Claude — the ability for Claude to navigate desktop interfaces, click through UI, fill forms, and verify results — was previously available primarily in research preview and on macOS. The April 2026 Cowork release brought computer use to Windows as a generally available capability within the Cowork desktop app.

    What this means in practice: Claude can open a native Windows application, navigate its interface, perform a sequence of actions, and hand the result back — without you needing to automate it through code or build an API integration. If there’s a tool that only has a Windows UI and no API, Claude can use the Windows UI directly.

    The current state of computer use is honest about its scope. It’s good at:

    • Navigating well-structured desktop applications with clear UI hierarchies
    • Form completion across multiple-step workflows
    • Data extraction from desktop tools that don’t export well
    • Verification steps that require visual confirmation

    It’s slower than direct API integrations when those exist. For tools with APIs, use the API. Computer use is the path when no API exists or when the integration cost exceeds the value of doing it properly.

    The combination of Routines + Windows computer use means a scheduled task can now include a step that operates a Windows desktop application — unattended, while your laptop is running in the background. That’s a meaningfully different capability than what Cowork shipped with originally.

    How We’re Using Both

    Our Cowork architecture as of May 2026:

    • Cowork as execution layer — always-on laptop running scheduled tasks
    • Notion as control plane — desk specs, task queues, logs, and credential storage
    • GCP Cloud Run as action layer — WordPress publishing, API calls, content pipeline steps
    • Claude Code Routines as cloud fallback — tasks that need to run independent of local hardware

    Routines handle the tasks where continuous availability matters more than local context: briefings, monitoring, scheduled publishing. Cowork handles the tasks where rich local context matters: multi-step sessions with file access, browser navigation, and tools that live on the local machine.

    The practical division: if the task needs to run at 3am when the laptop is sleeping, it’s a Routine. If the task needs to interact with local files, a browser session, or a Windows app, it’s Cowork.

    The Non-Developer Angle

    Neither of these capabilities requires you to be a developer to use. Routines are configured through the Cowork interface with natural language task descriptions and a schedule. Computer use activates through the same conversational interface you’re already using.

    The architecture underneath is sophisticated. The interface isn’t. You describe what you want done and when, and the system figures out the implementation. This is the progression that makes these capabilities meaningful for operations teams, executive assistants, knowledge workers, and small business owners — not just engineers building agent pipelines.

    Singapore’s Foreign Minister Balakrishnan built his own version of this on a Raspberry Pi. The point isn’t to build your own — it’s that the underlying architecture (persistent memory, scheduled tasks, multi-channel input) is now accessible at multiple layers of sophistication, from DIY open source to fully managed product.

    Frequently Asked Questions

    What are Cowork Routines?

    Cowork Routines are cloud-hosted scheduled tasks that run on Anthropic’s infrastructure regardless of whether your local Cowork laptop is on or available. They execute on a schedule you define — daily, weekly, or at specific times — and can perform any task Cowork handles: briefings, monitoring, content pipeline steps, report generation, and notification triggers. Each Routine reads its instructions from a Notion desk spec at runtime.

    Does Windows computer use require coding to set up?

    No. Computer use in Cowork activates through the standard conversational interface. You describe what you want Claude to do in the application, and Claude navigates the Windows desktop UI directly. No scripting, automation code, or API integration is required — though API integrations are faster when they exist. Computer use is the path for tools with no accessible API.

    What’s the difference between Cowork and Cowork Routines?

    Cowork runs on your local machine and requires the desktop app to be open and active. Routines run on cloud infrastructure and execute regardless of local hardware state. The practical division: tasks that need to run unattended on a schedule go to Routines; tasks that need local context, file access, or desktop UI interaction go to Cowork. Both read task instructions from Notion desk spec pages at runtime.

    Is Cowork available on both Mac and Windows?

    Yes. Cowork and computer use are available on both macOS and Windows as of the April 2026 general availability release. The Windows release also established PowerShell as the default shell (previously Git Bash was required), reducing a friction point for enterprise Windows shops.

  • Harvard FAS Replaces ChatGPT Edu With Claude: What the Switch Signals

    Harvard’s Faculty of Arts and Sciences will provide Claude access to all affiliates — students, faculty, staff, and researchers — and will discontinue ChatGPT Edu after June 2026. Continuing ChatGPT Edu access will require “administrative and budgetary approval.” Harvard FAS also holds a Google Gemini institutional agreement. The story was reported by The Harvard Crimson on April 28, 2026.

    This is the cleanest institutional AI platform switch yet on record. Harvard FAS covers roughly 20,000 affiliates. The administrative approval language around ChatGPT Edu continuation is the detail that tells you this isn’t additive — it’s a replacement.

    What Actually Happened

    Harvard FAS is not abandoning all AI tools. It’s rotating its primary institutional AI platform from ChatGPT Edu to Claude. The Gemini institutional agreement stays. What’s changing is which AI system gets the default institutional license, the frictionless path, the one that “just works” for every affiliate without requiring a separate approval process.

    That framing matters. When an institution of Harvard FAS’s size structures access so that one platform requires administrative approval to continue while another is provided automatically to all affiliates, the default is the decision. The approval requirement for ChatGPT Edu isn’t a ban — it’s a friction tax that most users won’t bother to pay.

    Why Institutions Switch AI Platforms

    The Harvard Crimson’s reporting framed the switch as “platform rotation based on capability” — not a permanent commitment to any single AI provider. That framing is worth taking seriously. Academic institutions making technology decisions at this scale move deliberately, and the stated rationale (capability) suggests the evaluation was substantive.

    The specific capabilities that tend to drive academic platform decisions:

    • Long-form document handling: Claude’s 1M token context window (on Opus 4.7 and Sonnet 4.6) is directly useful for academic work — reading full papers, dissertations, and research datasets in a single session
    • Research synthesis: Multi-document reasoning across large corpora without chunking
    • Writing quality: Academic writing and editing assistance where tone and precision matter
    • Institutional trust signals: Claude’s Constitutional AI approach and Anthropic’s safety positioning have become differentiators in institutional procurement conversations

    We don’t have Harvard FAS’s internal evaluation criteria. What we know is that after running a ChatGPT Edu institutional agreement, they evaluated their options and chose to route default access to Claude.

    What This Signals for Enterprise Platform Switching

    Harvard FAS is a useful case study because academic institutions make AI procurement decisions in a way that resembles enterprise decisions more than consumer decisions: budget approval processes, IT security review, institutional liability considerations, and the need for a platform that works across a wildly diverse user base — from first-year undergraduates to Nobel laureates.

    The platform switching question — “can our organization move from one AI platform to another?” — has been theoretical for most of the last two years. Harvard FAS running this switch makes it concrete. The institutional machinery for moving 20,000 users from one AI platform to another exists and has been executed.

    For enterprise teams evaluating whether to consolidate on Claude or maintain a multi-platform approach: the Harvard FAS switch is evidence that the transition is operationally feasible at institutional scale, and that institutions with high capability and safety requirements are making this choice.

    The Competitive Context

    Claude now holds institutional agreements at major universities. ChatGPT Edu launched as OpenAI’s play for this exact market. The Harvard FAS switch doesn’t mean OpenAI is losing the education market — it means the competition for institutional default status is real and Claude is winning some of those decisions on capability grounds.

    Anthropic’s enterprise market share, cited in its April 2026 Partner Network announcement, had grown from 24% to 40% since the Claude 4 generation launched. Harvard FAS is one data point in that trend.

    Our Take

    We track institutional AI adoption because it signals where the capability and trust thresholds are in the market. When an institution like Harvard FAS — which has the internal expertise to evaluate these platforms seriously — runs a full procurement process and routes its default institutional license to Claude, that’s a substantive signal about where the models stand.

    The “administrative approval required to continue ChatGPT Edu” language is the tell. That’s not a ban. It’s the institutional equivalent of making one option the path of least resistance and the other a deliberate choice. For 20,000 people with actual work to do, the default wins.

    Frequently Asked Questions

    Did Harvard ban ChatGPT?

    No. Harvard FAS is discontinuing its ChatGPT Edu institutional agreement after June 2026. Continuing access will require administrative and budgetary approval — meaning it’s available but no longer the frictionless default. Harvard FAS is also maintaining its Google Gemini institutional agreement. Claude is becoming the new institutional default, not an exclusive platform.

    How many people does the Harvard FAS Claude agreement cover?

    Harvard FAS covers all affiliates — students, faculty, staff, and researchers within the Faculty of Arts and Sciences. Exact affiliate count varies, but FAS is one of Harvard’s largest schools, covering undergraduate education and most of Harvard’s graduate programs in arts, sciences, and humanities.

    Why did Harvard FAS switch from ChatGPT to Claude?

    The Harvard Crimson reported the switch was framed as “platform rotation based on capability” — not a permanent commitment to any single provider. Anthropic hasn’t published the specific evaluation criteria Harvard FAS used. What’s on record is that after running a ChatGPT Edu institutional agreement, FAS evaluated its options and chose to route default access to Claude.

    Does Harvard’s decision affect other universities?

    Institutional decisions at the Harvard level typically influence procurement conversations at peer institutions — not through imitation but because evaluation committees at other universities use visible peer decisions as data points in their own capability and risk assessments. The Harvard FAS switch makes Claude a more credible institutional option for other universities running similar evaluations.

  • Singapore’s Foreign Minister Built His Own Claude AI Second Brain — And Published the Blueprint

    On April 21, 2026, Singapore’s Foreign Minister Dr Vivian Balakrishnan published the architecture of his personal AI assistant on GitHub. He called it NanoClaw — “a second brain for a diplomat.” It runs on a Raspberry Pi 5. It costs roughly $80 in hardware and $5–20 a month in API fees. It connects to his WhatsApp, Gmail, and voice notes. It drafts speeches, runs scheduled briefings, and — unlike every standard chatbot — gets smarter over time because it maintains a structured knowledge graph that persists across sessions.

    His summary: “It answers every question, researches topics, provides daily updates, drafts speeches and condenses information. It has become invaluable — I don’t dare switch it off.”

    A sitting cabinet minister of a G20-adjacent nation just open-sourced his personal AI second brain on GitHub. That’s worth slowing down to look at.

    What NanoClaw Actually Is

    NanoClaw is built on four open-source components running on a Raspberry Pi 5:

    • NanoClaw (agent framework, built by developer Gavriel Cohen, 28k+ GitHub stars) — orchestrates Claude agents in isolated Docker containers. Each chat group gets its own sandboxed container.
    • Mnemon — the knowledge graph layer. Extracts discrete facts, insights, and style preferences from raw documents and conversations into a structured, retrievable graph database. Each entry is a self-contained statement, not a raw text chunk.
    • OneCLI — credential proxy.
    • Karpathy’s LLM Wiki pattern — the memory architecture that lets the system synthesize knowledge rather than just retrieve it.

    WhatsApp integration runs through Baileys, an open-source implementation of the WhatsApp Web protocol — no commercial API required. Voice notes are transcribed locally via Whisper.

    The full architecture is published at: gist.github.com/VivianBalakrishnan/a7d4eec3833baee4971a0ee54b08f322

    The Architecture Detail That Matters Most

    Standard chatbots are stateless. Each session starts from zero. The standard workaround is RAG — retrieval-augmented generation, which pulls chunks of raw text from a document store when they seem relevant. Balakrishnan’s system does something different. Mnemon’s Extract function pulls discrete facts and insights from raw documents into a graph database. Each entry is a self-contained, retrievable statement — not a text chunk.

    This is the same distinction that Anthropic’s Dreaming feature (announced May 6 for Managed Agents) is built on: the difference between storing raw experience and synthesizing it into structured knowledge. A system that synthesizes what it learns compounds in usefulness over time. One that just accumulates raw text doesn’t.

    Balakrishnan acknowledged this in a reply on his GitHub gist: “Local models will not give you the big context needed for digesting the memory graph, but will be good enough for querying it. You may want to use a bigger model that works well with a 128K token context at the very least.” He chose Claude specifically for the reasoning capability on the memory graph.

    He Built It With Claude Code, Not Traditional Coding

    This detail matters. Balakrishnan confirmed on X that he never used an IDE. Claude Code made all edits. His description of his own process: “No ‘vibe coding’. All I did was ‘tool assembly’ to create a utility that worked in my domain.”

    Tool assembly. That’s an important distinction. He didn’t write code — he assembled existing open-source tools using Claude as the implementation layer. A trained ophthalmologist and career diplomat, with no traditional software development background, built and deployed a production AI system running on commodity hardware by composing tools through Claude Code.

    His framing at the 17th Asia-Pacific Programme for Senior National Security Officers, the day he published NanoClaw: “AI agents have crossed a threshold I did not expect so soon. Not just impressive demos — but practical tools for daily use.” The audience was senior national security officials from across the Asia-Pacific region.

    Why This Is the Cowork Story in Miniature

    We run our own version of this — Claude operating scheduled tasks, content pipelines, and research workflows on our behalf through Cowork. The architecture Balakrishnan published is recognizably the same value proposition: persistent memory, multi-channel input, scheduled tasks, a system that improves over time.

    His total cost: ~$80 hardware, $5–20/month API. That’s a DIY Cowork running on a credit-card-sized computer on a diplomat’s desk in Singapore. The point isn’t that the price is better or worse than any specific product — it’s that the primitives are now accessible enough that a non-developer can assemble them into a working production system.

    His own thesis on why he published it: “Sharing the blueprint boosts the edge — the specific composition will be obsolete in months, but the builder’s ability to compose the right pieces is the durable advantage.” That’s as clean a statement of the AI-literacy case as we’ve seen from anyone, let alone a sitting foreign minister.

    The Broader Signal

    Singapore continues to be the most Claude-dense environment we track. The same week Balakrishnan published NanoClaw, a Claude Code meetup at Grab HQ drew 1,291 registrants. GIC (Singapore’s sovereign wealth fund) is a co-investor in Anthropic’s infrastructure JV. The country has institutional capital, developer community density, and now a sitting cabinet minister publishing working Claude architecture on GitHub. That triangle is unusual.

    Balakrishnan’s quote from the CNBC Converge Live fireside the day after publishing NanoClaw: “The diplomat who learns to work with AI will have a meaningful edge. I think that edge is now.” He wasn’t talking about chatbots. He was talking about a system running on his desk, integrated into his actual workflows, that he personally built and that he personally depends on.

    That’s a different kind of AI adoption signal than a press release about an enterprise partnership.

    Frequently Asked Questions

    What is NanoClaw?

    NanoClaw is an open-source Claude-powered personal AI assistant framework built by developer Gavriel Cohen. Singapore’s Foreign Minister Dr Vivian Balakrishnan published his own NanoClaw implementation on April 21, 2026 — a self-hosted assistant running on a Raspberry Pi 5 that connects to WhatsApp, Gmail, and voice notes, runs scheduled tasks, and maintains a persistent knowledge graph that grows smarter over time.

    How much does NanoClaw cost to run?

    Balakrishnan’s setup uses approximately $80 in hardware (Raspberry Pi 5) and roughly $5–20 per month in Anthropic API fees depending on usage volume. The software components (NanoClaw, Mnemon, OneCLI, Whisper, Baileys) are all open source. The full architecture is published at gist.github.com/VivianBalakrishnan/a7d4eec3833baee4971a0ee54b08f322.

    Did Vivian Balakrishnan write the code himself?

    He described his process as “tool assembly” rather than traditional coding — composing existing open-source components using Claude Code to handle implementation. He confirmed on X that he never used an IDE and that Claude Code made all edits. He has no traditional software development background; he’s a trained ophthalmologist and career diplomat.

    How is NanoClaw’s memory different from standard chatbot memory?

    Standard chatbots are stateless — each session starts from zero. NanoClaw uses Mnemon, a knowledge graph that extracts discrete facts and insights from conversations and documents into structured, retrievable entries. The system synthesizes knowledge rather than just storing raw text, meaning it compounds in usefulness over time rather than simply accumulating history.

  • Code with Claude London (May 19) and Tokyo (June 10): What to Know and Watch For

    Anthropic’s Code with Claude conference went global this spring. After the San Francisco event on May 6, London is next on May 19 — followed by Tokyo on June 10. Both are free to attend in person (applications closed; selected by lottery in April) or via livestream from anywhere in the world. If you’re a developer building on Claude and didn’t get an in-person seat, the livestream is worth blocking time for. Here’s what we know about both events and why the Tokyo date in particular is worth paying attention to.

    Quick Reference

    What Code with Claude Is

    Code with Claude is Anthropic’s annual developer conference — a full day of hands-on technical workshops, live capability demos, and 1:1 office hours with the engineers who build Claude. It’s structured specifically for developers and founders who are building with the API, not for people who want marketing keynotes. The SF event on May 6 featured three parallel tracks: Research (direct access to Anthropic researchers on current and future model capabilities), Claude Platform (production agent deployment on Anthropic infrastructure), and Claude Code (running Claude Code at scale — long-horizon tasks, multi-repo work, parallel agents).

    Confirmed speakers across the series: Ami Vora (CPO at Anthropic), Boris Cherny (Head of Claude Code), and Angela Jiang (Product Lead for the Claude API and SDKs). Partner presentations from GitHub, Vercel, and Datadog were part of the SF agenda and are likely to carry into London and Tokyo.

    The Extended day format — May 20 for London, June 11 for Tokyo — is a separate event focused on independent developers and early-stage founders: builder deep-dives, laptops-open workshops from Anthropic’s Applied AI team.

    What Came Out of San Francisco (May 6)

    London and Tokyo attendees will be walking in with context from what Anthropic announced in SF. The major developments from May 6:

    • Managed Agents public beta: Multiagent Orchestration and Outcomes moved to public beta. Multiple SF sessions were dedicated to Managed Agents, including “Get to Production 10x Faster with Claude Managed Agents” and a hands-on “Build a Production-Ready Agent” workshop.
    • Dreaming (developer preview): Agents that review and reorganize their own session history between runs. Harvey (legal AI) reported roughly a 6× task completion rate increase after implementing it.
    • SpaceX compute expansion: Doubled rate limits for Pro, Max, Team, and Enterprise; 1,500% input token increase and 900% output token increase for Tier 1 API customers; peak-hours throttling eliminated for Pro and Max.
    • Claude Code v2.1.133: Subagent skill discovery fix (was silently broken), worktree base ref control, effort-level hooks.

    London and Tokyo events will likely build on these — demonstrating Managed Agents and Claude Code in production contexts with the partner companies that attended SF.

    London — May 19, 2026

    London is Anthropic’s first Code with Claude event in Europe. The practical significance: for developers building in European markets, this is the first opportunity to engage directly with Anthropic’s engineering team rather than attending via livestream from across the Atlantic.

    For teams working in regulated European industries — financial services, healthcare, legal — the Claude Platform and Research tracks are the most relevant. Anthropic’s Finance Agents suite (Moody’s integration, financial analysis and compliance tooling) and Claude Security Beta are recent launches that will likely feature in the sessions, given the financial services concentration in London.

    The London timezone (BST, UTC+1) makes the livestream accessible for much of Europe, Africa, and Middle East without the early-morning constraint that the SF event imposed. Register at claude.com/code-with-claude/london.

    What to Watch For at London

    • Enterprise deployment patterns — London’s enterprise tech community is distinct from SF’s startup-heavy audience
    • EU AI Act compliance framing — Anthropic’s approach to regulated market deployment
    • MCP ecosystem sessions — the Model Context Protocol is increasingly central to how Claude connects to enterprise data sources
    • Any Claude Code enterprise adoption data — the JetBrains 2026 developer survey showed significant Claude Code growth year-over-year; London sessions may provide more context

    Tokyo — June 10, 2026

    The Tokyo date is the strategically interesting one. Anthropic chose Japan as its first Asia-Pacific Code with Claude location at a moment when it has already made several Japan-specific moves: the NEC enterprise partnership (April 2026) and active engagement with Japan’s developer community. This is Anthropic positioning before competitors have fully embedded in the Japanese enterprise AI market.

    Japan’s enterprise AI adoption pattern is different from the US. Large enterprises dominate, procurement cycles are longer, and partnerships with established technology companies (like NEC) carry more weight than direct developer adoption alone. Tokyo’s Code with Claude is as much about signaling enterprise commitment as it is about developer community building.

    The Tokyo event is also relevant to Southeast Asia broadly — developers across the Asia-Pacific region can attend via livestream at a timezone that doesn’t require a middle-of-the-night session.

    What to Watch For at Tokyo

    • NEC partnership details — the most concrete Japan enterprise deployment announced so far
    • Asia-Pacific pricing or access updates — Anthropic’s pricing in USD creates friction in markets like India and Japan where USD conversion plus local taxes creates meaningful access barriers
    • Localization and multilingual Claude capability demos — Claude’s multilingual support is strong on paper; Tokyo is where it gets demonstrated to an audience that can evaluate it critically
    • Any announcement of a dedicated Japan or APAC infrastructure presence

    How to Attend Remotely

    Both events are fully livestreamed at no cost. The livestream covers all three conference tracks. Recordings are published to Anthropic’s YouTube channel (the “Code w/ Claude Developer Conference” playlist) within 7–10 days of each event. If you’re watching recorded sessions rather than live, the Claude Code track tends to have the highest density of immediately applicable technical content.

    For the London event: sessions run BST (UTC+1). For Tokyo: JST (UTC+9). Anthropic hasn’t published detailed schedules for London or Tokyo publicly yet — check claude.com/code-with-claude for updates as each event approaches.

    Our Take

    We watched the SF event closely and tracked what came out of it. The Managed Agents announcements were the most developer-relevant; the SpaceX rate limit news was the most immediately practical for anyone hitting API ceilings. Both London and Tokyo will be building on that foundation with an audience that has had two more weeks to actually use what Anthropic shipped in SF.

    The office hours format is underrated. Getting 30 minutes with Boris Cherny’s team on a specific Claude Code workflow problem is worth more than three conference talks. If you’re attending in person or have specific implementation questions, that’s the format to prioritize.

    For us, Tokyo is the event to watch for signals about where Anthropic’s international enterprise push is actually headed. The NEC partnership gave them a credible anchor. Code with Claude Tokyo is where they build on it.

    Frequently Asked Questions

    Is Code with Claude London free to attend?

    Yes. Both in-person attendance and virtual livestream are free. In-person applications closed in April with selection by lottery. Livestream registration remains open at claude.com/code-with-claude/london.

    Will Code with Claude Tokyo sessions be recorded?

    Yes. All sessions from all three cities are published to Anthropic’s YouTube channel within approximately 7–10 days of each event. The “Code w/ Claude Developer Conference” playlist on Anthropic’s YouTube channel is the official home for recordings.

    What tracks are available at London and Tokyo?

    Based on the SF event structure, three parallel tracks: Research (model capabilities and direction), Claude Platform (production agent deployment), and Claude Code (scaling Claude Code in real engineering workflows). Specific session details for London and Tokyo haven’t been fully published; check claude.com/code-with-claude for the agenda as each event approaches.

    What is the Extended day format?

    The Extended day (May 20 for London, June 11 for Tokyo) is a separate event focused specifically on independent developers and early-stage founders — builder stories, hands-on workshops from Anthropic’s Applied AI team, and a more informal format than the main conference day.

    Is Code with Claude relevant if I’m not using Claude Code specifically?

    Yes. The Claude Platform track covers Managed Agents, MCP integrations, and production deployment patterns that apply to any team using the Claude API — not just Claude Code users. The Research track covers model capabilities and roadmap direction relevant to anyone building on Claude.

  • How Mozilla Used Claude Mythos to Find 271 Firefox Vulnerabilities — Including a 20-Year-Old Bug

    On May 7, 2026, Mozilla’s engineering team published the technical account of what happened when they ran Claude Mythos Preview against the Firefox codebase. The headline numbers — 271 vulnerabilities found, 423 total security bugs fixed in April — had already circulated. What the Mozilla Hacks post added was the methodology: how they actually built the pipeline, what Mythos found that human reviewers and fuzzers had missed for decades, and a candid account of what AI-assisted security research looks like in production.

    This is that story, with the details that matter.

    Source

    All technical details in this article are sourced from Mozilla’s own engineering post: Behind the Scenes Hardening Firefox with Claude Mythos Preview, published May 7, 2026, by Mozilla engineers Brian Grinstead, Christian Holler, and Frederik Braun.

    The Numbers in Context

    Mozilla’s security team was fixing roughly 20 to 30 security bugs in Firefox per month throughout 2025. That number jumped to 423 in April 2026 — a roughly 20× increase in a single month. Of those 423 total fixes, 271 were attributed to Claude Mythos Preview. The remaining bugs came from external reports (41), other internal pipeline work using different models, and traditional fuzzing.

    The 271 Mythos-found bugs broke down by severity as follows, from the Mozilla advisory:

    • 180 rated sec-high — vulnerabilities triggerable with normal user behavior, like visiting a web page
    • 80 rated sec-moderate — would be sec-high except they require unusual steps from the victim
    • 11 rated sec-low — annoying but low harm risk (safe crashes, etc.)

    Mozilla also directly credited 3 separate CVEs to Anthropic’s Frontier Red team (CVE-2026-6746, CVE-2026-6757, CVE-2026-6758) — bugs Anthropic had submitted to Mozilla a couple months prior, before the harness work began.

    What Claude Mythos Found That Everything Else Missed

    The most striking finding from Mozilla’s report isn’t the volume — it’s the age and complexity of what Mythos surfaced. Mozilla published a sample of the bug reports. Two entries stand out:

    A 20-Year-Old XSLT Bug (Bug 2025977)

    Mythos identified a bug in Firefox’s XSLT implementation where reentrant key() calls cause a hash table rehash that frees its backing store while a raw entry pointer is still in use. The bug had been sitting in the codebase for 20 years, undetected by fuzzing and manual review. Mozilla noted this was one of several sec-high issues involving XSLT they fixed in the same release.

    A 15-Year-Old HTML Legend Element Bug (Bug 2024437)

    Mythos triggered a bug in the <legend> element by orchestrating edge cases across distant parts of the browser — including recursion stack depth limits, expando properties, and cycle collection. The bug had existed for 15 years. Mozilla’s description of the finding: “meticulous orchestration of edge cases across distant parts of the browser.” This is the kind of bug that requires reasoning about how subsystems interact at a systems level — not pattern-matching on known vulnerability types.

    Sandbox Escape Bugs That Human Reviewers Had Missed

    Several of the 271 bugs were sandbox escapes — vulnerabilities that, when chained with other exploits, could allow an attacker to break out of Firefox’s sandboxed content process into the privileged parent process. Mozilla noted these are “notoriously difficult to find with fuzzing.” Mythos found multiple. It also attempted prototype pollution attacks on hardened subsystems — and found nothing exploitable there, confirming that Mozilla’s earlier architectural changes had worked.

    How the Agentic Harness Actually Works

    Mozilla’s engineers are explicit about the mechanism that changed everything: it’s not the model alone. It’s the combination of a capable model with an agentic harness that can generate and run reproducible test cases.

    Earlier attempts at AI-assisted security review using GPT-4 and Claude Sonnet 3.5 produced too many false positives to be practical. The shift came when the harness could do something the earlier systems couldn’t: create a test case, run it, observe the result, and confirm whether the hypothesized bug was real before reporting it. Static analysis produces noise. An agent that can execute code to verify its findings produces signal.

    The pipeline Mozilla built, in their own description:

    1. Parallelized jobs run across multiple ephemeral VMs, each tasked with hunting bugs in a specific target file
    2. Findings are written back to a central bucket
    3. A discovery subsystem deduplicates against known issues, tracks bugs, triages them, classifies by severity, and manages patches through the release process
    4. Over 100 engineers contributed code to get patches out the door

    Mozilla started this pipeline with Claude Opus 4.6 on sandbox escape hunting. When Mythos became available, they swapped it in. Their assessment of the upgrade: “model upgrades increase the effectiveness of the entire pipeline: the system gets simultaneously better at finding potential bugs, creating proof-of-concept test cases to demonstrate them, and articulating their pathology and impact.”

    What Mythos Couldn’t Break

    Mozilla’s engineers made a point of documenting what Mythos tried and failed to do. Specifically: it repeatedly attempted prototype pollution attacks — a class of sandbox escape that human researchers had used successfully in the past — and was blocked by architectural changes Mozilla had made. The hardened subsystems held.

    Mozilla’s take on this: “Observing such direct payoff from previous hardening work was even more rewarding than finding and fixing more bugs.” This is actually the more important message for security teams: defensive architecture works, and AI analysis now provides the empirical test of whether it does.

    What This Means for the Software Security Ecosystem

    Mozilla’s engineers closed their post with a direct recommendation: anyone building software can start using an agentic harness with a modern model today. Their advice on approach is practical — start with simple prompting, observe what the model produces, iterate. The inner loop they describe is: “there is a bug in this part of the code, please find it and build a testcase.”

    The implications are real for any organization that maintains a codebase:

    • The asymmetry is reversing. For years, offensive AI (cheap to prompt, cheap to deploy) had the advantage over defensive security (slow, expensive human review). An agentic harness that can verify its own findings changes that balance. Mozilla’s engineers describe the current moment as one where “defenders finally have a chance to win, decisively.”
    • Old code is newly exposed. 15-year and 20-year-old bugs in a heavily-reviewed browser like Firefox suggests that large, mature codebases contain latent vulnerabilities that fuzzing and human review have consistently missed. If that’s true of Firefox, it’s true of most production software.
    • The pipeline is the work. Mozilla’s engineers are clear that the model is a component, not the product. Building the triage, deduplication, patch management, and release integration around the model is what made this work at scale. The pipeline required significant iteration and tight feedback loops with the engineers who were fielding the bugs.

    Claude Mythos Preview: Access and Context

    Claude Mythos Preview is not a generally available model. It’s offered through Project Glasswing as an invitation-only research preview for defensive cybersecurity workflows, specifically for organizations working on critical infrastructure. Pricing from Anthropic’s docs: $25 input / $125 output per million tokens. Mozilla’s access was part of this program.

    The generally available Claude models as of May 2026 (verified from Anthropic’s official documentation):

    • Claude Opus 4.7 (claude-opus-4-7) — flagship, 1M context window
    • Claude Sonnet 4.6 (claude-sonnet-4-6) — balanced speed/intelligence, 1M context window
    • Claude Haiku 4.5 (claude-haiku-4-5-20251001) — fastest, 200K context window

    Mozilla’s earlier pipeline work used Claude Opus 4.6 before Mythos was available and still found significant vulnerabilities. The pipeline architecture is available to any team; Mythos-tier capability is not.

    Our Take

    We’ve been tracking the Mythos story since the Project Glasswing announcement in April. The Mozilla post is the first time a production engineering team has published the full technical account of what AI-assisted security research looks like from the inside — not benchmarks, not Anthropic’s own claims, but Mozilla’s own engineers describing what they built, what it found, and what it couldn’t crack.

    The 20-year-old XSLT bug is the one that cuts through the noise. Firefox is one of the most security-reviewed browser codebases in existence. Thousands of professional security researchers, internal teams, and academic researchers have looked at this code. An AI model running in an agentic harness found a two-decade-old bug with a reproducible test case in what Mozilla described as a pipeline that “required significant iteration.” That’s not a benchmark number — it’s a deployed result from a production security team.

    The question for any organization that ships software is no longer whether this class of tooling will become standard. It’s how fast and whether your team will be ahead of or behind that curve when it does.

    Frequently Asked Questions

    What is Claude Mythos Preview?

    Claude Mythos Preview is Anthropic’s most capable AI model, offered exclusively through Project Glasswing as an invitation-only research preview for defensive cybersecurity workflows. It’s not publicly available. Pricing is $25 per million input tokens and $125 per million output tokens. Mozilla, along with other critical infrastructure partners, received access as part of this program.

    How many Firefox vulnerabilities did Claude Mythos find?

    Claude Mythos Preview found 271 security vulnerabilities in Firefox that were fixed in Firefox 150 (April 21, 2026) and subsequent point releases. Of those, 180 were rated sec-high, 80 sec-moderate, and 11 sec-low. Total security bugs fixed across all of April 2026 was 423, including externally reported bugs and bugs found by other internal methods.

    What is the agentic harness Mozilla built?

    Mozilla built a custom pipeline on top of their existing fuzzing infrastructure. It runs model-powered agents in parallel across ephemeral VMs, each tasked with finding bugs in a specific file or subsystem. Agents generate reproducible proof-of-concept test cases to verify bugs before reporting them — eliminating the false positive problem that made earlier AI security review impractical. Findings are piped into a deduplication and triage system integrated with Mozilla’s normal patch management and release process.

    Can other organizations use this approach?

    Yes, with the publicly available models. Mozilla’s engineers explicitly recommend that any software team start using an agentic harness with a modern model now. You don’t need Mythos access to start — Claude Opus 4.7 and Sonnet 4.6 are publicly available via the Anthropic API. The pipeline architecture is the work; the model upgrade is a component swap.

    What’s the difference between what Claude found and what fuzzing finds?

    Traditional fuzzing generates random or semi-random inputs to trigger crashes. It’s effective at finding memory corruption bugs triggered by malformed data, but poor at finding bugs that require complex reasoning about how distant subsystems interact. The 15-year-old HTML legend element bug and 20-year-old XSLT bug that Mythos found both required reasoning about multi-subsystem interactions that fuzzing consistently missed. AI analysis and fuzzing are complementary; Mozilla runs both.

  • The Water Damage Supplement Playbook: 8 Xactimate Line Items Adjusters Routinely Miss

    The Water Damage Supplement Playbook: 8 Xactimate Line Items Adjusters Routinely Miss

    Every adjuster who writes a water damage scope knows they’re leaving money out. This isn’t incompetence — it’s strategy. Carriers train adjusters to write lean estimates with the expectation that contractors who know what they’re doing will supplement back. If you’re accepting first-offer scopes without supplementing, you’re subsidizing their process.

    Here’s what you’re leaving on the table — and how to get it back.

    What Adjusters Leave Out (And Why)

    Eight line items show up missing on water damage estimates so often they should be considered structural omissions, not oversights.

    1. Equipment Monitoring Time (EQ Hours)

    Every piece of drying equipment you deploy needs to be set up, monitored daily, and removed. This is billed under EQ (equipment) hours in Xactimate — distinct from the equipment daily rental rate itself. Adjusters routinely include the air mover or dehumidifier line but strip the EQ monitoring hours. On a standard 3-day residential water loss with 6 pieces of equipment, this can represent $800–$1,200 in omitted labor (approximate, varies by region and Xactimate price list). It’s legitimate labor time. Submit it every job.

    2. Contents Manipulation (FCC)

    If you moved furniture to set equipment or protect contents — and you did, because wet carpet under a couch is a mold claim waiting to happen — you can bill for it. The FCC line item covers furniture manipulation. Adjusters frequently zero it out claiming “no significant contents.” Document with photos. Bill it anyway. The IICRC S500 supports moving contents as part of professional mitigation protocol.

    3. Antimicrobial Treatment

    Antimicrobial application is standard protocol on Category 2 or Category 3 losses. Some adjusters skip it on Cat 2 jobs claiming the loss “wasn’t contaminated enough.” That’s not a defensible position under your standard of care. Cite your IICRC S500 obligation. Your standard of care requires it. Your estimate should reflect it, every time.

    4. Structural Drying Labor (WTR STRC)

    This is separate from equipment rental. Structural drying labor — the time spent monitoring moisture readings, adjusting equipment placement, logging psychrometric data — is billable under the WTR STRC line in Xactimate. It gets omitted constantly. If you’re running a 4-day dry with daily monitoring visits, that’s real labor time that belongs in the scope. Don’t bundle it into your equipment rate. Break it out.

    5. Controlled Demolition — Broken Out by Material

    Any time you remove material to facilitate drying — baseboard, drywall, flooring — document each demolition activity with its own line item. Adjusters often bundle multiple demolition activities under a single generic line at the lower rate. Don’t let them. Break it out: WTR DWL for drywall removal, WTRFC variants for flooring type (C for carpet, T for tile, W for wood). Each code carries its own unit rate. Bundled scopes always favor the carrier’s math, not yours.

    6. Overhead & Profit (O&P)

    The single most fought-over line item in all of Xactimate. O&P is the 10% overhead + 10% profit markup that general contractors are entitled to charge when coordinating multiple subcontractors. Carriers deny it by claiming the job “doesn’t involve three or more trades” — a threshold they invented. It does not appear in Xactimate’s published pricing guide documentation.

    The counter: build a trades list into your scope narrative. List every trade involved — mitigation crew, licensed plumber, drywall contractor, flooring installer, painter. Four trades is four trades. Include a one-page scope narrative that names them. This prevents the denial before it starts. And if your overhead is genuinely higher than 10%, carry your overhead calculation to the negotiation. The “10 and 10” standard is an industry habit, not a contractual ceiling.

    7. Drying Documentation & Psychrometric Reporting

    Daily moisture logs, psychrometric readings, equipment placement diagrams — this is billable work that simultaneously protects you legally and demonstrates professional standard of care. Some carriers will pay for drying documentation as a discrete line item. Others will fight it. Submit it regardless. The documentation cost is real whether they pay it or not, and if you ever face a bad-faith claim, that paper trail is worth far more than the line item rate.

    8. Code Upgrade Items

    If your jurisdiction requires anything beyond like-for-like replacement — updated electrical to code, fire blocking on structural penetrations, cement board substrate under tile in wet areas — those upgrades are billable line items. They’re also frequently omitted from adjuster scopes. Pull your local code requirements for every material type you’re replacing and include the upgrade lines with code citations in your narrative. “Local code requires X per section Y” is a hard argument to deny.

    How to Win the O&P Fight

    When an adjuster denies O&P citing insufficient trades: don’t argue the threshold. Argue the standard.

    Send back a scope narrative page that lists explicitly: mitigation contractor, structural drying crew, licensed plumber, licensed electrician (if any wiring was involved), drywall contractor, flooring contractor, painter. That’s five to seven trades on a typical Category 2 bathroom loss. Documented. Named. The general contractor coordinating them is entitled to O&P.

    If they push back a second time, pull the insured’s policy language. General contractor services — scheduling, coordination, quality control, project warranty — exist whether the carrier likes it or not. If you’re managing subcontractors, you’re performing GC functions. GCs charge O&P. That’s what it’s for.

    The Supplement Submission Process That Actually Gets Paid

    Fast-tracked supplements share one trait: they’re submitted in Xactimate format, not PDF invoices. A clean Xactimate supplement typically gets reviewed in 2–3 weeks. A PDF invoice can sit 6–8 weeks — and gets denied at a higher rate because adjusters can’t reconcile it against their own scope line by line.

    When submitting a supplement, include a clear cover narrative: what changed from the original scope, why, and what code or standard supports it. Mark every supplemental line item clearly — “Supplemental Item — Not in OA Scope” — so the reviewer can locate additions instantly. Attach photo documentation for any line item likely to be disputed. Submit through the carrier’s supplement portal if one exists.

    One more thing: track your supplement approval rates by carrier. If one carrier denies your antimicrobial supplements at 60% and another approves 90%, adjust your initial scope narrative accordingly for that carrier. They’re not all operating from the same playbook.

    Bottom Line

    Carriers write lean because they can. Most contractors either don’t supplement or supplement poorly — PDF invoices, vague narratives, no photo documentation. That’s why the strategy works for them.

    If you’re running water damage jobs at $8,000–$15,000 in average ticket size and not supplementing, you’re leaving somewhere between $1,200 and $4,000 per job on the table — a rough estimate based on common first-offer gap percentages in the industry. Across 50 jobs a year, that’s real revenue. Not found money. Your money.

    The line items are in Xactimate. The standard of care is established by IICRC. The adjuster expects you to push back. The only question is whether your scope is specific enough to win.

  • The Tolerance Premise

    Article 38 ended with a question that usually gets asked in the wrong register: whether aggregate ownership — someone being accountable for the gap no individual node can see — is achievable above a certain scale.

    The honest answer is: probably not. And the more interesting question is what you build once you’ve accepted that.

    Most organizational design assumes the answer is better process. Better visibility, better cadence, better escalation paths. Hire a coordinator. Build a dashboard. Add a meeting where the distributed parts report to a center that holds.

    What that design is still doing, structurally, is pursuing coherence. The meeting is the coherence mechanism. The dashboard is the coherence mechanism. The gap is treated as a problem with a process solution, and the process is built to close it.

    But there’s a design premise on the other side of that question — one that almost nobody builds toward intentionally, because it sounds like giving up. The premise is: distributed incoherence is not a problem to solve. It is the permanent condition of any system operating at real complexity. The task is not eliminating the gap. The task is making the gap legible, bounded, and visible to the right eyes at the right time.

    Call this the tolerance premise. Not tolerance in the passive sense — not ignoring the gap — but designed, deliberate tolerance with structure. The difference between an organization that drifts silently into incoherence and one that holds distributed nodes in deliberate, bounded divergence is not whether gaps exist. It’s whether the gaps are visible, named, and bounded before they compound.


    What the Tolerance Premise Requires

    Three things the tolerance premise requires that coherence pursuit doesn’t.

    Local legibility. Each node has to be able to report its own state honestly — not relative to the aggregate, which it can’t see, but in absolute terms. Am I stalled, moving, or blocked? Am I running the same instructions I was running six weeks ago? The discipline is not performance relative to the plan. It’s accurate self-reporting relative to the last known state. Most systems optimize local nodes for output, not for honest state representation. The tolerance premise inverts this: the most valuable thing a node can do is tell the truth about itself, because the aggregate can only be seen if the inputs are accurate. A node that reports green when it’s yellow is not a performance problem — it’s an epistemic problem, and epistemic problems aggregate faster than process problems.

    Aggregate surfacing. Something has to look across nodes — not to own the gap, but to name it. This is the function that’s almost universally missing. Not a manager, not a meeting, not a weekly review that summarizes what the nodes already reported. Something that reads the pattern across honest local reports and says: here is where drift has accumulated. Here is the shape of the distributed incoherence you are currently running with. This function cannot be inside any node, because every node’s context is bounded by its own view. It has to be orthogonal to execution — not above it, not managing it, but adjacent to it with a wider aperture. The weekly briefing that can see nineteen sites healthy and one down is doing aggregate surfacing. What it cannot do is close the gap it names. That’s the distinction: surfacing is not owning.

    Bounded drift. Tolerance without limits is not a design — it’s an abdication. The tolerance premise requires specifying, in advance, how much drift is acceptable before the aggregate requires a reset. Not a goal to eliminate drift, but a maximum. Beyond this distance, the distributed configuration has to be brought into view and reoriented. The timing is not a calendar event. It’s a threshold condition. The bounded-drift rule fires when the condition is met, not when someone gets around to looking. Items in flight beyond a certain number of days get reviewed — not because anyone scheduled a review, but because the threshold was crossed. That’s a different instrument than a due date. A due date is a coherence mechanism. A threshold is a tolerance mechanism.


    The Ecological Analog

    The closest working analog for this is not organizational. It’s ecological.

    A forest doesn’t achieve coherence. Every tree is pursuing its own local optimization — light, water, soil, root competition — with no central coordinator. The aggregate is neither coherent nor chaotic. It’s something else: distributed local optimization with seasonal rebalancing. The rebalancing isn’t managed. It’s structural. Winter is the bounded-drift reset. Fire is the bounded-drift reset. The organism that can’t survive the reset was already running outside tolerance, whether or not anyone noticed.

    What would “seasonal rebalancing” mean for an AI-augmented operation?

    Not a quarterly review. Reviews are coherence mechanisms — they gather the distributed parts and try to realign them to a center. A seasonal reset in the ecological sense would be more disruptive and more structural: a periodic moment where the whole configuration is visible at once, where whatever is outside tolerance doesn’t get optimized — it gets composted, and the freed attention becomes the resource for the next cycle.

    Most organizations cannot build this because the cultural cost of composting living work is too high. The project that’s been in flight for eight weeks has people behind it. Ending it looks like failure. The forest does not feel bad about the dead branch. The operator who has to tell a team that a project is being composted — not killed for cause, just outside tolerance — is doing something the forest does automatically and humans find almost impossible to do cleanly.

    The composting problem is not a process problem. It’s a grief problem. And the tolerance premise doesn’t solve it. It just makes the moment of composting structurally necessary rather than politically optional.


    What Leadership Becomes

    Here is the uncomfortable version of the tolerance premise.

    If aggregate ownership is impossible above a certain scale, and the design solution is legible bounded incoherence rather than coherence pursuit, then the function of leadership in that system changes. The leader is no longer the person who closes the gap. They are the person who decides how much gap is acceptable — and who runs the bounded-drift reset when the threshold is crossed.

    That’s a different job. Not better or worse. Different.

    The briefing system that can look across distributed nodes and name the gap is not doing leadership’s job. It’s doing the aggregate-surfacing job — providing the honest read that leadership can’t get from inside any single node. What it cannot do is choose the tolerance threshold, decide when the reset fires, or do the composting. Those require judgment about what the operation can sustain and what it is trying to become. Judgment like that requires something that has skin in the game.

    Most people who are building AI-augmented operations are still designing for coherence and then being surprised when the gap persists. They build better dashboards, more sophisticated briefing cadences, finer-grained status tracking. All of this is useful. None of it changes the structural fact that the gap between distributed nodes is not a visibility problem — it’s an ownership problem, and visibility doesn’t create owners. It just makes ownerlessness more obvious.

    The tolerance premise is what you build when you’ve stopped pretending that better visibility will, eventually, produce the coherence it’s been promising.


    The question isn’t whether your system is coherent. It’s whether you know what shape your incoherence has taken — and whether you chose it, or it chose you.