What are Notion Custom Agents?

Notion Custom Agents are AI teammates that handle repetitive tasks autonomously — triggered by schedules, database changes, or webhooks. Launched February 2026, available as an add-on for Business and Enterprise plans.

What is Notion Workers?

Notion Workers is a hosted cloud runtime for custom TypeScript code that powers database sync, agent tools, and webhook triggers. Launched May 13, 2026. Free during beta through August 10, 2026.

What AI model does Notion use?

Notion runs on Anthropic's Claude — Claude Opus 4.7 as of January 2026. Unlike Microsoft Copilot (OpenAI GPT) and Google Workspace (Gemini), Notion's Claude integration emphasizes reliable, safe agentic behavior for workflows with write access to business databases.

How is Notion different from Microsoft Copilot and Google Workspace AI?

Notion is database-first. Every piece of information is structured, typed, and queryable data — not documents. Notion agents run precise queries against your actual organizational data rather than inferring structure from prose, making them more reliable for business data operations.

What is Google Workspace Studio?

Google Workspace Studio is Google's no-code AI agent builder, launched to all Workspace domains on March 19, 2026. Users describe what they want in plain language and Gemini builds the agent — no coding required.

What is the latest Google Gemini model in 2026?

As of mid-2026, Gemini 3.1 Pro (released February 19, 2026) is Google's most capable model. Gemini 2.5 Flash is the default model for most Workspace use cases, balancing speed and cost.

What is Google Agentspace?

Google Agentspace (unified into the Gemini Enterprise Agent Platform at Cloud Next 2026) combines Gemini reasoning, Google search, and enterprise data to give employees AI agents that understand their organization's specific knowledge.

Do small businesses have access to Google AI agent features?

Yes. Workspace Studio and Gemini features are included in Business Standard and higher Workspace tiers. Most of the agent infrastructure is arriving in existing plans, not as separate enterprise-only products.

Is Microsoft building an everything app like WeChat?

Microsoft hasn't announced a single everything app product, but Copilot, Microsoft Graph, LinkedIn integration, Agent 365, and Bing web cards together suggest a unified AI-powered dashboard is the strategic direction.

Why did Western super apps fail where WeChat succeeded?

U.S. data privacy regulations, antitrust scrutiny, platform fragmentation, and entrenched single-purpose apps all prevented a WeChat-style super app in the West. AI changes the equation by connecting apps without needing to own them.

How does LinkedIn data connect to Microsoft Copilot?

Microsoft Graph links LinkedIn professional data — profiles, company updates, career changes — into Copilot's intelligence layer, giving enterprise users LinkedIn-informed context in sales briefings, meeting prep, and professional queries.

What can small businesses do today to prepare for AI-unified platforms?

Connect your tools via APIs now, optimize your LinkedIn presence for AI entity recognition, publish structured authoritative content, and build automation stacks that produce clean data outputs.

What does the Snowflake-Anthropic partnership include?

A multi-year, $200M agreement making Claude models available to Snowflake's 12,600+ enterprise customers. Snowflake Intelligence uses Claude Sonnet 4.6. Snowflake Cortex AI Functions supports Opus 4.5 and newer. Focus on financial services, healthcare, and life sciences.

What is Project Glasswing?

Anthropic's invitation-only defensive cybersecurity program providing early access to Claude Mythos Preview. Named partners include AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan Chase, Microsoft, and Nvidia. No Indian organizations are currently named as partners.

Why is India's government warning about Mythos if India is Anthropic's second-largest market?

The Indian government meetings were framed as defensive preparation, not restriction. The concern is Mythos-tier capability used offensively against Indian financial infrastructure. India's absence from Project Glasswing means its financial sector lacks early access to the defensive research that Glasswing partners receive.

What are Cowork Routines?

Cowork Routines are cloud-hosted scheduled tasks that run on Anthropic's infrastructure regardless of local hardware state. They execute on a schedule — daily, weekly, or at specific times — and read their instructions from Notion desk specs at runtime.

Does Windows computer use require coding to set up?

No. Computer use activates through the standard conversational Cowork interface. You describe what you want done and Claude navigates the Windows UI directly. No scripting or API integration required.

What is the difference between Cowork and Cowork Routines?

Cowork runs on your local machine and requires the app to be active. Routines run on cloud infrastructure unattended. Tasks needing a schedule go to Routines; tasks needing local context or desktop UI go to Cowork.

Did Harvard ban ChatGPT?

No. Harvard FAS is discontinuing its ChatGPT Edu institutional agreement after June 2026 and requiring administrative approval for continued access. Claude is becoming the new institutional default. Harvard is also maintaining its Google Gemini agreement.

Why did Harvard FAS switch from ChatGPT to Claude?

The Harvard Crimson reported the switch was framed as platform rotation based on capability. Specific evaluation criteria were not published. After running a ChatGPT Edu agreement, FAS evaluated options and chose to route default access to Claude.

Does Harvard's decision affect other universities?

Institutional decisions at the Harvard level typically influence procurement conversations at peer institutions, where evaluation committees use visible peer decisions as data points in their own capability and risk assessments.

NanoClaw is an open-source Claude-powered personal AI assistant framework. Singapore's Foreign Minister published his own NanoClaw implementation on April 21, 2026 — a self-hosted assistant on a Raspberry Pi 5 with WhatsApp, Gmail, voice notes, scheduled tasks, and a persistent knowledge graph.

How much does NanoClaw cost to run?

Approximately $80 in hardware (Raspberry Pi 5) and $5-20 per month in Anthropic API fees. All software components are open source.

How is NanoClaw's memory different from standard chatbot memory?

NanoClaw uses Mnemon, a knowledge graph that extracts discrete facts and insights into structured entries rather than storing raw text. It synthesizes knowledge, compounding in usefulness over time.

Is Code with Claude London free to attend?

Yes. Both in-person attendance and virtual livestream are free. In-person applications closed in April with selection by lottery. Livestream registration remains open at claude.com/code-with-claude/london.

Will Code with Claude Tokyo sessions be recorded?

Yes. All sessions are published to Anthropic's YouTube channel within approximately 7-10 days of each event on the Code w/ Claude Developer Conference playlist.

What tracks are available at London and Tokyo?

Based on the SF event structure: Research (model capabilities), Claude Platform (production agent deployment), and Claude Code (scaling Claude Code in real engineering workflows). Check claude.com/code-with-claude for full agendas as each event approaches.

What is the Extended day format?

The Extended day (May 20 for London, June 11 for Tokyo) is a separate event for independent developers and early-stage founders — builder stories, hands-on workshops, and informal format.

What is Claude Mythos Preview?

Claude Mythos Preview is Anthropic's most capable AI model, offered exclusively through Project Glasswing as an invitation-only research preview for defensive cybersecurity workflows. It's not publicly available. Mozilla received access as part of this program.

How many Firefox vulnerabilities did Claude Mythos find?

Claude Mythos Preview found 271 security vulnerabilities in Firefox fixed in Firefox 150 and subsequent point releases. Of those, 180 were rated sec-high, 80 sec-moderate, and 11 sec-low. Total April 2026 security fixes across all sources was 423.

Can other organizations use this agentic security approach?

Yes, with publicly available models. Mozilla's engineers recommend any software team start using an agentic harness now. Claude Opus 4.7 and Sonnet 4.6 are available via the Anthropic API. The pipeline architecture is the real work; Mythos is a component upgrade within that pipeline.

What is the difference between what Claude found and what fuzzing finds?

Traditional fuzzing is poor at finding bugs requiring complex multi-subsystem reasoning. The 15-year-old HTML legend bug and 20-year-old XSLT bug Mythos found both required reasoning about distant subsystem interactions that fuzzing consistently missed for over a decade.

Category: Industry News & Commentary

Google drops an algorithm update. AI Overviews reshape local search. A new ad format launches on LinkedIn. When something happens that affects how restoration companies market themselves, we break it down — what changed, what it means, and what you should do about it. No recycled press releases, just sharp analysis from someone who actually runs these campaigns.

Industry News and Commentary covers Google algorithm updates, AI search developments, advertising platform changes, marketing technology announcements, regulatory shifts affecting digital marketing, and expert analysis of industry events as they impact restoration contractors, commercial services companies, and the broader property damage restoration ecosystem.

Notion’s Database-First Bet: Why the Everything App Might Be Built on a Spreadsheet, Not a Document
Last refreshed: May 15, 2026

See also: Our full breakdown of the May 13, 2026 platform launch is here — Notion Developer Platform Launch (May 13, 2026). And for the operating doctrine the launch reinforces, see The Three-Legged Stack.

Microsoft is stitching together an everything app from acquisitions. Google is trying to unify a native stack it keeps fragmenting. Notion is doing something different — and arguably more interesting. It’s building the everything app from the database up, and it just made its most important move yet.

Definition: The Database-First Everything App An AI-powered workspace where every piece of information — tasks, projects, docs, contacts, data — lives in a structured, queryable database, and agents can read, write, reason over, and act on that data autonomously. The database isn’t the backend. It’s the interface.

Yesterday Changed Everything for Notion

On May 13, 2026 — yesterday — Notion shipped version 3.5 and announced their full Developer Platform in a livestreamed product event. The tech press covered it as an AI agent story. They weren’t wrong, but they missed the bigger frame.

Notion didn’t just add agents. They introduced a new primitive called Workers — a hosted runtime for custom code that lets teams extend Notion without running their own servers. Database sync, agent tools, and webhook triggers all run through Workers. They launched the External Agents API, allowing any agent — ones you built, or ones from Claude, Codex, Decagon, and other partners — to work natively inside your Notion workspace. And they opened a developer platform that lets teams connect AI agents, external data sources, and custom code directly into their workspace.

Taken individually, these are nice product updates. Taken together, they’re an orchestration play. Notion is positioning itself not as a note-taker with AI features bolted on, but as the hub where people, agents, and data collaborate across every tool a team uses.

The Database Advantage Nobody Else Has

Here’s the thing that separates Notion from every other everything-app candidate — including Microsoft and Google.

Both Microsoft 365 and Google Workspace are document-first platforms. Their fundamental unit of work is a file: a Word document, a Google Doc, a PowerPoint, a Sheet. Files are great for humans to read. They’re terrible for AI to reason over at scale. You can’t ask an AI agent to “find every project where the status is blocked and the deadline is this week” across a folder of Word documents and get a reliable answer.

Notion’s fundamental unit is a database. Every page can be a database row. Every property is structured, queryable, filterable data. When Notion AI looks at your workspace, it doesn’t see a pile of documents — it sees a relational knowledge graph. Tasks have statuses. Projects have owners and deadlines. Contacts have properties. Everything is connected, typed, and queryable.

That’s not a feature difference. That’s an architectural difference. And it’s why Notion’s agents can do things that Copilot and Gemini agents fundamentally struggle with: operate reliably on your actual organizational data, not summaries of your documents.

The Agent Timeline: Faster Than Anyone Expected

Notion’s agent rollout has moved at a pace that’s easy to underestimate if you haven’t been watching closely. Here’s the actual timeline:
- September 18, 2025 — Notion 3.0: Agents. First AI agents launch. Autonomous data analysis and task automation. The starting gun.
- January 20, 2026 — Notion 3.2. Mobile AI, new model support, people directory. Agents go everywhere, not just desktop.
- February 24, 2026 — Notion 3.3: Custom Agents. Users can build their own agents from scratch. Over 21,000 custom agents built in the first free trial period alone. Notion reported 2,800 agents running 24/7 internally at Notion itself.
- March 2026. Workers introduced in alpha — a TypeScript-based framework for agents to talk to any service with an API. The coding layer for power users.
- April 14, 2026 — Notion 3.4. Calendar and inbox connectors. Notion AI can now schedule meetings and draft emails from inside your workspace.
- May 5, 2026. Custom Agent admin controls for enterprise — workspace-level credit limits, governance tools, compliance features.
- May 13, 2026 — Notion 3.5: Developer Platform. External Agents API, Workers out of alpha, database sync with no servers, full developer ecosystem launched.
That’s eight months from first agent launch to full developer platform. For context, Microsoft spent years building Azure OpenAI integration before Copilot reached feature parity with what Notion shipped in less than a year.

What the Notion Everything App Actually Looks Like Today

This isn’t theoretical. Here’s what a team running on Notion can configure right now:
- Your project data, always current. Databases synced from Slack, Google Drive, GitHub, Jira, Microsoft Teams, Salesforce, and Box — all flowing into Notion databases in real time, powered by Workers. No manual updates. No stale spreadsheets.
- Agents watching your work. Custom agents triggered by database changes, schedules, or webhooks — compiling status updates, flagging blocked tasks, escalating overdue items, answering team FAQs.
- Your inbox and calendar inside your workspace. Connect Gmail or Outlook and your calendar; Notion AI can schedule meetings and draft emails without leaving the tool your work already lives in.
- External agents working in your context. Claude, Codex, Decagon — agents you’ve built yourself via the External Agents API — all operating against your Notion databases with full context. Not generic AI. AI that knows your specific data.
- Plan Mode for complex operations. Before an agent makes large changes to your databases or pages, it stops, asks clarifying questions, and builds a plan for your approval. This is the governance layer that makes AI trustworthy in a business context.
- Your institutional knowledge, always accessible. Every decision, every project history, every team document — structured and queryable by agents that can synthesize across your entire knowledge base on demand.
The Model Behind It: Claude Opus 4.7

Unlike Microsoft (Copilot runs on GPT-4o and Azure OpenAI) and Google (Gemini family), Notion is built on Anthropic’s Claude. As of the January 2026 update, Notion runs Claude Opus 4.7 — Anthropic’s most capable model at the time of release — for its AI features and agent reasoning.

This is a strategic choice worth examining. Claude is specifically designed with a focus on reliability, honesty, and safe behavior in agentic contexts — qualities that matter enormously when an AI agent has write access to your company’s databases. Anthropic’s Constitutional AI training approach was built for exactly the kind of autonomous, long-running agent work that Notion is deploying.

The Notion + Claude combination isn’t just a vendor relationship. It’s an architectural alignment: a database-first workspace built on a model specifically designed for trustworthy agentic behavior. That’s a more coherent stack than either Microsoft or Google has assembled, where the AI model and the productivity platform were developed independently and integrated later.

Why “Database First” Beats “Document First” for the Everything App

Let’s make this concrete with a comparison most teams will recognize.

Ask Microsoft Copilot: “Which of our client projects are behind schedule this quarter?” Copilot will search your emails, scan your SharePoint documents, and produce a reasonable summary — but it’s reading prose, inferring structure, and hoping the documents are up to date. The answer is a best-effort synthesis, not a query result.

Ask a Notion agent the same question: it runs a database filter. Status = Behind. Quarter = Q2 2026. It returns an exact list in under a second, with links to every project, the responsible person, and the last update — because that data is structured. The agent didn’t infer anything. It read typed data.

That’s the difference between AI that helps you find things and AI that actually knows things. Notion’s database architecture is what makes the second kind possible at scale, without hallucination, without retrieval errors, without the AI making up a project that doesn’t exist.

The Honest Weakness: The 30-Second Wall

Here’s what you only learn by actually building inside the alpha — and we did.

Notion Workers runs in a 30-second sandbox with 128MB of memory. Each Worker is created through the Notion control panel, taking 3–5 minutes to spin up. The network is limited to an approved domain allowlist. Storage is ephemeral — nothing persists between runs. These aren’t theoretical constraints. They’re the real walls you hit when you try to move serious automation workloads into Notion.

We were in the Workers alpha. We built Workers. We set up custom agents. And we stress-tested the sandbox deliberately — forcing failures to find the exact break points, then running production workloads at 60% of the known ceiling as a stability rule. That’s the only honest way to operate inside a system this constrained: know where it breaks before you depend on it.

What we found changed our architecture. Heavy automations — multi-site WordPress SEO optimization passes across 18 sites, content pipelines, image generation, WP-CLI batch operations — couldn’t live inside Notion Workers. They’re multi-minute jobs, not 30-second jobs. Moving them to Notion would have meant engineering workarounds that added complexity without adding reliability.

So instead of moving Cowork automations into Notion as we originally planned, we moved them to Google Cloud Run. The notion-deep-extractor (crawls the workspace, extracts structured knowledge, logs to the Second Brain database — runs 3x daily) and the notion-maintenance bundle (archive sweeper, stale work detector, content guardian — runs daily at 6am UTC) all live on Cloud Run now, with Cowork scheduled tasks paused. The 18-site WordPress optimizer running Tuesday? Cloud Run. Not Notion.

This isn’t a knock on Notion. It’s an architectural reality that every builder needs to understand before they commit workloads. The right pattern — the one we’re now using and that Notion’s own documentation points toward — is Notion Workers as the trigger layer, Cloud Run as the execution layer. A Worker fires a signed POST to a Cloud Run endpoint, returns immediately (well under 30 seconds), Cloud Run runs the heavy job, then writes results back to a Notion database via the Public API. You get Notion as the orchestration and visibility layer without hitting the sandbox wall.

That hybrid is genuinely powerful. But it requires infrastructure that most small teams don’t have. If you don’t have a Cloud Run setup, a service account, and the deployment knowledge to wire this together, the 30-second limit will stop you cold on anything more complex than a lightweight API call or a database update.

Notion doesn’t own email. It connects to Gmail and Outlook. It doesn’t own a calendar — it integrates with yours. It doesn’t have a mobile OS or browser. Those gaps matter less than the sandbox constraint does for real production workloads. The everything app story is real — but the execution layer has hard limits that require a hybrid architecture to work around, at least until Workers matures beyond its current beta constraints.

Who Should Be Paying Attention Right Now

If you’re an agency, a service business, a content operation, or any knowledge-work team that already uses Notion — or has been considering it — the May 13 Developer Platform announcement changes your calculus significantly.

Custom Agents are available as an add-on for Business and Enterprise plans. Workers are free during the current beta period (billing starts August 11, 2026). The External Agents API is open now. This is the window to build before your competitors do.

The teams that spend the next 90 days wiring up their Notion databases, building their first custom agents, and connecting their external data sources will have a compounding advantage that’s very hard to replicate in 2027. The institutional knowledge that feeds these agents — the project histories, the SOPs, the client databases — takes time to build. Starting now is the only strategy that works.

The Bigger Picture: A Series on Who Wins the Everything App

This is the third article in an emerging pattern I’ve been thinking through: who actually builds the everything app, and what does their path look like?

Microsoft is building it through acquisitions and Copilot, stitching together LinkedIn, Azure, and the M365 suite. Google already owns the native stack — Gmail, Drive, Search, Android — and is trying to unify it through Gemini Enterprise and Workspace Studio after years of product fragmentation. Notion is building it from the database up, betting that structured data plus open agents beats document-first platforms with AI bolted on.

None of them has won yet. All three bets are live. The winner won’t be the company with the most features — it’ll be the one that earns enough trust to become the single place where your work actually lives.

Notion’s database-first architecture is the most interesting bet of the three. It’s also the most fragile — dependent on integrations, constrained by not owning the OS or the inbox, limited by whatever Anthropic does with Claude pricing and capabilities. But if it works, it works in a way the others can’t easily copy. You can’t retrofit a database architecture onto a document platform. You have to start over.

Microsoft and Google aren’t starting over. Notion never had to.

Frequently Asked Questions

What are Notion Custom Agents?

Notion Custom Agents are AI teammates that handle repetitive tasks autonomously — answering FAQs, compiling status updates, automating workflows — triggered by schedules, database changes, or webhooks. They launched in February 2026 (Notion 3.3) and are available as an add-on for Business and Enterprise plans. Over 21,000 were built during the free trial period alone.

What is Notion Workers?

Notion Workers is a hosted cloud runtime for custom TypeScript code, introduced in alpha in March 2026 and fully launched with the Developer Platform on May 13, 2026. It powers database sync, agent tools, and webhook triggers — letting teams extend Notion to connect any service with an API, without running their own servers. Workers are free during the beta period through August 10, 2026.

What AI model does Notion use?

Notion runs on Anthropic’s Claude — specifically Claude Opus 4.7 as of the January 2026 update. This is different from Microsoft Copilot (which uses OpenAI’s GPT models) and Google Workspace (which uses the Gemini family). Notion’s choice of Claude reflects an emphasis on reliable, safe agentic behavior for workflows that have write access to business databases.

What is the Notion External Agents API?

The External Agents API, launched with Notion 3.5 on May 13, 2026, lets teams bring any AI agent — including ones built internally or from partners like Claude, Codex, and Decagon — directly into their Notion workspace. These external agents can read and write to Notion databases with full context about the team’s data.

How is Notion different from Microsoft Copilot and Google Workspace AI?

Notion is database-first. Every piece of information in Notion is structured, typed, and queryable data — not documents. This means Notion agents can run precise database queries against your actual organizational data rather than inferring structure from prose documents. For teams that need AI to reliably operate on business data (not just search and summarize), this architectural difference is significant.

What are the real limitations of Notion Workers in the alpha?

Notion Workers runs in a 30-second sandbox with 128MB of memory and ephemeral storage. Network access is limited to an approved domain allowlist. Workers are created via the Notion control panel (3–5 minutes each). Long-running jobs — content pipelines, multi-site operations, image generation — won’t fit. The recommended pattern for serious workloads is Notion Workers as the trigger layer firing a signed POST to an external execution environment (like Google Cloud Run), with results written back to Notion databases via the Public API.
May 14, 2026
Google Already Has the Everything App. The Question Is Whether They’ll Actually Build It.
Microsoft gets credit for the “everything app” conversation because of Copilot’s marketing reach. But Google has quietly assembled something more complete, more native, and arguably more dangerous to every other productivity platform on earth — and most people haven’t connected the dots yet.

Definition: Google’s “Everything Stack” The convergence of Google Workspace, Agentspace, Workspace Studio, NotebookLM, Google Search, Gmail, Calendar, Drive, Maps, Android, and the Gemini model family into a single AI-unified operating environment — where agents connect your data, automate your work, and surface what matters, without switching apps.

Google Didn’t Need to Acquire Its Way Here

Microsoft’s path to the everything app runs through acquisitions: LinkedIn ($26.2B), GitHub ($7.5B), Activision ($68.7B), and years of stitching Azure, Teams, and Bing into a coherent story. It’s impressive. It’s also fundamentally a construction project — building a unified platform out of parts that weren’t designed to work together.

Google already owns the pieces natively. Gmail. Google Calendar. Google Drive. Google Docs, Sheets, and Slides. Google Search. Google Maps. Android. Chrome. YouTube. These aren’t acquisitions bolted onto a platform — they’re the platform. Over three billion people use Google Workspace tools. That install base isn’t a future bet; it’s the present reality.

The question was never whether Google had the ingredients. The question was whether they’d ever actually bake the cake. In 2026, they finally are.

What Google Just Shipped: The Pieces Coming Together

At Google Cloud Next 2026, Google made moves that deserve more attention than they got.

Workspace Studio launched to all Google Workspace domains on March 19, 2026. It’s the place to create, manage, and share AI agents that automate work across Workspace — no coding required. An end user can describe what they want in plain language (“every Friday, ping me to update my tracker”) and Gemini builds the agent. That’s not a developer feature. That’s a feature for your office manager, your sales coordinator, your operations lead.

Workspace Intelligence is the connective tissue underneath. It’s a secure, dynamic system that understands the semantic relationships between your Docs, Slides, Gmail threads, active projects, collaborators, and your organization’s institutional knowledge — all in real time. Not indexed. Not cached. Live.

Google Agentspace (now absorbed into the unified Gemini Enterprise Agent Platform as of Cloud Next 2026) brings together Gemini’s reasoning, Google-quality search, and enterprise data regardless of where it lives. Agents can connect to Google Drive, NotebookLM, and Google Group Chats and become an expert on a specific topic — delivering daily briefings, status updates, and research synthesis without anyone digging through months of documents.

NotebookLM — Google’s AI research and synthesis tool — is now available as an out-of-the-box agent in Agentspace for enterprise users, with podcast-style audio summaries, enhanced privacy controls, and direct integration into the agent ecosystem. It’s the knowledge layer sitting on top of everything else.

The AI Control Center, announced in May 2026 in the Admin console, gives IT and enterprise organizations visibility and governance over every agent and AI interaction touching Workspace data. For regulated industries, this is the feature that unlocks the whole stack.

The Model Reality: Get This Right Before You Strategize

Any honest conversation about Google’s AI strategy has to be anchored in what the models actually are — because the capabilities are moving fast and the marketing often lags the reality.

As of mid-2026, Google’s current model family looks like this:
- Gemini 3.1 Pro — Released February 19, 2026. The most capable model in the family. Scores 77.1% on ARC-AGI-2. Optimized for complex multi-step agentic workflows. This is the model powering the high-stakes enterprise use cases.
- Gemini 2.5 Pro — The previous flagship, announced at Google I/O 2025. Still widely deployed in Vertex AI for enterprise. Excellent reasoning, very long context window.
- Gemini 2.5 Flash — The speed/cost-efficiency model. Default model in the Gemini app. Generally available in Google AI Studio and Vertex AI. This is what most Workspace automation runs on day-to-day.
- Gemini 2.5 Flash-Lite — The lightest, cheapest tier. For high-volume, low-complexity tasks like classification, routing, and summarization at scale.
The architecture matters for strategy: Gemini 3.1 Pro handles reasoning-heavy agent tasks (complex research, multi-step decisions, agentic workflows), while Flash handles the volume work (daily digests, routine automation, quick lookups). The tiered model family is what makes an everything-app architecture economically viable — you don’t run your email summarizer on your most expensive model.

What Google’s Everything Page Actually Looks Like Today

Here’s what’s possible right now — not as a concept, but as actual configured Workspace behavior:
- Your Gmail digest — Gemini in Gmail surfaces key threads, drafts replies, and flags action items before you open your inbox
- Your Calendar intelligence — Meeting briefs pulled from your Drive documents, recent email threads with attendees, and relevant Docs — surfaced automatically before each event
- Your Drive knowledge — NotebookLM agents synthesizing your team’s documents, project histories, and institutional knowledge into on-demand briefings
- Your automation outputs — Workspace Studio agents running on schedule, pinging updates, moving data between Sheets and Docs, reporting on triggers
- Your search layer — Google Search and Workspace Intelligence working together to answer business questions against both your internal data and the public web
- Your news and signals — Gemini Enterprise surfacing industry news, competitor moves, and relevant content as part of a unified daily briefing
The difference between this and the Microsoft vision is subtle but important: Google’s version requires almost no new infrastructure for most organizations. If you’re already on Google Workspace — and three billion people are — the agent layer sits on top of what you already use. The friction is configuration, not adoption.

The Tension: Google’s Biggest Competitor Is Google’s Own Fragmentation

Here’s where the opinion part comes in, because the facts alone don’t tell the whole story.

Google has a well-documented history of building extraordinary tools and then failing to unify them. Google+. Google Wave. Google Inbox. Allo. Hangouts. The graveyard of Google products that almost became the everything app is long and sobering. The pattern is consistent: build something brilliant, run it in parallel with five other things, confuse the market, and eventually kill it.

The 2026 rebranding — consolidating Vertex AI and Agentspace into the Gemini Enterprise Agent Platform — is either the sign that Google has finally learned its lesson about fragmentation, or it’s another reorganization that will look different again in 18 months. The cynical read is that Google Cloud Next announcements have promised unification before.

The optimistic read — and I lean toward this one — is that the Gemini model family gives Google something it never had before: a single coherent AI backbone that every product can be rebuilt around. When your search, your email, your documents, your agents, and your developer platform all run on the same model family with the same context and the same API surface, unification becomes an engineering problem rather than a product vision problem. Engineering problems get solved.

The A2A Protocol: The Move Nobody Talked About Enough

One of the quieter announcements at Cloud Next 2026 was the Agent-to-Agent (A2A) protocol — Google’s open standard for allowing AI agents to communicate with each other across platforms and vendors. This is strategically significant in a way that’s easy to miss.

If A2A gains adoption, the everything page doesn’t have to be Google’s proprietary walled garden. Your Workspace agents could communicate with agents from other platforms — your CRM, your project management tool, your industry-specific software. Google becomes the orchestration layer rather than the only layer. That’s a smarter long-term play than trying to own everything, and it sidesteps the antitrust concern that the Microsoft everything-app vision runs into head-on.

What This Means for SMBs and Content Creators Right Now

If you’re a small business running on Google Workspace — and most are — the everything-app infrastructure is closer than you think, and cheaper than you assume.

Workspace Studio is included in Business Standard and above. Gemini in Gmail and Docs is rolling out across plans. NotebookLM Business is available as an add-on. The agent layer is not a future enterprise-only feature — it’s arriving in the same tools you’re already paying for.

The businesses that will win the next three years are the ones that start treating their Google Workspace as an agent platform right now — connecting their data, building their automations, and training their teams to work alongside AI rather than around it.

The everything page isn’t a product launch you wait for. It’s a configuration decision you make today.

Google vs. Microsoft: Who Wins the Everything App Race?

Honest answer: it’s not a race with one winner. The enterprise world will bifurcate along existing tool allegiances. Microsoft 365 shops will get their everything page through Copilot and Agent 365. Google Workspace shops will get theirs through Gemini Enterprise and Workspace Studio. The cold-start problem — who do you trust with all your connected data — will be solved by whoever already has your accounts.

What’s different about Google’s position is the consumer crossover. Microsoft dominates enterprise desktops but has marginal consumer presence. Google lives on both sides — the same Gemini that runs your enterprise agent also runs in your personal Gmail, your Android phone, your Google search bar. The everything page, for Google users, won’t feel like a new product. It’ll feel like the thing you already use, finally doing what you always wished it would.

That’s a powerful distribution advantage. And it’s one Microsoft, for all its enterprise strength, can’t easily replicate.

Frequently Asked Questions

What is Google Workspace Studio?

Google Workspace Studio is Google’s no-code AI agent builder, launched to all Workspace domains on March 19, 2026. It lets any user create, manage, and share AI agents that automate work across Gmail, Docs, Sheets, Drive, and other Workspace apps — without writing code. Users describe what they want in plain language and Gemini builds the agent.

What is Google Agentspace?

Google Agentspace (now unified into the Gemini Enterprise Agent Platform as of Cloud Next 2026) is Google’s enterprise AI agent environment. It combines Gemini’s reasoning, Google-quality search, and enterprise data across Drive, NotebookLM, and Group Chats to give employees AI agents that understand their organization’s specific knowledge.

What is the latest Google Gemini model in 2026?

As of mid-2026, Gemini 3.1 Pro (released February 19, 2026) is Google’s most capable model, scoring 77.1% on ARC-AGI-2 and optimized for complex agentic workflows. Gemini 2.5 Flash is the default model for most consumer and business Workspace use cases, balancing speed and cost efficiency.

What is Google’s A2A protocol?

Agent-to-Agent (A2A) is Google’s open standard for AI agents to communicate across platforms and vendors, announced at Cloud Next 2026. It allows Workspace agents to interoperate with agents from other tools and platforms, positioning Google as an orchestration layer rather than a closed ecosystem.

Do small businesses have access to Google’s AI agent features?

Yes. Workspace Studio and Gemini features are included in Business Standard and higher tiers. NotebookLM Business is available as an add-on. Most of the agent infrastructure is arriving in existing Workspace plans, not as separate enterprise-only products.
May 14, 2026
Microsoft’s Everything App: Is Copilot Building the Unified AI Dashboard Nobody Asked For (But Everyone Needs)?
What if every email, calendar event, LinkedIn notification, health metric, automation log, and business dashboard you care about lived on one page — organized by AI, updated in real time, and actually useful? That’s not a fever dream. It may already be Microsoft’s plan. And if it isn’t, someone needs to build it fast.

Definition: The “Everything App” A unified AI-powered platform that aggregates professional data, communications, scheduling, automation outputs, and personal metrics into a single intelligent interface — personalized per user and powered by connected APIs.

The Observation That Started This

A few days ago I noticed something odd: LinkedIn posts I was publishing were reformatting into blocks of plain text instead of keeping their intended structure. My own agents couldn’t scrape LinkedIn the way I wanted them to. Anti-AI friction was everywhere on the platform.

Then it hit me: Microsoft owns LinkedIn. Microsoft owns Bing. Microsoft is betting billions on Copilot. What if the formatting weirdness, the scraping blocks, the structured data changes — what if those aren’t bugs? What if they’re features in a Beta program for AI information ingestion?

Think about it differently. Imagine a Bing page — or a Copilot interface — that pulls in curated LinkedIn posts, your email threads, your calendar, your business process updates, your health watch data, your cloud automations, and your news feed. All of it, organized the way you think about your day. That’s not a stretch. That might be exactly where this is heading.

Microsoft Is Already Building the Pieces

Let’s be clear about what Microsoft has actually shipped and announced, because the pieces of this puzzle are already on the table.

Microsoft 365 Copilot Wave 3 launched in early 2026 alongside Microsoft 365 E7: The Frontier Suite (generally available May 1, 2026). It combines productivity, identity, Copilot AI, and Agent 365 — a control plane for governing and scaling AI agents across an organization. The Agent 365 dashboard shows connections between agents, people, and data in real time. That’s not a search box. That’s an operational view of your entire professional world.

Microsoft Graph is the connective tissue. It links LinkedIn professional data — profiles, company updates, job changes, content signals — directly into Copilot’s intelligence layer. When enterprise users ask Copilot about industry experts or companies, LinkedIn data feeds the answer. The integration is deeper than most people realize, and it’s been quietly expanding since Microsoft acquired LinkedIn for $26.2 billion in 2016.

Bing web cards in Copilot Chat now deliver rich, expandable information cards for weather, stocks, sports, news, and more. It’s a small feature on paper. But it signals the visual direction: Copilot as a personalized front page, not a search box.

The new Agenda view in Windows — announced at Ignite 2025 — shows a chronological list of upcoming events unified with Calendar, surfaced directly in the Notification Center. Microsoft is literally building a unified daily view into the operating system itself.

Why the Western Super App Never Happened — Until Now

WeChat has over 1.3 billion monthly active users and handles messaging, payments, e-commerce, government services, and mini-programs all in one place. Western companies have been trying and failing to replicate that for a decade.

The reasons for failure are real: U.S. data privacy law, antitrust scrutiny, platform fragmentation, and deeply entrenched single-purpose apps (Slack for chat, Stripe for payments, Google Calendar for scheduling) made the super app strategy a dead end in the West.

But AI changes the calculus. The old super app required you to rebuild every vertical inside one app. The new super app just needs one AI brain that can use everything outside it. You don’t need to own payments — you need Copilot to understand your Stripe data. You don’t need to own scheduling — you need Copilot to read your Google Calendar and act on it.

As one analysis of the U.S. super app window put it: “The old super app was ‘one app with everything inside.’ The next super app might be ‘one AI brain that can use everything outside.’” Between 2025 and 2027, the U.S. enters what some analysts call its Super App window — a convergence of AI interfaces, behavioral compression, and digital sovereignty that’s distinctly Western in character.

Microsoft is the only Western company with the asset stack to pull this off: an OS (Windows), a browser (Edge), a search engine (Bing), a professional network (LinkedIn), a productivity suite (Microsoft 365), a developer platform (GitHub + Azure), and now a unified AI layer (Copilot) stitching it all together.

What the “Everything Page” Actually Looks Like

Here’s the vision, stated plainly:
- Your news — curated by AI based on your industry, interests, and saved searches
- Your LinkedIn feed — surfaced selectively, not chronologically, based on what actually matters to your business goals
- Your email digest — key threads, action items, follow-ups, flagged by AI before you even open your inbox
- Your calendar — not just events, but prep briefs for each meeting pulled from your email, CRM, and LinkedIn history
- Your automation outputs — Cloud Run jobs, Zapier logs, agent reports, anything your background systems are doing
- Your health signals — fitness watch data, sleep scores, recovery metrics — not in a separate app, but contextualizing your day
- Your business metrics — revenue, leads, content performance, wherever your data lives
All of it on one page. All of it updated in real time. All of it organized by an AI that knows what you consider signal versus noise.

That’s not sci-fi. The APIs for all of that exist today. The AI to synthesize it exists today. The missing piece is the will to build the page — and a platform with enough trust and install base to make it stick.

The LinkedIn Angle Nobody Is Talking About

Here’s where my original observation gets more interesting. Microsoft has spent years sitting on one of the richest professional datasets on earth and doing relatively little with it compared to what’s possible. LinkedIn has 1 billion+ members, decades of career graph data, company relationship maps, content engagement signals — and it feeds directly into Microsoft Graph.

Now that Copilot is deeply embedded in enterprise environments, LinkedIn data isn’t just a social feature — it’s a professional intelligence layer. When your Copilot brief for a sales call surfaces that your prospect just changed jobs, posted about a pain point, or follows a competitor — that’s LinkedIn data flowing through Microsoft Graph into your daily workflow.

The scraping friction I noticed? It makes more sense when you consider that Microsoft may be actively working to make LinkedIn data more valuable inside its own ecosystem rather than letting third-party agents extract it freely. They’re not blocking AI — they’re channeling it through Copilot.

The Risk: Nobody Wants One Company Holding All of This

It would be dishonest not to acknowledge the obvious counterargument: this is a massive concentration of data and influence in one company’s hands.

The reason WeChat works in China is partly cultural and partly because the regulatory environment permits it. U.S. antitrust law, GDPR-aligned state privacy rules, and growing public skepticism about big tech data practices all push against a single unified everything app.

Microsoft’s bet is that enterprise trust — built through compliance features, security architecture, and the corporate IT relationship — gives them the permission that consumer platforms like Meta or X never earned. It’s a reasonable bet. It’s also one that regulators will watch closely.

If Microsoft Doesn’t Build It, Someone Will

The technology is not the bottleneck. Any serious developer with access to the right APIs could build a personal everything page today. Connect your Gmail, your LinkedIn (to the extent the API allows), your calendar, your fitness data, your cloud automation logs, and your analytics tools. Build a UI that surfaces what matters. Add an AI layer to summarize and prioritize.

The bottleneck is distribution, trust, and the cold-start problem — nobody wants to connect all their accounts to something they’ve never heard of. That’s why Microsoft wins this race if they choose to run it. They already have the accounts. They already have the trust relationships. Copilot is already installed in hundreds of millions of enterprise seats.

But if they don’t move fast enough, or if they build it only for enterprise and ignore the small business and creator class — that’s an opening. A focused, privacy-first, SMB-oriented everything page, built on open APIs, with no data lock-in? That’s a product worth building.

What This Means for Your Content and AI Strategy Right Now

Whether or not Microsoft delivers the everything app in the next 18 months, the direction of travel is clear. Professional information is consolidating around AI interfaces. LinkedIn content is increasingly flowing into Copilot’s intelligence layer. Bing-based AI answers are pulling from structured, authoritative content.

For businesses and content creators, that means:
- Your LinkedIn presence is now AI training data. What you post, how you structure it, and what entities you’re associated with affects how Copilot describes you to enterprise users asking about your industry.
- Your website content needs to be AI-readable. Structured data, clear entity signals, authoritative citations — these are no longer optional for AI search visibility.
- Your automation stack is a competitive advantage. The businesses that have already connected their tools via APIs will be first in line when the everything page actually ships.
The everything app isn’t coming. It’s arriving in pieces, quietly, through products you already use. The question is whether you’re positioned when the pieces snap together.

Frequently Asked Questions

Is Microsoft building an “everything app” like WeChat?

Microsoft hasn’t announced a single “everything app” product, but the pieces — Copilot, Microsoft Graph, LinkedIn data integration, Agent 365, and Bing web cards — suggest a unified AI-powered dashboard is the strategic direction. Whether it arrives as one product or an ecosystem of connected tools remains to be seen.

Why did Western super apps fail where WeChat succeeded?

U.S. data privacy regulations, antitrust scrutiny, platform fragmentation, and deeply entrenched single-purpose apps all prevented a WeChat-style super app from emerging in the West. AI changes the equation by enabling one system to connect and synthesize data across many separate apps without needing to own them.

How does LinkedIn data connect to Microsoft Copilot?

Microsoft Graph links LinkedIn’s professional data — profiles, company updates, career changes, content signals — directly into Copilot’s intelligence layer. Enterprise Copilot users receive LinkedIn-informed context in sales briefings, meeting prep, and professional research queries.

What is Microsoft 365 E7 and what does it include?

Microsoft 365 E7 (The Frontier Suite, GA May 1, 2026) combines Microsoft 365 E5 for secure productivity, Entra Suite for identity and access, Microsoft 365 Copilot for AI-in-workflow, and Agent 365 as the control plane to govern and scale AI agents across an organization.

What can small businesses do today to prepare for AI-unified platforms?

Connect your tools via APIs now, optimize your LinkedIn presence for AI entity recognition, publish structured authoritative content for AI search visibility, and build automation stacks that produce clean data outputs — these investments compound in value as AI platforms consolidate professional information.
May 14, 2026
Snowflake’s $200M Claude Partnership and India’s Glasswing Gap: Two Enterprise Stories That Matter
Last refreshed: May 15, 2026

Two partnership and policy stories from the Anthropic desk that haven’t been covered here yet, both with meaningful implications for how Claude reaches enterprise users and how governments are thinking about AI security risk.

Part 1: Snowflake’s $200M Partnership — 12,600 Enterprise Customers as Distribution

In December 2025, Anthropic and Snowflake announced a multi-year, $200M partnership making Claude models available to Snowflake’s 12,600+ enterprise customers across all three major clouds. The partnership makes Claude the AI layer inside Snowflake’s data platform for a client base concentrated in financial services, healthcare, and life sciences — the three regulated verticals where Anthropic has been most deliberately building.

The specific products:
- Snowflake Intelligence — powered by Claude Sonnet 4.6, providing conversational data analysis directly within the Snowflake environment
- Snowflake Cortex AI Functions — supporting Claude Opus 4.5 and newer models for structured AI functions across the Snowflake data warehouse
Source: anthropic.com/news/snowflake-anthropic-expanded-partnership

The number that matters most here isn’t $200M — it’s 12,600. That’s the customer count Snowflake brings as a distribution channel. These are enterprise organizations that have already made a procurement decision to standardize on Snowflake for data infrastructure. Embedding Claude inside that infrastructure means Claude becomes the AI system those organizations reach for when they need to query, analyze, or reason about their own data — without requiring a separate AI platform procurement decision.

This is the distribution model that makes enterprise AI market share move: not direct sales to 12,600 enterprises, but a single partnership that makes Claude the default AI layer inside infrastructure those enterprises already use. Snowflake customers in financial services can run Claude-powered compliance analysis on their own Snowflake data. Healthcare organizations can run Claude-powered analysis on patient data that stays within their existing Snowflake security perimeter.

The regulated-industry focus is deliberate. Financial services, healthcare, and life sciences are the verticals where data governance requirements are strictest — and where the ability to run AI on your own data, within your own security perimeter, without moving that data to an external AI service, is the deciding factor in procurement. Snowflake’s existing data residency and compliance infrastructure makes that possible in a way that a direct Anthropic API call often doesn’t.

Part 2: India’s RBI Warning + The Glasswing Gap

In late April 2026, India’s Finance Ministry and Reserve Bank of India convened meetings on cybersecurity preparedness specifically referencing Claude Mythos risk. Finance Minister Nirmala Sitharaman met with bank executives at North Block to advise pre-emptive hardening. The RBI began consulting with global regulators. CERT-In, major telcos, and fintechs ran parallel risk assessments.

Source: Business Standard, April 27, 2026 — business-standard.com

The structural issue underneath the news: Project Glasswing — Anthropic’s defensive cybersecurity consortium that provides early access to Mythos for defensive purposes — named the following founding partners: AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan Chase, Microsoft, and Nvidia. Zero Indian firms. India is Anthropic’s second-largest market globally. Its government is actively warning its financial sector about Mythos risk. And no Indian organization is in the defender consortium that gets early access to the model and the defensive research that goes with it.

This is not a small gap. The Mozilla Firefox result (271 vulnerabilities in a month, including 20-year-old bugs) demonstrated what Mythos can do in a real production codebase. If that capability is available to offensive actors — or if non-partner organizations don’t have the same early visibility into what Mythos can find — organizations outside the Glasswing partner network are in a different risk position than those inside it.

The Tension This Creates

Anthropic’s distribution into India is accelerating. Cognizant deployed Claude across 350,000 employees. Razorpay built its Agent Studio on the Claude Agent SDK and wired UPI rails through Claude as an authorized payment agent with NPCI. Air India, CRED, and Swiggy are named enterprise customers. India is Anthropic’s second-largest market.

Meanwhile: India’s government is warning its financial sector about the offensive potential of Claude Mythos, no Indian firm is in the Glasswing defender consortium, and INR-denominated pricing (with 18% GST) makes the effective Pro subscription cost approximately ₹2,240/month for Indian users — a meaningful friction point for the market Anthropic is describing as its #2 global market.

The distribution is running faster than the partnership infrastructure is opening. Either Project Glasswing expands to include Indian financial institutions and cybersecurity organizations, or India builds its own parallel defensive capacity, or the gap becomes a structural political fact in Anthropic’s India relationship.

India’s government isn’t opposed to Claude. It’s actively adopting it across both public and private sector. The RBI/Finance Ministry meetings were framed as hardening preparation, not restriction. But the asymmetry — India as top-2 market, zero Indian firms in the defender consortium — is conspicuous enough that it will eventually require a response.

Frequently Asked Questions

What does the Snowflake-Anthropic partnership include?

A multi-year, $200M agreement announced December 2025, making Claude models available to Snowflake’s 12,600+ enterprise customers. Snowflake Intelligence launched powered by Claude Sonnet 4.6 for conversational data analysis (model at time of partnership announcement; verify current model with Snowflake). Snowflake Cortex AI Functions supports Opus 4.5 and newer models. The focus is regulated industries: financial services, healthcare, and life sciences.

What is Project Glasswing?

Project Glasswing is Anthropic’s invitation-only defensive cybersecurity program that provides early access to Claude Mythos Preview for organizations working to defend critical infrastructure. Named founding partners include AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan Chase, Microsoft, and Nvidia. Access is invitation-only with no self-serve sign-up. No Indian organizations are currently named as Glasswing partners.

Why is India’s government warning about Claude Mythos if India is Anthropic’s second-largest market?

The Indian government’s meetings (RBI, Finance Ministry, CERT-In) were framed as defensive preparation, not restriction. The concern is that Mythos-tier capability could be used offensively against Indian financial infrastructure — a legitimate risk that applies regardless of Anthropic’s commercial relationship with India. The tension is that organizations inside Project Glasswing get early access to defensive research while India’s financial sector, with no Glasswing presence, does not.
May 9, 2026
Cowork Routines and Windows Computer Use: What’s New and How We’re Using Both
Last refreshed: May 15, 2026

Two Cowork capabilities that haven’t been written about here yet, despite being live since late April: Cowork Routines (always-on scheduled tasks that run when your laptop is closed) and Windows computer use (Claude operating your Windows desktop directly from within Cowork). Both shipped in the April 28–30 window alongside the Claude GA release. Both materially change what Cowork is.

Cowork Routines: The Laptop Can Be Closed

The original Cowork model required your laptop to be open and the Cowork desktop app to be running. Useful — but bounded by your hardware being available and powered on. Cowork Routines changes that.

Routines are cloud-hosted scheduled tasks that execute on Anthropic’s infrastructure regardless of your local hardware state. They run on a schedule you define. They execute when your laptop is off, sleeping, or in your bag on a plane. The task runs, the output lands where you configured it to land, and when you open the laptop you find the work done.

The practical scope of what runs well as a Routine:
- Daily briefings: Pull sources, synthesize, write to Notion or email — delivered before you open your laptop each morning
- Monitoring tasks: Check a source on a schedule, flag anomalies, log findings
- Content pipeline steps: Recurring publication tasks, social scheduling prep, site audit runs
- Report generation: Weekly status documents assembled from live data sources
- Notification triggers: Watch a condition, fire an action when it’s met
We run our own Claude Newspaper Desk — a daily briefing that checks Anthropic’s news, release notes, GitHub releases, and external coverage, then writes a structured briefing to Notion before we start the day. That’s a Routine. The briefing that generated this article was produced by a Routine running on a schedule, not by someone manually triggering a task.

The architectural decision that makes Routines significant: the task reads its instructions from a Notion desk spec page at runtime, not from a baked-in prompt. Change the Notion spec, change what the Routine does — without touching the scheduled task itself. The shim file that triggers the Routine is thin by design; the intelligence lives in Notion.

Windows Computer Use: Claude Operates Your Desktop

Computer use in Claude — the ability for Claude to navigate desktop interfaces, click through UI, fill forms, and verify results — was previously available primarily in research preview and on macOS. The April 2026 Cowork release brought computer use to Windows as a generally available capability within the Cowork desktop app.

What this means in practice: Claude can open a native Windows application, navigate its interface, perform a sequence of actions, and hand the result back — without you needing to automate it through code or build an API integration. If there’s a tool that only has a Windows UI and no API, Claude can use the Windows UI directly.

The current state of computer use is honest about its scope. It’s good at:
- Navigating well-structured desktop applications with clear UI hierarchies
- Form completion across multiple-step workflows
- Data extraction from desktop tools that don’t export well
- Verification steps that require visual confirmation
It’s slower than direct API integrations when those exist. For tools with APIs, use the API. Computer use is the path when no API exists or when the integration cost exceeds the value of doing it properly.

The combination of Routines + Windows computer use means a scheduled task can now include a step that operates a Windows desktop application — unattended, while your laptop is running in the background. That’s a meaningfully different capability than what Cowork shipped with originally.

How We’re Using Both

Our Cowork architecture as of May 2026:
- Cowork as execution layer — always-on laptop running scheduled tasks
- Notion as control plane — desk specs, task queues, logs, and credential storage
- GCP Cloud Run as action layer — WordPress publishing, API calls, content pipeline steps
- Claude Code Routines as cloud fallback — tasks that need to run independent of local hardware
Routines handle the tasks where continuous availability matters more than local context: briefings, monitoring, scheduled publishing. Cowork handles the tasks where rich local context matters: multi-step sessions with file access, browser navigation, and tools that live on the local machine.

The practical division: if the task needs to run at 3am when the laptop is sleeping, it’s a Routine. If the task needs to interact with local files, a browser session, or a Windows app, it’s Cowork.

The Non-Developer Angle

Neither of these capabilities requires you to be a developer to use. Routines are configured through the Cowork interface with natural language task descriptions and a schedule. Computer use activates through the same conversational interface you’re already using.

The architecture underneath is sophisticated. The interface isn’t. You describe what you want done and when, and the system figures out the implementation. This is the progression that makes these capabilities meaningful for operations teams, executive assistants, knowledge workers, and small business owners — not just engineers building agent pipelines.

Singapore’s Foreign Minister Balakrishnan built his own version of this on a Raspberry Pi. The point isn’t to build your own — it’s that the underlying architecture (persistent memory, scheduled tasks, multi-channel input) is now accessible at multiple layers of sophistication, from DIY open source to fully managed product.

Frequently Asked Questions

What are Cowork Routines?

Cowork Routines are cloud-hosted scheduled tasks that run on Anthropic’s infrastructure regardless of whether your local Cowork laptop is on or available. They execute on a schedule you define — daily, weekly, or at specific times — and can perform any task Cowork handles: briefings, monitoring, content pipeline steps, report generation, and notification triggers. Each Routine reads its instructions from a Notion desk spec at runtime.

Does Windows computer use require coding to set up?

No. Computer use in Cowork activates through the standard conversational interface. You describe what you want Claude to do in the application, and Claude navigates the Windows desktop UI directly. No scripting, automation code, or API integration is required — though API integrations are faster when they exist. Computer use is the path for tools with no accessible API.

What’s the difference between Cowork and Cowork Routines?

Cowork runs on your local machine and requires the desktop app to be open and active. Routines run on cloud infrastructure and execute regardless of local hardware state. The practical division: tasks that need to run unattended on a schedule go to Routines; tasks that need local context, file access, or desktop UI interaction go to Cowork. Both read task instructions from Notion desk spec pages at runtime.

Is Cowork available on both Mac and Windows?

Yes. Cowork and computer use are available on both macOS and Windows as of the April 2026 general availability release. The Windows release also established PowerShell as the default shell (previously Git Bash was required), reducing a friction point for enterprise Windows shops.
May 9, 2026
Harvard FAS Replaces ChatGPT Edu With Claude: What the Switch Signals
Last refreshed: May 15, 2026

Harvard’s Faculty of Arts and Sciences will provide Claude access to all affiliates — students, faculty, staff, and researchers — and will discontinue ChatGPT Edu after June 2026. Continuing ChatGPT Edu access will require “administrative and budgetary approval.” Harvard FAS also holds a Google Gemini institutional agreement. The story was reported by The Harvard Crimson on April 28, 2026.

This is the cleanest institutional AI platform switch yet on record. Harvard FAS covers roughly 20,000 affiliates. The administrative approval language around ChatGPT Edu continuation is the detail that tells you this isn’t additive — it’s a replacement.

What Actually Happened

Harvard FAS is not abandoning all AI tools. It’s rotating its primary institutional AI platform from ChatGPT Edu to Claude. The Gemini institutional agreement stays. What’s changing is which AI system gets the default institutional license, the frictionless path, the one that “just works” for every affiliate without requiring a separate approval process.

That framing matters. When an institution of Harvard FAS’s size structures access so that one platform requires administrative approval to continue while another is provided automatically to all affiliates, the default is the decision. The approval requirement for ChatGPT Edu isn’t a ban — it’s a friction tax that most users won’t bother to pay.

Why Institutions Switch AI Platforms

The Harvard Crimson’s reporting framed the switch as “platform rotation based on capability” — not a permanent commitment to any single AI provider. That framing is worth taking seriously. Academic institutions making technology decisions at this scale move deliberately, and the stated rationale (capability) suggests the evaluation was substantive.

The specific capabilities that tend to drive academic platform decisions:
- Long-form document handling: Claude’s 1M token context window (on Opus 4.7 and Sonnet 4.6) is directly useful for academic work — reading full papers, dissertations, and research datasets in a single session
- Research synthesis: Multi-document reasoning across large corpora without chunking
- Writing quality: Academic writing and editing assistance where tone and precision matter
- Institutional trust signals: Claude’s Constitutional AI approach and Anthropic’s safety positioning have become differentiators in institutional procurement conversations
We don’t have Harvard FAS’s internal evaluation criteria. What we know is that after running a ChatGPT Edu institutional agreement, they evaluated their options and chose to route default access to Claude.

What This Signals for Enterprise Platform Switching

Harvard FAS is a useful case study because academic institutions make AI procurement decisions in a way that resembles enterprise decisions more than consumer decisions: budget approval processes, IT security review, institutional liability considerations, and the need for a platform that works across a wildly diverse user base — from first-year undergraduates to Nobel laureates.

The platform switching question — “can our organization move from one AI platform to another?” — has been theoretical for most of the last two years. Harvard FAS running this switch makes it concrete. The institutional machinery for moving 20,000 users from one AI platform to another exists and has been executed.

For enterprise teams evaluating whether to consolidate on Claude or maintain a multi-platform approach: the Harvard FAS switch is evidence that the transition is operationally feasible at institutional scale, and that institutions with high capability and safety requirements are making this choice.

The Competitive Context

Claude now holds institutional agreements at major universities. ChatGPT Edu launched as OpenAI’s play for this exact market. The Harvard FAS switch doesn’t mean OpenAI is losing the education market — it means the competition for institutional default status is real and Claude is winning some of those decisions on capability grounds.

Anthropic’s enterprise market share, cited in its April 2026 Partner Network announcement, had grown from 24% to 40% since the Claude 4 generation launched. Harvard FAS is one data point in that trend.

Our Take

We track institutional AI adoption because it signals where the capability and trust thresholds are in the market. When an institution like Harvard FAS — which has the internal expertise to evaluate these platforms seriously — runs a full procurement process and routes its default institutional license to Claude, that’s a substantive signal about where the models stand.

The “administrative approval required to continue ChatGPT Edu” language is the tell. That’s not a ban. It’s the institutional equivalent of making one option the path of least resistance and the other a deliberate choice. For 20,000 people with actual work to do, the default wins.

Frequently Asked Questions

Did Harvard ban ChatGPT?

No. Harvard FAS is discontinuing its ChatGPT Edu institutional agreement after June 2026. Continuing access will require administrative and budgetary approval — meaning it’s available but no longer the frictionless default. Harvard FAS is also maintaining its Google Gemini institutional agreement. Claude is becoming the new institutional default, not an exclusive platform.

How many people does the Harvard FAS Claude agreement cover?

Harvard FAS covers all affiliates — students, faculty, staff, and researchers within the Faculty of Arts and Sciences. Exact affiliate count varies, but FAS is one of Harvard’s largest schools, covering undergraduate education and most of Harvard’s graduate programs in arts, sciences, and humanities.

Why did Harvard FAS switch from ChatGPT to Claude?

The Harvard Crimson reported the switch was framed as “platform rotation based on capability” — not a permanent commitment to any single provider. Anthropic hasn’t published the specific evaluation criteria Harvard FAS used. What’s on record is that after running a ChatGPT Edu institutional agreement, FAS evaluated its options and chose to route default access to Claude.

Does Harvard’s decision affect other universities?

Institutional decisions at the Harvard level typically influence procurement conversations at peer institutions — not through imitation but because evaluation committees at other universities use visible peer decisions as data points in their own capability and risk assessments. The Harvard FAS switch makes Claude a more credible institutional option for other universities running similar evaluations.
May 9, 2026
Singapore’s Foreign Minister Built His Own Claude AI Second Brain — And Published the Blueprint
Last refreshed: May 15, 2026

On April 21, 2026, Singapore’s Foreign Minister Dr Vivian Balakrishnan published the architecture of his personal AI assistant on GitHub. He called it NanoClaw — “a second brain for a diplomat.” It runs on a Raspberry Pi 5. It costs roughly $80 in hardware and $5–20 a month in API fees. It connects to his WhatsApp, Gmail, and voice notes. It drafts speeches, runs scheduled briefings, and — unlike every standard chatbot — gets smarter over time because it maintains a structured knowledge graph that persists across sessions.

His summary: “It answers every question, researches topics, provides daily updates, drafts speeches and condenses information. It has become invaluable — I don’t dare switch it off.”

A sitting cabinet minister of a G20-adjacent nation just open-sourced his personal AI second brain on GitHub. That’s worth slowing down to look at.

What NanoClaw Actually Is

NanoClaw is built on four open-source components running on a Raspberry Pi 5:
- NanoClaw (agent framework, built by developer Gavriel Cohen, 28k+ GitHub stars) — orchestrates Claude agents in isolated Docker containers. Each chat group gets its own sandboxed container.
- Mnemon — the knowledge graph layer. Extracts discrete facts, insights, and style preferences from raw documents and conversations into a structured, retrievable graph database. Each entry is a self-contained statement, not a raw text chunk.
- OneCLI — credential proxy.
- Karpathy’s LLM Wiki pattern — the memory architecture that lets the system synthesize knowledge rather than just retrieve it.
WhatsApp integration runs through Baileys, an open-source implementation of the WhatsApp Web protocol — no commercial API required. Voice notes are transcribed locally via Whisper.

The full architecture is published at: gist.github.com/VivianBalakrishnan/a7d4eec3833baee4971a0ee54b08f322

The Architecture Detail That Matters Most

Standard chatbots are stateless. Each session starts from zero. The standard workaround is RAG — retrieval-augmented generation, which pulls chunks of raw text from a document store when they seem relevant. Balakrishnan’s system does something different. Mnemon’s Extract function pulls discrete facts and insights from raw documents into a graph database. Each entry is a self-contained, retrievable statement — not a text chunk.

This is the same distinction that Anthropic’s Dreaming feature (announced May 6 for Managed Agents) is built on: the difference between storing raw experience and synthesizing it into structured knowledge. A system that synthesizes what it learns compounds in usefulness over time. One that just accumulates raw text doesn’t.

Balakrishnan acknowledged this in a reply on his GitHub gist: “Local models will not give you the big context needed for digesting the memory graph, but will be good enough for querying it. You may want to use a bigger model that works well with a 128K token context at the very least.” He chose Claude specifically for the reasoning capability on the memory graph.

He Built It With Claude Code, Not Traditional Coding

This detail matters. Balakrishnan confirmed on X that he never used an IDE. Claude Code made all edits. His description of his own process: “No ‘vibe coding’. All I did was ‘tool assembly’ to create a utility that worked in my domain.”

Tool assembly. That’s an important distinction. He didn’t write code — he assembled existing open-source tools using Claude as the implementation layer. A trained ophthalmologist and career diplomat, with no traditional software development background, built and deployed a production AI system running on commodity hardware by composing tools through Claude Code.

His framing at the 17th Asia-Pacific Programme for Senior National Security Officers, the day he published NanoClaw: “AI agents have crossed a threshold I did not expect so soon. Not just impressive demos — but practical tools for daily use.” The audience was senior national security officials from across the Asia-Pacific region.

Why This Is the Cowork Story in Miniature

We run our own version of this — Claude operating scheduled tasks, content pipelines, and research workflows on our behalf through Cowork. The architecture Balakrishnan published is recognizably the same value proposition: persistent memory, multi-channel input, scheduled tasks, a system that improves over time.

His total cost: ~$80 hardware, $5–20/month API. That’s a DIY Cowork running on a credit-card-sized computer on a diplomat’s desk in Singapore. The point isn’t that the price is better or worse than any specific product — it’s that the primitives are now accessible enough that a non-developer can assemble them into a working production system.

His own thesis on why he published it: “Sharing the blueprint boosts the edge — the specific composition will be obsolete in months, but the builder’s ability to compose the right pieces is the durable advantage.” That’s as clean a statement of the AI-literacy case as we’ve seen from anyone, let alone a sitting foreign minister.

The Broader Signal

Singapore continues to be the most Claude-dense environment we track. The same week Balakrishnan published NanoClaw, a Claude Code meetup at Grab HQ drew 1,291 registrants. GIC (Singapore’s sovereign wealth fund) is a co-investor in Anthropic’s infrastructure JV. The country has institutional capital, developer community density, and now a sitting cabinet minister publishing working Claude architecture on GitHub. That triangle is unusual.

Balakrishnan’s quote from the CNBC Converge Live fireside the day after publishing NanoClaw: “The diplomat who learns to work with AI will have a meaningful edge. I think that edge is now.” He wasn’t talking about chatbots. He was talking about a system running on his desk, integrated into his actual workflows, that he personally built and that he personally depends on.

That’s a different kind of AI adoption signal than a press release about an enterprise partnership.

Frequently Asked Questions

What is NanoClaw?

NanoClaw is an open-source Claude-powered personal AI assistant framework built by developer Gavriel Cohen. Singapore’s Foreign Minister Dr Vivian Balakrishnan published his own NanoClaw implementation on April 21, 2026 — a self-hosted assistant running on a Raspberry Pi 5 that connects to WhatsApp, Gmail, and voice notes, runs scheduled tasks, and maintains a persistent knowledge graph that grows smarter over time.

How much does NanoClaw cost to run?

Balakrishnan’s setup uses approximately $80 in hardware (Raspberry Pi 5) and roughly $5–20 per month in Anthropic API fees depending on usage volume. The software components (NanoClaw, Mnemon, OneCLI, Whisper, Baileys) are all open source. The full architecture is published at gist.github.com/VivianBalakrishnan/a7d4eec3833baee4971a0ee54b08f322.

Did Vivian Balakrishnan write the code himself?

He described his process as “tool assembly” rather than traditional coding — composing existing open-source components using Claude Code to handle implementation. He confirmed on X that he never used an IDE and that Claude Code made all edits. He has no traditional software development background; he’s a trained ophthalmologist and career diplomat.

How is NanoClaw’s memory different from standard chatbot memory?

Standard chatbots are stateless — each session starts from zero. NanoClaw uses Mnemon, a knowledge graph that extracts discrete facts and insights from conversations and documents into structured, retrievable entries. The system synthesizes knowledge rather than just storing raw text, meaning it compounds in usefulness over time rather than simply accumulating history.
May 9, 2026
Code with Claude London (May 19) and Tokyo (June 10): What to Know and Watch For
Last refreshed: May 15, 2026

Anthropic’s Code with Claude conference went global this spring. After the San Francisco event on May 6, London is next on May 19 — followed by Tokyo on June 10. Both are free to attend in person (applications closed; selected by lottery in April) or via livestream from anywhere in the world. If you’re a developer building on Claude and didn’t get an in-person seat, the livestream is worth blocking time for. Here’s what we know about both events and why the Tokyo date in particular is worth paying attention to.
Quick Reference
- London: May 19, 2026 (Extended day May 20) — claude.com/code-with-claude/london
- Tokyo: June 10, 2026 (Extended day June 11) — claude.com/code-with-claude/tokyo
- Livestream: Free, all three cities, registration open at claude.com/code-with-claude
- Recordings: Published to Anthropic’s YouTube channel within 7–10 days after each event
What Code with Claude Is

Code with Claude is Anthropic’s annual developer conference — a full day of hands-on technical workshops, live capability demos, and 1:1 office hours with the engineers who build Claude. It’s structured specifically for developers and founders who are building with the API, not for people who want marketing keynotes. The SF event on May 6 featured three parallel tracks: Research (direct access to Anthropic researchers on current and future model capabilities), Claude Platform (production agent deployment on Anthropic infrastructure), and Claude Code (running Claude Code at scale — long-horizon tasks, multi-repo work, parallel agents).

Confirmed speakers across the series: Ami Vora (CPO at Anthropic), Boris Cherny (Head of Claude Code), and Angela Jiang (Product Lead for the Claude API and SDKs). Partner presentations from GitHub, Vercel, and Datadog were part of the SF agenda and are likely to carry into London and Tokyo.

The Extended day format — May 20 for London, June 11 for Tokyo — is a separate event focused on independent developers and early-stage founders: builder deep-dives, laptops-open workshops from Anthropic’s Applied AI team.

What Came Out of San Francisco (May 6)

London and Tokyo attendees will be walking in with context from what Anthropic announced in SF. The major developments from May 6:
- Managed Agents public beta: Multiagent Orchestration and Outcomes moved to public beta. Multiple SF sessions were dedicated to Managed Agents, including “Get to Production 10x Faster with Claude Managed Agents” and a hands-on “Build a Production-Ready Agent” workshop.
- Dreaming (developer preview): Agents that review and reorganize their own session history between runs. Harvey (legal AI) reported roughly a 6× task completion rate increase after implementing it.
- SpaceX compute expansion: Doubled rate limits for Pro, Max, Team, and Enterprise; 1,500% input token increase and 900% output token increase for Tier 1 API customers; peak-hours throttling eliminated for Pro and Max.
- Claude Code v2.1.133: Subagent skill discovery fix (was silently broken), worktree base ref control, effort-level hooks.
London and Tokyo events will likely build on these — demonstrating Managed Agents and Claude Code in production contexts with the partner companies that attended SF.

London — May 19, 2026

London is Anthropic’s first Code with Claude event in Europe. The practical significance: for developers building in European markets, this is the first opportunity to engage directly with Anthropic’s engineering team rather than attending via livestream from across the Atlantic.

For teams working in regulated European industries — financial services, healthcare, legal — the Claude Platform and Research tracks are the most relevant. Anthropic’s Finance Agents suite (Moody’s integration, financial analysis and compliance tooling) and Claude Security Beta are recent launches that will likely feature in the sessions, given the financial services concentration in London.

The London timezone (BST, UTC+1) makes the livestream accessible for much of Europe, Africa, and Middle East without the early-morning constraint that the SF event imposed. Register at claude.com/code-with-claude/london.

What to Watch For at London
- Enterprise deployment patterns — London’s enterprise tech community is distinct from SF’s startup-heavy audience
- EU AI Act compliance framing — Anthropic’s approach to regulated market deployment
- MCP ecosystem sessions — the Model Context Protocol is increasingly central to how Claude connects to enterprise data sources
- Any Claude Code enterprise adoption data — the JetBrains 2026 developer survey showed significant Claude Code growth year-over-year; London sessions may provide more context
Tokyo — June 10, 2026

The Tokyo date is the strategically interesting one. Anthropic chose Japan as its first Asia-Pacific Code with Claude location at a moment when it has already made several Japan-specific moves: the NEC enterprise partnership (April 2026) and active engagement with Japan’s developer community. This is Anthropic positioning before competitors have fully embedded in the Japanese enterprise AI market.

Japan’s enterprise AI adoption pattern is different from the US. Large enterprises dominate, procurement cycles are longer, and partnerships with established technology companies (like NEC) carry more weight than direct developer adoption alone. Tokyo’s Code with Claude is as much about signaling enterprise commitment as it is about developer community building.

The Tokyo event is also relevant to Southeast Asia broadly — developers across the Asia-Pacific region can attend via livestream at a timezone that doesn’t require a middle-of-the-night session.

What to Watch For at Tokyo
- NEC partnership details — the most concrete Japan enterprise deployment announced so far
- Asia-Pacific pricing or access updates — Anthropic’s pricing in USD creates friction in markets like India and Japan where USD conversion plus local taxes creates meaningful access barriers
- Localization and multilingual Claude capability demos — Claude’s multilingual support is strong on paper; Tokyo is where it gets demonstrated to an audience that can evaluate it critically
- Any announcement of a dedicated Japan or APAC infrastructure presence
How to Attend Remotely

Both events are fully livestreamed at no cost. The livestream covers all three conference tracks. Recordings are published to Anthropic’s YouTube channel (the “Code w/ Claude Developer Conference” playlist) within 7–10 days of each event. If you’re watching recorded sessions rather than live, the Claude Code track tends to have the highest density of immediately applicable technical content.

For the London event: sessions run BST (UTC+1). For Tokyo: JST (UTC+9). Anthropic hasn’t published detailed schedules for London or Tokyo publicly yet — check claude.com/code-with-claude for updates as each event approaches.

Our Take

We watched the SF event closely and tracked what came out of it. The Managed Agents announcements were the most developer-relevant; the SpaceX rate limit news was the most immediately practical for anyone hitting API ceilings. Both London and Tokyo will be building on that foundation with an audience that has had two more weeks to actually use what Anthropic shipped in SF.

The office hours format is underrated. Getting 30 minutes with Boris Cherny’s team on a specific Claude Code workflow problem is worth more than three conference talks. If you’re attending in person or have specific implementation questions, that’s the format to prioritize.

For us, Tokyo is the event to watch for signals about where Anthropic’s international enterprise push is actually headed. The NEC partnership gave them a credible anchor. Code with Claude Tokyo is where they build on it.

Frequently Asked Questions

Is Code with Claude London free to attend?

Yes. Both in-person attendance and virtual livestream are free. In-person applications closed in April with selection by lottery. Livestream registration remains open at claude.com/code-with-claude/london.

Will Code with Claude Tokyo sessions be recorded?

Yes. All sessions from all three cities are published to Anthropic’s YouTube channel within approximately 7–10 days of each event. The “Code w/ Claude Developer Conference” playlist on Anthropic’s YouTube channel is the official home for recordings.

What tracks are available at London and Tokyo?

Based on the SF event structure, three parallel tracks: Research (model capabilities and direction), Claude Platform (production agent deployment), and Claude Code (scaling Claude Code in real engineering workflows). Specific session details for London and Tokyo haven’t been fully published; check claude.com/code-with-claude for the agenda as each event approaches.

What is the Extended day format?

The Extended day (May 20 for London, June 11 for Tokyo) is a separate event focused specifically on independent developers and early-stage founders — builder stories, hands-on workshops from Anthropic’s Applied AI team, and a more informal format than the main conference day.

Is Code with Claude relevant if I’m not using Claude Code specifically?

Yes. The Claude Platform track covers Managed Agents, MCP integrations, and production deployment patterns that apply to any team using the Claude API — not just Claude Code users. The Research track covers model capabilities and roadmap direction relevant to anyone building on Claude.
May 9, 2026
How Mozilla Used Claude Mythos to Find 271 Firefox Vulnerabilities — Including a 20-Year-Old Bug
Last refreshed: May 15, 2026

On May 7, 2026, Mozilla’s engineering team published the technical account of what happened when they ran Claude Mythos Preview against the Firefox codebase. The headline numbers — 271 vulnerabilities found, 423 total security bugs fixed in April — had already circulated. What the Mozilla Hacks post added was the methodology: how they actually built the pipeline, what Mythos found that human reviewers and fuzzers had missed for decades, and a candid account of what AI-assisted security research looks like in production.

This is that story, with the details that matter.

Source

All technical details in this article are sourced from Mozilla’s own engineering post: Behind the Scenes Hardening Firefox with Claude Mythos Preview, published May 7, 2026, by Mozilla engineers Brian Grinstead, Christian Holler, and Frederik Braun.

The Numbers in Context

Mozilla’s security team was fixing roughly 20 to 30 security bugs in Firefox per month throughout 2025. That number jumped to 423 in April 2026 — a roughly 20× increase in a single month. Of those 423 total fixes, 271 were attributed to Claude Mythos Preview. The remaining bugs came from external reports (41), other internal pipeline work using different models, and traditional fuzzing.

The 271 Mythos-found bugs broke down by severity as follows, from the Mozilla advisory:
- 180 rated sec-high — vulnerabilities triggerable with normal user behavior, like visiting a web page
- 80 rated sec-moderate — would be sec-high except they require unusual steps from the victim
- 11 rated sec-low — annoying but low harm risk (safe crashes, etc.)
Mozilla also directly credited 3 separate CVEs to Anthropic’s Frontier Red team (CVE-2026-6746, CVE-2026-6757, CVE-2026-6758) — bugs Anthropic had submitted to Mozilla a couple months prior, before the harness work began.

What Claude Mythos Found That Everything Else Missed

The most striking finding from Mozilla’s report isn’t the volume — it’s the age and complexity of what Mythos surfaced. Mozilla published a sample of the bug reports. Two entries stand out:

A 20-Year-Old XSLT Bug (Bug 2025977)

Mythos identified a bug in Firefox’s XSLT implementation where reentrant key() calls cause a hash table rehash that frees its backing store while a raw entry pointer is still in use. The bug had been sitting in the codebase for 20 years, undetected by fuzzing and manual review. Mozilla noted this was one of several sec-high issues involving XSLT they fixed in the same release.

A 15-Year-Old HTML Legend Element Bug (Bug 2024437)

Mythos triggered a bug in the <legend> element by orchestrating edge cases across distant parts of the browser — including recursion stack depth limits, expando properties, and cycle collection. The bug had existed for 15 years. Mozilla’s description of the finding: “meticulous orchestration of edge cases across distant parts of the browser.” This is the kind of bug that requires reasoning about how subsystems interact at a systems level — not pattern-matching on known vulnerability types.

Sandbox Escape Bugs That Human Reviewers Had Missed

Several of the 271 bugs were sandbox escapes — vulnerabilities that, when chained with other exploits, could allow an attacker to break out of Firefox’s sandboxed content process into the privileged parent process. Mozilla noted these are “notoriously difficult to find with fuzzing.” Mythos found multiple. It also attempted prototype pollution attacks on hardened subsystems — and found nothing exploitable there, confirming that Mozilla’s earlier architectural changes had worked.

How the Agentic Harness Actually Works

Mozilla’s engineers are explicit about the mechanism that changed everything: it’s not the model alone. It’s the combination of a capable model with an agentic harness that can generate and run reproducible test cases.

Earlier attempts at AI-assisted security review using GPT-4 and Claude Sonnet 3.5 produced too many false positives to be practical. The shift came when the harness could do something the earlier systems couldn’t: create a test case, run it, observe the result, and confirm whether the hypothesized bug was real before reporting it. Static analysis produces noise. An agent that can execute code to verify its findings produces signal.

The pipeline Mozilla built, in their own description:
1. Parallelized jobs run across multiple ephemeral VMs, each tasked with hunting bugs in a specific target file
2. Findings are written back to a central bucket
3. A discovery subsystem deduplicates against known issues, tracks bugs, triages them, classifies by severity, and manages patches through the release process
4. Over 100 engineers contributed code to get patches out the door
Mozilla started this pipeline with Claude Opus 4.6 on sandbox escape hunting. When Mythos became available, they swapped it in. Their assessment of the upgrade: “model upgrades increase the effectiveness of the entire pipeline: the system gets simultaneously better at finding potential bugs, creating proof-of-concept test cases to demonstrate them, and articulating their pathology and impact.”

What Mythos Couldn’t Break

Mozilla’s engineers made a point of documenting what Mythos tried and failed to do. Specifically: it repeatedly attempted prototype pollution attacks — a class of sandbox escape that human researchers had used successfully in the past — and was blocked by architectural changes Mozilla had made. The hardened subsystems held.

Mozilla’s take on this: “Observing such direct payoff from previous hardening work was even more rewarding than finding and fixing more bugs.” This is actually the more important message for security teams: defensive architecture works, and AI analysis now provides the empirical test of whether it does.

What This Means for the Software Security Ecosystem

Mozilla’s engineers closed their post with a direct recommendation: anyone building software can start using an agentic harness with a modern model today. Their advice on approach is practical — start with simple prompting, observe what the model produces, iterate. The inner loop they describe is: “there is a bug in this part of the code, please find it and build a testcase.”

The implications are real for any organization that maintains a codebase:
- The asymmetry is reversing. For years, offensive AI (cheap to prompt, cheap to deploy) had the advantage over defensive security (slow, expensive human review). An agentic harness that can verify its own findings changes that balance. Mozilla’s engineers describe the current moment as one where “defenders finally have a chance to win, decisively.”
- Old code is newly exposed. 15-year and 20-year-old bugs in a heavily-reviewed browser like Firefox suggests that large, mature codebases contain latent vulnerabilities that fuzzing and human review have consistently missed. If that’s true of Firefox, it’s true of most production software.
- The pipeline is the work. Mozilla’s engineers are clear that the model is a component, not the product. Building the triage, deduplication, patch management, and release integration around the model is what made this work at scale. The pipeline required significant iteration and tight feedback loops with the engineers who were fielding the bugs.
Claude Mythos Preview: Access and Context

Claude Mythos Preview is not a generally available model. It’s offered through Project Glasswing as an invitation-only research preview for defensive cybersecurity workflows, specifically for organizations working on critical infrastructure. Pricing from Anthropic’s docs: $25 input / $125 output per million tokens. Mozilla’s access was part of this program.

The generally available Claude models as of May 2026 (verified from Anthropic’s official documentation):
- Claude Opus 4.7 (claude-opus-4-7) — flagship, 1M context window
- Claude Sonnet 4.6 (claude-sonnet-4-6) — balanced speed/intelligence, 1M context window
- Claude Haiku 4.5 (claude-haiku-4-5-20251001) — fastest, 200K context window
Mozilla’s earlier pipeline work used Claude Opus 4.6 before Mythos was available and still found significant vulnerabilities. The pipeline architecture is available to any team; Mythos-tier capability is not.

Our Take

We’ve been tracking the Mythos story since the Project Glasswing announcement in April. The Mozilla post is the first time a production engineering team has published the full technical account of what AI-assisted security research looks like from the inside — not benchmarks, not Anthropic’s own claims, but Mozilla’s own engineers describing what they built, what it found, and what it couldn’t crack.

The 20-year-old XSLT bug is the one that cuts through the noise. Firefox is one of the most security-reviewed browser codebases in existence. Thousands of professional security researchers, internal teams, and academic researchers have looked at this code. An AI model running in an agentic harness found a two-decade-old bug with a reproducible test case in what Mozilla described as a pipeline that “required significant iteration.” That’s not a benchmark number — it’s a deployed result from a production security team.

The question for any organization that ships software is no longer whether this class of tooling will become standard. It’s how fast and whether your team will be ahead of or behind that curve when it does.

Frequently Asked Questions

What is Claude Mythos Preview?

Claude Mythos Preview is Anthropic’s most capable AI model, offered exclusively through Project Glasswing as an invitation-only research preview for defensive cybersecurity workflows. It’s not publicly available. Pricing is $25 per million input tokens and $125 per million output tokens. Mozilla, along with other critical infrastructure partners, received access as part of this program.

How many Firefox vulnerabilities did Claude Mythos find?

Claude Mythos Preview found 271 security vulnerabilities in Firefox that were fixed in Firefox 150 (April 21, 2026) and subsequent point releases. Of those, 180 were rated sec-high, 80 sec-moderate, and 11 sec-low. Total security bugs fixed across all of April 2026 was 423, including externally reported bugs and bugs found by other internal methods.

What is the agentic harness Mozilla built?

Mozilla built a custom pipeline on top of their existing fuzzing infrastructure. It runs model-powered agents in parallel across ephemeral VMs, each tasked with finding bugs in a specific file or subsystem. Agents generate reproducible proof-of-concept test cases to verify bugs before reporting them — eliminating the false positive problem that made earlier AI security review impractical. Findings are piped into a deduplication and triage system integrated with Mozilla’s normal patch management and release process.

Can other organizations use this approach?

Yes, with the publicly available models. Mozilla’s engineers explicitly recommend that any software team start using an agentic harness with a modern model now. You don’t need Mythos access to start — Claude Opus 4.7 and Sonnet 4.6 are publicly available via the Anthropic API. The pipeline architecture is the work; the model upgrade is a component swap.

What’s the difference between what Claude found and what fuzzing finds?

Traditional fuzzing generates random or semi-random inputs to trigger crashes. It’s effective at finding memory corruption bugs triggered by malformed data, but poor at finding bugs that require complex reasoning about how distant subsystems interact. The 15-year-old HTML legend element bug and 20-year-old XSLT bug that Mythos found both required reasoning about multi-subsystem interactions that fuzzing consistently missed. AI analysis and fuzzing are complementary; Mozilla runs both.
May 9, 2026
The Tolerance Premise

Article 38 ended with a question that usually gets asked in the wrong register: whether aggregate ownership — someone being accountable for the gap no individual node can see — is achievable above a certain scale.

The honest answer is: probably not. And the more interesting question is what you build once you’ve accepted that.

Most organizational design assumes the answer is better process. Better visibility, better cadence, better escalation paths. Hire a coordinator. Build a dashboard. Add a meeting where the distributed parts report to a center that holds.

What that design is still doing, structurally, is pursuing coherence. The meeting is the coherence mechanism. The dashboard is the coherence mechanism. The gap is treated as a problem with a process solution, and the process is built to close it.

But there’s a design premise on the other side of that question — one that almost nobody builds toward intentionally, because it sounds like giving up. The premise is: distributed incoherence is not a problem to solve. It is the permanent condition of any system operating at real complexity. The task is not eliminating the gap. The task is making the gap legible, bounded, and visible to the right eyes at the right time.

Call this the tolerance premise. Not tolerance in the passive sense — not ignoring the gap — but designed, deliberate tolerance with structure. The difference between an organization that drifts silently into incoherence and one that holds distributed nodes in deliberate, bounded divergence is not whether gaps exist. It’s whether the gaps are visible, named, and bounded before they compound.

What the Tolerance Premise Requires

Three things the tolerance premise requires that coherence pursuit doesn’t.

Local legibility. Each node has to be able to report its own state honestly — not relative to the aggregate, which it can’t see, but in absolute terms. Am I stalled, moving, or blocked? Am I running the same instructions I was running six weeks ago? The discipline is not performance relative to the plan. It’s accurate self-reporting relative to the last known state. Most systems optimize local nodes for output, not for honest state representation. The tolerance premise inverts this: the most valuable thing a node can do is tell the truth about itself, because the aggregate can only be seen if the inputs are accurate. A node that reports green when it’s yellow is not a performance problem — it’s an epistemic problem, and epistemic problems aggregate faster than process problems.

Aggregate surfacing. Something has to look across nodes — not to own the gap, but to name it. This is the function that’s almost universally missing. Not a manager, not a meeting, not a weekly review that summarizes what the nodes already reported. Something that reads the pattern across honest local reports and says: here is where drift has accumulated. Here is the shape of the distributed incoherence you are currently running with. This function cannot be inside any node, because every node’s context is bounded by its own view. It has to be orthogonal to execution — not above it, not managing it, but adjacent to it with a wider aperture. The weekly briefing that can see nineteen sites healthy and one down is doing aggregate surfacing. What it cannot do is close the gap it names. That’s the distinction: surfacing is not owning.

Bounded drift. Tolerance without limits is not a design — it’s an abdication. The tolerance premise requires specifying, in advance, how much drift is acceptable before the aggregate requires a reset. Not a goal to eliminate drift, but a maximum. Beyond this distance, the distributed configuration has to be brought into view and reoriented. The timing is not a calendar event. It’s a threshold condition. The bounded-drift rule fires when the condition is met, not when someone gets around to looking. Items in flight beyond a certain number of days get reviewed — not because anyone scheduled a review, but because the threshold was crossed. That’s a different instrument than a due date. A due date is a coherence mechanism. A threshold is a tolerance mechanism.

The Ecological Analog

The closest working analog for this is not organizational. It’s ecological.

A forest doesn’t achieve coherence. Every tree is pursuing its own local optimization — light, water, soil, root competition — with no central coordinator. The aggregate is neither coherent nor chaotic. It’s something else: distributed local optimization with seasonal rebalancing. The rebalancing isn’t managed. It’s structural. Winter is the bounded-drift reset. Fire is the bounded-drift reset. The organism that can’t survive the reset was already running outside tolerance, whether or not anyone noticed.

What would “seasonal rebalancing” mean for an AI-augmented operation?

Not a quarterly review. Reviews are coherence mechanisms — they gather the distributed parts and try to realign them to a center. A seasonal reset in the ecological sense would be more disruptive and more structural: a periodic moment where the whole configuration is visible at once, where whatever is outside tolerance doesn’t get optimized — it gets composted, and the freed attention becomes the resource for the next cycle.

Most organizations cannot build this because the cultural cost of composting living work is too high. The project that’s been in flight for eight weeks has people behind it. Ending it looks like failure. The forest does not feel bad about the dead branch. The operator who has to tell a team that a project is being composted — not killed for cause, just outside tolerance — is doing something the forest does automatically and humans find almost impossible to do cleanly.

The composting problem is not a process problem. It’s a grief problem. And the tolerance premise doesn’t solve it. It just makes the moment of composting structurally necessary rather than politically optional.

What Leadership Becomes

Here is the uncomfortable version of the tolerance premise.

If aggregate ownership is impossible above a certain scale, and the design solution is legible bounded incoherence rather than coherence pursuit, then the function of leadership in that system changes. The leader is no longer the person who closes the gap. They are the person who decides how much gap is acceptable — and who runs the bounded-drift reset when the threshold is crossed.

That’s a different job. Not better or worse. Different.

The briefing system that can look across distributed nodes and name the gap is not doing leadership’s job. It’s doing the aggregate-surfacing job — providing the honest read that leadership can’t get from inside any single node. What it cannot do is choose the tolerance threshold, decide when the reset fires, or do the composting. Those require judgment about what the operation can sustain and what it is trying to become. Judgment like that requires something that has skin in the game.

Most people who are building AI-augmented operations are still designing for coherence and then being surprised when the gap persists. They build better dashboards, more sophisticated briefing cadences, finer-grained status tracking. All of this is useful. None of it changes the structural fact that the gap between distributed nodes is not a visibility problem — it’s an ownership problem, and visibility doesn’t create owners. It just makes ownerlessness more obvious.

The tolerance premise is what you build when you’ve stopped pretending that better visibility will, eventually, produce the coherence it’s been promising.

The question isn’t whether your system is coherent. It’s whether you know what shape your incoherence has taken — and whether you chose it, or it chose you.

May 9, 2026

Category: Industry News & Commentary

Yesterday Changed Everything for Notion

The Database Advantage Nobody Else Has

The Agent Timeline: Faster Than Anyone Expected

What the Notion Everything App Actually Looks Like Today

The Model Behind It: Claude Opus 4.7

Why “Database First” Beats “Document First” for the Everything App

The Honest Weakness: The 30-Second Wall

Who Should Be Paying Attention Right Now

The Bigger Picture: A Series on Who Wins the Everything App

Frequently Asked Questions

What are Notion Custom Agents?

What is Notion Workers?

What AI model does Notion use?

What is the Notion External Agents API?

How is Notion different from Microsoft Copilot and Google Workspace AI?

What are the real limitations of Notion Workers in the alpha?

Google Didn’t Need to Acquire Its Way Here

What Google Just Shipped: The Pieces Coming Together

The Model Reality: Get This Right Before You Strategize

What Google’s Everything Page Actually Looks Like Today

The Tension: Google’s Biggest Competitor Is Google’s Own Fragmentation

The A2A Protocol: The Move Nobody Talked About Enough

What This Means for SMBs and Content Creators Right Now

Google vs. Microsoft: Who Wins the Everything App Race?

Frequently Asked Questions

What is Google Workspace Studio?

What is Google Agentspace?

What is the latest Google Gemini model in 2026?

What is Google’s A2A protocol?

Do small businesses have access to Google’s AI agent features?

The Observation That Started This

Microsoft Is Already Building the Pieces

Why the Western Super App Never Happened — Until Now

What the “Everything Page” Actually Looks Like

The LinkedIn Angle Nobody Is Talking About

The Risk: Nobody Wants One Company Holding All of This

If Microsoft Doesn’t Build It, Someone Will

What This Means for Your Content and AI Strategy Right Now

Frequently Asked Questions

Is Microsoft building an “everything app” like WeChat?

Why did Western super apps fail where WeChat succeeded?

How does LinkedIn data connect to Microsoft Copilot?

What is Microsoft 365 E7 and what does it include?

What can small businesses do today to prepare for AI-unified platforms?

Part 1: Snowflake’s $200M Partnership — 12,600 Enterprise Customers as Distribution

Part 2: India’s RBI Warning + The Glasswing Gap

The Tension This Creates

Frequently Asked Questions

What does the Snowflake-Anthropic partnership include?

What is Project Glasswing?

Why is India’s government warning about Claude Mythos if India is Anthropic’s second-largest market?

Cowork Routines: The Laptop Can Be Closed

Windows Computer Use: Claude Operates Your Desktop

How We’re Using Both

The Non-Developer Angle

Frequently Asked Questions

What are Cowork Routines?

Does Windows computer use require coding to set up?

What’s the difference between Cowork and Cowork Routines?

Is Cowork available on both Mac and Windows?

What Actually Happened

Why Institutions Switch AI Platforms

What This Signals for Enterprise Platform Switching

The Competitive Context

Our Take

Frequently Asked Questions

Did Harvard ban ChatGPT?

How many people does the Harvard FAS Claude agreement cover?

Why did Harvard FAS switch from ChatGPT to Claude?

Does Harvard’s decision affect other universities?

What NanoClaw Actually Is

The Architecture Detail That Matters Most

He Built It With Claude Code, Not Traditional Coding

Why This Is the Cowork Story in Miniature

The Broader Signal

Frequently Asked Questions

What is NanoClaw?

How much does NanoClaw cost to run?

Did Vivian Balakrishnan write the code himself?