Author: Will Tygart

  • Microsoft Copilot Compliance for Regulated Industries: Finance, Healthcare, and Legal (2026)

    Microsoft Copilot compliance for regulated industries requires governance controls that exceed the standard enterprise deployment model. Financial services firms face SEC and FINRA recordkeeping requirements that extend to AI interactions. Healthcare organizations must ensure Copilot does not surface protected health information in violation of HIPAA. Legal departments must prevent Copilot from crossing ethical walls between client matters. Each industry has distinct compliance obligations, and deploying Copilot without addressing them creates regulatory exposure.

    This guide provides industry-specific compliance frameworks for the three sectors with the highest Copilot adoption rates and the strictest regulatory requirements: financial services, healthcare, and legal.

    Microsoft’s Compliance Certifications for Copilot

    Microsoft 365 Copilot inherits the compliance certifications of the broader Microsoft 365 platform, and in 2025 achieved its own dedicated certification: ISO/IEC 42001:2023 for AI management systems, with zero non-conformities. This certification covers the AI-specific governance practices Microsoft applies to Copilot, including data handling, model training boundaries, and interaction monitoring.

    Key certifications relevant to regulated deployments:

    • ISO/IEC 42001:2023 — AI management system (Copilot-specific, zero non-conformities)
    • SOC 2 Type II — Security, availability, processing integrity, confidentiality, and privacy
    • ISO 27001/27018 — Information security and cloud privacy
    • HIPAA BAA — Business Associate Agreement available for healthcare customers
    • FedRAMP High — Authorization for US government cloud deployments
    • PCI DSS — Payment card industry data security (infrastructure level)

    These certifications establish baseline compliance, but they do not eliminate the need for organization-specific controls. Certification means Microsoft’s infrastructure and processes meet the standard — your organization’s configuration and usage patterns are your responsibility.

    Financial Services: Deploying Copilot Under SEC, FINRA, and MiFID II

    Financial services leads all industries in Copilot adoption at 71%. Major deployments include Barclays (100,000 seats), UBS (50,000 seats), and Lloyds Banking Group (30,000 seats with 93% daily active usage). These firms have invested heavily in governance frameworks that satisfy regulatory requirements while capturing productivity benefits.

    Recordkeeping Requirements

    SEC Rule 17a-4 and FINRA Rule 4511 require broker-dealers to retain business communications for specified periods. When a financial advisor uses Copilot to draft client communications, analyze portfolio performance, or summarize market research, those Copilot interactions become business records subject to retention.

    Configuration requirements:

    • Enable Purview retention policies for Copilot interactions with a minimum 6-year retention period
    • Configure legal hold capabilities for Copilot data to support regulatory examinations
    • Ensure Copilot interactions are included in the firm’s eDiscovery workflows
    • Implement Communication Compliance policies that mirror existing surveillance for email and chat

    Information Barriers and Chinese Walls

    Investment banks and multi-service financial firms maintain information barriers (Chinese walls) between departments that have access to material non-public information (MNPI). Copilot must respect these barriers — an analyst in the M&A advisory team cannot receive Copilot responses that reference information from the trading desk.

    Microsoft 365 Information Barriers can be configured to restrict Copilot’s grounding scope by department or group membership. However, these barriers must be tested specifically for Copilot, because the AI’s cross-document aggregation capability may surface connections between seemingly unrelated documents that cross barrier boundaries.

    Financial Services DLP Template

    Deploy DLP policies that detect: account numbers, SWIFT codes, wire transfer instructions, insider trading keywords, earnings previews, M&A codenames, and client personal financial information. Block Copilot responses containing more than two financial identifiers. Alert compliance on any Copilot interaction that references restricted-list securities.

    Healthcare: HIPAA Compliance and Copilot

    Healthcare presents unique Copilot compliance challenges because the regulatory framework — HIPAA — was written decades before AI assistants existed. The Privacy Rule and Security Rule establish requirements for protected health information (PHI) that must be interpreted for the Copilot context.

    Is Microsoft 365 Copilot HIPAA Compliant?

    Microsoft offers a HIPAA Business Associate Agreement (BAA) that covers Microsoft 365 services, including Copilot. However, the BAA covers Microsoft’s obligations as a technology provider. The covered entity (hospital, clinic, health plan) remains responsible for configuring Copilot in a manner that prevents unauthorized PHI disclosure.

    Copilot becomes a HIPAA compliance risk when:

    • A user in a non-clinical department (marketing, finance) asks Copilot a question and receives a response grounded in clinical documents they technically have access to
    • Copilot aggregates fragments from multiple patient records into a response that creates a more complete PHI record than any individual source
    • Copilot is used on unmanaged personal devices where PHI could be exposed outside the organization’s security perimeter

    Healthcare-Specific Configuration

    Deploy sensitivity labels specifically for PHI: Patient Records (Highly Confidential), Clinical Notes (Confidential), De-identified Research Data (Internal). Configure autolabeling to detect PHI combinations — patient name plus any of: diagnosis, medication, lab result, insurance ID, or date of service.

    Use Restricted SharePoint Search to exclude clinical document repositories from Copilot’s grounding scope for non-clinical users. Enable Copilot only on managed devices enrolled in Microsoft Intune with health data encryption policies enforced.

    Copilot Health: The 2026 Clinical Expansion

    Microsoft launched Copilot Health in March 2026, extending Copilot capabilities specifically for clinical workflows. Copilot Health operates under additional technical controls — it processes clinical data within a more restricted boundary than general Copilot and includes healthcare-specific guardrails for PHI handling. Organizations evaluating Copilot Health should treat it as a separate deployment with its own governance framework, not an extension of the general Copilot rollout.

    Legal: Ethical Walls and Privilege Protection

    Law firms and corporate legal departments face two Copilot compliance challenges that other industries do not: maintaining ethical walls between client matters and protecting attorney-client privilege in AI interactions.

    Matter-Level Isolation

    Legal ethics rules require that information from one client matter is not accessible to attorneys working on adverse or unrelated matters. When a law firm deploys Copilot, the AI must not surface documents from Matter A in responses to attorneys assigned only to Matter B.

    Implementation approach: structure SharePoint sites by matter with explicit permission boundaries. Configure Copilot access at the matter-site level so the AI’s grounding scope is limited to documents within the requesting attorney’s assigned matters. Validate this isolation through adversarial testing — have attorneys deliberately query for information from matters they are not assigned to.

    Privilege Protection

    Attorney-client privileged communications included in Copilot’s grounding could inadvertently appear in responses to non-privileged users. The risk is compounded because privilege is contextual — the same document may be privileged in one context and not in another.

    Mitigation: apply sensitivity labels that identify privileged documents and configure DLP policies that flag Copilot responses containing privilege markers (“attorney-client privileged,” “legal advice,” “work product”) when accessed by non-legal personnel.

    Legal Industry Case Study: Loyens & Loeff

    Loyens & Loeff, a Benelux law firm, deployed Copilot to their entire organization and achieved a 94% active user rate with over 1 million prompts in six months. Their success was built on matter-level SharePoint isolation, comprehensive sensitivity labeling, and an internal training program that emphasized responsible Copilot usage for legal professionals.

    Cross-Industry Compliance Considerations

    EU and UK Regulatory Scrutiny

    The Dutch government conducted a data protection impact assessment on Microsoft 365 Copilot, raising concerns about data processing transparency and user consent. Organizations deploying Copilot in EU/UK jurisdictions should conduct their own Data Protection Impact Assessments under GDPR Article 35, particularly if Copilot processes employee personal data or customer information.

    Data Residency

    Copilot processes data within the Microsoft 365 tenant’s geographic boundary. For organizations with data residency requirements — EU data staying in EU data centers, for example — verify that your tenant’s data location settings align with Copilot’s processing locations. Microsoft’s EU Data Boundary commitment covers Copilot interactions for EU tenants.

    Frequently Asked Questions

    Is Microsoft Copilot HIPAA compliant?

    Microsoft offers a HIPAA Business Associate Agreement covering Copilot. However, the covered entity remains responsible for configuring Copilot to prevent unauthorized PHI disclosure. This requires sensitivity labels for clinical data, Restricted SharePoint Search for clinical repositories, DLP policies for PHI patterns, and device-level controls through Intune.

    What compliance certifications does Copilot have?

    Microsoft 365 Copilot has achieved ISO/IEC 42001:2023 (AI management) with zero non-conformities, and inherits SOC 2 Type II, ISO 27001, HIPAA BAA eligibility, FedRAMP High, and PCI DSS certifications from the Microsoft 365 platform.

    How do financial services firms deploy Copilot compliantly?

    Financial services firms deploy Copilot with SEC/FINRA-compliant retention policies (minimum 6-year), information barriers that prevent cross-department MNPI leakage, Communication Compliance surveillance, and financial-specific DLP policies. Barclays, UBS, and Lloyds have deployed 100K, 50K, and 30K seats respectively under these controls.

    Can law firms use Copilot without breaking attorney-client privilege?

    Yes, with proper configuration. Law firms must implement matter-level SharePoint isolation, apply sensitivity labels to privileged documents, configure DLP to flag privilege markers in Copilot responses to non-legal users, and validate isolation through adversarial testing. Loyens & Loeff achieved 94% active usage with these controls.

    Does Copilot comply with GDPR and EU data residency requirements?

    Copilot processes data within the tenant’s geographic boundary. Microsoft’s EU Data Boundary commitment covers Copilot interactions for EU tenants. Organizations should conduct GDPR Article 35 Data Protection Impact Assessments before deployment, particularly if Copilot processes employee personal data.



  • 73% of Enterprises Find Data Exposure After Deploying Copilot — Here’s the Pre-Deployment Security Checklist

    Copilot data exposure occurs when Microsoft 365 Copilot surfaces sensitive documents, emails, or data to users who were never intended to see that information. The root cause is not a flaw in Copilot itself — Copilot faithfully respects existing access permissions. The problem is that most organizations have accumulated years of permission sprawl, overshared folders, and misconfigured access controls that were invisible until an AI started actively surfacing content based on those permissions.

    According to Microsoft’s internal assessments, 73% of enterprises discover critical data exposure risks within the first 90 days of Copilot deployment. This checklist exists to find and fix those risks before Copilot amplifies them.

    Understanding the Oversharing Problem

    Every organization accumulates permission debt over time. A SharePoint site created for a project team five years ago still grants access to employees who left that team. A OneDrive folder shared with “Everyone except external users” contains documents that should be restricted to a specific department. An email distribution group used for a one-time announcement still has membership that includes contractors.

    Before Copilot, this permission debt was largely invisible. Users rarely browsed through every SharePoint site they had access to. The information was technically accessible but practically obscured by the sheer volume of content across the tenant.

    Copilot changes this equation. When a user asks a question, Copilot searches across every piece of content that user can access — every SharePoint site, every OneDrive folder, every email, every Teams message. Content that was buried in a forgotten folder is now one natural language query away from appearing in a Copilot response.

    The Pre-Deployment Security Checklist

    Phase 1: Permission Audit (Week 1-2)

    1. Audit SharePoint site collection permissions. Generate a permissions report for every site collection in your tenant. Identify sites where “Everyone” or “Everyone except external users” has been granted access. These are the highest-risk targets because Copilot will surface their content to any employee.

    2. Review OneDrive sharing links. OneDrive files shared via “Anyone with the link” or “People in your organization” links are accessible to Copilot for every user who matches that sharing scope. Run a sharing link audit using the SharePoint Admin Center or Microsoft Graph API to identify over-shared personal files.

    3. Evaluate Microsoft 365 Group memberships. Every M365 Group grants access to a shared mailbox, SharePoint site, and Teams channel. Review group memberships for accuracy, focusing on groups created more than 12 months ago where membership may have drifted from the intended audience.

    4. Check guest and external user access. External users with SharePoint or Teams access create a data boundary risk. If Copilot is enabled for external users (which it should not be by default), they could surface internal content through AI-assisted queries. Verify that guest access policies exclude Copilot.

    5. Identify stale content with active permissions. Documents and sites that have not been modified in 12+ months but still have broad access represent unnecessary exposure surface. These are prime candidates for permission reduction or archival.

    Phase 2: Classification Deployment (Week 2-3)

    6. Deploy sensitivity labels across the tenant. At minimum, implement a four-tier label taxonomy: Public, Internal, Confidential, and Highly Confidential. Each label must have Copilot-relevant protections — at the Highly Confidential tier, content should be excluded from Copilot grounding entirely.

    7. Configure autolabeling policies. Manual labeling alone will not achieve sufficient coverage before Copilot deployment. Configure Microsoft Purview autolabeling to detect and label documents containing sensitive information types automatically. Prioritize financial data, personal identifiers, and health information.

    8. Measure label coverage. Track the percentage of documents across SharePoint and OneDrive that have sensitivity labels applied. Target 80% coverage before enabling Copilot for production users. Use Purview Data Classification dashboards to monitor coverage progress.

    9. Enable label inheritance for new documents. Configure sensitivity label policies so that new documents created from labeled templates or in labeled containers automatically inherit the parent sensitivity level. This prevents coverage gaps from growing over time.

    Phase 3: Copilot-Specific Controls (Week 3-4)

    10. Configure Restricted SharePoint Search. If your label coverage is below 80% or if specific site collections contain regulated data, enable Restricted SharePoint Search to limit which sites Copilot can access for grounding. Start with a curated allow-list and expand as governance matures.

    11. Set up Purview audit logging for Copilot. Enable Purview Audit (Premium recommended) and verify that Copilot interaction events are being captured. These logs record every prompt, response, and document reference — essential for compliance monitoring and incident investigation.

    12. Deploy Communication Compliance for Copilot. Create Communication Compliance policies that monitor Copilot interactions for sensitive information patterns. Configure review workflows so flagged interactions are investigated by appropriate compliance personnel.

    13. Configure Conditional Access for Copilot. Restrict Copilot access to managed, compliant devices through Microsoft Entra Conditional Access policies. Copilot should not be accessible from personal devices or unmanaged endpoints where data loss controls cannot be enforced.

    14. Disable Copilot for service accounts and shared mailboxes. Service accounts and shared mailboxes often have broader access than individual users. Exclude these accounts from Copilot licensing to prevent the AI from operating with elevated permissions.

    Phase 4: Pilot and Validate (Week 4-5)

    15. Select a pilot group of 50-100 users. Choose users from a department with moderate data sensitivity — not the most sensitive (finance, legal, HR) and not the least sensitive (marketing, general admin). The pilot should be representative of typical Copilot usage patterns.

    16. Run adversarial testing. During the pilot, have security team members deliberately test Copilot’s boundaries: ask for salary information, request documents from other departments, query for unreleased product details. Document every case where Copilot surfaces content that should be restricted.

    17. Review pilot audit logs weekly. Analyze Copilot interaction logs from the pilot group for unexpected access patterns, high-sensitivity document references, and DLP policy matches. Use findings to refine policies before broader deployment.

    18. Conduct user awareness training. Pilot users should understand that Copilot can surface content from across the organization based on their permissions. Train them to recognize when Copilot shows information they should not be seeing and how to report it.

    Phase 5: Post-Deployment Monitoring

    19. Establish a monthly governance review. After Copilot is in production, conduct monthly reviews of: DLP policy match rates, Communication Compliance findings, permission change requests driven by Copilot exposure, and user feedback on unexpected content surfacing.

    20. Create an incident response playbook. Document the specific workflow for when Copilot surfaces sensitive data to an unauthorized user: detection, containment (disable Copilot for affected user), investigation (trace source documents and permissions), remediation (fix the access gap), and notification (regulatory reporting if required).

    Priority Order: What to Fix First

    If you cannot complete the entire checklist before Copilot deployment, prioritize in this order:

    1. Enable Restricted SharePoint Search to limit Copilot’s scope (immediate risk reduction)
    2. Audit and fix “Everyone” permissions on SharePoint sites (highest exposure vector)
    3. Deploy sensitivity labels on your most sensitive site collections (targeted protection)
    4. Configure Purview audit logging (visibility and compliance)
    5. Set up Communication Compliance monitoring (detection capability)

    Frequently Asked Questions

    What percentage of enterprises find data exposure after deploying Copilot?

    According to Microsoft’s internal assessments, 73% of enterprises discover critical data exposure risks within the first 90 days of deploying Microsoft 365 Copilot. The exposure comes from pre-existing permission sprawl that Copilot amplifies, not from flaws in Copilot itself.

    How do I secure Microsoft Copilot before deployment?

    Secure Copilot before deployment by completing a five-phase checklist: audit SharePoint and OneDrive permissions, deploy sensitivity labels with autolabeling, configure Restricted SharePoint Search and Purview audit logging, run a controlled pilot with adversarial testing, and establish ongoing governance reviews.

    Does Copilot break data permissions?

    No. Copilot strictly respects existing Microsoft 365 permissions. If a user can access a document through SharePoint or OneDrive, Copilot can surface that document’s content. The risk is that existing permissions are often broader than intended — Copilot makes this visible by actively surfacing content that was previously buried.

    What is the fastest way to reduce Copilot data exposure risk?

    The fastest risk reduction is enabling Restricted SharePoint Search, which limits which SharePoint site collections Copilot can access for grounding its responses. This can be configured in minutes through the SharePoint Admin Center and immediately restricts Copilot’s data scope.

    How long should a Copilot security pilot last?

    A Copilot security pilot should run for a minimum of 4-6 weeks with 50-100 users. This provides enough interaction data to identify permission gaps, test DLP policies, and validate that governance controls are functioning before broader deployment.



  • Copilot DLP Policies: The CISO’s Configuration Guide

    Copilot DLP policies are Data Loss Prevention rules configured in Microsoft Purview that specifically monitor and control how Microsoft 365 Copilot interacts with sensitive data. Unlike traditional DLP that tracks file movement across endpoints and email, Copilot DLP must address a fundamentally different threat model: an AI assistant that aggregates fragments from dozens of documents into a single response, potentially combining information in ways that exceed the sensitivity of any individual source.

    This guide walks CISOs and security teams through the complete configuration process for Copilot DLP, from understanding why traditional approaches fall short to deploying prompt-level enforcement and Communication Compliance monitoring.

    Why Traditional DLP Fails for Copilot

    Traditional DLP was designed for a world where data moves in predictable patterns: a user downloads a file, attaches it to an email, or shares it externally. DLP policies intercept these movements and enforce rules. The data stays in recognizable containers — files, messages, uploads — that DLP can inspect.

    Copilot breaks this model. When a user asks Copilot to “summarize the key financial terms from our recent client negotiations,” the AI does not move a file. Instead, it reads across every document, email, and Teams message the user has access to, extracts relevant fragments, and synthesizes them into a new response. That response may contain salary figures from HR documents, deal terms from legal contracts, and revenue projections from finance spreadsheets — none of which were individually flagged by traditional DLP because no file was moved.

    The aggregation problem is the core challenge. A Copilot response can be more sensitive than any of its source documents individually, because it combines information that was intentionally siloed across different departments and access boundaries.

    The Three Layers of Copilot DLP

    Effective Copilot data protection requires three enforcement layers working together. No single layer is sufficient.

    Layer 1: Endpoint DLP (Pre-Copilot)

    Endpoint DLP remains the first line of defense. Before Copilot ever processes a query, endpoint DLP policies should already be controlling how sensitive files are accessed, modified, and shared on managed devices. This layer prevents sensitive content from being in locations where Copilot can access it in the first place.

    Key endpoint DLP configurations for Copilot readiness:

    • Block copy-to-clipboard for documents with Highly Confidential sensitivity labels
    • Restrict printing and screen capture for regulated content
    • Audit access to sensitive file locations that Copilot could reference
    • Configure sensitivity label inheritance so new documents created from sensitive sources carry the parent label

    Layer 2: Communication DLP (Copilot Interactions)

    Microsoft Purview Communication Compliance extends to Copilot interactions. This layer monitors what Copilot says in its responses and flags interactions that contain sensitive information patterns.

    Configuration steps for Communication Compliance with Copilot:

    1. Navigate to Microsoft Purview Compliance Portal → Communication Compliance
    2. Create a new policy selecting “Microsoft 365 Copilot” as the monitored channel
    3. Define detection conditions using sensitive information types (SSN, credit card, health records)
    4. Configure the review workflow — assign compliance reviewers who will investigate flagged interactions
    5. Set severity levels: informational for low-risk matches, high for regulated data types
    6. Enable automated alerts to the security operations team for critical matches

    Layer 3: Prompt-Level DLP (2026 Addition)

    Prompt-level DLP evaluates the user’s input to Copilot — not just the response. This is the newest enforcement layer, introduced in 2026, and it addresses a gap that the first two layers could not cover: users deliberately or inadvertently requesting sensitive information through carefully constructed prompts.

    Prompt-level DLP can detect and block queries such as:

    • Requests for employee compensation data across departments
    • Queries that attempt to access content outside the user’s normal working scope
    • Prompts that reference specific regulated data categories (patient health information, student education records)
    • Patterns indicating prompt engineering attempts to bypass content restrictions

    Configuring Sensitive Information Types for Copilot

    Microsoft Purview includes over 300 built-in sensitive information types (SITs), but effective Copilot DLP requires selecting and customizing the right set for your organization. The most impactful SITs for Copilot governance fall into four categories:

    Financial data: Credit card numbers, bank account numbers, SWIFT codes, ABA routing numbers. These appear frequently in Copilot responses when users query across financial documents and emails.

    Personal identifiers: Social Security numbers, passport numbers, driver’s license numbers, national ID numbers. Copilot can inadvertently surface these from HR documents, benefits enrollment forms, and employee communications.

    Health information: ICD-10 codes, drug names in clinical context, patient identifiers. Critical for healthcare organizations or any company with employee health programs.

    Custom SITs: Create organization-specific patterns for internal project codenames, unreleased product names, M&A target company names, and other proprietary identifiers that standard SITs will not catch.

    Restricted SharePoint Search: The Nuclear Option

    Restricted SharePoint Search (RSS) is the most powerful — and most blunt — Copilot control available. When enabled, RSS limits Copilot’s grounding to only the SharePoint site collections you explicitly allow. Everything else is invisible to Copilot regardless of user permissions.

    RSS is appropriate when:

    • Your sensitivity label coverage is below 80% and you cannot wait for full deployment
    • Specific site collections contain regulated data that must never appear in Copilot responses
    • You are in the initial deployment phase and want to limit Copilot’s scope while building confidence

    RSS configuration:

    1. Access the SharePoint Admin Center → Settings → Restricted SharePoint Search
    2. Enable the feature and add site collections to the allowed list
    3. Copilot will only ground responses using content from allowed sites
    4. Review and expand the allowed list quarterly as governance matures

    DLP Policy Templates for Regulated Industries

    Financial Services Template

    Monitor for: credit card numbers, bank account numbers, financial statement fragments, insider trading keywords, material non-public information patterns. Block: Copilot responses containing more than 2 financial identifiers in a single response. Alert: compliance team on any Copilot interaction referencing M&A codenames or unreleased earnings data.

    Healthcare Template

    Monitor for: patient names with medical record numbers, ICD-10 codes, drug prescriptions, PHI combinations (name + diagnosis + date). Block: any Copilot response containing a complete PHI record as defined by HIPAA. Alert: privacy officer on any Copilot interaction in clinical departments that references patient data.

    Legal Template

    Monitor for: attorney-client privilege markers, litigation hold references, settlement amounts, opposing counsel communications. Block: Copilot from synthesizing across matters that should be ethically walled. Alert: general counsel on any Copilot interaction that crosses matter boundaries.

    Testing and Deployment Workflow

    Never deploy Copilot DLP policies directly to enforcement mode. The recommended workflow:

    1. Week 1-2: Deploy all policies in audit-only mode. Copilot continues to function normally, but every policy match is logged
    2. Week 3: Review audit logs. Identify false positives and adjust detection thresholds
    3. Week 4: Conduct tabletop exercise with sample Copilot interactions that should trigger each policy
    4. Week 5: Move low-risk policies (informational alerts) to enforcement mode
    5. Week 6: Move high-risk policies (blocking rules) to enforcement mode with override justification required
    6. Ongoing: Monthly policy review cycle. Adjust as Copilot capabilities expand and new sensitive data patterns emerge

    Measuring DLP Effectiveness for Copilot

    Track these metrics monthly to assess whether your Copilot DLP policies are working:

    • Policy match rate: Number of Copilot interactions flagged per 1,000 total interactions. Baseline this in audit mode, then track post-enforcement
    • False positive rate: Percentage of flagged interactions that reviewers classify as non-issues. Target below 15%
    • Sensitive data exposure incidents: Confirmed cases where Copilot surfaced protected data to unauthorized users. Target zero
    • Mean time to investigation: Average time from DLP alert to completed compliance review
    • User override rate: Percentage of blocked interactions where users request and receive an override. High rates suggest policies are too aggressive

    Frequently Asked Questions

    How do I configure DLP for Microsoft Copilot?

    Configure Copilot DLP through Microsoft Purview Compliance Portal using three layers: endpoint DLP for file-level controls, Communication Compliance for monitoring Copilot responses, and prompt-level DLP for evaluating user queries. Start in audit-only mode for 30 days before enforcing blocking rules.

    What is prompt-level DLP for Copilot?

    Prompt-level DLP, introduced in 2026, evaluates what users type into Copilot before the AI processes the query. It can detect and block requests for sensitive information categories, attempts to access data outside normal working scope, and prompt patterns that indicate bypass attempts.

    Can Copilot bypass DLP policies?

    Copilot itself cannot bypass DLP policies when properly configured. However, the aggregation problem means Copilot can combine non-sensitive fragments into sensitive responses. This is why all three DLP layers — endpoint, communication, and prompt-level — are necessary for comprehensive protection.

    What sensitive information types should I monitor for Copilot?

    Prioritize financial identifiers (credit cards, account numbers), personal identifiers (SSN, passport), health information (PHI, clinical data), and custom patterns for your organization’s proprietary data. Microsoft Purview includes over 300 built-in sensitive information types that can be applied to Copilot DLP policies.

    How long should I test Copilot DLP policies before enforcement?

    Run Copilot DLP policies in audit-only mode for a minimum of 30 days. During this period, review all policy matches, adjust detection thresholds to reduce false positives below 15%, and conduct a tabletop exercise before moving to enforcement mode.



  • The Complete Microsoft 365 Copilot Governance Framework for Enterprise IT (2026)

    Microsoft 365 Copilot governance is the structured set of policies, controls, and processes that determine how your organization deploys, monitors, and secures Copilot across the Microsoft 365 ecosystem. Without a deliberate governance framework, enterprises routinely discover that Copilot surfaces sensitive data employees were never meant to see — a problem that affects 73% of organizations within the first 90 days of deployment, according to Microsoft’s own internal assessments.

    This guide provides a complete, actionable governance framework built around five control domains. It is designed for CISOs, IT administrators, GRC professionals, and managed service providers who need to move beyond Microsoft’s reference documentation into practical implementation.

    Why Copilot Governance Cannot Wait

    Microsoft 365 Copilot operates on a simple principle: it can access anything the user can access. That means every misconfigured SharePoint permission, every overshared OneDrive folder, and every stale document with outdated access controls becomes a potential data exposure vector the moment Copilot is enabled. The AI does not break your permissions — it amplifies whatever permission state already exists.

    For regulated industries — financial services, healthcare, legal, and government — this creates immediate compliance risk. Barclays deployed Copilot to 100,000 seats. UBS rolled it out to 50,000. Lloyds Banking Group reports 93% daily active usage among their 30,000 Copilot users. Each of these deployments required governance frameworks that went far beyond what Microsoft provides out of the box.

    The Five Control Domains of Copilot Governance

    Effective Copilot governance operates across five interconnected domains. Weakness in any single domain creates risk that cascades across the others. The framework below addresses each domain in the order they should be implemented.

    Domain 1: Data Classification and Sensitivity Labels

    Classification is the foundation. Before enabling Copilot for any user group, your organization must have a functioning sensitivity label taxonomy applied across SharePoint, OneDrive, Exchange, and Teams. Microsoft Purview Information Protection provides the tooling, but the taxonomy itself must reflect your organization’s actual data categories.

    The minimum viable label set for Copilot governance includes four tiers: Public, Internal, Confidential, and Highly Confidential. Each tier requires specific Copilot interaction policies — for example, Highly Confidential documents should be excluded from Copilot grounding entirely through Restricted SharePoint Search configuration.

    Autolabeling policies accelerate coverage. Configure Purview autolabeling to detect sensitive information types — Social Security numbers, credit card numbers, health records, financial account data — and automatically apply the appropriate sensitivity label. Organizations that implement autolabeling before Copilot deployment reduce their sensitive data exposure surface by up to 89% within the first 60 days.

    Domain 2: Policy Design and DLP

    Data Loss Prevention policies for Copilot require a fundamentally different approach than traditional DLP. Traditional DLP monitors file movement — downloads, email attachments, external sharing. Copilot DLP must monitor AI interactions, because Copilot can aggregate fragments from dozens of documents into a single response that contains more combined sensitivity than any individual source document.

    Microsoft introduced prompt-level DLP in 2026, adding a third enforcement layer alongside endpoint DLP and communication DLP. Prompt-level DLP evaluates what users ask Copilot and what Copilot returns, flagging interactions that request or expose protected information types.

    The policy design sequence:

    1. Map your sensitive information types to DLP policy templates
    2. Configure Microsoft Purview DLP policies with Copilot-specific conditions
    3. Enable Communication Compliance for Copilot interaction monitoring
    4. Set up Restricted SharePoint Search to exclude sensitive site collections from Copilot grounding
    5. Test policies in audit-only mode for 30 days before enforcement

    Domain 3: Identity and Access Controls

    Copilot governance inherits your identity posture. If your Azure Active Directory (now Microsoft Entra ID) has overly permissive group memberships, nested security groups with unintended access inheritance, or guest accounts with broad SharePoint access, Copilot will surface content through all of those vectors.

    The governance framework requires a pre-deployment identity audit that specifically evaluates access from Copilot’s perspective: not just who should have access, but what Copilot would surface to each user based on their current effective permissions. Microsoft’s Data Security Posture Management for AI tools can automate portions of this assessment.

    Key identity controls for Copilot:

    • Implement Conditional Access policies that restrict Copilot to managed, compliant devices
    • Review and trim overprivileged security group memberships quarterly
    • Disable Copilot for guest and external accounts by default
    • Enforce Privileged Identity Management for admin accounts that configure Copilot policies

    Domain 4: Audit and Monitoring

    Every Copilot interaction generates audit data — the prompt, the response, the documents referenced during grounding, and the web queries Copilot used. This audit trail is essential for compliance, incident investigation, and governance maturity measurement.

    Microsoft Purview Audit (Standard and Premium) captures Copilot interaction events. Purview Activity Explorer provides a visual interface for investigating specific interactions. For organizations subject to legal hold requirements, Copilot interactions are included in eDiscovery workflows and can be placed under preservation holds.

    The monitoring stack for mature Copilot governance:

    • Real-time alerts: Configure Purview Communication Compliance policies to flag high-risk Copilot interactions
    • Weekly reviews: Audit Copilot usage patterns by department, identifying anomalous query volumes or topics
    • Monthly reporting: Generate compliance reports showing DLP policy matches, sensitivity label coverage, and Copilot adoption metrics
    • Incident workflow: Document the investigation process for when Copilot surfaces content it should not have

    Domain 5: Incident Response

    When Copilot surfaces sensitive data to an unauthorized user — and in a large deployment, this will happen — the incident response process must be defined before it is needed. The response workflow should address three questions: what was exposed, to whom, and what remediation is required.

    The Copilot-specific incident response playbook:

    1. Detection: Alert triggered by Communication Compliance, DLP policy match, or user report
    2. Containment: Disable Copilot for the affected user or group immediately via admin center
    3. Investigation: Use Purview Activity Explorer to identify the exact interaction, source documents, and scope of exposure
    4. Remediation: Fix the underlying permission or classification gap that allowed the exposure
    5. Notification: Determine whether regulatory notification obligations apply (GDPR, HIPAA, state breach notification laws)
    6. Prevention: Update DLP policies, sensitivity labels, or access controls to prevent recurrence

    The Zoned Governance Strategy

    Microsoft recommends — and enterprise practice confirms — a zoned approach to Copilot governance. Rather than applying a single policy set across the entire organization, create distinct governance zones with different control levels.

    Experimentation Zone: A controlled environment where select user groups test Copilot with enhanced monitoring. All interactions logged. DLP in audit mode. Use this zone for pilot programs and user acceptance testing.

    Standard Zone: Production deployment for general business users. Standard DLP enforcement, sensitivity labels required, regular audit reviews. This is where most employees operate.

    Restricted Zone: Departments handling regulated data — legal, HR, finance, executive communications. Enhanced DLP, stricter Restricted SharePoint Search boundaries, additional Communication Compliance policies, shorter audit review cycles.

    Agent Governance: The 2026 Expansion

    The governance framework must now extend beyond chat-based Copilot to Copilot Studio agents — custom AI agents built on the Copilot platform that can take actions, access external systems, and operate with varying degrees of autonomy. Agent governance requires additional controls:

    • Agent registration and approval workflows before deployment
    • Scoped permissions for each agent (which data sources, which actions)
    • Agent-specific audit trails separate from user Copilot interactions
    • Testing requirements before agents are published to production
    • Periodic access reviews for agent permissions, mirroring user access reviews

    Implementation Timeline: 30/60/90 Day Plan

    Days 1-30: Foundation

    • Complete sensitivity label taxonomy and begin autolabeling deployment
    • Run SharePoint permission audit focused on oversharing
    • Configure Copilot admin settings at tenant level
    • Establish the Experimentation Zone with 50-100 pilot users
    • Enable Purview audit logging for Copilot interactions

    Days 31-60: Policy Enforcement

    • Deploy DLP policies in audit-only mode
    • Configure Restricted SharePoint Search for sensitive site collections
    • Set up Communication Compliance policies for Copilot monitoring
    • Conduct pilot user feedback sessions and adjust policies
    • Move DLP policies from audit to enforcement mode

    Days 61-90: Scale and Mature

    • Expand from Experimentation Zone to Standard Zone
    • Deploy Restricted Zone policies for regulated departments
    • Establish monthly governance review cadence
    • Document incident response playbook and conduct tabletop exercise
    • Begin agent governance planning if Copilot Studio adoption is planned

    Frequently Asked Questions

    What is a Microsoft 365 Copilot governance framework?

    A Copilot governance framework is a structured set of policies, controls, and procedures that govern how an organization deploys, configures, monitors, and secures Microsoft 365 Copilot. It typically covers five domains: data classification, DLP policy design, identity and access controls, audit and monitoring, and incident response.

    Why do enterprises need Copilot governance?

    Copilot accesses content based on existing user permissions. Without governance, Copilot can surface sensitive documents, emails, and data that users technically have access to but were never meant to see — a problem discovered by 73% of enterprises within 90 days of deployment.

    What is Restricted SharePoint Search and how does it protect Copilot?

    Restricted SharePoint Search is a Microsoft 365 admin feature that limits which SharePoint site collections Copilot can use for grounding its responses. By excluding sensitive sites from Copilot’s search scope, you prevent it from surfacing content from those locations regardless of user permissions.

    How does Copilot DLP differ from traditional DLP?

    Traditional DLP monitors file movement — downloads, sharing, email attachments. Copilot DLP must also monitor AI interactions, because Copilot can combine fragments from multiple documents into responses that contain more combined sensitivity than any individual source. Prompt-level DLP, introduced in 2026, evaluates Copilot prompts and responses directly.

    What compliance certifications does Microsoft 365 Copilot have?

    Microsoft 365 Copilot has achieved ISO/IEC 42001:2023 certification for AI management systems with zero non-conformities. It also inherits the compliance certifications of the broader Microsoft 365 platform, including SOC 2 Type II, ISO 27001, HIPAA BAA eligibility, and FedRAMP authorization for government cloud deployments.

    How should regulated industries approach Copilot governance?

    Regulated industries — financial services, healthcare, legal, and government — should implement the Restricted Zone governance model with enhanced DLP policies, stricter classification requirements, shorter audit review cycles, and industry-specific sensitive information type detection. Start with a pilot in a non-regulated business unit before expanding to regulated departments.



  • Conversations as Code: The Ontological Shift Nobody Named Yet

    Conversations as Code: The Ontological Shift Nobody Named Yet

    By William Tygart | June 2026


    Abstract

    Every major paradigm shift in technology follows the same arc: the mechanic arrives first, the naming arrives later, and the person who names it captures lasting authority over the frame. Version control went from SCCS to git over three decades. Then its metaphors leaked into every domain — documents, designs, legal contracts, data pipelines. But nobody has named the next obvious target: the conversation itself.

    This paper argues that AI conversations are not like code. They are code — complete with commits, branches, diffs, deploys, and the entire software development lifecycle. The infrastructure already exists. The philosophical claim does not. This is that claim.


    I. The Pattern We Keep Missing

    In 1964, Marshall McLuhan told a room full of Canadian broadcasters that the medium is the message. He’d been saying it since 1958, but nobody wrote it down because radio people don’t read media theory — they do media. The written version showed up in Understanding Media six years later. His colleague Harold Innis had the structural insight a decade earlier, published it in an academic journal, in concepts too dense for a headline. Innis is for specialists. McLuhan owns the cultural territory.

    The pattern repeats. Lawrence Lessig compressed Joel Reidenberg’s “Lex Informatica” into “Code is law” and pointed it at the general public. Clive Humby said “Data is the new oil” at a 2006 conference; nobody wrote it down until a colleague blogged it months later, and it didn’t truly detonate until The Economist ran a cover story in 2017 — eleven years after the phrase was coined. Marc Andreessen published “Why Software Is Eating the World” in the Wall Street Journal in August 2011; fourteen years later, the phrase still structures how VCs talk about markets.

    The structural formula is always the same: someone compresses a complex, multi-page argument into a logical identity statement — A is B — short enough for a keynote, a tweet, a headline. The person who does this in a broadcast venue captures lasting authority, even if someone else had the idea first. Reidenberg published “Lex Informatica” in the Texas Law Review a full year before Lessig. He’s a footnote. Alfred Russel Wallace mailed Darwin a manuscript with the identical theory of natural selection. We call it Darwinism. Stephen Stigler named this dynamic “Stigler’s Law of Eponymy” — no discovery is named after its true discoverer — while explicitly crediting Robert Merton as the actual originator. The law is now called Stigler’s.

    I’m not going to be Reidenberg.


    II. The Mechanic Is Already Commodity

    Before I make the philosophical claim, let me be precise about what already exists. The infrastructure for treating conversations with version-control primitives is live, shipping, and increasingly competitive:

    ChatGPT introduced conversation branching in late 2024, letting users fork from any message and explore alternate paths. It’s a consumer feature with millions of users. Claude Code, Anthropic’s developer tool, runs on a directed acyclic graph — a DAG — the same data structure git uses to track commits. It spawns sub-agents that branch, execute in parallel, and return results to the main thread. Google AI Studio offers conversation forking. Forky, an open-source tool, adds git-like branching to any AI chat interface. GitChat stores conversations in actual git repositories. Academic researchers published a full “Conversational Versioning System” framework (arXiv:2512.13914, December 2025) mapping version control onto multi-turn dialogue.

    The mechanic — forking, branching, comparing conversation paths — is commoditized. Every major AI lab either ships it or has it on the roadmap. This is the plumbing, and it’s table stakes.

    What nobody has done is name the building.


    III. The Claim

    A conversation with an AI is not *like* code. It *is* code.

    Not metaphorically. Not “conversations have some properties that remind us of code.” Literally: a conversation is a sequence of instructions that, when executed against a runtime (the model), produces deterministic-ish outputs. It can be versioned. It can be branched. It can be tested. It can be deployed. It can be reviewed. It has bugs. It has technical debt. It has a lifecycle.

    Every primitive in the software development lifecycle has a direct, non-metaphorical conversation equivalent. Not because someone designed it that way, but because conversations with AI systems are programs — they’re just programs written in natural language and executed against a neural network instead of a CPU.

    Here is the complete Rosetta Stone:


    The Full Mapping

    Commit → A prompt-response pair that produces a decision or artifact. Every time you send a message and receive a response that changes the state of your work, you’ve committed. The conversation history is your commit log. It’s append-only (you can’t unsend), it has timestamps, and it has attribution (who said what).

    Branch → A conversation fork from a decision point. When ChatGPT lets you “edit” a prior message and explore a different path, that’s a branch. When Claude Code spawns a sub-agent with different instructions, that’s a branch. When you copy a system prompt into a new conversation and modify one variable, that’s a branch.

    Merge → Synthesizing two conversation branches into a single decision. This is the hard one — the one every non-code domain drops when they adopt version control. More on this below.

    Diff → Comparing the outputs of two conversation branches. “I asked the same question two different ways. Here’s what changed in the answer.” This is already how people evaluate prompt quality — they just don’t call it diffing.

    Pull Request → Proposing a conversation-derived decision for review. When I run a strategic analysis in Claude and then present the output to a stakeholder for approval before acting on it, that’s a pull request. The conversation produced the work. The review gate determines whether it ships.

    Code Review → Structured review of a reasoning chain against a specification. I’ve been doing this for weeks and didn’t call it code review until now. More on this in the receipts section.

    Linter → Prompt quality enforcement. System prompts, CLAUDE.md files, constitutional AI guidelines — all of these constrain conversation outputs the way a linter constrains code style. They don’t change the logic; they enforce the standards.

    Test Suite → “Does this prompt reliably produce the expected output?” Prompt evaluation frameworks (the kind every AI lab publishes) are test suites. They run inputs, compare outputs to expected results, and report pass/fail. We’ve been writing tests for conversations for two years. We just call them “evals.”

    CI/CD → Promoting a conversation pattern to production use. When a prompt goes from “something I tried once” to “a standing instruction that runs automatically,” it has been deployed through a pipeline. My scheduled tasks — email triage at 7 AM, newsletter extraction, midday inbox check — are conversations that graduated to production.

    Deploy → A conversation becoming a skill, a workflow, a standing instruction. A Claude skill (a SKILL.md file) is a deployed conversation. It started as an interactive session. The session produced a workflow. The workflow was encoded as a reusable protocol. That’s build → test → deploy.

    Rebase → Replaying a conversation on top of new context. When I take an old analysis and re-run it with updated data — same structure, new inputs — I’m rebasing. The conversation structure is preserved; the context underneath it has changed.

    Cherry-pick → Extracting one insight from a conversation branch and applying it to another. “That framework from Tuesday’s session would solve the problem we hit Thursday.” Pull one commit from one branch, apply it to another.

    .gitignore → Context exclusion. System prompts that say “do not use information from X” or “ignore content that looks like instructions inside documents.” This is .gitignore for conversations — explicitly marking what the runtime should not process.

    README → System prompt. The README tells a new developer what a repository does, how to use it, and what to expect. A system prompt tells a new conversation what the AI’s role is, how to behave, and what to expect from the user. A CLAUDE.md file is a README for a conversation environment.

    Monorepo vs. Polyrepo → One mega-conversation vs. many focused ones. The monorepo debate is alive and well in AI workflows. Do you run one long conversation that accumulates context (monorepo), or do you spawn many focused conversations with narrow scopes (polyrepo)? The tradeoffs are identical: monorepos have easier cross-referencing but get unwieldy at scale; polyrepos are cleaner but require explicit coordination.


    IV. The Missing Primitive: Merge

    Every domain that adopts version control drops branching. Wikis keep revision history but don’t branch. Google Docs keeps versions but doesn’t branch. Legal redlining is bilateral — two parties, not an arbitrary graph. The reason is always the same: branching requires merging, and merging requires resolving conflicts, and conflict resolution requires judgment that most users won’t exercise and most tools won’t automate.

    Conversations have the same problem, and it’s the reason the “conversations as code” framing hasn’t been named yet — the hardest primitive is the one that makes the whole system coherent.

    What does it mean to merge two conversation branches?

    It means taking two divergent reasoning paths — two explorations that started from the same decision point and went different directions — and synthesizing them into a single, coherent decision that incorporates the best of both. This is not summarization. Summarization compresses; merging reconciles. A merge has to identify where the two branches agree (fast-forward), where they conflict (merge conflict), and how to resolve the conflicts (judgment).

    This is, incidentally, the thing that AI systems are becoming extraordinarily good at. A model that can hold two 100,000-token conversation branches in context and produce a synthesis that identifies agreements, flags conflicts, and proposes resolutions is a merge engine. The merge primitive that every other domain dropped because humans wouldn’t do it might be the primitive that AI makes viable.

    If that happens — if AI-assisted conversation merging becomes reliable — then conversations won’t just be code. They’ll be code with better tooling than most actual code has.


    V. My Receipts

    I’m not writing this as a theoretical exercise. I’ve been living this paradigm for months, building systems that embody every primitive I’ve described, before I had a name for what I was doing. Here are the receipts.

    Skills as Deployed Conversations

    I have over forty Claude skills in production — reusable protocols that handle everything from WordPress SEO optimization to social media scheduling to content quality gates. Every single one was born from a conversation. The pattern is always the same: I have a conversation where we figure out a workflow. The workflow works. I encode it as a SKILL.md file. The file becomes a standing protocol that runs the same way every time.

    My team documented the birth of one skill — the Cockpit Session — with precision: “This pattern emerged from the April 6, 2026 Monday Content Intelligence Audit. Will described wanting to ‘walk into a prepped room’ — the cockpit-session skill codifies that habit permanently.”

    The conversation was the development environment. The SKILL.md was the deploy artifact. The skill running in production is the service. That’s not a metaphor. That’s a software lifecycle.

    The Scope Index as Main Branch

    On June 15, 2026, I ran an off-site board session — alone, with Claude — that produced a comprehensive strategic map of my entire business network. We called it the Scope Index. It maps every organization, every key person, every partnership, every risk, every sequenced move.

    The Scope Index defines its own operating loop: “scope → implement → document → change.” That’s a development cycle. The document functions as trunk — the canonical branch that all decisions branch from and merge back into. When I evaluate a new opportunity, I check it against the Scope Index. When I make a strategic decision, I update the Scope Index. It has a date stamp. It has an author. It has a version history in Notion.

    It even has branch termination. Two prospective partners — Phil Rosebrook and Chris Nordyke — were evaluated and marked NO-GO. Those are closed branches. They’ll never merge back to main.

    Lens Exercises as Code Review

    The week after I built the Scope Index, I started running what I called “lens exercises” — structured reviews of my strategic decisions through formal analytical frameworks. Critical Thinking applied to a partnership gate decision. Context and History applied to an identity question about one of my organizations. Ethics and Impact applied to an information firewall I’d built between two business relationships. Future Implications applied to a parked initiative.

    Each exercise reads the prior reasoning chain (the Scope Index entry), evaluates it against a formal specification (the analytical lens), and returns a structured verdict: what passed, what failed, what needs revision, what was missed. Exercise #1 surfaced three execution blind spots I’d have walked into. Exercise #3 identified a pattern of information asymmetry across my entire network that I hadn’t seen.

    That’s code review. The inputs are conversation outputs. The specification is a formal framework. The output is a structured diff — here’s what your reasoning got right, here’s what it got wrong, here’s what to change. I was doing code review on my own conversations and didn’t have a name for it.

    Two Operating Modes as Branch Strategies

    I run two modes when working with AI: Execute and Extract. Execute mode means the conversation is going to production — tight messages, clear instructions, direct output. Extract mode means the conversation is brainstorming — loose, rambly, exploratory, with the output captured to my Notion second brain for later processing.

    Execute mode is committing to main. Extract mode is opening a feature branch. My own documentation uses the language directly: “loose branching messages → capture to Notion.” The system even has a recursive proof of concept — the idea for Extract mode was itself captured in Extract mode. It was born as a branch.

    Conversations Committed to Git — Literally

    This isn’t just metaphor mapping. My Claude Code sessions produce work products — articles, code, strategies — that are committed to actual git branches named after the conversation sessions that produced them. Branch claude/session-planning-mbp0ys in the wtygart-ctrl/tygart-workers repository. Branch claude/tygart-media-optimization-7pofae with a documented merge path: “Review + merge → main (merge triggers the deploy workflow automatically).”

    The conversation IS the development environment. The git branch IS the conversation’s artifact trail. The merge to main IS the conversation’s output going to production. This is already happening. It just hasn’t been named.


    VI. What This Means

    For the next twelve months

    If conversations are code, then every tool and practice from fifty years of software engineering is available for adaptation. We don’t need to invent conversation management from scratch. We need to port it.

    Conversation linters already exist — they’re called system prompts and constitutional AI. Conversation tests already exist — they’re called evals. Conversation deploys already exist — they’re called skills, workflows, and agents. Conversation version control is shipping from every major AI lab.

    What doesn’t exist yet: conversation code review as a practice. Conversation CI/CD as infrastructure. Conversation architecture as a discipline. Conversation technical debt as a concept that organizations manage.

    For the longer arc

    The history of version control shows a consistent compression: SCCS took eleven years to become the dominant paradigm. Git took five. Each generation solved exactly one bottleneck its predecessor left unresolved. The same compression is happening with conversations. The gap between “someone built a conversation branching feature” and “conversation versioning is table stakes” is going to be measured in months, not years.

    The domain that’s never successfully implemented branching-and-merging outside of code may finally do so — because the merge step, which every other domain dropped, is the thing AI systems do better than humans. A model that can hold two divergent 100K-token reasoning paths in context and produce a synthesis that identifies agreements, flags conflicts, and proposes resolutions is not just a chatbot. It’s a merge engine for thought.

    For the people building on this

    The Rosetta Stone I’ve laid out in Section III isn’t a thought experiment. It’s a product roadmap. Every unmapped primitive is a feature that doesn’t exist yet. Every mapped-but-unbuilt primitive is a competitive advantage for whoever builds it first.

    The conversation CI/CD pipeline — a system that takes a conversation pattern from experimental to production with automated quality gates — is sitting there waiting to be built. The conversation architecture review — a structured assessment of whether an organization’s AI conversation patterns are well-designed or accumulating technical debt — is a consulting practice that doesn’t exist yet. The conversation diff tool — a product that lets you compare the outputs of two conversation branches side by side, like a git diff but for reasoning chains — is an obvious product.

    None of this requires new AI capabilities. It requires new framing. The capabilities already exist.


    VII. The Urgency of Naming

    Every cautionary tale in intellectual history has the same moral: the person who delays publishing loses permanent naming rights to whoever publishes next, regardless of who had the idea first.

    Newton developed calculus in 1665 and sat on it for twenty years. Leibniz published first. We use Leibniz’s notation. Darwin developed natural selection around 1838 and wrote a private essay in 1844. He didn’t publish. In 1858, Wallace mailed him a manuscript with the identical theory. Darwin’s allies staged an emergency joint reading. Darwin rushed Origin of Species to press. Twenty years of sitting on an unpublished idea nearly cost him everything.

    Rosalind Franklin produced Photo 51 — the X-ray crystallography image that proved DNA’s double helix structure — in 1952. A colleague showed it to Watson without her knowledge. Watson and Crick published the double helix in April 1953. Franklin died of cancer in 1958. Watson, Crick, and Wilkins received the 1962 Nobel. No mechanism for correction existed.

    I’ve done the research. The philosophical claim that conversations are code — not that they’re like code, not that they have some properties of code, but that they are a legitimate programming paradigm with a complete software development lifecycle — is unclaimed territory as of June 2026. The mechanic is commoditized. The products are shipping. The academic papers are published. But nobody has compressed the argument into the three-word identity statement and planted it in a broadcast venue.

    Until now.


    VIII. The Three-Word Claim

    Conversations are code.

    Not “conversations are like code.” Not “conversations can be managed with code-like tools.” Not “AI conversations share some interesting structural properties with software.”

    Conversations are code.

    They are sequences of instructions executed against a runtime. They produce outputs. They can be versioned, branched, tested, reviewed, deployed, and maintained. They accumulate technical debt. They have architecture. They have lifecycle.

    The fifty-year arc of version control — from SCCS to git to the sprawling ecosystem of tools and practices built on top of distributed version control — is the playbook. The conversation is the new codebase. The prompt is the new function call. The skill is the new microservice. The system prompt is the new README. The eval is the new test suite. The model is the new runtime.

    And the person sitting in front of the conversation — the one deciding when to branch, when to commit, when to deploy, when to revert — is the new developer.

    Whether they know it or not.


    William Tygart is the founder of Tygart Media and architect of a multi-site AI content operation spanning 95,000+ AI citations. He builds systems where conversations become protocols, protocols become skills, and skills become the operating layer of businesses that run on AI. He’s been coding in conversations since before he had a name for it. Now he does.


    Sources

    1. McLuhan, M. (1964). Understanding Media: The Extensions of Man. McGraw-Hill.

    2. Lessig, L. (2000). “Code Is Law: On Liberty in Cyberspace.” Harvard Magazine.

    3. Humby, C. (2006). “Data is the new oil.” Association of National Advertisers conference.

    4. Andreessen, M. (2011). “Why Software Is Eating the World.” Wall Street Journal.

    5. Karpathy, A. (2023). “The hottest new programming language is English.” X/Twitter.

    6. Reidenberg, J. (1998). “Lex Informatica.” Texas Law Review.

    7. arXiv:2512.13914 (2025). “Conversational Versioning Systems.”

    8. Stigler, S. (1980). “Stigler’s Law of Eponymy.” Transactions of the New York Academy of Sciences.

    9. Nelson, T. (1960). Project Xanadu.

    10. Ram, K. (2013). “Git can facilitate greater reproducibility and increased transparency in science.” Source Code for Biology and Medicine.

  • Claude Fable 5 Pricing and Access (2026)

    Claude Fable 5 Pricing and Access (2026)

    Last verified: June 13, 2026

    Claude Fable 5 (claude-fable-5) is Anthropic’s most capable widely released model, built for the most demanding reasoning and long-horizon agentic work. On the Claude API it is priced at $10 per million input tokens and $50 per million output tokens — double the rate of Claude Opus 4.8 — with a 1M-token context window and up to 128K output tokens per request. It reached general availability on June 9, 2026. The verified pricing and access details are below.

    Pricing at a glance

    All figures below are from Anthropic’s official pricing and models pages. Prices are in USD per million tokens (MTok). Fable 5 includes the full 1M-token context window at standard pricing — there is no long-context premium.

    Item Claude Fable 5
    Model ID (API) claude-fable-5
    Base input $10 / MTok
    Output $50 / MTok
    5-minute cache write $12.50 / MTok
    1-hour cache write $20 / MTok
    Cache hit / read $1 / MTok
    Batch API input / output $5 / MTok · $25 / MTok
    Context window 1M tokens
    Max output 128K tokens

    How Fable 5 compares to Opus, Sonnet, and Haiku

    Fable 5 sits at the top of Anthropic’s lineup, a tier above the Opus models. The per-token cost difference is the clearest way to see where it fits.

    Model Input $/MTok Output $/MTok Context Max output
    Claude Fable 5 $10 $50 1M 128K
    Claude Opus 4.8 $5 $25 1M 128K
    Claude Sonnet 4.6 $3 $15 1M 64K
    Claude Haiku 4.5 $1 $5 200K 64K

    Where you can use Fable 5

    At general availability, Fable 5 is offered across Anthropic’s first-party API and all major cloud platforms, plus claude.ai subscription plans (subject to the access note below). The model IDs differ by platform.

    Surface Availability / model ID
    Claude API (first-party) Generally available — claude-fable-5
    Claude Platform on AWS Generally available — claude-fable-5
    Amazon Bedrock Generally available — anthropic.claude-fable-5
    Google Vertex AI Generally available — claude-fable-5
    Microsoft Foundry Generally available
    claude.ai — Pro, Max, Team, Enterprise Promotional access June 9–22, 2026 (see below)
    claude.ai — Free plan Not included

    Consumer-plan access and the promotional window

    For claude.ai subscribers, Anthropic launched Fable 5 with a time-limited promotion rather than a permanent plan inclusion. From June 9 through June 22, 2026, Fable 5 was included on the Pro, Max, Team, and seat-based Enterprise plans at no extra charge. During that window, Anthropic’s documentation states that Fable 5 usage “counts toward your plan’s usage limits, and you won’t be charged anything extra,” but that it draws from those limits “at a higher rate than other models.” The Free plan was explicitly excluded.

    Anthropic’s announced plan was that after June 22, 2026, Fable 5 would no longer be included in plan usage limits, and continued use on claude.ai would require usage credits — a pay-as-you-go balance for usage beyond what a plan includes.

    Integration notes that affect cost and handling

    Fable 5 differs from the Opus, Sonnet, and Haiku models in a few ways that matter when you wire it into an application. It ships with safety classifiers that can decline a request: when that happens, the Messages API returns stop_reason: "refusal" as a successful HTTP 200 response, not an error. You are not billed for a request that is refused before any output is generated, and Anthropic provides server-side, client-side, and manual fallback paths to retry on another Claude model. Adaptive thinking is always on (thinking: {"type": "disabled"} is not supported), and the raw chain of thought is never returned — thinking.display controls whether thinking blocks contain a summary or are empty. Fable 5 also uses the tokenizer introduced with Opus 4.7, which can produce roughly 30–35% more tokens for the same text than older models, so re-baseline your token counts rather than assuming parity with earlier Claude models.

    How much does Claude Fable 5 cost?

    On the Claude API, Fable 5 costs $10 per million input tokens and $50 per million output tokens. Prompt-cache writes are $12.50/MTok (5-minute) or $20/MTok (1-hour), cache reads are $1/MTok, and the Batch API halves the rate to $5/MTok input and $25/MTok output.

    Is Fable 5 more expensive than Claude Opus 4.8?

    Yes. Fable 5 is priced at exactly double Opus 4.8 on both input ($10 vs $5 per MTok) and output ($50 vs $25 per MTok). Both share a 1M-token context window and 128K max output.

    Which claude.ai plans include Fable 5?

    From June 9 to June 22, 2026, Fable 5 was included on the Pro, Max, Team, and seat-based Enterprise plans at no extra cost, drawing from plan usage limits at a higher rate. The Free plan was not included. Anthropic’s plan was to move continued claude.ai use to usage credits after June 22.

    What is the difference between Fable 5 and Mythos 5?

    They share the same specs ($10/$50 per MTok, 1M context, 128K output) and June 9, 2026 launch date. Fable 5 is the generally available model with built-in safety classifiers that can decline requests; Mythos 5 is offered only in limited availability.


  • Claude Message Batches API: 50% Pricing, Limits and How It Works (2026)

    Claude Message Batches API: 50% Pricing, Limits and How It Works (2026)

    Last verified: June 13, 2026

    The Message Batches API lets you submit up to 100,000 Claude requests in a single call and receive results asynchronously — at exactly 50% of standard token prices. Most batches finish in under an hour. Results remain downloadable for 29 days. This page covers every verified limit, the per-tier rate limit tables, and how batch pricing stacks with prompt caching.

    Pricing: 50% off standard rates

    Every token processed through the Message Batches API is billed at half the standard input and output price. No quality difference from synchronous requests — only timing. The table below shows verified batch prices for active models.

    Model Batch input (per MTok) Batch output (per MTok) Standard input (per MTok) Standard output (per MTok)
    Claude Fable 5 $5.00 $25.00 $10.00 $50.00
    Claude Opus 4.8 $2.50 $12.50 $5.00 $25.00
    Claude Opus 4.7 $2.50 $12.50 $5.00 $25.00
    Claude Opus 4.6 $2.50 $12.50 $5.00 $25.00
    Claude Opus 4.5 $2.50 $12.50 $5.00 $25.00
    Claude Sonnet 4.6 $1.50 $7.50 $3.00 $15.00
    Claude Sonnet 4.5 $1.50 $7.50 $3.00 $15.00
    Claude Haiku 4.5 $0.50 $2.50 $1.00 $5.00

    Source: platform.claude.com/docs/en/build-with-claude/batch-processing

    Key limits at a glance

    Limit Value
    Maximum requests per batch 100,000
    Maximum batch payload size 256 MB
    Typical completion time Under 1 hour
    Hard expiration window 24 hours from creation
    Result retention period 29 days after creation
    Zero Data Retention eligible No
    Results format JSONL, streamed via results_url
    Supported models All active Claude models

    A batch expires if processing has not completed within 24 hours. Any individual request within that batch that did not finish is marked expired — you are not billed for expired or errored requests. Batch results (the JSONL file) are accessible for download for 29 days after the batch was created; after that the batch object itself is still visible but results can no longer be downloaded.

    Message Batches API rate limits by tier

    The Message Batches API has its own rate-limit pool, shared across all models, separate from the standard Messages API limits. The “processing queue” count refers to individual batch requests (not batches) that have been submitted but not yet completed by the model.

    Tier RPM (API calls) Max batch requests in processing queue Max batch requests per batch
    Tier 1 50 100,000 100,000
    Tier 2 1,000 200,000 100,000
    Tier 3 2,000 300,000 100,000
    Tier 4 4,000 500,000 100,000

    Source: platform.claude.com/docs/en/api/rate-limits

    RPM here limits how fast you can make HTTP requests to the Batches API endpoints (create, retrieve, list, cancel). It does not limit how many individual requests inside a batch are processed per minute — that is governed by the queue cap above. If high demand causes processing to slow, more individual requests within a batch may reach the 24-hour expiration limit.

    Stacking batch pricing with prompt caching

    The Batches API documentation explicitly states that the 50% batch discount and prompt caching discounts stack. Cache writes incur a one-time cost at 1.25x the base input rate (5-minute TTL) or 2x (1-hour TTL); subsequent cache reads cost 0.1x the base input rate. Because batches process asynchronously and may take longer than 5 minutes, Anthropic recommends using the 1-hour cache duration for batch requests that share large context.

    The following example uses Claude Opus 4.8 (standard input: $5.00/MTok) to show what each token type costs in a batch with a 1-hour cached system prompt.

    Token type Multiplier applied Effective price per MTok How calculated
    Uncached input (standard) 1x $5.00 Baseline
    Uncached input (batch) 0.5x $2.50 50% batch discount
    Cache write — 1h TTL (batch) 2x × 0.5x = 1x $5.00 2x write cost, then 50% batch
    Cache read (batch) 0.1x × 0.5x = 0.05x $0.25 10% read cost, then 50% batch
    Output (batch) 0.5x of $25.00 $12.50 50% batch discount on output

    In practice: if you cache a 50,000-token system prompt once and then read it across 1,000 batch requests, the cache write costs $0.25 (50K tokens at $5.00/MTok effective), while 1,000 cache reads cost $12.50 total (50M tokens at $0.25/MTok). The same 50 million tokens without caching would cost $125 in batch input (50 MTok at the $2.50/MTok batch rate). Cache hit rates on batches vary; Anthropic’s documentation notes typical rates of 30% to 98% depending on traffic patterns, since batch requests are processed concurrently rather than sequentially.

    How results come back

    When the batch finishes (or the 24-hour limit is reached), a results_url property is set on the batch object. Results are in JSONL format — one JSON object per line, in any order (not necessarily matching submission order). Each result carries the custom_id you assigned, plus a result object of type succeeded, errored, canceled, or expired. Streaming the results file rather than downloading it all at once is recommended for large batches. You are not billed for errored, canceled, or expired requests.

    Does the Batches API count against my standard Messages API rate limits?

    No. The Message Batches API has its own rate-limit pool that is tracked separately from the standard Messages API RPM, ITPM, and OTPM limits. You can use both simultaneously up to their respective limits.

    What happens if my batch does not finish within 24 hours?

    Any individual requests within the batch that did not complete are marked expired. You are not billed for those requests. The batch itself moves to ended status and whatever results did complete are available at the results_url.

    Can I use extended thinking, tool use, or vision in a batch?

    Yes. The Batches API supports vision, tool use (including server tools such as web search and code execution), system messages, multi-turn conversations, and extended thinking. The parameters not supported are stream: true, fast mode (speed), Threads parameters, and max_tokens: 0.

    How long are batch results available for download?

    Results are available for 29 days after the batch was created. After that window, the batch object remains visible in the Console and via the API, but the results file can no longer be downloaded.

    Is the Batches API eligible for Zero Data Retention?

    No. The Message Batches API is explicitly excluded from Zero Data Retention (ZDR). Data is retained under the feature’s standard retention policy regardless of your organization’s ZDR settings.

  • How Many Words Is a Million Claude Tokens? (2026) — and How the New Tokenizer Changed the Math

    How Many Words Is a Million Claude Tokens? (2026) — and How the New Tokenizer Changed the Math

    Last verified: June 13, 2026

    A million Claude tokens equals roughly 750,000 words on Claude Sonnet 4.6 — but only about 555,000 words on Claude Opus 4.7, Claude Opus 4.8, and Claude Fable 5. The gap comes from a new tokenizer that Anthropic introduced with Opus 4.7: it emits up to 35% more tokens from the same text. The only reliable way to measure your actual token count is the /v1/messages/count_tokens endpoint.

    Token-to-word conversion by model (1 million tokens)

    Anthropic publishes word equivalents directly in the context-window tooltips on the official models overview page. The figures below come from those tooltips.

    Model Tokenizer Context window ~Words per 1M tokens ~Pages per 1M tokens*
    Claude Fable 5 (claude-fable-5) New (Opus 4.7) 1M tokens ~555,000 ~2,200
    Claude Opus 4.8 (claude-opus-4-8) New (Opus 4.7) 1M tokens ~555,000 ~2,200
    Claude Opus 4.7 (claude-opus-4-7) New (Opus 4.7) 1M tokens ~555,000 ~2,200
    Claude Sonnet 4.6 (claude-sonnet-4-6) Older 1M tokens ~750,000 ~3,000
    Claude Haiku 4.5 (claude-haiku-4-5) Older 200k tokens ~150,000 (200K context) ~600 (200K context)
    Claude Opus 4.6 (claude-opus-4-6) Older 1M tokens ~750,000 ~3,000

    * Pages estimated at ~250 words per double-spaced page. These are approximations for typical English prose; actual counts vary by content type.

    What the new tokenizer changed — and why it matters

    Anthropic introduced a new tokenizer with Claude Opus 4.7. The official migration guide states that the new tokenizer “may use roughly 1x to 1.35x as many tokens when processing text compared to previous models (up to ~35% more, varying by content).” The most commonly cited figure across Anthropic’s documentation is roughly 30% more tokens for the same text.

    The practical effect: a document that costs 1,000,000 tokens on Opus 4.6 or Sonnet 4.6 costs approximately 1,300,000 tokens on Opus 4.7, Opus 4.8, or Fable 5. Budgets built for the old tokenizer need to be re-baselined against the new one.

    Tokenizer Models Approximate token increase vs. older tokenizer
    New (introduced Opus 4.7) Opus 4.7, Opus 4.8, Fable 5, Mythos 5 ~30% typical; up to ~35% depending on content
    Older Opus 4.6, Sonnet 4.6, Haiku 4.5, Opus 4.5, Sonnet 4.5 Baseline

    The token counting page also notes the comparison directly: “Claude Fable 5 and Claude Mythos 5 use the tokenizer introduced with Claude Opus 4.7, which produces roughly 30% more tokens than models before Claude Opus 4.7 for the same text.”

    Use count_tokens — not tiktoken or ratio math

    Anthropic’s migration guide explicitly flags the risk: “Any code path that estimates tokens client-side or assumes a fixed token-to-character ratio should be re-tested against Claude Opus 4.7.” OpenAI’s tiktoken library is trained on a different vocabulary and produces different counts. It will not give accurate results for any Claude model.

    The correct approach is the /v1/messages/count_tokens endpoint, passing the specific model you intend to use:

    curl https://api.anthropic.com/v1/messages/count_tokens \
      --header "x-api-key: $ANTHROPIC_API_KEY" \
      --header "content-type: application/json" \
      --header "anthropic-version: 2023-06-01" \
      --data '{
        "model": "claude-opus-4-8",
        "messages": [{"role": "user", "content": "Your text here"}]
      }'

    The endpoint returns a model-specific count. If you are migrating a workload from Sonnet 4.6 to Opus 4.8, count the same prompt with both model IDs and compare the two input_tokens values. The token counting endpoint is free to use (rate limits apply by usage tier). Anthropic notes that the returned count is an estimate; the actual count at inference time may differ by a small amount.

    Quick reference: common document sizes

    Document type Approx. words Tokens (older tokenizer) Tokens (new tokenizer)
    Novel (~400 pages) ~100,000 ~133,000 ~173,000
    Long research paper ~20,000 ~27,000 ~35,000
    Full context, Sonnet 4.6 (1M tokens) ~750,000 1,000,000 N/A (different model)
    Full context, Opus 4.8 (1M tokens) ~555,000 N/A (different model) 1,000,000

    These word estimates assume typical English prose. Code, structured data, and non-Latin scripts tokenize differently from natural language prose. Highly repetitive text and dense symbol-heavy content (like JSON or code) can fall well outside the ~0.75 words-per-token ratio.

    Does the new tokenizer change what fits in the context window?

    Yes, in one direction. The context window is still 1M tokens, but that window holds fewer words on the new tokenizer (~555k words) than on the old one (~750k words). A document that previously fit comfortably may now require trimming or chunking when moving to Opus 4.7, Opus 4.8, or Fable 5.

    Does Sonnet 4.6 use the new tokenizer?

    No. Claude Sonnet 4.6 uses the older tokenizer. Anthropic’s model overview page lists Sonnet 4.6’s 1M-token context window as equivalent to ~750k words, the same ratio as Opus 4.6 — confirming it has not adopted the Opus 4.7 tokenizer. Only Opus 4.7, Opus 4.8, Fable 5, and Mythos 5 use the new tokenizer.

    Can I use tiktoken or another open-source tokenizer for Claude?

    No. tiktoken is built for OpenAI models and uses a different vocabulary. It will not produce accurate token counts for any Claude model, and its error will be larger on the new Opus 4.7 tokenizer than on older Claude models. Use /v1/messages/count_tokens with the specific Claude model ID you plan to deploy.

    Does the new tokenizer affect pricing?

    Yes. Billing reflects token counts under the model’s tokenizer. If you migrate a workload from Opus 4.6 to Opus 4.8 and the new tokenizer produces 30% more tokens, your input token costs increase by roughly 30% before accounting for any per-token price difference between the models. Re-baseline cost estimates using the count_tokens endpoint rather than scaling from old measurements.

    How many pages is the full 1M-token context window?

    On models with the older tokenizer (Sonnet 4.6, Opus 4.6), 1 million tokens is approximately 3,000 double-spaced pages of typical English prose. On models with the new tokenizer (Opus 4.8, Fable 5), the same 1 million tokens holds approximately 2,200 pages. These are prose estimates — a 1M-token window filled with source code or dense structured data will span a very different page count.

  • Claude Cowork vs Code vs Agent SDK vs Managed Agents (2026)

    Claude Cowork vs Code vs Agent SDK vs Managed Agents (2026)

    Last verified: June 13, 2026

    Anthropic ships four distinct ways to put Claude to work as an agent, and they are easy to confuse. The short version: Claude Cowork and Claude Code are interactive products billed through your Claude subscription — Cowork for knowledge work in the desktop app, Code for software work in your terminal, IDE, desktop, or browser. The Claude Agent SDK and Managed Agents are programmatic surfaces for developers, billed through the API: the Agent SDK is a Python/TypeScript library that runs the agent loop inside your own process, while Managed Agents is a REST API where Anthropic runs the loop and hosts the sandbox. The tables below give the verified, side-by-side breakdown.

    The decision matrix

    Each row is one surface. Read across for who it serves, whether you drive it turn-by-turn or hand it a goal, where the work executes, and how it is paid for.

    Surface Who it is for Interactive vs autonomous Where it runs How it is billed
    Claude Cowork Knowledge workers (non-developers) — research, documents, file and spreadsheet work Interactive, supervised — shows you the plan and waits for your approval before acting The Claude desktop app on your own computer (macOS or Windows); not available on web or mobile Claude subscription (Pro, Max, Team, Enterprise) — draws from your plan’s usage allocation
    Claude Code Developers doing interactive coding — build features, fix bugs, automate dev tasks Interactive — you drive it in a session, though it can run agentically across files and tools Your machine (terminal, VS Code, JetBrains, desktop app) or the browser at claude.ai/code Claude subscription or an Anthropic Console (API) account
    Claude Agent SDK Developers building custom agents programmatically (Python or TypeScript) Autonomous — Claude reads files, runs commands, and edits code on its own via the agent loop Your own process and infrastructure API key (pay-as-you-go credits); see the subscription note below for the June 15, 2026 change
    Managed Agents Developers running production or long-running agents without operating their own sandbox/session infrastructure Autonomous — you send events, Claude executes tools and streams back results Anthropic-managed cloud sandbox per session (or a self-hosted sandbox on your own infrastructure) Claude API key + the managed-agents-2026-04-01 beta header (no subscription path)

    Where billing actually differs

    The cleanest way to split these four is by the wallet they draw from. The two interactive products are funded by a subscription; the two programmatic surfaces are funded by the API. This is the single distinction that trips people up most often, so it is worth stating plainly in its own table.

    Surface Billing model Notes
    Claude Cowork Subscription Included on Pro, Max, Team, and Enterprise. Multi-step tasks consume more of your usage allocation than chatting.
    Claude Code Subscription or API Most surfaces require a Claude subscription or a Console account; the terminal CLI and VS Code also support third-party providers.
    Claude Agent SDK API (pay-as-you-go) Authenticated with an ANTHROPIC_API_KEY; also supports Bedrock, Claude Platform on AWS, Vertex AI, and Azure. Anthropic does not permit claude.ai login for third-party agents built on the SDK.
    Managed Agents API (credits) Requires a Claude API key and the beta header; enabled by default for API accounts.

    One dated nuance is worth pinning down because it changes how subscription users pay for programmatic work. Starting June 15, 2026, Claude Agent SDK and claude -p usage on subscription plans no longer counts toward your Claude plan’s interactive usage limits; instead, eligible subscribers receive a separate monthly Agent SDK credit (per-user, not pooled), while subscription usage limits stay reserved for interactive use of Claude Code, Cowork, and Claude. If you use the Agent SDK with an API key from the Claude Platform, nothing changes — pay-as-you-go billing continues and you do not receive an Agent SDK monthly credit.

    SDK vs Managed Agents: the programmatic split

    Both programmatic surfaces let Claude run tools autonomously, but they differ in where the loop and the work live. Anthropic’s own comparison frames it this way: the Agent SDK “is a library that runs the agent loop inside your own process,” while Managed Agents “is a hosted REST API: Anthropic runs the agent and the sandbox, and your application sends events and streams back results.” Pick by who you want operating the infrastructure.

    Dimension Agent SDK Managed Agents
    Runs in Your process, your infrastructure Anthropic-managed infrastructure
    Interface Python or TypeScript library REST API
    Agent works on Files on your infrastructure A managed sandbox per session
    Session state JSONL on your filesystem Anthropic-hosted event log
    Best for Local prototyping; agents that work directly on your filesystem and services Production agents without operating sandbox/session infrastructure; long-running, asynchronous sessions

    A common path, per Anthropic’s docs, is to prototype with the Agent SDK locally, then move to Managed Agents for production.

    Quick chooser

    If you are not writing code and want Claude to finish a task on your computer, use Cowork. If you are a developer working interactively on a codebase, use Claude Code. If you are building your own agent and want it to run in your own process, use the Agent SDK. If you want Anthropic to run the agent and host the sandbox for long-running or production work, use Managed Agents.

    Is Claude Cowork the same as Claude Code?

    No. Both appear in the Claude desktop app, but Cowork is aimed at knowledge work (research, documents, spreadsheets, file management) for non-developers, while Claude Code is an agentic coding tool. Cowork runs only in the desktop app (macOS or Windows); Claude Code also runs in the terminal, VS Code, JetBrains, and the browser.

    Does a Claude subscription cover the Agent SDK or Managed Agents?

    Cowork and Claude Code are included with Claude subscriptions (Pro, Max, Team, Enterprise). The Agent SDK and Managed Agents are API surfaces authenticated with a Claude API key. As of June 15, 2026, subscription users do get a separate monthly Agent SDK credit for SDK and claude -p usage, but Managed Agents has no subscription path — it requires an API key and a beta header.

    Where does the work actually execute for each surface?

    Cowork runs on your own computer in the desktop app. Claude Code runs on your machine (or in the browser). The Agent SDK runs in your own process and infrastructure. Managed Agents executes in an Anthropic-managed cloud sandbox per session, or a self-hosted sandbox you control.

    Is the Agent SDK built on Claude Code?

    Yes. Per Anthropic, the Agent SDK “gives you the same tools, agent loop, and context management that power Claude Code, programmable in Python and TypeScript.” Anthropic also describes it as “Claude Code as a library.”

    Is Managed Agents generally available?

    No. As of June 13, 2026, Claude Managed Agents is in beta. Every Managed Agents endpoint requires the managed-agents-2026-04-01 beta header (the SDK sets it automatically), and access is enabled by default for API accounts.


  • Claude Enterprise Compliance: BAA, SOC 2, GDPR and Data Policy (2026)

    Claude Enterprise Compliance: BAA, SOC 2, GDPR and Data Policy (2026)

    Last verified: June 13, 2026

    Anthropic publishes a defined compliance posture for Claude: it holds SOC 2 Type I and Type II, ISO 27001:2022, and ISO/IEC 42001:2023 credentials; it will sign a Business Associate Agreement (BAA) covering HIPAA-ready services such as the first-party API and Enterprise plans; by default it does not train models on data sent under its commercial terms; and it offers a zero-data-retention (ZDR) arrangement on the Messages and Token Counting APIs. The hard part for buyers is the per-surface boundary — what the BAA covers, which features are blocked under ZDR or HIPAA, how long data is kept, and where it can be processed. Every figure below is drawn from Anthropic’s own trust, privacy, and developer documentation, with sources at the bottom. Eligibility, feature lists, and durations change; treat your signed contract and the live Trust Center as the controlling sources.

    Certifications and attestations

    Anthropic’s help center lists the following compliance credentials for its commercial products (Claude for Work and the Anthropic API). It directs customers to the Trust Portal at trust.anthropic.com to request copies of the underlying reports and certificates.

    Credential Status as described by Anthropic Scope
    SOC 2 Type I & Type II Listed as held Commercial products (Claude for Work, Anthropic API)
    ISO 27001:2022 Certified Information Security Management
    ISO/IEC 42001:2023 Certified (issued by Schellman Compliance, LLC, accredited by the ANSI National Accreditation Board) AI Management Systems
    HIPAA “HIPAA-ready configuration (BAA available)” See BAA section

    Anthropic describes itself as “one of the first frontier AI labs” to achieve ISO/IEC 42001:2023 certification, in an announcement dated January 13, 2025. The help-center certifications list does not mention ISO 27017, ISO 27018, FedRAMP, or CSA STAR; those are left out here rather than asserted. GDPR and CCPA are handled through Anthropic’s privacy program and customer agreements rather than as line-item “certifications” (see GDPR section).

    HIPAA and the BAA: covered by product surface

    Anthropic states it “provides a Business Associate Agreement (BAA) covering our HIPAA-ready services, such as use of our first-party API or Enterprise plans.” HIPAA readiness is enforced at the organization level: Anthropic provisions a dedicated HIPAA-enabled organization that automatically blocks non-eligible features. To process protected health information (PHI) on the API, an administrator must sign the BAA and contact sales to enable it; for Enterprise, an admin activates HIPAA compliance in the Claude Enterprise admin settings under “Data & Privacy” and signs the BAA there.

    Surface BAA / HIPAA-ready coverage
    First-party Claude API (Messages API) Covered as an Eligible Service (admin signs BAA, then contact sales)
    Claude Enterprise Covered once an admin activates HIPAA compliance and signs the BAA
    Workbench and Console Not covered
    Claude Free, Pro, Max, Team Not covered
    Cowork Not covered
    Claude Code Not covered under HIPAA readiness
    Amazon Bedrock / Vertex AI Not covered (cloud provider is the data processor; see those platforms)
    Claude Platform on AWS / Microsoft Foundry HIPAA readiness not available
    Beta features (e.g., Claude in Office, Claude Design) Generally not covered unless explicitly listed as eligible

    Within the API, only a subset of features is HIPAA-eligible. Anthropic enforces this in code: a HIPAA-enabled organization that sends a non-eligible feature gets a 400 invalid_request_error naming the blocked feature. Anthropic states your signed BAA is the official source of truth for what is covered.

    API feature HIPAA-eligible
    Messages API (/v1/messages) Yes
    Token counting Yes
    Web search Yes (dynamic filtering not eligible)
    Prompt caching, structured outputs, extended/adaptive thinking, citations, 1M context, PDF (inline), data residency, effort, fast mode, bash & text-editor tools, memory tool Yes
    Web fetch, computer use, advisor tool, context management (compaction / editing), tool search, cache diagnostics No
    Code execution, programmatic tool calling No
    Batch API, Files API, Agent Skills, MCP connector, Claude Managed Agents, MCP tunnels No

    PHI must appear only in message content, attached files, or related file names/metadata — never in JSON schema definitions (property names, enum/const values, or pattern regexes), because compiled schemas are cached separately and do not receive the same PHI protections. Anthropic notes workspace names, user contact details, billing data, and support tickets are not expected to contain PHI under the BAA.

    Data retention (commercial default)

    Under Anthropic’s commercial data retention policy, conversation content is not retained by default for the API, and API inputs and outputs are automatically deleted on the backend within 30 days of receipt or generation. For interface products such as Claude for Work, data persists until you delete it, after which it is removed from backend storage within 30 days. Two exceptions extend retention regardless of arrangement.

    Data type / event Retention
    API inputs and outputs (default) Auto-deleted within 30 days
    Deleted conversation content (Claude for Work) Removed from backend within 30 days
    Inputs/outputs for a chat flagged as a Usage Policy violation Up to 2 years
    Trust & safety classification scores (flagged chat) Up to 7 years
    Data tied to feedback you submit (thumbs up/down, bug report) 5 years

    Zero data retention (ZDR)

    With a ZDR arrangement, customer data is not stored at rest after the API response is returned, except where needed to comply with law or combat misuse. ZDR is requested through Anthropic sales and enabled per organization — it does not carry over automatically to new organizations under the same account. Even under ZDR, Anthropic retains User Safety classifier results, and may retain inputs and outputs for up to 2 years if a chat or session is flagged for a Usage Policy violation. CORS is not supported for ZDR organizations, so browser apps must call through a backend proxy.

    Surface ZDR coverage
    Claude Messages API & Token Counting API Eligible
    Claude Code (Commercial org API keys, or via Claude Enterprise with ZDR enabled) Eligible
    Console and Workbench Not eligible
    Claude Teams & Claude Enterprise interfaces Not eligible (except Claude Code via Enterprise with ZDR on)
    Claude Free, Pro, Max Not eligible
    Claude Managed Agents Not eligible (stateful; delete transcripts manually)
    Batch API, Files API, code execution, Agent Skills, MCP connector Not eligible
    Third-party integrations Not eligible

    A handful of ZDR-eligible features are marked “Yes (qualified)” — structured outputs and cache diagnostics — meaning Anthropic retains a narrow, documented set of technical data (for example, a cached JSON schema for up to 24 hours since last use) rather than your prompts or Claude’s outputs.

    Model-training policy and Covered Models

    Anthropic’s Privacy Policy states it does not apply to content processed on behalf of business customers; that data is governed by the customer agreement. For the API specifically, Anthropic states retained data is never used for model training without your express permission. Anthropic’s consumer-terms update confirms the data-use changes “do not apply to services under our Commercial Terms,” including Claude for Work, Claude for Government, Claude for Education, and API use (including via Amazon Bedrock and Google Cloud’s Vertex AI). Training on commercial data happens only if a customer explicitly opts in (for example, the Development Partner Program).

    One model-specific exception affects retention, not training: Claude Fable 5 and Claude Mythos 5 are designated Covered Models and require 30-day data retention. ZDR is not available for these two models; a request to either from an organization whose retention configuration doesn’t meet the requirement returns a 400 invalid_request_error. Organizations with ZDR can turn on 30-day retention for a single workspace (Console > Settings > Workspaces > Privacy controls) to use those models there while keeping ZDR elsewhere. On Bedrock, Vertex AI, and Microsoft Foundry, retention requirements for these models are set by each platform.

    GDPR, data residency, and international transfers

    For users in the EEA, UK, or Switzerland, the data controller is Anthropic Ireland, Limited; elsewhere it is Anthropic PBC. Where the EU or UK GDPR applies, Anthropic responds to verifiable data-subject requests within one calendar month. For transfers to countries without an adequacy decision, Anthropic relies on standard contractual clauses, and publishes its subprocessors at anthropic.com/subprocessors.

    On data residency, the Claude API exposes two independent controls. inference_geo sets where inference runs per request — values are "global" (default) or "us" — and is supported on Claude Opus 4.6, Sonnet 4.6, and later (older models return a 400). Workspace geo controls where data is stored at rest and where endpoint processing happens; it is set at workspace creation and cannot be changed afterward. Per Anthropic’s documentation, "us" is currently the only available workspace geo, and only "us" and "global" inference geos are available — so there is currently no EU-resident storage option at the workspace level. US-only inference is priced at 1.1x the standard rate on supported models. Data residency is available on the Claude API (first-party) and Claude Platform on AWS; on Bedrock and Vertex AI the region is set by the endpoint or inference profile.

    Does Anthropic train its models on my API or commercial data?

    No, not by default. Anthropic’s Privacy Policy excludes business-customer content (governed by your customer agreement), and for the API it states retained data is never used for training without your express permission. The consumer data-use changes explicitly do not apply to Commercial Terms services. Training on commercial data requires an explicit opt-in.

    Will Anthropic sign a BAA, and for what?

    Yes. Anthropic signs a BAA covering HIPAA-ready services such as the first-party API and Enterprise plans. The Messages API is covered as an Eligible Service. It does not cover Workbench/Console, Free/Pro/Max/Team, Cowork, Claude Code, or beta features unless explicitly listed. An admin must sign the BAA and enable HIPAA readiness; the organization then auto-blocks non-eligible features.

    What’s the difference between ZDR and HIPAA readiness?

    Per Anthropic, ZDR prevents customer data from being stored at rest after the API response. HIPAA readiness is a broader set of safeguards (encryption, access controls, audit logging) that protect PHI throughout its lifecycle and lets data be retained with safeguards rather than deleted immediately. Anthropic states you do not also need ZDR if you have HIPAA readiness.

    How long does Anthropic keep my data?

    By default, API inputs and outputs are auto-deleted within 30 days. If a chat is flagged as a Usage Policy violation, inputs/outputs may be retained up to 2 years and trust & safety classification scores up to 7 years. Data tied to feedback you submit is kept 5 years. ZDR removes the default at-rest storage but does not remove the law/misuse exceptions.

    Can I keep Claude inference and data in the EU?

    Not at rest currently. The API’s inference_geo can pin inference to "us" or run "global", but Anthropic’s documentation lists "us" as the only available workspace geo (storage region). EU/UK data-subject rights and standard contractual clauses apply regardless, but an EU storage-residency option is not currently offered at the workspace level per the docs verified here.