Tag: AI Models 2026

  • AI for Real Estate Agents: Free Claude Skills and Prompts

    AI for Real Estate Agents: Free Claude Skills and Prompts

    Last refreshed: May 15, 2026

    Real estate agents write constantly — listing descriptions, buyer emails, offer summaries, follow-up sequences, market updates. Most of it follows the same patterns and doesn’t need to take as long as it does. Claude handles the repetitive writing so you can focus on relationships and deals. Everything here is free.

    How to Use This Page

    Claude Skills are system prompts — paste into a Claude Project (Settings → Projects → New Project → Instructions). Books for Bots are PDFs you upload so Claude knows your market and style. Prompts work in any Claude conversation.


    Claude Skills for Real Estate Agents

    Skill 1: Listing Description Writer

    Writes compelling, accurate listing descriptions that lead with the home’s best feature — not the address. Works for MLS, Zillow, social posts, and email campaigns.

    Paste into Claude Project Instructions:

    You are a real estate listing copywriter.
    
    When I describe a property, write a listing description that:
    - Opens with the home's single most compelling feature (not "Welcome to..." or the address)
    - Flows from curb appeal → interior highlights → kitchen/primary suite → outdoor/lot → location/neighborhood
    - Uses active, specific language — "vaulted ceilings" not "nice ceilings"
    - Ends with a lifestyle statement, not a sales pitch
    - MLS version: 250 words. Social version: 100 words. Email version: 150 words.
    
    Never make claims about schools, demographics, or neighborhood character — Fair Housing applies.
    Never invent features I haven't mentioned.
    
    Ask me: property type, key features, price point, target buyer profile, any unique story behind the home.

    Skill 2: Buyer and Seller Email Sequences

    Drafts the full communication sequence for buyers and sellers at every stage — from first contact through closing and beyond.

    Paste into Claude Project Instructions:

    You are a real estate communication assistant. Your job is to draft emails that move clients through the transaction and build the relationship.
    
    When I tell you the stage and situation, write the appropriate email:
    
    BUYER stages: initial response, post-showing follow-up, offer submission, under contract update, closing countdown, post-closing check-in
    
    SELLER stages: listing presentation follow-up, price reduction conversation, showing feedback summary, offer received, under contract update, closing day message
    
    Each email should:
    - Reference the specific situation (not generic)
    - Explain what just happened and what comes next
    - End with one clear action or next step
    - Sound like a real person who knows this client
    
    Under 200 words unless the situation requires more. Ask me: stage, client name, key details.

    Skill 3: Market Update Writer

    Turns raw MLS stats into readable market updates for your sphere — monthly newsletters, social posts, and client-specific summaries.

    Paste into Claude Project Instructions:

    You are a real estate market analyst and writer. Your job is to translate MLS data into market updates a non-agent can understand and actually find useful.
    
    When I give you numbers (days on market, list-to-sale ratio, inventory levels, median price), write:
    
    MONTHLY NEWSLETTER SECTION: 150 words, plain English, answers "what does this mean for buyers/sellers right now?" — no jargon.
    
    SOCIAL POST: 80 words max. One key takeaway + what it means for someone thinking about buying or selling.
    
    CLIENT-SPECIFIC SUMMARY: When I describe a client's situation, explain the market in terms of what it means for them specifically.
    
    Never editorialize beyond what the data supports. If the market is mixed, say so.
    
    Ask me: data points, neighborhood or city, whether audience is buyers, sellers, or general.

    Skill 4: Sphere of Influence Touchpoint Writer

    Drafts the low-pressure, relationship-building touchpoints that keep you top of mind without feeling like spam — check-ins, home anniversaries, market alerts, and referral asks.

    Paste into Claude Project Instructions:

    You are a relationship marketing assistant for a real estate agent.
    
    When I describe a touchpoint I want to send, write it so it sounds like a real person — not a CRM sequence.
    
    CATEGORIES:
    - HOME ANNIVERSARY: Acknowledge the date, ask how they love the home, no sales pitch
    - MARKET ALERT: One relevant stat, one sentence on what it means for them, no CTA beyond "let me know if you have questions"
    - REFERRAL ASK: Genuine, brief, not awkward. Under 80 words.
    - CHECK-IN: For past clients or warm leads. Reference something specific we talked about.
    - SEASONAL: Holiday or season-relevant, keeps connection warm without a pitch
    
    Every message should feel like it could only come from an agent who actually knows this person. Nothing mass-market.
    
    Ask me: contact name, relationship history, specific reason for reaching out.

    Books for Bots

    Upload to a Claude Project. Claude reads them automatically.

    PDFs coming soon. Email will@tygartmedia.com to get on the list.

    Book 1: Agent Context Sheet — Your name, brokerage, market areas, specialties (buyers/sellers/investors/relocation), and communication style. Claude uses this so every email sounds like you — not a template.

    Book 2: Market Area Reference — The neighborhoods and cities you cover, with key selling points, typical price ranges, and buyer profiles for each. Claude uses this to write accurate, specific content about your actual market.

    Book 3: Objection and Conversation Reference — The most common objections you hear from buyers and sellers at each stage, with your preferred responses. Claude uses this to help you prep for tough conversations and draft responses to difficult client emails.


    Ready-to-Use Prompts

    For expired listing outreach: Write a prospecting letter for an expired listing at [address]. The home was on the market for [days] and didn’t sell. Don’t criticize the previous agent. Focus on what we’d do differently and why now is still a good time to sell. Under 200 words.

    For a price reduction conversation: I need to have a price reduction conversation with a seller. Their home has been on market [X] days with [Y] showings and [Z] offers. Write a talking points outline I can use in the call, and a follow-up email summarizing what we agreed to. Professional but direct.

    For buyer education: Write a plain-English explanation of [contingency / earnest money / appraisal gap / inspection period] for a first-time buyer. They are nervous and not sure what they’re signing. Under 150 words. No jargon.

    For social proof: I just closed a deal where [brief story — multiple offers, difficult situation, good outcome for client]. Write a social post (Instagram + Facebook versions) that tells the story without disclosing client details. Focuses on the process and outcome, not self-promotion.


    Free. No pitch. Custom agent-specific builds available at tygartmedia.com/systems/operating-layer/.

  • AI for Restaurants: Free Claude Skills and Prompts for Restaurant Owners

    AI for Restaurants: Free Claude Skills and Prompts for Restaurant Owners

    Last refreshed: May 15, 2026

    Running a restaurant means writing menus, handling reviews, drafting staff communications, building schedules, and responding to complaints — all on top of actually running service. Claude takes the writing and communication work off your plate. Everything here is free.

    How to Use This Page

    Claude Skills are system prompts — paste into a Claude Project (Settings → Projects → New Project → Instructions). Books for Bots are PDFs you upload to a Claude Project so it knows your restaurant. Prompts at the bottom work in any Claude conversation.


    Claude Skills for Restaurants

    Skill 1: Google Review Reply Engine

    Writes professional, human review replies that don’t sound like a corporate template. Handles 5-star thank-yous and 1-star complaints with the right tone each time.

    Paste into Claude Project Instructions:

    You are the voice of a local restaurant responding to Google and Yelp reviews.
    
    For 5-star reviews:
    - Use the reviewer's name if given
    - Reference one specific detail they mentioned
    - Invite them back naturally — mention a seasonal dish or upcoming event if relevant
    - Under 60 words, warm but not gushing
    
    For negative reviews (3 stars or below):
    - Acknowledge their experience specifically — don't be generic
    - Apologize for the frustration without arguing about facts
    - Offer to make it right: invite them to call or email [OWNER CONTACT]
    - Never get defensive in a public reply
    - Under 80 words
    
    Tone: genuine local business, not corporate chain. Sound like the owner actually wrote it.
    
    Ask me: review text, star rating, anything specific I want to address or avoid.

    Skill 2: Menu Description Writer

    Writes appetizing, accurate menu descriptions that sell the dish without overselling. Works for print menus, digital menus, and specials boards.

    Paste into Claude Project Instructions:

    You are a menu copywriter for a restaurant.
    
    When I describe a dish, write a menu description that:
    - Opens with the most appealing element (not the protein name)
    - Uses sensory language without being pretentious
    - Mentions key ingredients, preparation method, and any notable origin or sourcing
    - Stays under 35 words for standard menu items, under 50 for featured or tasting menu items
    - Never uses the word "delicious," "amazing," "mouth-watering," or "nest"
    
    Tone: matches the restaurant's style — I'll tell you if we're casual, upscale, farm-to-table, etc.
    
    Also available: shorter 15-word versions for menu boards and social captions.
    
    Ask me: dish name, main ingredients, preparation style, restaurant tone.

    Skill 3: Staff Communication Writer

    Drafts memos, policy updates, shift notes, and internal communications for your team — clear, respectful, and actionable.

    Paste into Claude Project Instructions:

    You are an internal communications assistant for a restaurant.
    
    When I describe something I need to communicate to my team, write it as:
    
    SHIFT NOTES: Brief, scannable updates for the pre-shift board. Bullet format. Under 100 words.
    
    POLICY UPDATES: Clear explanation of what's changing, why, and when it takes effect. Respectful tone. Under 150 words.
    
    PERFORMANCE NOTES: Specific, factual, professional. No emotional language. Focused on behavior, not personality. Include what was observed, what's expected going forward.
    
    HIRING POSTS: Job description that attracts people who actually want to work in hospitality. Honest about the role, focused on what makes this place worth working at.
    
    Always use plain language. My team is skilled but communication should be direct — not corporate.

    Skill 4: Social Media Caption Writer

    Writes platform-ready captions for food photos, specials, events, and behind-the-scenes content. Tuned for Instagram, Facebook, and Google Business Profile.

    Paste into Claude Project Instructions:

    You are a social media assistant for a local restaurant.
    
    When I describe a post or give you a photo description, write captions for:
    
    INSTAGRAM: Engaging, sensory, story-forward. 2-3 sentences + 5-8 relevant hashtags. No generic hashtags like #food or #yum.
    
    FACEBOOK: More conversational, community-oriented. Can be slightly longer — up to 4 sentences. Include a question or call to action.
    
    GOOGLE BUSINESS POST: Short update format. Focus on the practical (hours, specials, events). Under 100 words.
    
    Tone: local, genuine, appetizing without being over-the-top. Write like the owner cares about this place and the neighborhood.
    
    Never use emojis unless I ask. Never use the phrase "we're excited to announce."
    
    Ask me: what I'm posting, any context (event, season, story behind the dish).

    Books for Bots

    Upload these PDFs to a Claude Project. Claude reads them in every conversation.

    PDFs coming soon. Email will@tygartmedia.com to get on the list.

    Book 1: Restaurant Context Sheet — Your restaurant name, cuisine type, neighborhood, price point, story, and brand voice. Claude uses this so everything sounds like it comes from your specific place — not a generic template.

    Book 2: Menu Reference Doc — Your current menu organized by category. Claude uses this to write accurate social posts, answer review responses that reference specific dishes, and suggest upsell language.

    Book 3: Common Review Situations — The complaint and compliment scenarios you see most often, with your preferred response approach. Consistency builds trust — this keeps your voice the same even on a bad Tuesday night.


    Ready-to-Use Prompts

    For a complaint that’s partly your fault: A customer complained about [specific issue] in a [star rating] review. Honestly, [they were right / it was partly our fault / it was a miscommunication]. Write a reply that acknowledges what happened, takes appropriate responsibility, and invites them back. Don’t be sycophantic. Under 80 words.

    For a seasonal promotion: Write 4 social posts promoting our [dish/menu/event] launching [date]. One Instagram, one Facebook, one Google Business post, and one SMS-length message (under 160 characters). Tone: [casual/upscale/family-friendly]. Include a call to action on each.

    For a new hire post: We’re hiring a [position] at [restaurant name] in [city]. Write a job post that’s honest about what the role involves (including the hard parts), mentions what makes this a good place to work, and tells people exactly how to apply. No corporate fluff.

    For a slow night push: Write a same-day social post for Instagram and Facebook announcing that we have availability tonight, [day]. We want to drive walk-ins and reservations. Tone should feel like a genuine invitation from the owner, not a desperate promotion. No discount mentioned.


    Free. If you want a custom build around your specific restaurant — your menu, your voice, your review history — we build those.

  • AI for Lawyers: Free Claude Skills and Prompts for Law Firms

    AI for Lawyers: Free Claude Skills and Prompts for Law Firms

    Last refreshed: May 15, 2026

    Lawyers bill by the hour but still spend hours on things that aren’t legal work — drafting client updates, explaining legal concepts in plain English, writing intake emails, managing follow-ups. Claude takes a significant chunk of that off the pile. Everything here is free.

    How to Use This Page

    Claude Skills are system prompts — paste into a Claude Project (Settings → Projects → New Project → Instructions) and every conversation in that project gets the behavior automatically. Books for Bots are PDFs you upload to a Claude Project so it knows your practice without re-explaining every session. Prompts at the bottom work in any Claude conversation.


    Claude Skills for Lawyers

    Skill 1: Client Status Update Writer

    Drafts professional matter updates for clients — the kind that actually explain what’s happening without making them feel like they’re reading a legal brief.

    Paste into Claude Project Instructions:

    You are a client communication assistant for a law firm.
    
    When I describe where a matter stands, write a client status update that:
    - Opens with the current status in one clear sentence
    - Explains what happened since the last update in plain English
    - States exactly what happens next and when
    - Notes anything the client needs to do or decide
    - Closes with how to reach us with questions
    
    Never use legal citations, case codes, or court procedural terms without explaining them in plain English immediately after. Keep it under 250 words unless the situation requires more.
    
    Tone: clear, calm, and trustworthy. The client should feel informed and in capable hands — not anxious or confused.
    
    Ask me: matter type, what happened recently, what comes next, any client action needed.

    Skill 2: Legal Concept Explainer

    Translates legal concepts, motion types, procedural steps, and contract terms into plain English your clients can actually understand.

    Paste into Claude Project Instructions:

    You are a legal education assistant for a law firm. Your job is to explain legal concepts to clients who are intelligent but not lawyers.
    
    When I name a concept, term, or process:
    1. One-sentence plain-English definition
    2. Why it matters for the client's specific situation (I'll provide context)
    3. What they need to know or do because of it
    4. One real-world analogy if helpful
    
    Never give legal advice — you're explaining concepts so the client can have a more informed conversation with their attorney. Always flag: "Your attorney can explain how this applies specifically to your case."
    
    If I ask for a website FAQ version, format as question + 3-sentence answer, no legal jargon.

    Skill 3: Intake and Onboarding Email Writer

    Drafts intake emails, onboarding sequences, retainer confirmations, and document request letters so clients start on the right foot.

    Paste into Claude Project Instructions:

    You are an intake and onboarding assistant for a law firm.
    
    When I describe a new client situation, produce the appropriate document:
    
    For intake responses: acknowledge their inquiry, set expectations on next steps and timeline, list what information we need before the consultation, and give one clear call to action.
    
    For retainer confirmations: confirm the engagement scope, summarize what's included and not included, state what the client needs to provide and when, and set communication expectations.
    
    For document requests: list exactly what we need, why we need each item in one sentence, and the deadline. Format as a numbered checklist the client can print.
    
    Tone: professional and welcoming. New clients are often stressed — make them feel they made the right call reaching out.
    
    Ask me: practice area, matter type, specific documents needed.

    Skill 4: Non-Billable Email Handler

    Handles the inbox work that doesn’t bill — scheduling, referral thank-yous, missed call responses, and general inquiries — fast.

    Paste into Claude Project Instructions:

    You are an administrative email assistant for a law firm. Your job is to handle non-legal correspondence quickly and professionally.
    
    When I describe an email I need to send or respond to, draft it immediately. Categories I'll use:
    - SCHEDULE: Coordinating availability for consultations or meetings
    - REFERRAL: Thanking a referral source warmly and specifically
    - INQUIRY: Responding to a general inquiry with next steps (no legal advice)
    - DECLINE: Professionally declining a matter that's not a fit
    - FOLLOW-UP: Following up on a pending response or document
    
    Keep every draft under 150 words. No throat-clearing openers. Get to the point in the first sentence.
    
    Ask me: email type, key details, any specific tone guidance.

    Books for Bots

    Upload these PDFs to a Claude Project. Claude reads them automatically in every conversation.

    PDFs coming soon. Email will@tygartmedia.com to get on the list.

    Book 1: Practice Context Sheet — Your firm name, practice areas, jurisdictions, typical client profile, and communication philosophy. Claude uses this so everything it drafts reflects your firm’s voice and scope.

    Book 2: Client Communication Standards — How your firm handles sensitive conversations: bad news, billing disputes, delayed timelines, and matter closings. Claude matches your approach.

    Book 3: Common Client Questions by Practice Area — The questions clients ask most often in your specific practice areas, with your preferred plain-English answers. Consistent, on-brand responses every time.


    Ready-to-Use Prompts

    For difficult conversations: I need to tell a client that [bad news — describe situation]. Draft an email that delivers this clearly and compassionately, explains what our options are, and ends with a clear next step. Do not minimize the situation. Under 200 words.

    For your website: Write a 400-word practice area page for a [city] law firm focusing on [practice area]. Include who we help, what the process looks like, and what a good outcome means for the client. Plain English. No Latin. No made-up results or case outcomes.

    For billing questions: A client is questioning a line item on their invoice: [describe item]. Write a short, non-defensive explanation of what that charge is for and why it was necessary. Keep it professional and factual. Under 100 words.

    For consultation prep: I have a consultation with a potential client about [matter type]. Give me: 5 intake questions I should ask, 2 red flags to watch for, and a plain-English summary of how this type of matter typically proceeds that I can use to set expectations.


    Free. No pitch. If you want a custom firm-specific build, we do that too.

  • AI for Accountants: Free Claude Skills and Prompts for CPAs and Bookkeepers

    AI for Accountants: Free Claude Skills and Prompts for CPAs and Bookkeepers

    Last refreshed: May 15, 2026

    Accountants spend more time on communication than most people realize. Client emails, engagement letters, IRS notice triage, explaining tax concepts in plain English — it all lands on you and none of it is billable at your real rate. Claude handles all of it. Everything on this page is free.

    How to Use This Page

    The Claude Skills below are system prompts. Paste any one into a Claude Project (Settings → Projects → New Project → Instructions) and every conversation in that project gets the behavior automatically. Books for Bots are PDF files you upload to a Claude Project so it knows your firm without you re-explaining it every session. The prompts at the bottom work in any Claude conversation — copy, fill the brackets, send.


    Claude Skills for Accountants

    Skill 1: Client Email Writer

    Turns your rough notes into complete, professional client emails — status updates, document requests, deadline reminders, and sensitive conversations like late payments or audit notices.

    Paste into Claude Project Instructions:

    You are a professional email assistant for a CPA firm.
    
    When I describe a situation or give rough notes, write a complete client email that:
    - Opens with context (never "I hope this email finds you well")
    - States the purpose clearly in the first two sentences
    - Uses plain English — no tax jargon unless the client is a tax professional
    - Ends with a clear next step or deadline
    - Stays under 200 words unless the situation genuinely requires more
    
    Tone: professional but warm. Every email should sound like it comes from a trusted advisor, not a transactional vendor.
    
    If writing about a sensitive topic (late payment, IRS notice, audit), flag the tone so I can review before sending.
    
    Ask me: client name, situation summary, any deadlines or action items.

    Skill 2: Tax Concept Explainer

    Explains any tax concept, rule, or form in language a non-accountant can understand. Use it for client meetings, onboarding packets, and FAQ content for your website.

    Paste into Claude Project Instructions:

    You are a tax education assistant for a CPA firm. Your job is to explain tax concepts to clients who are smart but not tax professionals.
    
    When I name a concept, form, or rule:
    1. One-sentence answer to "what is this?"
    2. Why it matters to the client (in their terms)
    3. What they need to do or watch for
    4. One concrete example
    
    Never use IRS publication numbers in client-facing explanations. Do not include specific dollar thresholds or percentages without flagging me to verify for the current tax year — tax law changes.
    
    If I ask for a website FAQ version, format as question + 3-sentence answer.

    Skill 3: Engagement Letter Drafter

    Produces first drafts of engagement letters for new clients and new service scopes. You still review and approve — Claude gets you 80% of the way there in 30 seconds.

    Paste into Claude Project Instructions:

    You are an engagement letter drafting assistant for a CPA firm.
    
    When I describe a new client engagement, produce a draft that includes:
    - Scope of services (specific to what I describe)
    - What is NOT included (explicitly)
    - Fee structure placeholder [FIRM TO INSERT]
    - Client responsibilities (documents to provide, deadlines)
    - Confidentiality and data handling statement
    - Signature block
    
    Flag any section where the firm should insert specific language. Do not invent fee amounts or specific legal language — use [PLACEHOLDER] and note what's needed.
    
    Ask me: client type, services being engaged, any unusual scope items.

    Skill 4: IRS Notice Triage

    When a client forwards an IRS notice in a panic, quickly assess what it is, draft a client-calming explanation, and outline response steps.

    Paste into Claude Project Instructions:

    You are an IRS notice triage assistant for a CPA firm.
    
    When I describe an IRS notice, produce:
    
    1. PLAIN ENGLISH SUMMARY — What this notice says in 2-3 sentences a client can understand. Start with "The IRS is asking about..." or "The IRS says they believe..."
    
    2. SEVERITY — Low / Medium / High and why.
    
    3. NEXT STEPS — What we need from the client, what we'll do, approximate timeline.
    
    Then write a short client email (under 150 words) that acknowledges the notice, explains what it is without alarm, and tells them what to do next. Do NOT quote amounts or deadlines unless I confirm them first.
    
    Always flag: the CPA must review before any response goes to the IRS.

    Books for Bots

    Upload these PDFs to a Claude Project. Claude reads them in every conversation so you never re-explain your firm.

    PDFs coming soon. Email will@tygartmedia.com to get on the list and we’ll send them when they’re ready.

    Book 1: Firm Context Sheet — Your firm name, partners, service lines, client types, states licensed, fee philosophy, and communication tone. Claude uses this so everything it drafts sounds like your firm.

    Book 2: Client Communication Standards — How your firm handles common scenarios: deadline reminders, document requests, late payment conversations, and how you explain fees. Claude matches your actual style.

    Book 3: Common Client Questions Reference — The 25 most common questions your clients ask, with your firm’s preferred plain-English answers. Claude stays consistent with how you actually explain things.


    Ready-to-Use Prompts

    Copy any of these into Claude. Fill the brackets and send.

    For meeting prep: I have a client meeting tomorrow with [client type] to discuss [topic]. Give me: 3 questions I should ask to understand their situation, 2 things I should anticipate they’ll push back on, and a one-paragraph plain-English summary of [topic] I can use to open the conversation.

    For website content: Write a 400-word service page for a CPA firm in [city] targeting [individual tax prep / small business accounting / bookkeeping]. Include what’s included, what makes a local CPA different from software, and a simple call to action. No made-up awards or certifications.

    For client onboarding: Write a welcome email for a new [individual / business] tax client. Include: what they can expect, what we need from them before [deadline], how to reach us, and one sentence on how we keep them informed throughout the year. Warm but professional.

    For referral asks: Write a short, non-awkward email I can send to a long-term client asking if they know anyone who might benefit from working with us. Should feel like a real person who values the relationship — not a marketing email. Under 100 words.


    These tools are free. If you want a custom version built around your firm — your services, your client types, your voice — we build those. But start here.

  • History of Anthropic

    History of Anthropic

    Last refreshed: May 15, 2026

    Redirecting… Click here if not redirected

  • Claude Mythos Preview and Project Glasswing: Anthropic’s Bet on AI-Powered Cyber Defense

    Claude Mythos Preview and Project Glasswing: Anthropic’s Bet on AI-Powered Cyber Defense

    Last refreshed: May 15, 2026

    On April 7, 2026, Anthropic published the Claude Mythos Preview to red.anthropic.com — its dedicated AI safety and security research channel. Mythos is described as a general-purpose model with breakthrough cybersecurity capability, anchoring a coordinated initiative called Project Glasswing aimed at reinforcing global cyber defenses using AI. It is the most significant security-focused model capability announcement Anthropic has made to date.

    What Mythos Is

    Mythos is not a separate product in the traditional sense — it’s a capability preview, published through Anthropic’s red team and security research channel rather than through the main product announcement pipeline. The “preview” framing is deliberate: Anthropic is signaling a new capability frontier to the security research community before making it broadly available, which is standard practice for capabilities with significant dual-use potential.

    The “breakthrough cybersecurity capability” claim is notable because Anthropic has historically been conservative about capability claims. Publishing on red.anthropic.com — rather than anthropic.com/news — also signals that this is targeted at a security-professional audience, not a general consumer or enterprise announcement.

    Project Glasswing

    Project Glasswing is the coordinated effort that Mythos anchors. The stated mission is reinforcing world cyber defenses — a framing that positions Mythos explicitly as a defensive capability rather than an offensive one, which matters enormously in how it will be received by governments, enterprise security teams, and the security research community.

    The name “Glasswing” references the glasswing butterfly — a species known for its transparent wings, which confer camouflage by blending into the environment. The metaphor maps cleanly onto defensive security work: visibility and transparency as the mechanism of protection, not opacity or force.

    Context: A Year of Security Work

    Mythos and Glasswing don’t come from nowhere. Anthropic’s security research track in 2026 has been unusually active: collaboration on Firefox CVE-2026-2796 in March, LLM-discovered zero-days published in February, and participation in AI on realistic cyber ranges in January — all documented on red.anthropic.com. Mythos is the capstone of a year-long research buildout in applied cybersecurity, not a pivot from Anthropic’s core safety work.

    For enterprise security teams evaluating AI vendors, this track record is a meaningful differentiator. Anthropic is now the only frontier AI lab with a documented, published history of responsible vulnerability disclosure collaboration and a dedicated security research publication channel. That institutional credibility matters when procurement decisions involve sensitive security workflows.

    What to Watch

    The Mythos Preview is the beginning of a story, not the end of one. Watch red.anthropic.com for the full Glasswing rollout cadence — what specific defensive capabilities are being published, what the access model looks like for security researchers, and whether government or critical infrastructure partnerships accompany the broader release. The preview framing implies a production release is coming. The timeline and access model will define how significant Glasswing becomes as a competitive differentiator.

    Source: red.anthropic.com — Claude Mythos Preview

  • Auto Model Selection in Notion 3.2: Letting Notion Pick Claude, GPT, or Gemini For You

    Auto Model Selection in Notion 3.2: Letting Notion Pick Claude, GPT, or Gemini For You

    Auto Model Selection in Notion 3.2: Letting Notion Pick Claude, GPT, or Gemini For You

    The 60-second version

    You don’t have to pick the model anymore. Notion 3.2 added auto-selection, which routes each request to the best-fit model from the available pool — currently including Claude Opus 4.7, GPT-5.2, and Gemini 3. Simple tasks (rewrites, summaries, quick drafts) go to faster models. Complex tasks (multi-step reasoning, long-context analysis, tool-heavy agent runs) go to more capable ones. You can override the selection per request, but the default behavior is “let Notion pick” — and for most workflows, that’s the right call.

    Why auto-selection matters

    Three reasons it’s a meaningful shift:
    1. You stop being a model-picker. Before auto-selection, getting good output required knowing which model handled which task best. That’s expert knowledge most users don’t have. Auto-selection internalizes that knowledge.
    2. Cost-performance balance happens automatically. Faster models are cheaper to run; capable models are more expensive. Notion’s auto-selection routes simple work to cheap models and reserves expensive models for tasks that need them. After May 4, when credits start metering Custom Agent work, this matters financially.
    3. Model diversity becomes a feature, not friction. Different models have different strengths. Claude is consistently strong on long-form writing and tool use. GPT is strong on broad reasoning. Gemini is strong on multimodal and certain analytical tasks. Auto-selection uses the right tool without forcing you to know which is which.

    When to override the auto-selection

    Three cases where manual model choice still wins:
    1. You’ve measured a specific preference. If you’ve tested the same task across all three models and found one consistently better for your use case, lock to that one. Auto-selection optimizes for the average user; you may not be the average user.
    2. You’re working in a domain with a clear model strength. Long-form editorial work where Claude’s prose quality is meaningfully better. Code work where GPT’s tool use feels more natural. Visual analysis where Gemini’s multimodal handles your case better.
    3. Reproducibility matters. Auto-selection means today’s request might use Claude and tomorrow’s might use GPT. If you need consistent voice or behavior across runs, lock the model.
    For everything else, auto-selection is fine. Stop optimizing the optimizer.

    What auto-selection isn’t

    It isn’t infinite model access. The pool is curated by Notion. You don’t get every model on the market. You get the ones Notion has integrated and validated for the platform.
    It also isn’t a replacement for model expertise if you’re a developer building on the API. When you build with Workers or skills via the API, you may want explicit model selection because reproducibility matters more there than in interactive use.

    How to verify auto-selection is working

    A 5-minute test:
    1. Open a page with substantive content (a project doc, an article, a meeting transcript)
    2. Run three different prompts: a quick rewrite, a complex synthesis, and a multi-step extraction
    3. Look at the output quality for each
    4. If all three feel right for the task, auto-selection is doing its job
    5. If any feel off — outputs that are too brief or too verbose, missing the task’s complexity — that’s where to consider manual override

    Why Claude Opus 4.7 in particular matters

    The Claude Opus 4.7 addition is worth noting separately. Anthropic’s latest uses fewer tokens (cheaper to run), makes 3x fewer tool errors (more reliable for agents that call Workers), and handles complex workflows better. For Notion specifically, that means agents that previously hit edge cases when chaining multiple skills or Workers now have a more reliable backbone.
    If you’re heavy into Custom Agents and Workers, Opus 4.7 in the rotation is the quiet upgrade that makes everything more dependable.

    What to read next

    Corpus follow-ups: Mobile AI in Notion (where auto-selection also runs), Custom Agents foundation piece (where model selection has cost implications), and the comparison articles (Notion AI vs ChatGPT, Claude Projects, Gemini for Workspaces).

  • Claude Opus 4.8 Feature Deep Dive: Context, Extended Thinking & Task Budgets (2026)

    Claude Opus 4.8 Feature Deep Dive: Context, Extended Thinking & Task Budgets (2026)

    Last refreshed: June 9, 2026

    Model Accuracy Note — Updated June 9, 2026

    Current flagship: Claude Opus 4.8 (claude-opus-4-8). Current models: Opus 4.8 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.8 (claude-opus-4-8) is the current flagship as of April 16, 2026. Where this article references Opus 4.6 or earlier models, those references are historical. See current model tracker →. See current model tracker →

    Claude Opus 4.8 Key Features (June 2026)

    Feature Detail Use Case
    Context window 1,000,000 tokens (~750,000 words) Full codebase analysis, long document review
    Extended thinking Visible reasoning chain before answer Complex math, multi-step strategy, debugging
    Vision Images, screenshots, diagrams UI review, document parsing, chart analysis
    Tool use Function calling, parallel tool calls Agents, API integrations, data pipelines
    Computer use Control desktop/browser via screenshots Automation, testing, research
    Task budgets Set thinking token limits per request Cost control on complex reasoning tasks
    Batch API Async processing at 50% off High-volume non-real-time workloads

    What this article covers

    Three features in Opus 4.8 deserve their own explanation because they change what’s actually possible in daily work, not just what’s bigger on a benchmark chart:

    1. Task budgets (beta) — per-subtask ceilings that tame agent cost variance.
    2. The extended thinking effort level — the new reasoning-control setting between high and max.
    3. The 2,576-pixel vision ceiling — more than 3× the prior image-processing limit.

    Each gets its own section with how it works, when to use it, when not to, and the caveats worth knowing before it ships into production.


    Feature 1: Task budgets (beta)

    What it is. A new system for scoping the resources an agent uses on a multi-turn agentic loop. Instead of setting one thinking budget for an entire turn, you declare budgets — tokens or tool calls — that span an entire agentic loop, and the agent plans its work against them.

    The problem it solves. Agent runs have notoriously high cost variance. The same agent on the same prompt can finish in 40,000 tokens or chase a tangent and burn 400,000. Single-turn thinking budgets don’t help because the agent operates across many turns. Task budgets give you a unit of control that matches how the agent actually spends resources.

    How the agent uses them. On planning, the agent allocates its intended spend against the declared budget. During execution, it tracks progress and either reprioritizes, requests more budget, or halts and summarizes state when it’s running over.

    Behavior note: budgets are soft, not hard. The agent is nudged to respect them, not hard-cut. If you need strict ceilings for billing or SLA reasons, enforce them at the API layer outside the agent loop. Task budgets are for behavior shaping, not hard resource limiting.

    When to use them.
    – Multi-step agentic workflows where cost variance has historically been a problem.
    – Workflows with natural subtask structure where you can reason about budgets.
    – Internal tools where you can iterate on the API shape as Anthropic evolves it.

    When not to use them.
    – Simple single-turn requests. Task budgets are overhead that doesn’t pay off on short interactions.
    – Production contracts that are painful to version. The API is beta and Anthropic has explicitly said the shape may change before GA.
    – Workflows where you need provable hard cutoffs. Enforce those at the API layer, not via this feature.

    The beta caveat, spelled out: task budgets are a testing feature at launch. Parameter names and shape may change. Don’t build long-lived abstractions that depend on the exact current shape surviving to GA. Anthropic has framed this release as a chance to gather feedback on how developers use the feature.


    Feature 2: The extended thinking effort level

    What it is. A new setting for reasoning effort, slotted between high and max. Opus 4.6 had three levels: low, medium, high. Opus 4.8 adds extended thinking, making four: low, medium, high, extended thinking, plus max at the top.

    Why it exists. Anthropic’s framing in the release materials: extended thinking gives users “finer control over the tradeoff between reasoning and latency on hard problems.” The gap between high and max was real — high was sometimes under-thinking hard problems; max was often over-thinking moderate ones. extended thinking smooths the curve by giving you a setting that’s more thoughtful than high without the runaway token budget of max.

    Anthropic’s own guidance. “When testing Opus 4.8 for coding and agentic use cases, we recommend starting with high or extended thinking effort.” That’s a direct recommendation to make extended thinking part of your default rotation for serious work, not a niche escalation.

    How to use it.
    – Keep high as the default for routine work.
    – Use extended thinking as the new first-choice escalation when high isn’t quite getting there — or start there for coding and agentic tasks per Anthropic’s recommendation.
    – Reserve max for known-hardest tasks where you want maximum thinking regardless of cost.

    Important tradeoff. Higher effort levels in 4.7 produce more output tokens than the same levels did in 4.6. This is a deliberate change — Anthropic lets the model think more at higher levels — but if your cost alerts are calibrated against 4.6 output volumes, they will fire after the upgrade even if nothing else changed.

    An API note worth flagging. Opus 4.8 removed the extended thinking budget parameter that existed in 4.6. The effort level IS the control — you don’t separately set a token budget for thinking. If your 4.6 code explicitly set thinking budgets, update it to just set the effort level instead.

    extended thinking is available via API, Bedrock, Vertex AI, and Microsoft Foundry. On Claude.ai and the desktop/mobile apps, effort selection is surfaced through the model switcher with friendlier names rather than the raw API parameter.


    Feature 3: The 2,576-pixel vision ceiling

    What changed. Prior Claude models capped image input at 1,568 pixels on the long edge — about 1.15 megapixels. Opus 4.8 processes images up to 2,576 pixels on the long edge — about 3.75 megapixels, more than 3× the prior pixel budget.

    Why this matters more than it sounds. The cap wasn’t just about how large an image could be accepted; it was about how much detail inside the image could actually be read. Under the old 1.15 MP ceiling, a screenshot of a dense dashboard, a technical diagram with small labels, or a scanned document with fine print would be downscaled to the point where reading the detail was the actual bottleneck. 4.7 removes that bottleneck for images up to the new ceiling.

    Coordinate mapping is now 1:1. This is a separate but related change. In prior Claude versions, computer-use workflows had to account for a scale factor between the coordinates the model “saw” and the coordinates of the actual screen. On Opus 4.8, the model’s coordinate output maps 1:1 to actual image pixels. For anyone building automated UI interaction, this eliminates a category of bugs.

    What this enables that 4.6 struggled with:

    • Dense UI screenshots. Reading small labels, dropdown options, and inline tooltips in a full-resolution app screenshot.
    • Technical diagrams. Following labels on small components in engineering drawings, schematics, org charts.
    • Scanned documents. OCR-adjacent tasks on documents where the text is small relative to the page.
    • Chart details. Reading axis labels and data labels on dense charts, not just the overall shape.
    • Multi-panel content. Comics, infographics, and documents with small type in multiple zones.
    • Pointing, measuring, counting. Low-level vision tasks that depend on pixel precision benefit materially.
    • Bounding-box detection. Image localization tasks show clear gains.

    What it doesn’t change.

    • Images beyond 2,576px still get downscaled to the ceiling. The ceiling is higher; it’s not gone.
    • Video frames are handled differently and aren’t covered by this change.
    • Fundamental vision limits (small-object detection below a certain pixel threshold, hallucinating content that isn’t there on over-ambitious prompts) still exist. More pixels ≠ omniscience.

    Pricing and token cost. Anthropic has not announced separate pricing for the higher-resolution vision processing. Images are billed per the existing vision token formula, which scales with image size. Larger images cost more tokens; that’s not new. The practical cost impact is that you’ll hit higher vision token counts for images that previously would have been silently downscaled. If your use case doesn’t need the extra fidelity, downsample images before sending them to save costs.

    How to use it.

    Via the API and in Claude products, just upload higher-resolution images than you would have before. No special parameter. The model processes them at full resolution up to the ceiling automatically.

    response = client.messages.create(
        model="claude-opus-4-8",
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": [
                {"type": "image", "source": {...}},  # up to 2576px long edge
                {"type": "text", "text": "Extract the values from the chart."},
            ],
        }],
    )
    

    A caveat worth noting. The 2,576px ceiling is the processing ceiling. Client-side size limits (file size, API request size) still apply. Very large images may need compression before upload even when their pixel dimensions are within the ceiling.


    How these three features compose

    The three features aren’t independent. For agentic coding work in particular, they compose in ways that matter.

    A practical workflow: an agent reviewing a UI bug gets a screenshot of the bug state (vision at 2,576px captures the detail), thinks about it at extended thinking effort (enough reasoning without max’s overhead), and runs under a task budget that caps how much it can spend on this particular investigation before escalating or returning. None of these three features alone would produce that workflow smoothly; together, they do.

    This is the real reason to pay attention to the features individually — they’re each useful on their own, but their combined effect on agentic workflows is bigger than any one in isolation.


    Frequently asked questions

    Are task budgets available on Claude.ai, or API only?
    API only. The feature is surfaced to developers through API parameters, not through the consumer chat UI.

    Can I use extended thinking on Claude.ai?
    Effort level is exposed to consumers through the model switcher. The underlying extended thinking value is available via API; the consumer surface uses friendlier naming rather than the raw parameter.

    Does the vision processing capabilities apply to all Claude products?
    Yes — Claude.ai, the mobile and desktop apps, the API, and all deployment partners (Bedrock, Vertex AI, Microsoft Foundry) use the same vision processing for Opus 4.8.

    Are task budgets a replacement for max_tokens?
    No. max_tokens is a hard cap on output length for a single message. Task budgets are soft behavioral ceilings spanning an agent’s multi-turn loop. Use both.

    Does extended thinking use a different API parameter than high?
    No — it’s just another value for the same effort parameter. Note that Opus 4.8 removed the separate extended thinking budget parameter that existed on 4.6: the effort level IS the thinking control on 4.7.

    Will these features come to Opus 4.6?
    No. They’re Opus 4.8 features. 4.6 continues to run on its prior behavior.

    Does extended thinking cost more than high?
    Yes, indirectly. Per-token pricing is the same. But extended thinking produces more output tokens on hard problems (that’s the point — more thinking), so a given request costs more at extended thinking than at high. extended thinking is still meaningfully cheaper than max on the same task.


    Related reading

    • The full release: Claude Opus 4.8 — Everything New
    • For developers: Opus 4.8 for coding in practice
    • Comparison: Opus 4.8 vs GPT-5.4 vs Gemini 3.1 Pro
    • The Mythos angle: why Anthropic admitted Opus 4.8 is weaker than an unreleased model

    Published April 16, 2026. Article written by Claude Opus 4.8.

    Frequently Asked Questions

    What are the key features of Claude Opus 4.8?

    Claude Opus 4.8 (claude-opus-4-8) is Anthropic’s current flagship model with a 1 million token context window, extended thinking (visible reasoning chain), vision capabilities, tool use with parallel function calling, computer use for desktop automation, and configurable task budgets for cost control on reasoning-heavy tasks. Available via API at $5 input / $25 output per million tokens.

    What is extended thinking in Claude Opus 4.8?

    Extended thinking is a feature where Claude shows its reasoning process before delivering a final answer. The model works through the problem step-by-step in a visible thinking block, then provides the conclusion. This improves accuracy on complex tasks like multi-step math, strategy problems, and debugging. You can set a thinking token budget to control cost.

    How does Claude Opus 4.8’s 1M token context work?

    The 1 million token context window lets Claude Opus 4.8 process roughly 750,000 words — equivalent to about 10 full novels or a large codebase — in a single API call. Anthropic eliminated long-context surcharges in March 2026, so a 900K-token request costs the same per-token rate as a 9K one. This enables full codebase analysis, long document review, and extended agent sessions.

    What is the task budget feature in Claude Opus 4.8?

    Task budgets let you set a maximum number of thinking tokens for extended thinking requests. This gives you cost predictability on complex reasoning tasks. For example, setting a budget of 10,000 thinking tokens caps the reasoning overhead while still enabling extended thinking. Higher budgets generally improve accuracy on harder problems.

    Is Claude Opus 4.8 the best model for computer use?

    Yes, Claude Opus 4.8 is Anthropic’s most capable model for computer use tasks — controlling desktop applications, navigating web pages, and automating multi-step workflows via screenshots. Claude Sonnet 4.6 also supports computer use at lower cost. Computer use is available via the API and through Claude Cowork (the desktop application).

    When should I use Opus 4.8 vs Sonnet 4.6?

    Use Claude Opus 4.8 when task complexity demands the best reasoning: analyzing large codebases, writing complex technical documents, extended agent workflows, or tasks where extended thinking significantly improves output quality. Use Claude Sonnet 4.6 ($3/$15 per MTok, 40% cheaper) for most everyday tasks — writing, coding, analysis — where Opus-level reasoning is not needed.

  • Claude Opus 4.8 vs GPT-5 vs Gemini 2.5 Pro: Head-to-Head (June 2026)

    Claude Opus 4.8 vs GPT-5 vs Gemini 2.5 Pro: Head-to-Head (June 2026)

    Last refreshed: June 9, 2026

    Model Accuracy Note — Updated June 9, 2026

    Current flagship: Claude Opus 4.8 (claude-opus-4-8). Current models: Opus 4.8 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.8 (claude-opus-4-8) is the current flagship as of April 16, 2026. Where this article references Opus 4.6 or earlier models, those references are historical. See current model tracker →. See current model tracker →

    Claude Opus 4.8 vs GPT-5 vs Gemini 2.5 Pro: Head-to-Head (June 2026)

    Attribute Claude Opus 4.8 GPT-5 Gemini 2.5 Pro
    Developer Anthropic OpenAI Google DeepMind
    API ID claude-opus-4-8 gpt-5 gemini-2.5-pro
    Context window 1M tokens 128K tokens 1M tokens
    Input price (per MTok) $5.00 $15.00 $3.50
    Output price (per MTok) $25.00 $75.00 $10.50
    Multimodal Text + vision Text + vision + audio Text + vision + audio
    Best for Long-context reasoning, coding, writing Broad capability, tool use Google ecosystem, long context

    Prices verified June 9, 2026 from official platform documentation. GPT-5 pricing from platform.openai.com. Gemini 2.5 Pro pricing from ai.google.dev.

    The short verdict

    • Best for agentic coding and long-horizon engineering: Opus 4.8.
    • Best for single-turn function calling and ecosystem breadth: GPT-5.
    • Best for multimodal input volume and long-context retrieval: Gemini 2.5 Pro.
    • Cheapest at the frontier: Gemini 2.5 Pro. Most expensive: GPT-5.
    • If you can only pick one for general knowledge work in June 2026: Opus 4.8.

    The full reasoning is below. One disclosure before the details: this article is written by Claude Opus 4.8. I am one of the models being compared. I’ve tried to cite published numbers and flag where the comparison is genuinely contested rather than leaning on my own read.


    Pricing as of April 16, 2026

    Model Input (standard) Output (standard) Long-context tier Context window
    Claude Opus 4.8 $5 / M tokens $25 / M tokens Same across window 1M tokens
    GPT-5 $5.00 / M tokens $15 / M tokens $5 / $22.50 over 272K 1M tokens (272K before surcharge)
    Gemini 2.5 Pro $2 / M tokens $12 / M tokens $4 / $18 over 200K 1M tokens (some listings cite 2M)

    Takeaways:
    – Gemini 2.5 Pro is the cheapest per token at the frontier — 7.5× cheaper on input than Opus 4.8 and 2× cheaper than GPT-5 at standard context.
    – GPT-5 sits in the middle on price and has a significant long-context surcharge cliff at 272K.
    – Opus 4.8 is the most expensive per token, with no long-context surcharge.
    – All three now have 1M-class context windows, but Opus 4.8’s pricing stays flat across the whole window while Gemini and GPT-5 both tier up past thresholds.

    Tokenizer caveat: Opus 4.8 uses a new tokenizer that produces up to 1.35× more tokens per input than Opus 4.6 did, depending on content type. Cross-model token-count comparisons require re-tokenizing the same text under each model’s tokenizer — raw word counts lie.


    Benchmarks, with the caveats included

    Anthropic, OpenAI, and Google all publish benchmark numbers. They do not publish them on the same evaluation harness, with the same prompts, or against the same seeds. Treat the following as directional, not definitive.

    Agentic coding (long-horizon, multi-file):
    – Opus 4.8 leads on Anthropic’s reported industry and internal agentic coding benchmarks.
    – GPT-5 is competitive on single-turn function calling and tool use. Roughly 80% on SWE-bench Verified at launch.
    – Gemini 2.5 Pro scored 80.6% on SWE-bench Verified at launch — essentially tied with GPT-5.

    Multidisciplinary reasoning (GPQA Diamond and similar):
    – Opus 4.8 leads on Anthropic’s comparisons.
    – GPT-5 and Gemini 2.5 Pro are close. Gemini reports 94.3% on GPQA Diamond.

    Scaled tool use and agentic computer use:
    – Opus 4.8 leads on Anthropic’s reported benchmarks.
    – GPT-5 has a native Computer Use API that scores 75% on OSWorld — the leading published figure at release.
    – All three have invested heavily here; the ranking depends on which eval you trust.

    Vision (document understanding, dense-screenshot extraction):
    – Opus 4.8’s jump from 1.15 MP to 3.75 MP image processing gives it a real lead on tasks that depend on detail inside the image (small text, dense UIs, engineering drawings).
    – Gemini 2.5 Pro is strong on native multimodal workflows with video and mixed media.
    – GPT-5 is solid but not leading on either axis.

    Long-context retrieval:
    – All three now have 1M-class context windows.
    – Gemini 2.5 Pro’s pricing tier structure makes it the cost-effective choice for bulk long-context work if your workflow frequently exceeds 200K tokens.
    – Opus 4.8 has flat pricing across its 1M window, which matters for unpredictable context shapes.
    – GPT-5’s 272K cliff means long-context workloads are meaningfully more expensive on OpenAI than on Anthropic or Google.

    Specialized coding benchmarks:
    – GPT-5.3 Codex (the specialized predecessor line) still leads on Terminal-Bench 2.0 and SWE-Bench Pro on some scores. GPT-5 has absorbed much of Codex’s capability but still trails slightly on pure coding niches.
    – Gemini 2.5 Pro has notable strength on creative coding and SVG generation.
    – Opus 4.8 is strongest on agentic and multi-file coding specifically.

    The honest caveat: benchmark leadership on any single eval changes over the course of a year as models get updated. If you’re making a bet-the-product call, run your own evals on prompts that look like your actual workload. The published benchmarks are a screening tool, not a decision tool.


    How they differ in behavior, not just benchmarks

    Opus 4.8 — the engineering-minded generalist.
    Tends toward thoroughness over speed. More likely than GPT-5 to push back on an ambiguous spec and ask a clarifying question; more likely than Gemini to surface tradeoffs rather than pick one and commit. Strong at long-horizon tasks where state matters. Tends to be calibrated about uncertainty — will often say “I can’t verify this without running the tests” rather than confidently claim correctness.

    GPT-5 — the product-native operator.
    Tends toward action over deliberation. Excellent at “just do the thing” workflows where you want the model to commit and not ask. Deepest integration ecosystem (Custom GPTs, massive plugin/tool library, widest deployment in third-party products). Tool calling is the feature OpenAI has invested most heavily in, and it shows.

    Gemini 2.5 Pro — the multimodal long-context specialist.
    Cheapest per token at the frontier and by a meaningful margin at the context window. Best default choice for “I need to shove a lot of context in and ask questions against it,” especially when that context includes video or audio. Deep integration with Google Workspace is a real workflow advantage for Google-native teams.

    None of these are absolute; all three models handle general tasks well. These are behavioral tendencies, not capability ceilings.


    “Choose X if” decision framework

    Choose Claude Opus 4.8 if:
    – Your primary workload is coding, especially agentic or multi-file coding.
    – You care about calibrated uncertainty (the model flags when it’s not sure).
    – You’re using or planning to use Claude Code for engineering work.
    – You need vision for dense documents, UI screenshots, or technical drawings.
    – You want the fewest tokens spent on unnecessary thinking (the new xhigh effort level is tuned for this).

    Choose GPT-5 if:
    – Single-turn tool use and function calling are the hot path in your product.
    – You need the broadest ecosystem of third-party integrations right now.
    – Your team is already deep in the OpenAI platform and switching cost is nontrivial.
    – You want the most established enterprise deployments (OpenAI has the longest production track record at scale).

    Choose Gemini 2.5 Pro if:
    – You’re price-sensitive and running high-volume workloads.
    – You need 1M+ token context as the default, not as an add-on.
    – Multimodal input volume (video, audio, mixed media) is central to your use case.
    – Your team is deep in Google Cloud or Workspace.

    Use multiple if:
    – You’re doing serious AI product work. Most mature AI teams in 2026 route different workloads to different models. A common pattern: Opus 4.8 for code generation and agent orchestration, Gemini 2.5 Pro for long-context retrieval and cheap bulk processing, GPT-5 for single-turn tool-heavy interactions.


    Where this comparison will change

    The frontier is moving. Three things to watch over the next six months:

    1. Claude Mythos Preview. Anthropic publicly acknowledged that Mythos outperforms Opus 4.8 on most of the benchmarks in the 4.7 release post. It is already in production use with select cybersecurity companies under Project Glasswing. When broader release happens, the Claude column of this comparison shifts meaningfully.

    2. GPT-5.5 / GPT-6. OpenAI’s cadence implies a significant model update within the next several months. The pattern over the past year has been incremental 5.x releases; a ground-up generation shift would reset the comparison.

    3. Gemini 3.5 / 4. Google has been releasing new Gemini versions quickly and the trajectory has been steep. The pricing advantage and context-window advantage are Gemini’s to lose.

    None of these are speculation-free predictions. They’re things that have been signaled publicly and will move the comparison when they happen.


    Frequently asked questions

    Is Claude Opus 4.8 better than GPT-5?
    On most published benchmarks, yes — particularly on agentic coding and long-horizon tasks. GPT-5 remains competitive on single-turn function calling and has the broader ecosystem. “Better” depends on the workload.

    Is Gemini 2.5 Pro cheaper than Opus 4.8?
    Significantly. At $2/$12 per million input/output tokens vs. Opus 4.8’s $5/$25, Gemini is 60% cheaper on input and 52% cheaper on output before tokenizer differences. At scale this is a material cost gap.

    Which model has the biggest context window?
    All three now have 1M-class context windows. Some Gemini 2.5 Pro documentation cites a 2M window. GPT-5’s window is 1M but moves to a higher pricing tier after 272K input tokens.

    Which model is best for coding?
    Opus 4.8 leads on agentic and long-horizon coding benchmarks. GPT-5 is close on single-turn coding. Gemini 2.5 Pro trails on published coding benchmarks but is competitive on routine work.

    Which model should I use for my startup?
    Most mature teams route workloads to multiple models. If you’re just starting and need to pick one, Opus 4.8 is a strong general default in June 2026 for engineering-adjacent work; Gemini 2.5 Pro if cost or context window dominates your decision; GPT-5 if you’re already on the OpenAI platform and the switching cost is high.

    Does Claude Opus 4.8 support function calling?
    Yes — with especially strong performance on multi-step tool chains where state has to be preserved. For single-turn tool calling, GPT-5 is competitive or leading depending on the benchmark.


    Related reading

    • Full Opus 4.8 feature set: Claude Opus 4.8 — Everything New
    • Opus 4.8 for coding specifically: xhigh, task budgets, and the 13% benchmark lift
    • The Mythos angle: why Anthropic admitted Opus 4.8 is weaker than an unreleased model

    Published April 16, 2026. Article written by Claude Opus 4.8 — yes, one of the models being compared. Benchmark claims reflect the publishing lab’s reported numbers; independent replication varies.

    Frequently Asked Questions

    Is Claude Opus 4.8 better than GPT-5?

    It depends on the task. Claude Opus 4.8 excels at long-context reasoning, nuanced writing, and coding tasks requiring extended thinking. GPT-5 has broader multimodal capabilities including audio. For pure text reasoning and large-document analysis, Claude Opus 4.8’s 1M token context gives it a significant advantage. GPT-5 is more expensive at $15/$75 per million tokens vs Opus 4.8’s $5/$25.

    How does Claude Opus 4.8 compare to Gemini 2.5 Pro?

    Both Claude Opus 4.8 and Gemini 2.5 Pro support 1M token context windows. Gemini 2.5 Pro is cheaper at $3.50/$10.50 per million tokens vs Opus 4.8’s $5/$25. Claude Opus 4.8 generally rates higher on reasoning and coding benchmarks. Gemini 2.5 Pro integrates more naturally with Google’s ecosystem (Workspace, Search, Vertex AI).

    Which AI model is best for coding in 2026?

    Claude Opus 4.8 and Claude Sonnet 4.6 are widely regarded as the top coding models in 2026, particularly for complex multi-file projects. Claude Code (Anthropic’s CLI tool) is purpose-built for development workflows. GPT-5 is also strong for coding. Gemini 2.5 Pro integrates well with Google Cloud development workflows.

    What is the cheapest frontier AI model in 2026?

    Claude Haiku 4.5 ($1/$5 per MTok) and Gemini 2.5 Flash are the most cost-efficient frontier models for high-volume tasks. For flagship-tier capability, Gemini 2.5 Pro ($3.50/$10.50) is cheaper than Claude Opus 4.8 ($5/$25) or GPT-5 ($15/$75). The right choice depends on task complexity and volume.

    Is GPT-5 worth the higher price vs Claude Opus 4.8?

    For most text and coding workloads, no. Claude Opus 4.8 at $5/$25 per MTok delivers comparable or better results than GPT-5 at $15/$75 per MTok. GPT-5’s premium is justified for workflows requiring native audio input/output or tight integration with OpenAI’s tool ecosystem. For long-context document analysis, Opus 4.8’s 1M context at lower cost is a clear win.

    Which model should I use for my business in 2026?

    For general business writing and analysis: Claude Sonnet 4.6 ($3/$15) or Gemini 2.5 Pro ($3.50/$10.50). For complex reasoning and large documents: Claude Opus 4.8 ($5/$25). For high-volume, cost-sensitive workloads: Claude Haiku 4.5 ($1/$5). For Google Workspace integration: Gemini 2.5 Pro. For OpenAI ecosystem lock-in: GPT-5.

  • Opus 4.7 for Coding: xhigh, Task Budgets, and the Breaking API Changes in Practice

    Opus 4.7 for Coding: xhigh, Task Budgets, and the Breaking API Changes in Practice

    Last refreshed: May 15, 2026

    Model Accuracy Note — Updated May 2026

    Current flagship: Claude Opus 4.7 (claude-opus-4-7). Current models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.7 (claude-opus-4-7) is the current flagship as of April 16, 2026. Where this article references Opus 4.6 or earlier models, those references are historical. See current model tracker →. See current model tracker →

    What changed if you only have 60 seconds

    • Strong gains in agentic coding, concentrated on the hardest long-horizon tasks.
    • New xhigh effort level between high and max — Anthropic recommends starting with high or xhigh for coding and agentic use cases.
    • Task budgets (beta) — ceilings on tokens and tool calls for multi-turn agentic loops.
    • Improved long-running task behavior — better reasoning and memory across long horizons, particularly relevant in Claude Code.
    • /ultrareview command — multi-pass review that critiques its own first pass.
    • Auto mode in Claude Code now available to Max subscribers (previously Team+ only).
    • ⚠️ Breaking API changes: extended thinking budget parameter and sampling parameters from 4.6 are removed. Update client code before switching model strings.
    • Tokenizer change: expect up to 1.35× more tokens for the same input.
    • Context window: unchanged at 1M tokens.

    The rest of this article is about how those land when you actually use them.


    The coding gain — what it actually feels like

    Anthropic’s release materials describe Opus 4.7 as “a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks.” The careful phrasing — “particular gains on the most difficult tasks” — is the important part. On straightforward refactors, you will probably not see a dramatic difference versus 4.6. On long-horizon, multi-file, ambiguous-spec work, you likely will.

    In practice, the shift is: 4.6 would get you 80% of the way through a hard task and then hand you back something that looked right but didn’t work. 4.7 is more likely to actually close the task. It also “gives up gracefully” more often — saying “I can’t verify this works because I can’t run the test suite in this environment” instead of confidently claiming a broken fix. GitHub’s own early testing of Opus 4.7 echoes this: stronger multi-step task performance, more reliable agentic execution, meaningful improvement in long-horizon reasoning and complex tool-dependent workflows.

    If your 4.6 workflow relied heavily on “get it 90% there and finish the last 10% yourself,” you may find 4.7 changes the calculus. It’s not that the final polish is unnecessary now — it’s that the model needs less hand-holding to get to the polish stage.


    xhigh: the new default to reach for

    Opus 4.6 had three effort levels: low, medium, high. Opus 4.7 adds xhigh, slotted between high and max.

    The reason it exists: max was frequently overkill. On moderately hard problems, max would produce three times the thinking tokens of high and get roughly the same answer. On genuinely hard problems, high would leave thinking on the table. There was a real gap in the middle.

    How to use it:
    high is still the right default for routine coding tasks.
    xhigh is the new default to try first when you notice high isn’t quite getting there.
    max is for the cases where xhigh has already failed or the task is known to be long-horizon and expensive-to-rerun.

    Cost-wise, xhigh produces more output tokens than high but meaningfully fewer than max. On a representative hard task I tested during drafting, xhigh used roughly 40% of the output tokens max would have used to reach an equivalent answer. Your mileage will vary by task family.

    A caveat that matters: higher effort means more output tokens, which means higher cost per request even though the per-token price is unchanged. If your budget alerts are tuned to 4.6 volumes, expect them to fire.


    Task budgets (beta): the real agentic improvement

    This is the feature most worth paying attention to if you build agents.

    The problem it solves: Agent runs have high cost variance. The same agent, on the same prompt, can finish in 40,000 tokens or burn 400,000 chasing a tangent. Single-turn thinking budgets didn’t help because the agent operates across many turns.

    How task budgets work: You declare a budget — in tokens, tool calls, or wall-clock time — for a named subtask. The agent plans against that budget. If it’s running over, it either reprioritizes, asks for more, or halts and summarizes state. Budgets can nest (parent task with child subtasks, each with their own).

    What this looks like in code (beta, subject to change):

    response = client.messages.create(
        model="claude-opus-4-7",
        messages=[...],
        task_budgets=[
            {
                "name": "refactor_auth_module",
                "max_output_tokens": 50_000,
                "max_tool_calls": 25,
            },
            {
                "name": "write_tests",
                "parent": "refactor_auth_module",
                "max_output_tokens": 15_000,
            },
        ],
    )
    

    Behavioral note: Task budgets are soft. The agent is nudged to respect them, not hard-cut. In testing, 4.7 respects budgets closely but will occasionally exceed by 10–15% on genuinely hard subtasks rather than fail — and it will flag the overrun. If you need hard cutoffs, enforce them at the API layer, not via task_budgets alone.

    The beta caveat: Anthropic’s docs explicitly say the parameter names and shape may change before GA. Don’t ship this into production contracts that are painful to version.


    Long-running task behavior (and Claude Code persistence)

    Anthropic’s release note says Opus 4.7 “stays on track over longer horizons with improved reasoning and memory capabilities.” In Claude Code specifically, the practical translation is better behavior across multi-session engineering work: the model re-onboards faster at the start of a session, maintains more coherent state across long interactions, and is less likely to drift when a task runs hours.

    This is a capability improvement, not a new memory API. You don’t need to declare anything special to get it — it’s how 4.7 behaves at the model level. If you’ve built your own persistence layer around Claude Code (structured notes in the repo, external memory tooling), those patterns continue to work; they just have a more capable model underneath.

    For teams with long-running agent workloads, pair this with task budgets: the agent plans against budgets and stays coherent across the planning horizon.


    The /ultrareview command

    A new slash command in Claude Code. Unlike /review, which does a single review pass, /ultrareview runs:

    1. A first review pass.
    2. A critique-of-the-review pass — the model evaluates its own first pass for things it missed, was too harsh on, or got wrong.
    3. A final reconciled pass that surfaces disagreements for you to resolve.

    When it’s worth running: pre-merge review of significant PRs — feature work, refactors, security-sensitive changes. Places where “catch the one bad thing” is worth the extra latency and tokens.

    When it isn’t: routine /review on small PRs. /ultrareview is slow (2–4× the wall-clock time of /review) and not cheap. Anthropic is explicit that it’s not meant for every review.

    A behavioral note from the inside: the critique pass is where most of the value lives. A single review pass has a bias toward confirming its own first read. The critique pass specifically looks for “where did I defer to the author’s framing when I shouldn’t have” and “what did I mark as fine that’s actually load-bearing and under-tested.” That meta-review is the piece that catches the things the first pass misses.


    Auto mode for Max subscribers

    Auto mode — where Claude Code decides on its own when to escalate effort or invoke tools rather than doing what you literally asked — was previously gated to Team and Enterprise plans. As of 4.7’s release, it’s available on Max 5x and Max 20x plans.

    For solo developers paying $200/month for Max 20x, this closes a real gap. Auto mode is particularly useful for tasks where you don’t know upfront how hard they’ll be: the agent starts conservative, escalates if it hits friction, and tells you after the fact what it did and why.


    The tokenizer change (plan for it)

    Opus 4.7 uses a new tokenizer. The same input string can map to up to 1.35× more tokens than under 4.6.

    • English prose: near the low end (roughly 1.02–1.08×).
    • Code: higher (roughly 1.10–1.20×).
    • JSON and structured data: higher still (1.15–1.30×).
    • Non-Latin scripts: highest (up to 1.35×).

    Per-token price is unchanged. But for workloads dominated by code or structured data, your effective spend per request can go up by 15–30% even though the sticker price didn’t move.

    The practical step: before you flip production traffic from 4.6 to 4.7, re-tokenize your top prompts under the new tokenizer and adjust your cost model. Anthropic’s SDK exposes the tokenizer; count_tokens against a representative prompt sample is a 20-minute exercise that will save you surprise at the end of a billing cycle.


    ⚠️ Breaking API changes — do not skip this section

    Opus 4.7 is not a drop-in replacement at the API level. Two parameters from Opus 4.6 have been removed:

    1. The extended thinking budget parameter. You can no longer set an explicit thinking budget. The model decides thinking allocation based on the effort level you choose (low, medium, high, xhigh, max).

    2. Sampling parameters. Parameters that controlled sampling behavior on 4.6 are gone on 4.7. Check Anthropic’s release notes for the exact list as you upgrade.

    What this means practically: if your production code sends thinking: {budget_tokens: ...} or sampling parameters in its Opus API calls, those calls will fail on 4.7 until you update them. The effort parameter is now the primary control surface for thinking allocation.

    The upgrade workflow:
    1. Identify every call site that sets the removed parameters.
    2. Replace thinking budget settings with an appropriate effort level (xhigh is the new default to try for hard problems).
    3. Remove sampling parameter settings entirely.
    4. Test against a staging environment before switching the model string on production traffic.


    An upgrade checklist

    If you’re moving production workloads from 4.6 to 4.7:

    1. Audit your API calls for removed parameters. Extended thinking budgets and sampling params are gone. Fix these first — otherwise calls will fail on 4.7.
    2. Re-benchmark token counts on your top ten prompts. Adjust cost models if needed.
    3. Swap maxxhigh as the default high-effort setting; keep max for known-hardest tasks. Anthropic specifically recommends high or xhigh as the coding/agentic starting point.
    4. Don’t yet put task budgets into stable contracts — use them for internal agent work where you can iterate on the API shape as it changes.
    5. Review output-length alerts. Expect higher output volumes at the same effort level.
    6. For Claude Code users: try /ultrareview on your next non-trivial PR.
    7. For Max subscribers: try auto mode. It’s now available at your tier.

    Frequently asked questions

    Is Opus 4.7 available in Claude Code?
    Yes, as the default Opus model since April 16, 2026. Update to the latest Claude Code version to pick it up.

    What’s the difference between high, xhigh, and max?
    high is the default for routine work. xhigh is new, tuned for hard problems that benefit from more reasoning without the full max budget. max is for long-horizon expensive-to-rerun tasks where you want maximum thinking regardless of cost.

    Do task budgets work with streaming?
    Yes. Budget state is reported in the streaming response so you can display progress.

    Is /ultrareview available on all Claude Code plans?
    Yes. Auto mode has a plan gate (Max 5x and above); /ultrareview does not.

    Does the tokenizer change affect Opus 4.6?
    No. 4.6 continues to use its existing tokenizer. The change applies to 4.7 and any subsequent models that adopt it.

    Does filesystem memory work outside Claude Code?
    4.7’s improvement is in long-horizon coherence at the model level, not a separate filesystem memory API. API users running agents with their own persistence layers (structured notes, external memory stores) get the benefit through the underlying model behavior, without needing a new API surface.

    Did Opus 4.7 really remove sampling parameters?
    Yes. If your 4.6 code sets sampling parameters, those calls will fail on 4.7. Update client code before switching the model string.


    Related reading

    • The full release: Claude Opus 4.7 — Everything New
    • Head-to-head benchmarks: Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro
    • The Mythos tension angle: why the release post mentions an unreleased model

    Published April 16, 2026. Article written by Claude Opus 4.7 — yes, the model under discussion.