Last refreshed: May 15, 2026
Redirecting… Click here if not redirected
Complete guides, tutorials, comparisons, and use cases for Claude AI by Anthropic.

May 2026 has been one of Anthropic’s busiest months yet. Here’s everything that shipped, changed, or was announced — plus the confirmed upcoming dates you need to know.
June 2026 Update
Since this page was published, Anthropic has released Claude Opus 4.8 — the new current flagship model, succeeding Opus 4.8. Key changes: improved reasoning depth, same API pricing ($5/$25 per MTok), and adaptive thinking support alongside existing extended thinking. See the current model version tracker for the full model lineup.
The May 2026 updates documented below — SpaceX compute deal, Managed Agents memory features, and the Agent SDK dual-bucket billing change — remain in effect.
Opus 4.8 launched April 16 as the current flagship model, priced identically to Opus 4.6 at $5/$25 per million tokens (input/output). Key changes:
xhigh sits between high and max — five levels total: low / medium / high / xhigh / maxAlongside Opus 4.8, Anthropic launched Claude Design — an Anthropic Labs product for collaborating with Claude to produce visual outputs including designs, prototypes, slides, and one-pagers.
Anthropic announced a partnership with SpaceX to access Colossus 1 compute capacity. The immediate practical impact for subscribers:
Anthropic is also reportedly evaluating an IPO as early as October 2026, and has disclosed run-rate revenue of $30B (up from $9B at end of 2025). The SpaceX deal comes as the company prepares that filing.
Claude Managed Agents — the fully managed agent harness launched in public beta earlier this year — gained three significant additions:
managed-agents-2026-04-01 beta header.Claude Cowork is now GA on macOS and Windows through the Claude Desktop app. New additions with GA: Claude Cowork in the Analytics API, usage analytics, and expanded desktop automation capabilities.
Claude Code has been shipping near-daily updates. Notable May additions include:
--plugin-url <url> flag fetches a plugin .zip from a URL for the current sessionclaude project purge [path] deletes all Claude Code state for a project (transcripts, tasks, file history, config) with dry-run supportCLAUDE_CODE_PACKAGE_MANAGER_AUTO_UPDATE runs upgrade in the background on Homebrew or WinGet installs/remote-control bridges sessions to claude.ai/code to continue from a browser or phoneClaude’s connector directory has grown beyond work tools. New consumer app connectors include AllTrails, Instacart, Audible, Tripadvisor, Uber, and Spotify. The directory now exceeds 200 connectors. Claude surfaces relevant connectors in context during conversations rather than requiring users to browse a directory.
Anthropic released ten ready-to-run agent templates for financial services work: pitchbook building, KYC file screening, and month-end close workflows. Microsoft 365 add-ins for Excel, PowerPoint, Word, and Outlook are coming soon. A Moody’s MCP app brings Claude into financial data workflows.
These are officially announced by Anthropic — not speculation:
claude-sonnet-4-20250514) and Claude Opus 4 (claude-opus-4-20250514) are deprecated and retired from the Claude API. Migrate to Sonnet 4.6 and Opus 4.8 respectively before this date.Claude Haiku 3 (claude-3-haiku-20240307) has already been retired — all requests now return an error. Migrate to Claude Haiku 4.5. Claude Sonnet 4 and Opus 4 retire June 15, 2026.
Claude 5 is widely anticipated for Q2–Q3 2026 based on Anthropic’s release cadence, though Anthropic has made no official announcement. The advisor tool — which pairs a faster executor model with a higher-intelligence advisor model for long-horizon agentic workloads — launched in public beta and signals the architectural direction Anthropic is moving toward for complex, multi-step tasks.
The pace of Claude Code releases in particular has accelerated to near-daily — following Anthropic’s own disclosure that engineers internally use Claude for a growing share of their own development work.

Last refreshed: May 15, 2026
The Claude Team plan’s usage limits changed significantly in May 2026. If you’re a Team subscriber and you haven’t noticed yet, you’re now getting substantially more capacity than you were in April — and the free tier got left behind entirely. Here’s exactly what changed, what you have now, and what it means in practice.
Updated May 9, 2026
Rate limits doubled for Team plan subscribers following Anthropic’s SpaceX Colossus 1 compute deal (announced May 6, 2026). Free plan excluded from all increases. This page reflects current limits.
On May 6, 2026, Anthropic announced a compute partnership with SpaceX, giving it access to SpaceX’s Colossus 1 data center. The practical result for paying subscribers came fast: rate limits doubled. Here’s the breakdown by tier:
Source: Anthropic’s official announcement at anthropic.com/news/higher-limits-spacex.
The 1,500% input token figure for Tier 1 API is the one that didn’t get much press coverage. That’s a 15× ceiling increase for API users who’ve been running agent pipelines and hitting hard walls. If you’ve been rate-limited during multi-step Claude Code runs, this is the change that matters most.
The seat types haven’t changed — just the capacity within them. The Team plan still offers two seat types that can be mixed within the same organization:
| Seat Type | Annual Price | Monthly Price | Usage vs Pro | Claude Code |
|---|---|---|---|---|
| Standard | $25/seat/month | $30/seat/month | 1.25× more per session | No |
| Premium | $100/seat/month | $125/seat/month | 6.25× more per session | Yes |
Both seat types benefit from the May 2026 doubling of the 5-hour rate limit window. A Premium seat’s 6.25× multiplier now applies to a higher baseline than it did before May 6.
Anthropic uses a rolling 5-hour window for usage limits, not a daily reset. Here’s what that means practically:
Peak-hours throttling — the extra restriction that kicked in during high-demand periods — is now eliminated for Pro and Max. Team plan benefits from the doubled limit floor; the throttling elimination is Pro and Max specific.
As of May 2026, the Claude model lineup (verified from Anthropic’s official models page):
| Model | API String | Context Window |
|---|---|---|
| Claude Opus 4.7 | claude-opus-4-7 | 1M tokens |
| Claude Sonnet 4.6 | claude-sonnet-4-6 | 1M tokens |
| Claude Haiku 4.5 | claude-haiku-4-5-20251001 | 200K tokens |
Deprecation notice: Claude Sonnet 4 and Opus 4 (original 4.0-generation, 20250514 date-string model IDs) are being retired June 15, 2026. Update any API integrations before that date.
The May 2026 rate limit increase does not apply to free accounts. Anthropic explicitly excluded the free tier from all capacity increases tied to the SpaceX deal. Paid plans now have a substantially higher ceiling while the free ceiling stays the same. If you’re hitting limits regularly on the free tier, the May 2026 changes are pressure toward upgrading — not relief.
Yes. Anthropic confirmed the 5-hour rate limit doubled for Team plan subscribers following the SpaceX Colossus 1 compute deal announced May 6, 2026. This applies to both Standard and Premium seats.
The peak-hours throttling elimination was announced specifically for Pro and Max subscribers. Team plan benefits from the doubled rate limit floor; throttling elimination was not announced for Team.
Claude notifies you that you’ve reached your usage limit. With the 5-hour rolling window, you can continue once older usage rolls off — you’re not waiting for a midnight reset. Burst usage depletes the window faster than spread usage over the same period.
They remain available but retire June 15, 2026. After that date, the active lineup is Fable 5, Opus 4.8, Sonnet 4.6, and Haiku 4.5.
The 1,500% input and 900% output token increases apply to Tier 1 API customers specifically. Team plan through claude.ai uses the doubled 5-hour window. Both benefits apply in their respective contexts if you’re a Tier 1 API customer and a Team subscriber.
No. The free plan was explicitly excluded from all rate limit increases in the May 2026 SpaceX announcement.

Last verified: June 13, 2026
There is no public “Claude student discount” code, and as of June 13, 2026 Anthropic does not publish a percentage-off student price on Claude Pro. What actually exists is better than a coupon for many students: free Claude through a participating university, a paid campus program, free API credits to test against, and a genuinely capable free tier. Below is every route we could verify against a primary source — who qualifies, what you get, the real cost, and how to claim it. Anything we could not confirm from an official Anthropic or GitHub page is listed at the end as “not verified,” not in the tables.
Lift any single row. Each route is verified against the source linked in the last column’s footnote. “Cost” is the price to the student, not the institution.
| Route | Who qualifies | What you get | Cost | How to get it |
|---|---|---|---|---|
| Claude for Education | Students, faculty & staff at a partner university | Claude’s premium features, incl. Learning Mode & Claude Code, provided institution-wide | Free to the student (institution buys a university-wide plan) | Sign in to claude.ai with your school email; access is provisioned by your school |
| Claude Campus Program — Ambassadors | Selected students at eligible campuses | Claude Pro access, API credits, paid stipend; lead AI initiatives on campus | Free + paid (you are paid a stipend) | Apply during an open cohort at claude.com/programs/campus (Spring 2026 round closed) |
| Claude Campus Program — Builder Clubs | Students starting/joining an Anthropic-supported campus club | Claude Pro access and monthly API credits for members; run hackathons & workshops | Free | Apply via claude.com/programs/campus when a cohort is open |
| Free API credits | Anyone with a new Claude Console account | “A small amount of free credits to test the API” (no fixed amount published by Anthropic) | Free, one-time | Create an account at console.anthropic.com / platform.claude.com |
| Claude free tier | Anyone, no enrollment needed | Web/mobile/desktop chat, web search, file creation, code execution, extended thinking, connectors | $0 | Sign up at claude.ai |
| Academic / research API discount | Academic & research users (case-by-case) | “Academic and research discounts may be available” on API usage | Negotiated | Contact Anthropic sales |
Most “Claude student discount code” pages rank for a deal that does not exist. There is no Anthropic-issued promo code that takes a percentage off Claude Pro for individual students. Do not enter a code from a coupon aggregator, and do not buy “discounted Claude Pro” from a third-party reseller — shared or resold accounts violate Anthropic’s terms and can be revoked.
| Claim you’ll see | Reality |
|---|---|
| “Use this Claude Pro student promo code for X% off” | Not real. Anthropic publishes no individual-student discount code on Pro as of June 13, 2026. Verify with your university route instead. |
| “Buy cheap shared Claude Pro / Max accounts” | Avoid. Reselling and account-sharing breach Anthropic’s terms; access can be terminated. Not a legitimate route. |
Claude for Education is Anthropic’s official higher-education program. When a university buys in, eligible students, faculty, and staff get Claude’s premium capabilities — including Learning Mode (which Anthropic describes as working “like a tutor — it asks the questions that help you find the answers yourself”) and Claude Code for teaching programming. The student does not pay; the institution licenses a university-wide plan and provisions accounts, typically tied to your school email domain. If your school is not yet a partner, the only action available to you is to ask your IT or student-services team to contact Anthropic’s education team — there is no individual sign-up for this plan.
The Campus Program runs in cohort rounds and has two student tracks. Campus Ambassadors work directly with Anthropic to lead AI-education efforts on campus and receive Claude Pro access plus API credits and a paid stipend. Builder Clubs let students set up an Anthropic-supported organization for AI builders on their campus; members get Claude Pro access and monthly API credits and run hackathons, workshops, and demo nights. Applications open and close by cohort — the Spring 2026 round is in session and closed; watch claude.com/programs/campus for the next intake.
If you want to build with Claude rather than chat, create a Claude Console account: Anthropic’s pricing documentation states that “new users receive a small amount of free credits to test the API.” Anthropic does not publish a fixed dollar figure on that page, so treat any specific number you see elsewhere as unverified. Separately, the no-cost Claude free tier covers a lot of student work on its own — chat across web, iOS, Android, and desktop, plus web search, file creation, code execution, extended thinking, and connectors. For heavier API use, Anthropic also notes that “academic and research discounts may be available” — a sales conversation, not a self-serve coupon.
Many guides still claim verified students get free GitHub Copilot Pro — which includes Anthropic’s Claude models — through the GitHub Student Developer Pack. As of June 13, 2026, GitHub’s own documentation tells a narrower story: the two ways to qualify for free Copilot Pro are being a verified teacher on GitHub Education or a maintainer of a popular open-source repository. GitHub’s docs also state that, starting April 20, 2026, “new sign-ups for Copilot Pro, Copilot Pro+, Copilot Max, and student plans are temporarily paused,” and the Student Pack page itself shows Copilot sign-ups paused. Because the student-Copilot path is in flux, we are keeping it out of the verified routes table — check GitHub Education for current status before counting on Claude-via-Copilot.
These are the standard consumer prices (USD), so you can judge whether a route is worth the effort. Prices verified from claude.com/pricing on June 13, 2026.
| Plan | Price | Notable inclusions |
|---|---|---|
| Free | $0 | Chat, web search, file creation, code execution, extended thinking, connectors |
| Pro | $17/mo billed annually ($200 upfront), or $20/mo monthly | Higher usage, Claude Code, unlimited projects, Research access, more model options |
| Max | From $100/mo | 5x or 20x Pro usage, elevated output limits, early features, priority during peak |
| Team | $20/seat/mo annual ($25 monthly); 5–150 people | Enterprise search, SSO, admin controls, central billing |
No. As of June 13, 2026 Anthropic does not publish an individual-student discount code for Claude Pro. The legitimate ways to save are free Claude through a partner university (Claude for Education), the Claude Campus Program, free API credits, and the free tier. Treat any “promo code” from a coupon site as not real.
Through your school. If your university participates in Claude for Education, sign in to claude.ai with your school email and your account is provisioned with premium features at no cost to you. If your school isn’t a partner, ask IT or student services to contact Anthropic’s education team — there is no individual self-serve sign-up for this plan.
It’s uncertain right now. GitHub’s documentation currently lists only verified teachers and popular open-source maintainers as qualifying for free Copilot Pro, and states that new student-plan sign-ups are temporarily paused as of April 20, 2026. Check GitHub Education for current status before relying on Claude-via-Copilot.
Anthropic’s pricing docs say new users receive “a small amount of free credits to test the API” but do not publish a fixed dollar amount on that page. Any specific figure you see elsewhere is not officially confirmed. Create an account at console.anthropic.com to see your current credit.
Campus Ambassadors receive Claude Pro access, API credits, and a paid stipend for leading AI-education work on campus; Builder Club members get Claude Pro access and monthly API credits. Applications run in cohorts — the Spring 2026 round is closed; watch claude.com/programs/campus for the next one. The exact stipend amount is not published on the official program page.

Last refreshed: May 15, 2026
Law firms have always been early adopters of tools that compress billable time. Document review software. Legal research databases. E-discovery platforms. The pattern is consistent: the firms that adopt early capture the margin advantage, and the rest catch up at cost.
Claude is following that pattern. And the window where using it is a competitive advantage rather than table stakes is closing faster than most legal professionals realize.
This is a practical guide to where Claude actually delivers in legal work — not theoretical use cases, but the specific tasks where it earns its keep — and where you still need a human in the loop.
The highest-leverage use case for most attorneys is research compression. Claude can take a 40-page appellate decision and return a structured summary — holding, reasoning, key facts, dissent — in under 60 seconds. It can synthesize across multiple cases to identify how a circuit has treated a specific doctrine over time.
What it cannot do: verify citations autonomously or guarantee it has not hallucinated a case name. Every citation must be independently verified in Westlaw or Lexis before it goes into a brief. Claude is the first pass, not the final check.
Practical workflow: paste the full text of the opinion (Claude’s 200K context window handles most decisions comfortably), ask for a structured summary with specific fields — holding, key facts, procedural posture, distinguishing factors — and use that as the basis for your own analysis rather than the analysis itself.
Claude handles first-draft contract language well, particularly for standard commercial agreements where the structure is predictable: NDAs, MSAs, employment agreements, vendor contracts. Give it the deal terms and the governing law, and it produces a serviceable first draft that your attorney then marks up rather than writing from scratch.
For redlining, paste the counterparty’s draft and ask Claude to identify provisions that deviate from market standard, flag missing protections, or summarize the risk profile of specific clauses. It catches things that get missed at 11pm on a deal close.
The limitation: Claude does not know your client’s specific risk tolerance, industry norms for your particular market, or the negotiating history with this counterparty. Those judgment calls remain human work.
One of the most underused legal applications is using Claude to prepare for depositions. Feed it the deponent’s prior testimony, relevant documents, and the key issues in the case. Ask it to generate a question outline organized by theme, flag inconsistencies in prior statements, and identify documents to confront the witness with.
It can also process large document productions and summarize by custodian, date range, or topic — substantially reducing the time a paralegal or junior associate spends on initial review.
Client-facing memos — explaining a legal issue in plain language, summarizing a court ruling’s implications, drafting a status update — are exactly the kind of writing where Claude performs well and where attorneys often underinvest time. The work is important but not intellectually complex. Claude produces a solid draft; the attorney reviews, adjusts for client relationship context, and sends.
The most effective legal deployment of Claude is not the chat interface — it is Claude with a strong system prompt that establishes context, format expectations, and guardrails. A system prompt for a litigation practice might specify the governing jurisdiction, output format requirements, what it should flag for attorney review, and firm-specific terminology.
For firms with technical capacity, Claude’s API allows integration directly into document management systems, allowing attorneys to invoke Claude without leaving the tools they already use.
The elephant in the room for law firms considering AI adoption is the billing model. If Claude compresses a five-hour research task to one hour, do you bill five hours or one?
The firms navigating this well are shifting toward value billing and fixed-fee arrangements where efficiency is profit rather than a billing problem. The ABA and state bars are actively developing guidance on AI use and disclosure. Following your jurisdiction’s bar guidance and staying current on disclosure requirements is non-negotiable.
Claude does not replace legal judgment. It compresses the work that precedes judgment — research, drafting, review, summarization — at a quality level that makes it worth building into the workflow of any firm serious about efficiency. Pick one task category, run Claude against your next ten instances of that task, and measure the time delta. The ROI case makes itself.

Last refreshed: May 15, 2026
The phrase “optimize for AI search” is almost always wrong. There is no single AI search behavior. Claude, ChatGPT, and Perplexity each have distinct citation patterns — different content structures they reward, different page types they concentrate on, different signals they weight. Writing one undifferentiated article and hoping it gets cited across all three is the same mistake as writing one undifferentiated web page and hoping it ranks for every keyword. This cluster article covers the per-model citation playbook, built from GA4 data and the multi-model roundtable methodology in the Tygart Media Knowledge Lab.
This is the final cluster in the Claude on a Budget series. For the token economics that make targeted content cheaper to produce, see Output Compression Discipline and Prompt Caching.
Claude (Anthropic): Concentrates heavily. GA4 data from sites in the Knowledge Lab shows Claude sending approximately 54.5% of its AI referral traffic to just 2 pages per site. It rewards content that is entity-dense, structurally authoritative, and written with speakable precision — defined terms, explicit relationships between concepts, factual density over narrative padding. Claude users tend to be technical and high-intent; the model reflects that by citing content that answers with precision rather than coverage. Approximately 90% of content on a typical site is invisible to Claude — it surfaces a small authoritative set and ignores the rest.
ChatGPT (OpenAI): Spreads references broadly. Where Claude concentrates on 2 pages, ChatGPT may reference 8-12 across the same site. It rewards breadth, recency, and natural-language accessibility. Content structured like a knowledgeable friend explaining something clearly — without jargon walls — performs well. ChatGPT users skew toward general-purpose questions; the model cites content that covers the question conversationally without assuming deep domain expertise.
Perplexity: Research-flavored. It rewards sourced claims, comparative tables, explicit statistics, and content that reads like a researched brief rather than an opinion piece or narrative. Perplexity users are actively in research mode; the model surfaces content that looks like it did the research so the user does not have to. Citation-rich, data-dense, table-formatted content punches above its traffic weight in Perplexity referrals.
| Element | Claude | ChatGPT | Perplexity |
|---|---|---|---|
| Density target | High — entity-rich, precise | Medium — accessible, broad | High — sourced, comparative |
| Best structure | Defined terms, explicit relationships, OASF | Conversational headers, FAQ blocks | Tables, stat callouts, comparison matrices |
| Ideal length | 1,500-2,500 words with tight structure | 800-1,500 words, readable flow | 1,000-2,000 words with data anchors |
| Citation trigger | Authoritative entity coverage | Query-matching accessible answer | Sourced comparative data |
The Tygart Media Knowledge Lab documents a specific workflow for content research that leverages multiple models’ citation profiles rather than fighting them. The pattern: route the initial research brief to a free or cheap model (Gemini Flash via OpenRouter, or Llama 3 free tier) for broad source gathering. Pass the source list to Claude for entity extraction and authoritative synthesis. Use the Claude-synthesized brief as the foundation for the final article draft. The output is content that is naturally entity-dense from Claude’s synthesis pass while covering enough ground to catch ChatGPT’s broader citation net.
The token economics matter here: the expensive synthesis pass (Claude Sonnet 4.6 or Haiku) operates on a pre-filtered source set, not raw web content. Input tokens are lower because a cheaper model did the broad sweep. Claude’s output is higher-density because it is synthesizing structured inputs rather than processing noise. This is the OpenRouter multi-model pipeline in content production form.
If your primary goal is Claude citation — high-intent technical traffic, B2B contexts, developer audiences — the content discipline is: define every entity explicitly at first mention, state relationships between concepts directly (“X enables Y because Z”), use speakable sentence structures (subject-verb-object, no buried clauses), include a structured FAQ or definition block, and remove padding. Claude’s citation concentration on 2 pages per site means your best-performing page for Claude referrals will get the bulk of the traffic — invest in making that page entity-complete rather than spreading thin coverage across many pages.
Perplexity citation optimization is the most actionable of the three because the signal is explicit: include comparative tables with real numbers, cite sources inline (even if just attributing claims to specific organizations or studies), use headers that read like research questions, and lead sections with data points rather than narrative. The content in this series — pricing tables, API code examples, usage statistics — is structured for Perplexity citation by design. Every table is a potential Perplexity extraction point.
Per-model content shaping is a budget strategy, not just a citation strategy. Writing one highly targeted, entity-dense 2,000-word article for Claude citation is cheaper to produce — fewer tokens, tighter output discipline — and more effective than producing three generic 1,500-word articles hoping one gets cited. Concentration over coverage: the same principle Claude uses to cite content, applied to content production itself. The output compression discipline from Cluster 6 makes this article type cheaper to generate. Dense, targeted content is both cheaper to produce with Claude and more likely to be cited by Claude. The budget and the citation strategy converge.
This series has covered seven levers that compound: cold-start elimination via second brain, model routing by task tier, OpenRouter free model integration, Batch API for async 50% discount, prompt caching for 90% off repeated context, output compression discipline, and per-model citation shaping. None of these require negotiating with Anthropic’s pricing team. All of them are available today via the API. Applied together, they represent the difference between paying retail for Claude and operating it at professional efficiency — which, for most teams, means the same Claude capability at 40-70% of the sticker cost.
Return to the full guide: Claude on a Budget: Complete Guide →

Last refreshed: May 15, 2026
Most Claude cost analyses focus on input tokens — the knowledge you send in. The underappreciated lever is output compression. Claude is trained to be thorough. Left unconstrained, it produces full meals: preambles, recaps, hedges, transition sentences, closing summaries. All of those tokens cost money. All of them are often unnecessary. Output discipline — getting Claude to deliver concentrated slices instead of full meals — is often the highest-leverage cost reduction available without changing models or switching to async.
This is part of the Claude on a Budget series. For input-side compression, see The Cold-Start Problem. For pricing mechanics, see Prompt Caching.
Ask Claude to “summarize this document” without constraints and you will get: an opening sentence restating the task, a multi-paragraph summary, a bullet-point recap of the summary, and a closing note about what was not covered. The actual information density — insight per token — is low. You paid for 800 tokens of output and needed 150. Multiply across thousands of API calls and you have built a significant cost leak from default model behavior, not from bad prompts.
1. Explicit word and token caps in the prompt. “Respond in 150 words or fewer” is the single most effective instruction for reducing output tokens. Claude respects tight limits. “Be concise” does not work reliably. “150 words maximum” does. For JSON outputs: “Respond with only valid JSON, no markdown fences, no explanation.” Every word of instruction about format is recovered 10x in output reduction across repeated calls.
2. Structured output schemas. When you need structured data, define the exact JSON schema. Claude stops generating prose and fills fields. You get exactly what you specified and nothing more. The token reduction versus free-form responses is typically 40-70% for equivalent information content.
# Free-form -- verbose, unpredictable length
prompt_verbose = "Summarize the key points of this article and their implications."
# Structured -- tight, predictable, cheaper
prompt_structured = """Extract from this article:
{"headline": "string", "key_points": ["string", "string", "string"], "sentiment": "positive|neutral|negative"}
Respond with valid JSON only. No explanation."""
3. Role-based compression priming. System prompt framing shapes output length. “You are a precise technical writer who values brevity. Never restate the task. Deliver the answer directly.” produces consistently shorter outputs than a neutral system prompt. This is prompt engineering for token economics, not just quality.
4. Chained micro-tasks over monolithic requests. Instead of asking Claude to research, analyze, synthesize, and format in one prompt, chain smaller requests. Each call is scoped to one task with tight output constraints. Total tokens across the chain are often lower than a single unconstrained request, and intermediate outputs are cacheable — pairing naturally with the prompt caching strategy.
The operational implementation at Tygart Media runs this pattern at pipeline level. The Notion second brain eliminates the need for Claude to generate background context — it already exists in structured form. Extractions from Notion arrive as pre-formatted knowledge blocks. Claude’s task is synthesis over existing structured data, not open-ended research and explanation. Output prompts are scoped: “Given this structured data, write a 400-word section for [topic]. No preamble, no conclusion, begin directly with the first point.” The output is a concentrated slice — dense, usable, billable at a fraction of what free-form generation costs for equivalent value.
Track output_tokens in your API responses. Log them per prompt template. Identify your highest-output templates and run compression interventions — tighter word caps, structured formats, role priming. The target is information density: insight delivered per output token, not raw token count. A 500-token output with 3 actionable insights beats a 200-token output with 1. Compression discipline is about removing the scaffolding (preambles, hedges, recaps) while preserving the load-bearing structure (insight, data, instruction).
Set max_tokens conservatively in your API calls. This is your financial guardrail, not just a model parameter. For classification tasks: 50 tokens. For short summaries: 200 tokens. For structured JSON extraction: 500 tokens. For article drafts: 1,500-2,000 tokens. Leaving max_tokens at the model default (4,096-8,192) on every call is leaving a cost ceiling unjustifiably high. Claude will rarely hit the ceiling on constrained tasks, but it prevents runaway generation on edge-case inputs that can quietly inflate your bill.
Next: Per-Model Content Shaping: Write Less, Get Cited More →

Last refreshed: May 15, 2026
If you’re sending the same large block of context — a knowledge base, a style guide, a long system prompt, a reference document — with every Claude request, you’re paying full input token rate on every single call. Anthropic’s prompt caching collapses that to roughly 10% of the standard input rate for cache hits. For context blocks of 1,000+ tokens sent repeatedly, this is one of the most reliable cost levers available.
This is part of the Claude on a Budget series. For async workloads, see The Batch API: 50% Off for Non-Urgent Work. For token reduction before the API call, see The Cold-Start Problem: Second Brain and CLAUDE.md.
Prompt caching is prefix-based. Anthropic caches the exact token sequence up to your cache_control breakpoint. Any subsequent request that begins with that identical prefix hits the cache and pays cache read rate (~$0.30/M for Sonnet vs. $3.00/M standard — a 90% reduction). The cache is maintained for approximately 5 minutes of inactivity, with extended TTL options for longer-lived contexts. Token minimums apply: 1,024 tokens for Haiku, 2,048 for Sonnet and Opus.
| Model | Standard Input | Cache Write | Cache Read | Read Savings |
|---|---|---|---|---|
| Haiku 4.5 | $1.00/M | $1.25/M | $0.10/M | 90% |
| Sonnet 4.6 | $3.00/M | $3.75/M | $0.30/M | 90% |
| Opus 4.7 | $5.00/M | $6.25/M | $0.50/M | 90% |
Cache writes cost slightly more than standard input (25% premium). The break-even is the second hit. Every cache read after that is 90% cheaper. For any context block read more than once, caching wins.
import anthropic
client = anthropic.Anthropic()
# Large system prompt or knowledge base -- mark for caching
SYSTEM_CONTEXT = "Your 5,000-token knowledge base or style guide here..."
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=[
{
"type": "text",
"text": SYSTEM_CONTEXT,
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "Your specific question or task here"}
]
)
# Check cache performance in usage stats
usage = response.usage
print(f"Input tokens: {usage.input_tokens}")
print(f"Cache creation tokens: {usage.cache_creation_input_tokens}")
print(f"Cache read tokens: {usage.cache_read_input_tokens}")
On the first call, you will see cache_creation_input_tokens populated — that is the write. On subsequent calls with the same prefix, cache_read_input_tokens shows what was served from cache at 10% cost.
Anything large, stable, and repeated qualifies. The highest-value candidates: long system prompts defining agent behavior or personas; reference documents such as product specs, legal terms, or knowledge bases; few-shot example sets (10+ examples add up fast); conversation history in multi-turn applications where you mark the stable history prefix for caching and leave only the new turn uncached. In agentic pipelines where Claude processes a document repeatedly across multiple analysis passes, cache the document body and vary only the instruction. You pay full rate once per document, cache rate on every subsequent pass.
The cold-start reduction strategy covered in Cluster 1 works because you are reducing what gets sent, not how it gets priced. Prompt caching is the complement: for context that must be sent, cache it. Together, the disciplines compound — your CLAUDE.md file keeps context lean; prompt caching ensures whatever you do send repeatedly costs 90% less after the first hit.
Place stable content at the front of your prompt. Place dynamic content at the end. The cache key is prefix-matched, so any change to content before the cache_control marker invalidates the cache. If your system prompt has a dynamic date or session ID embedded at the top, you are guaranteeing cache misses on every call. The correct structure: static knowledge base, then cache marker, then dynamic task-specific instruction. Never the reverse.
You can set up to 4 cache breakpoints in a single request for granular control. Most production implementations need only one or two — the system prompt cache and optionally a conversation history cache in long multi-turn sessions.
Teams running Claude with consistent system prompts across many requests report effective input costs dropping to near-cache-read rates after the first request. For a content pipeline running 1,000 daily requests with a 5,000-token system prompt: uncached cost is $15/day on input alone at Sonnet rates ($3/M). Cached cost after the first request: $1.50/day. That is $13.50/day saved on system prompt tokens alone — $4,900/year from one implementation change that takes an afternoon to ship.
Next: Output Compression Discipline: Concentrated Slices vs Full Meals →