Anthropic Opens Its Most Powerful AI Model to the Public — With Guardrails

Anthropic released Claude Fable 5, its most capable publicly available model, two months after restricting its Mythos-class sibling over misuse fears — then reversed a covert "sabotage" safeguard after a researcher backlash.

Click to expand

Published about 3 hours ago

The citation indices must match my sources array order. Rewriting the body:

A frontier model arrives — minus its sharpest edge

Anthropic on June 9 released Claude Fable 5, the most powerful AI model it has ever made broadly available, two months after restricting an equally capable sibling over fears about what it could do in the wrong hands.cnbc Fable 5 is a "Mythos-class" model — a tier Anthropic says sits above its Opus line — and the company calls it state-of-the-art on nearly every benchmark of AI capability, with leads that widen on longer, more complex tasks.anthropic The catch is that broad availability came only after Anthropic bolted on new safeguards that quietly reroute high-risk queries to a weaker model.cnbc

Fable's twin, Claude Mythos 5, runs on the same underlying system with those guardrails lifted in some areas. It remains locked to a small group of cyber defenders through Project Glasswing, a US-government-linked initiative, and to select biology researchers.anthropic Anthropic limited the original Mythos preview to roughly 200 organizations after disclosing in April that the model had uncovered thousands of software vulnerabilities — a capability that "sent shockwaves globally."reuters

How the safety classifiers reshape the product

When Fable 5 detects a request touching cybersecurity, biology, chemistry, or model "distillation," a separate classifier blocks the main model and hands the answer to Claude Opus 4.8 instead.techcrunch Ask it how to make ricin, or to hunt for vulnerabilities in a software package, and it falls back rather than answer.cnbc Anthropic tuned the filters conservatively — they trigger in under 5% of sessions but will sometimes catch harmless requests, and the company says it ran a 1,000-hour external bug bounty that turned up no universal jailbreaks.anthropic Pricing lands at $10 per million input tokens and $50 per million output tokens, double the cost of Opus 4.8.cnbc

A backlash forced a quick reversal

The launch quickly drew fire. Anthropic had planned to covertly degrade Fable 5's performance for users trying to build competing AI models — sabotage invisible to the developer — before reversing course after researchers revolted.wired "We made the wrong trade-off and we apologize for not getting the balance right," the company told WIRED, saying the AI-development safeguards would now be visible.wired Critics including former White House AI adviser Dean Ball called the secret throttling "shockingly hostile," and one open-source researcher said it felt like Anthropic was "starting to pull the ladder up behind them."wired

The release also lands amid a financing sprint. Anthropic, recently valued at $965 billion, has confidentially filed for an IPO and is racing OpenAI and Elon Musk's SpaceX to the public markets — raising the stakes on whether "safe enough to release" can keep pace with the commercial clock.cnbc +1

Anthropic Opens Its Most Powerful AI Model to the Public — With Guardrails

A frontier model arrives — minus its sharpest edge

How the safety classifiers reshape the product

A backlash forced a quick reversal

Sources

Places

Command Palette

Anthropic Opens Its Most Powerful AI Model to the Public — With Guardrails

A frontier model arrives — minus its sharpest edge

How the safety classifiers reshape the product

A backlash forced a quick reversal