Anthropic Thinks Its Own Success Is Key to Making AI Safe

Anthropic has spent the last five years warning the world about how advanced artificial intelligence could enable mass destruction, destabilize society, and cause a litany of other grave harms. But simultaneously, it has become one of the most powerful forces pushing AI capabilities forward. The company is now among the top developers and distributors of cutting-edge AI models and courts customers like the US military. It was recently valued at almost $1 trillion.

At first glance, Anthropic’s stark messaging and its actions seem fundamentally at odds.

But inside the company, many people don’t see a contradiction. To understand why, you first have to understand that Anthropic operates based on two core beliefs. The first is that artificial intelligence is the most transformative technology in human history, and its arrival is inevitable. The only real question is whether it leads to catastrophe or extraordinary prosperity.

The second is that Anthropic believes the world will be better off if it remains at the frontier of the AI race, according to several former employees who spoke to WIRED on the condition of anonymity. Internally, leaders and employees at the company often refer to themselves as the “good guys,” meaning the ones being responsible stewards of AI technology, two of the sources said. The company sees accumulating power—whether in the form of capital, compute, research talent, or political influence—not as an end in itself, but as the price of fulfilling its mission: “to ensure the world safely makes the transition through transformative AI.”

Helen Toner, executive director of Georgetown’s Center for Security and Emerging Technology and a former OpenAI board member, uses an analogy to describe Anthropic’s worldview. She compares powerful AI to a forest filled with both magical treasures and dangerous monsters. All the villagers nearby are rushing in, lured by the treasure. In her telling, Anthropic wants to venture farther into the forest than anyone else while investing heavily in taming the monsters—that is, capturing AI’s benefits while containing its catastrophic risks.

“What’s distinctive about Anthropic is they’re like, ‘People are going in the forest anyway, we have to do it first.’ This is very explicitly their strategy: build cutting-edge AI in order to be a serious player at the table who can talk about what cutting-edge AI systems should look like, what risks they pose, and pushing for reasonable safeguards,” Toner tells me. “They’re very straightforward about this. It’s just a weird enough strategy that people have a hard time hearing it.”

Anthropic CEO Dario Amodei outlined this approach plainly in a conversation with his cofounders posted on the company’s career page: “You have to find a way to actually be competitive, to actually lead the industry in some cases, and yet manage to do things safely,” he says. “If you can do that, the gravitational pull you exert is so great.”

Anthropic was founded in 2021 by a group of former OpenAI employees who defected after losing faith in the ability of the company’s leadership—particularly CEO Sam Altman—to safely bring transformational AI into the world. That sentiment still shapes the company today. Two of the former employees I spoke with say that, in internal discussions, Anthropic executives often describe Altman and OpenAI—and, to a lesser extent, Meta and Elon Musk’s xAI—as cautionary examples that help define Anthropic’s own sense of responsibility.

In many regards, Anthropic is just like any other Silicon Valley company. Many startups market themselves as David fighting the outdated, entrenched Goliaths of the industries they want to disrupt. Google, Facebook, and Apple were all founded upon idealistic principles, which later became muddied or were abandoned altogether as they became richer, larger, and more influential.

Anthropic Thinks Its Own Success Is Key to Making AI Safe

Leave a Reply Cancel reply