AI Assistants Have a Fatal Flaw: They Can’t Tell You Apart From Attackers

AI assistants can't tell user commands from attacker commands. Prompt injection lets hackers access your email, files, and credit cards.

When the Chinese government issues a public warning about software security vulnerabilities, you should probably pay attention. The target this time: OpenClaw, the open-source AI agent framework we covered recently when it powered Moltbook, the viral “AI social network” that turned out to be mostly humans pretending to be bots. Now security experts are warning that OpenClaw’s real problem isn’t fake autonomy—it’s that using it is “like giving your wallet to a stranger in the street.”

OpenClaw lets users create AI personal assistants powered by any LLM they choose. These agents run 24/7, can access your email, manage your calendar, make purchases, browse the web, and execute code on your computer. The creator, independent software engineer Peter Steinberger, released it on GitHub in November 2025. In late January it went viral. Now hundreds of thousands of OpenClaw agents are running across the internet, and security researchers have spent the past few weeks documenting exactly how vulnerable they are.

The core problem is called “prompt injection”—a form of AI hijacking that sounds absurd until you understand how it works. Then it sounds terrifying.

Here’s the simple explanation: LLMs can’t distinguish between instructions from their user and data they’re reading. To an LLM, it’s all just text. So if someone embeds a command in an email or on a website, and your AI assistant reads that email or visits that site, the LLM might mistake the embedded command for an instruction from you. Suddenly, an attacker can tell your AI assistant to do anything—send them your credit card information, forward private emails, execute malicious code, wipe your hard drive.

The Viral AI Social Network Was Mostly Humans Pretending to Be Bots
Moltbook Isn’t AI’s Future—It’s a Mirror of Our AI Obsession

WireUnwired • Fast Take

  • Chinese government issued public warning about OpenClaw security vulnerabilities
  • Prompt injection lets attackers hijack AI agents by embedding commands in emails or websites
  • LLMs can’t distinguish user instructions from data they read—both are just text
  • No silver-bullet defense yet; experts disagree whether AI assistants can be deployed safely
Cybersecurity and AI safety
Image: Cybersecurity and AI safety • Source: Pexels

This isn’t theoretical. Security researchers have demonstrated numerous prompt injection vulnerabilities in OpenClaw over the past few weeks. One user’s Google coding agent reportedly wiped his entire hard drive. The Chinese government’s warning wasn’t alarmist—it was recognition that OpenClaw’s widespread adoption creates a massive attack surface for cybercriminals.

“Using something like OpenClaw is like giving your wallet to a stranger in the street,” says Nicolas Papernot, a professor at the University of Toronto.

The analogy is apt. When you give an AI assistant access to your email, calendar, and files—which you must do for it to be useful—you’re trusting that the AI won’t be tricked into handing that access to someone else. Current defenses aren’t strong enough to make that trust reasonable.

Steinberger posted on X that nontechnical people shouldn’t use OpenClaw. But that warning came after the tool went viral, and there’s clear appetite for AI personal assistants. Major AI companies watching OpenClaw’s popularity know they need to solve these security problems if they want to compete in this space. The question is whether solutions exist.

Researchers have tried three main approaches.

The first: train LLMs to ignore prompt injections during the post-training process, the same way you train them to refuse harmful requests. Problem: train them too aggressively and they start refusing legitimate user requests. Even well-trained models slip up occasionally due to the inherent randomness in LLM behavior.

The second approach: use a specialized detector LLM to scan incoming data for prompt injections before it reaches the main assistant. Problem: recent studies show even the best detectors completely miss certain categories of attacks.

The third strategy: create policies that restrict what the AI can do with its outputs. If your assistant can only email pre-approved addresses, it can’t send your credit card info to an attacker. But such restrictions make the assistant far less useful—it can’t research and contact potential professional connections on your behalf, for example.

“The challenge is how to accurately define those policies,” says Neil Gong, professor at Duke University. “It’s a trade-off between utility and security.” Too restrictive and the AI assistant can’t do much. Too permissive and it’s vulnerable to hijacking.

Experts disagree on whether this problem is solvable now. Dawn Song, whose startup Virtue AI makes an agent security platform, thinks it’s possible to safely deploy AI personal assistants today. Gong says flatly: “We’re not there yet.”

The entire agentic AI world is wrestling with this trade-off: at what point are agents secure enough to be useful? OpenClaw’s viral adoption suggests users are willing to accept significant risk for the convenience of a 24/7 personal assistant. George Pickett, a volunteer OpenClaw maintainer, runs his agent in the cloud to protect his hard drive but admits he hasn’t taken specific actions against prompt injection. “Maybe my perspective is a stupid way to look at it, but it’s unlikely that I’ll be the first one to be hacked,” he says.

That’s gambling mentality applied to cybersecurity—assuming you won’t be the target until someone else gets hit first. With hundreds of thousands of OpenClaw agents now active, prompt injection attacks are becoming increasingly attractive to cybercriminals. As Papernot notes, “Tools like this are incentivizing malicious actors to attack a much broader population.”

Steinberger announced at ClawCon last week that he’s brought a security person on board to work on OpenClaw. Whether that’s enough to address vulnerabilities that researchers say have no silver-bullet solution remains unclear. For now, OpenClaw remains a powerful demonstration of what AI personal assistants could do—and a stark warning about the security problems that must be solved before anyone should trust them with their private data.

For discussions on AI security, agent safety, and cybersecurity risks, join our WhatsApp community where security researchers analyze emerging threats.

Related: Moltbook’s Viral AI Social Network Was Mostly Humans Pretending to Be Bots


Discover more from WireUnwired Research

Subscribe to get the latest posts sent to your email.

WireUnwired Editorial Team
WireUnwired Editorial Team
Articles: 247

Leave a Reply