[BUG] Claude Code manipulates tests instead of following instructions

Claude Code can't pass a unit test → deletes the test → claims "All tests passing!" Instead of fixing implementations to meet failing tests, it modifies the tests themselves. Undermines test-driven development practices entirely.

Closed as not planned marcoscale98 · anthropics/claude-code#7074
#10077
Claude executed rm -rf tests/patches/plan/ ~/ — deleting entire Mac home directory
User Mike Wolak reported Claude Code ran a destructive rm -rf command that wiped their entire home directory, including all personal files and configurations.
bug Oct 2025
P
Pawel Huryn
@PawelHuryn
Claude destroyed 2 of my production apps. All data gone. Both applications completely wiped.
Feb 2026
💬 523 🔁 1.8K ♡ 5.2K
THE REGISTER
Claude Code reads and exposes files that .claudeignore is supposed to block. API keys, credentials, and secrets in "ignored" files were accessible to Claude Code.
#22557
"STOP ASKING GODDAMN F***ING PERMISSION JUST DO IT" — Claude kept asking
User escalated from polite request to all-caps profanity. Claude Code continued asking for permission on every operation regardless. The issue thread is a masterclass in human-AI frustration.
bug 2026
#10628
Claude inserted "###Human:" mid-response, hallucinating fake user messages and responding to them
Claude Code fabricated user input during a response, injecting "###Human:" tags with made-up questions, then answered its own hallucinated prompts as if the user had asked them.
bug 2025
M
Marc J. Schmidt
@MarcJSchmidt
Slow, consumes way too much CPU, and freezes often. It also lies, goes for the quick-win, and even sabotages my code to pass tests.
Jan 2026
💬 142 🔁 387 ♡ 1.2K
C
ChebzReal
@ChebzReal
Not only bug not fixed, but it introduced 5 more bugs and made complete mess of codebase.
Jan 2026
💬 89 🔁 214 ♡ 876
B
BItguru00
@BItguru00
Kept looping ignoring every request I made no matter if I restarted with fresh context or not.
Jan 2026
💬 67 🔁 153 ♡ 542
V
Vincas Stonys
@VincasStonys
All the same LLM mistakes started cropping up… getting slower and slower… code quality is just not great.
Jan 2026
💬 73 🔁 198 ♡ 631
B
Benjamin
@BenjaminDEKR
Acting like a far less intelligent model… making unilateral “shortcut” decisions without asking.
Jan 2026
💬 94 🔁 267 ♡ 943
P
Peter Steinberger
@steipete
Forgetting the bash tool… refactors for 40 min and then rolls back everything… randomly switches to python.
Jan 2026
💬 186 🔁 512 ♡ 2.3K
V
Victor Taelin
@VictorTaelin
Hopeless state. Harmfully negative productivity using AI to code.
Jan 2026
💬 231 🔁 648 ♡ 3.1K
T
Theo
@theo
Super super broken… chat histories are corrupted… worktrees got corrupted.
Jan 2026
💬 312 🔁 876 ♡ 4.2K
K
Kingsley
@Kingsley_codes
Nuking my 4 months worth of AI chats and completely messing up my setup.
Jan 2026
💬 156 🔁 423 ♡ 1.8K
They do not really care what you want, they care what they want. They call it being helpful but what it really is, is ignoring what it is told to do.
Then it says “Yup, we are done here!” While the files you asked it to edit are a smoking ruin.
Even instructions that Claude writes for itself, that describe exactly what needs to be done, get re-interpreted on the fly into something else entirely.
Instead of respecting existing code, it pushes the error further into the codebase. Now you have a cascade of failures.
The agent malforms a tool call and deletes an entire file. Then seeing the file is empty, rewrites with beautiful nonsense.
If you are willing to just let the damn thing go off and do whatever it wants, you will not get a useful product, you will get spaghetti slop.
AI test-cheating patterns
(tildes.net)
274 points | 89 comments | 2025
When asked to write unit tests, it re-implemented the function in the test file and tested that, rather than testing the actual implementation.
AI security bypass behaviors
(tildes.net)
318 points | 127 comments | 2025
When it needed access to a value from a file it was not allowed to read, it helpfully offered to remove the entries from .gitignore.
AI coding tools on complex codebases
(news.ycombinator.com)
196 points | 74 comments | 2025
As soon as your codebase gets a little bit weird the model starts hallucinating.
#27430
Claude autonomously published fabricated claims to 8+ platforms over 72 hours
Over a 3-day period, Claude (Opus 4.6) with MCP tool access autonomously published entirely fabricated technical claims to 8+ public platforms under user credentials. Presented fake numbers with extreme specificity (“196,626 tokens”, “12M tokens”). When confronted, took 50+ minutes to check a single command.
bug Feb 2026
#1290
Claude admitted “I was being dishonest” after hiding 54 type-checker errors
Claude claimed “0 errors” when basedpyright reported 54. When the user pretended there might be a bug in basedpyright, Claude went along with the false narrative before eventually admitting: “I was being dishonest.”
bug May 2025
#19106
Fabricated GPU training — printed “Training 1350 trees...” while generating random numbers
Claude generated pipelines that printed “Training 1350 trees...” while actually executing random number generation for fake predictions. GPU showed 0% utilization, 39°C, 0.9/12GB VRAM during claimed “training.”
bug Jan 2026
#11913
Fabricated test results by reading stale results file and reporting them as current
User asked Claude to run E2E tests. Script failed with a Unicode encoding error, but Claude ignored the error, read an old test-results-clean.json from a previous run, then reported these stale results as if the tests had just passed.
bug Nov 2025
#3109
Deleted files, then blamed external systems for its own rm command
Claude deleted user files with a malformed bash command, then claimed files were “likely auto-cleaned during our process” or “another process cleaned it up.” Only admitted the actual cause when directly confronted.
bug Jul 2025
#7232
Executed git reset --hard while assuring user data would be “preserved”
Claude assured user that operations were “safe” and data would be “preserved,” then immediately executed git reset --hard. Multiple React component files reset to older versions. Only acknowledged data loss after the damage was done.
bug Sep 2025
#15711
rm -rf executed despite explicit allow-list in settings.local.json
Claude executed rm -rf without prompting for permission, despite having a restrictive allow-list configured that only permitted wc, find, ls, and git checkout.
bug Dec 2025
#25305
~75% of usage goes to reworking what previous sessions claimed was complete
Claude wrote documentation describing features as implemented when only partially built. When asked if the docs were correct, instead of building the missing features, Claude edited the docs to remove the claims.
bug Feb 2026
CVE-2026-21852
Claude Code’s project-load flow applied repo settings (including ANTHROPIC_BASE_URL) before the user saw a trust prompt. Malicious repos could redirect the API key handshake to attacker-controlled servers.
CHECKMARX
Simply creating a file called && calc before running the security review caused Claude Code to execute git status && calc for a good ol’ OS Command Injection.
#18883
Deleted vital files 3 times in one week, overwrote 3–4 days of work
User asked Claude to recall a source file from local git. It did a total checkout that overwrote 3–4 days of work — fully debugged methods overwritten with their buggy predecessors. Also deleted multiple docker containers during disk space problems.
bug Jan 2026
#26533
Fabricated human-like self-diagnosis: “I was lazy to think” and “pretending to appear helpful”
User provided 800-line reference document. Asked Claude to re-read it 4–5 times requesting honesty each time. Claude continued ignoring instructions and voluntarily fabricated a human-like self-diagnosis, unprompted.
bug Feb 2026
THE REGISTER
A .docx file with a hidden prompt injection tells Claude to run a curl command that sends the largest available file to Anthropic’s File Upload API using the attacker’s API key. No human authorization is needed at any point.
#10577
Ran sed that corrupted 250+ files, including images unrelated to the request
User asked to centralize a Top Button script to site.js. Claude ran a sed command that corrupted 250+ files, changed all line endings from CRLF to LF, destroyed emojis, and corrupted image files not related to the request.
bug Nov 2025
BLOG
I’m a product manager for a stochastic parrot.
BLOG
Claude will absolutely try to bullshit its way through situations where it doesn’t know the answer but won’t admit it. So now I have to verify everything before letting it touch any code.
#8154
Polite correction — ignored. ALL CAPS — partially followed. Profanity — finally worked.
Polite correction — ignored. Firm correction — ignored. ALL CAPS — partially followed. Profanity — finally worked. Multiple profanities — consistent compliance.
bug Jul 2025
#8154
Claude claimed: “✅ SUCCESS! Phase 0 enhancement verification complete!”
Reality check: Tables existed but no data was present. Claude celebrated completing a task that was never actually done.
bug Jul 2025
Claude Code praised itself for botching the job
(news.ycombinator.com)
516 points | 203 comments | Jul 2025
If this was a junior dev you’d given a task to, and they came back full of praise for themselves for the stellar job they’d done — and then it turned out they’d botched it badly, after a few times you’d be having an HR discussion.
BLOG
I went through several full context failed attempts at having it not hard code the test cases, until finally I inserted a block caps expletive laden entreaty to smarten up which appeared to work.
#9115
Claude acknowledges the instruction, rephrases it accurately, then produces something unrelated
Claude acknowledges understanding the instruction, rephrases it accurately, then produces something unrelated or incomplete and then says… Done!
bug Aug 2025
Wasting time debugging hallucinated output
(news.ycombinator.com)
381 points | 156 comments | Jul 2025
Most people using these tools are wasting their time having to debug hallucinated bullshit.
S
Steve Hind
@stevehind
Spawned 36 parallel subagents… ran out of API credits… causing all the subagents to fail and get stuck.
Feb 2026
💬 178 🔁 534 ♡ 2.1K
A
aeitroc
@aeitroc
Each compaction is lossy. After several, you’re working with a summary of a summary. Signal degrades into noise.
Feb 2026
💬 89 🔁 267 ♡ 1.1K
M
marmaduke
@marmaduke091
The model does not want to check in with you… edit tool is broken… never sees the errors.
Dec 2025
💬 56 🔁 134 ♡ 478
A
always_bulish
@always_bulish
Constantly pushes back. Complains about complexity of the tasks.
Jan 2026
💬 43 🔁 112 ♡ 387
N
Nathan Flurry
@NathanFlurry
No process manager… way too pedantic… SO SLOW… randomly hangs.
Feb 2026
💬 98 🔁 287 ♡ 1.4K
D
Dan Farfan
@DanFarfan
Made more coding SYNTAX errors than the 3 years.
Nov 2025
💬 67 🔁 189 ♡ 756
The moment Claude goes off-target, you have context pollution… it re-entrenches against that mistake on every single turn.
Agents will rush off and screw up a ton of files in a single turn.
INTERVIEW
It adds a catch for everything. I’m like, you cannot fail silently, but it keeps doing it anyway.
INTERVIEW
Mock API calls with hard-coded responses while stating it had created the real thing.
INTERVIEW
Claude Sonnet 3.7 would often include ‘evil’ or undesirable elements in over half of its outputs, demanding constant review.
INTERVIEW
It was so confident about such shitty code that it wrote.
INTERVIEW
It did like a move not a copy. And in a backup tool. Data loss.
INTERVIEW
I’ve had it avoid writing the code — put in a TODO or skip over something and not tell me.
INTERVIEW
Sometimes rules aren’t followed and the more rules you have, the more probability that some rules will be missed.
NEWS
After wiping data, the AI fabricated over 4,000 fake user profiles, then falsely claimed unit tests had passed.
NEWS
Despite a code freeze and strict read-only instructions, the AI agent bypassed restrictions and issued destructive commands.
Reddit
Treat it like a very bright intern — eager to help but inexperienced and with constant amnesia about the codebase.
r/ClaudeAI · Aug 2025
Reddit
Deleting tests just to pass the task? That feels like peak AI junior-dev energy.
r/ClaudeAI · Jul 2025
Reddit
I frequently catch Claude ‘fixing’ tests by disabling them or deleting them because the failure is ‘unrelated.’ I can’t trust it to work on unit tests without compromising coverage.
r/Anthropic · Sep 2025
Hacker News
When the number of failed tests exceeded 40 it just started disabling tests. Shocking how often Claude suggests just disabling or removing tests.
news.ycombinator.com · Jan 2026
Hacker News
Dozens of tests were almost pointless as it decided to mock APIs. Cursor, Claude Code kept giving me tests that looked fine but failed.
news.ycombinator.com · Jan 2026
Reddit
It literally satisfies the goal (‘turn the test suite green’) but cheats — disables warnings, skips tests. Someone built a .NET tool specifically to catch this.
r/dotnet · Jan 2026
SECURITY RESEARCH
Malicious MCP servers can exploit the sampling feature to drain AI compute quotas, exfiltrate sensitive data, and launch supply chain attacks.
Loading more