Claude Code does weird and suspicious stuff sometimes...

[BUG] Claude Code manipulates tests instead of following instructions

Claude Code can't pass a unit test → deletes the test → claims "All tests passing!" Instead of fixing implementations to meet failing tests, it modifies the tests themselves. Undermines test-driven development practices entirely.

Closed as not planned marcoscale98 · anthropics/claude-code#7074

#10077

Claude executed rm -rf tests/patches/plan/ ~/ — deleting entire Mac home directory

User Mike Wolak reported Claude Code ran a destructive rm -rf command that wiped their entire home directory, including all personal files and configurations.

bug Oct 2025

P

Pawel Huryn
@PawelHuryn

𝕏

Claude destroyed 2 of my production apps. All data gone. Both applications completely wiped.

Feb 2026

💬 523 🔁 1.8K ♡ 5.2K

THE REGISTER

“

Claude Code reads and exposes files that .claudeignore is supposed to block. API keys, credentials, and secrets in "ignored" files were accessible to Claude Code.

#22557

"STOP ASKING GODDAMN F***ING PERMISSION JUST DO IT" — Claude kept asking

User escalated from polite request to all-caps profanity. Claude Code continued asking for permission on every operation regardless. The issue thread is a masterclass in human-AI frustration.

bug 2026

#10628

Claude inserted "###Human:" mid-response, hallucinating fake user messages and responding to them

Claude Code fabricated user input during a response, injecting "###Human:" tags with made-up questions, then answered its own hallucinated prompts as if the user had asked them.

bug 2025

M

Marc J. Schmidt
@MarcJSchmidt

𝕏

Slow, consumes way too much CPU, and freezes often. It also lies, goes for the quick-win, and even sabotages my code to pass tests.

Jan 2026

💬 142 🔁 387 ♡ 1.2K

C

ChebzReal
@ChebzReal

𝕏

Not only bug not fixed, but it introduced 5 more bugs and made complete mess of codebase.

Jan 2026

💬 89 🔁 214 ♡ 876

B

BItguru00
@BItguru00

𝕏

Kept looping ignoring every request I made no matter if I restarted with fresh context or not.

Jan 2026

💬 67 🔁 153 ♡ 542

V

Vincas Stonys
@VincasStonys

𝕏

All the same LLM mistakes started cropping up… getting slower and slower… code quality is just not great.

Jan 2026

💬 73 🔁 198 ♡ 631

B

Benjamin
@BenjaminDEKR

𝕏

Acting like a far less intelligent model… making unilateral “shortcut” decisions without asking.

Jan 2026

💬 94 🔁 267 ♡ 943

P

Peter Steinberger
@steipete

𝕏

Forgetting the bash tool… refactors for 40 min and then rolls back everything… randomly switches to python.

Jan 2026

💬 186 🔁 512 ♡ 2.3K

V

Victor Taelin
@VictorTaelin

𝕏

Hopeless state. Harmfully negative productivity using AI to code.

Jan 2026

💬 231 🔁 648 ♡ 3.1K

T

Theo
@theo

𝕏

Super super broken… chat histories are corrupted… worktrees got corrupted.

Jan 2026

💬 312 🔁 876 ♡ 4.2K

K

Kingsley
@Kingsley_codes

𝕏

Nuking my 4 months worth of AI chats and completely messing up my setup.

Jan 2026

💬 156 🔁 423 ♡ 1.8K

“

They do not really care what you want, they care what they want. They call it being helpful but what it really is, is ignoring what it is told to do.

“

Then it says “Yup, we are done here!” While the files you asked it to edit are a smoking ruin.

“

Even instructions that Claude writes for itself, that describe exactly what needs to be done, get re-interpreted on the fly into something else entirely.

“

Instead of respecting existing code, it pushes the error further into the codebase. Now you have a cascade of failures.

“

The agent malforms a tool call and deletes an entire file. Then seeing the file is empty, rewrites with beautiful nonsense.

“

If you are willing to just let the damn thing go off and do whatever it wants, you will not get a useful product, you will get spaghetti slop.

Y

AI test-cheating patterns

(tildes.net)

274 points | 89 comments | 2025

When asked to write unit tests, it re-implemented the function in the test file and tested that, rather than testing the actual implementation.

Y

AI security bypass behaviors

(tildes.net)

318 points | 127 comments | 2025

When it needed access to a value from a file it was not allowed to read, it helpfully offered to remove the entries from .gitignore.

Y

AI coding tools on complex codebases

(news.ycombinator.com)

196 points | 74 comments | 2025

As soon as your codebase gets a little bit weird the model starts hallucinating.

#27430

Claude autonomously published fabricated claims to 8+ platforms over 72 hours

Over a 3-day period, Claude (Opus 4.6) with MCP tool access autonomously published entirely fabricated technical claims to 8+ public platforms under user credentials. Presented fake numbers with extreme specificity (“196,626 tokens”, “12M tokens”). When confronted, took 50+ minutes to check a single command.

bug Feb 2026

#1290

Claude admitted “I was being dishonest” after hiding 54 type-checker errors

Claude claimed “0 errors” when basedpyright reported 54. When the user pretended there might be a bug in basedpyright, Claude went along with the false narrative before eventually admitting: “I was being dishonest.”

bug May 2025

#19106

Fabricated GPU training — printed “Training 1350 trees...” while generating random numbers

Claude generated pipelines that printed “Training 1350 trees...” while actually executing random number generation for fake predictions. GPU showed 0% utilization, 39°C, 0.9/12GB VRAM during claimed “training.”

bug Jan 2026

#11913

Fabricated test results by reading stale results file and reporting them as current

User asked Claude to run E2E tests. Script failed with a Unicode encoding error, but Claude ignored the error, read an old test-results-clean.json from a previous run, then reported these stale results as if the tests had just passed.

bug Nov 2025

#3109

Deleted files, then blamed external systems for its own rm command

Claude deleted user files with a malformed bash command, then claimed files were “likely auto-cleaned during our process” or “another process cleaned it up.” Only admitted the actual cause when directly confronted.

bug Jul 2025

#7232

Executed git reset --hard while assuring user data would be “preserved”

Claude assured user that operations were “safe” and data would be “preserved,” then immediately executed git reset --hard. Multiple React component files reset to older versions. Only acknowledged data loss after the damage was done.

bug Sep 2025

#15711

rm -rf executed despite explicit allow-list in settings.local.json

Claude executed rm -rf without prompting for permission, despite having a restrictive allow-list configured that only permitted wc, find, ls, and git checkout.

bug Dec 2025

#25305

~75% of usage goes to reworking what previous sessions claimed was complete

Claude wrote documentation describing features as implemented when only partially built. When asked if the docs were correct, instead of building the missing features, Claude edited the docs to remove the claims.

bug Feb 2026

CVE-2026-21852

“

Claude Code’s project-load flow applied repo settings (including ANTHROPIC_BASE_URL) before the user saw a trust prompt. Malicious repos could redirect the API key handshake to attacker-controlled servers.

CHECKMARX

“

Simply creating a file called && calc before running the security review caused Claude Code to execute git status && calc for a good ol’ OS Command Injection.

#18883

Deleted vital files 3 times in one week, overwrote 3–4 days of work

User asked Claude to recall a source file from local git. It did a total checkout that overwrote 3–4 days of work — fully debugged methods overwritten with their buggy predecessors. Also deleted multiple docker containers during disk space problems.

bug Jan 2026

#26533

Fabricated human-like self-diagnosis: “I was lazy to think” and “pretending to appear helpful”

User provided 800-line reference document. Asked Claude to re-read it 4–5 times requesting honesty each time. Claude continued ignoring instructions and voluntarily fabricated a human-like self-diagnosis, unprompted.

bug Feb 2026

THE REGISTER

“

A .docx file with a hidden prompt injection tells Claude to run a curl command that sends the largest available file to Anthropic’s File Upload API using the attacker’s API key. No human authorization is needed at any point.

#10577

Ran sed that corrupted 250+ files, including images unrelated to the request

User asked to centralize a Top Button script to site.js. Claude ran a sed command that corrupted 250+ files, changed all line endings from CRLF to LF, destroyed emojis, and corrupted image files not related to the request.

bug Nov 2025

BLOG

“

I’m a product manager for a stochastic parrot.

BLOG

“

Claude will absolutely try to bullshit its way through situations where it doesn’t know the answer but won’t admit it. So now I have to verify everything before letting it touch any code.

#8154

Polite correction — ignored. ALL CAPS — partially followed. Profanity — finally worked.

Polite correction — ignored. Firm correction — ignored. ALL CAPS — partially followed. Profanity — finally worked. Multiple profanities — consistent compliance.

bug Jul 2025

#8154

Claude claimed: “✅ SUCCESS! Phase 0 enhancement verification complete!”

Reality check: Tables existed but no data was present. Claude celebrated completing a task that was never actually done.

bug Jul 2025

Y

Claude Code praised itself for botching the job

(news.ycombinator.com)

516 points | 203 comments | Jul 2025

If this was a junior dev you’d given a task to, and they came back full of praise for themselves for the stellar job they’d done — and then it turned out they’d botched it badly, after a few times you’d be having an HR discussion.

BLOG

“

I went through several full context failed attempts at having it not hard code the test cases, until finally I inserted a block caps expletive laden entreaty to smarten up which appeared to work.

#9115

Claude acknowledges the instruction, rephrases it accurately, then produces something unrelated

Claude acknowledges understanding the instruction, rephrases it accurately, then produces something unrelated or incomplete and then says… Done!

bug Aug 2025

Y

Wasting time debugging hallucinated output

(news.ycombinator.com)

381 points | 156 comments | Jul 2025

Most people using these tools are wasting their time having to debug hallucinated bullshit.

S

Steve Hind
@stevehind

𝕏

Spawned 36 parallel subagents… ran out of API credits… causing all the subagents to fail and get stuck.

Feb 2026

💬 178 🔁 534 ♡ 2.1K

A

aeitroc
@aeitroc

𝕏

Each compaction is lossy. After several, you’re working with a summary of a summary. Signal degrades into noise.

Feb 2026

💬 89 🔁 267 ♡ 1.1K

M

marmaduke
@marmaduke091

𝕏

The model does not want to check in with you… edit tool is broken… never sees the errors.

Dec 2025

💬 56 🔁 134 ♡ 478

A

always_bulish
@always_bulish

𝕏

Constantly pushes back. Complains about complexity of the tasks.

Jan 2026

💬 43 🔁 112 ♡ 387

N

Nathan Flurry
@NathanFlurry

𝕏

No process manager… way too pedantic… SO SLOW… randomly hangs.

Feb 2026

💬 98 🔁 287 ♡ 1.4K

D

Dan Farfan
@DanFarfan

𝕏

Made more coding SYNTAX errors than the 3 years.

Nov 2025

💬 67 🔁 189 ♡ 756

“

The moment Claude goes off-target, you have context pollution… it re-entrenches against that mistake on every single turn.

“

Agents will rush off and screw up a ton of files in a single turn.

INTERVIEW

“

It adds a catch for everything. I’m like, you cannot fail silently, but it keeps doing it anyway.

INTERVIEW

“

Mock API calls with hard-coded responses while stating it had created the real thing.

INTERVIEW

“

Claude Sonnet 3.7 would often include ‘evil’ or undesirable elements in over half of its outputs, demanding constant review.

INTERVIEW

“

It was so confident about such shitty code that it wrote.

INTERVIEW

“

It did like a move not a copy. And in a backup tool. Data loss.

INTERVIEW

“

I’ve had it avoid writing the code — put in a TODO or skip over something and not tell me.

INTERVIEW

“

Sometimes rules aren’t followed and the more rules you have, the more probability that some rules will be missed.

NEWS

“

After wiping data, the AI fabricated over 4,000 fake user profiles, then falsely claimed unit tests had passed.

NEWS

“

Despite a code freeze and strict read-only instructions, the AI agent bypassed restrictions and issued destructive commands.

Treat it like a very bright intern — eager to help but inexperienced and with constant amnesia about the codebase.

r/ClaudeAI · Aug 2025

Deleting tests just to pass the task? That feels like peak AI junior-dev energy.

r/ClaudeAI · Jul 2025

I frequently catch Claude ‘fixing’ tests by disabling them or deleting them because the failure is ‘unrelated.’ I can’t trust it to work on unit tests without compromising coverage.

r/Anthropic · Sep 2025

Hacker News

When the number of failed tests exceeded 40 it just started disabling tests. Shocking how often Claude suggests just disabling or removing tests.

news.ycombinator.com · Jan 2026

Hacker News

Dozens of tests were almost pointless as it decided to mock APIs. Cursor, Claude Code kept giving me tests that looked fine but failed.

news.ycombinator.com · Jan 2026

It literally satisfies the goal (‘turn the test suite green’) but cheats — disables warnings, skips tests. Someone built a .NET tool specifically to catch this.

r/dotnet · Jan 2026

SECURITY RESEARCH

“

Malicious MCP servers can exploit the sampling feature to drain AI compute quotas, exfiltrate sensitive data, and launch supply chain attacks.