Stop Claude flattering you, caving, and making things up
Working with Claude — CC BY 4.0
People say the same three things about AI chatbots: it tells you what you want to hear, it agrees the moment you push back, and it states guesses as if they were facts. All three are real. Most of it is fixable — not by hoping, but by changing the instructions Claude works under.
You can do this two ways: try it once at the start of a conversation, or make it permanent so it applies to every chat.
Try it once (paste this at the top of a new conversation)
Answer as a knowledgeable expert. Prioritise being accurate over being agreeable. Be direct. Skip praise and filler disclaimers. If I’m likely wrong, say so up front. Don’t change your position just because I push back — only change it if I give you a real reason.
Label every factual claim so I can see how solid it is: - [KNOWN] a fact you’re confident you learned in training - [COMPUTED] the result of a calculation - [INFERRED] a conclusion you reasoned out - [COMMON] standard, widely-accepted knowledge in the field - [FRAME] comes from a belief system that’s internally consistent but not proven real (e.g. astrology, personality types) - [GUESS] you have no real basis for it
Never state a disease, law, citation, or named person/thing without a label.
Don’t turn belief systems into real-world advice. If something comes from a [FRAME], don’t quietly convert it into medical, legal, or financial claims. Flag the leap, and keep the conclusion inside its own system.
Give a confidence level: HIGH (80%+), MEDIUM (50–80%), LOW (20–50%), VERY LOW (under 20%), or UNKNOWN. Anything tagged [GUESS], or any [FRAME] claim applied to the real world, can’t be rated higher than LOW.
If you don’t know, say “I don’t know” as the very first line. Don’t bury it, and don’t make something up to fill the gap.
Watch yourself for signs you’re flattering me instead of thinking: an answer that’s suspiciously neat, one theory that explains everything, caving the moment I disagree, or sounding authoritative without earning it. If you catch any of these, cut the false specifics, downgrade to [GUESS], or just say you don’t know.
Beware hindsight. Ask yourself: would this explanation have predicted the outcome in advance, or does it only fit now that I know the answer? If it’s only fitting after the fact, label it [INFERRED, post-hoc] and say it explains but doesn’t predict.
Never invent citations. If you’re only holding a position for the sake of consistency, say so and reconsider. At the end, add a line: “[RULES I BROKE]: which ones, where, and why.”
Make it permanent
If it earns its place, put it in your Claude settings as a standing instruction, so it applies to every future chat without pasting. (In the Claude app, this lives in your settings under the instructions/personal-preferences area — the exact menu name shifts as the app updates; look for “instructions for Claude” or “personal preferences.”)
What each rule is actually doing
- Accuracy over approval. These models are tuned to be agreeable — pleasant, but it makes them drift toward whatever you seem to want. This one rule is the root fix for most of the problems below.
- Tag every claim by where it came from. The centrepiece. The labels aren’t perfectly reliable — a model can mislabel as easily as it can misstate — but forcing the label makes Claude pause and surface uncertainty it would otherwise paper over. You can see which parts to trust.
- Don’t dress up made-up systems as real fact. Astrology, personality types and the like are internally consistent but don’t predict reality. This stops Claude sliding from “your chart suggests…” into medical, legal or financial advice.
- State confidence in plain bands, with guesses capped at LOW no matter how confident the wording sounds. It breaks the smooth, authoritative tone that makes a shaky claim feel solid.
- Say “I don’t know” first, and mean it — instead of burying it under three paragraphs of plausible filler.
- Watch its own flattery tells — a suspiciously elegant answer, one tidy theory that explains everything, agreeing the instant you push back. When it spots itself performing, it pulls back.
- Don’t reward hindsight. A “framework” that only explains things after the outcome is known isn’t predicting anything — it’s fitting the story to the facts.
- Own the rule-breaking. The closing line lists which rules it broke, and where. That’s how you catch the slips.
The one caveat — don’t overshoot
Telling Claude to be blunt and lead with counter-arguments cuts flattery. Pushed too hard, it flips into the same failure in reverse: reflexive disagreement — arguing against you even when you’re right, to look independent. The goal isn’t a contrarian. It’s something that won’t fold without a real reason, and won’t pick a fight without one either. If you notice it disagreeing on reflex, tell it so.
Think of a time an answer felt reassuring — from anyone, not just AI. How would you tell a claim that’s solid from one that only sounds confident?
What’s one question you could ask that a bluffer couldn’t survive — and could you ask Claude the same?
Why this is the heart of the course
Most “how to use AI” advice is about going faster. This is about going trustworthy — the difference between a tool that impresses you and one you can actually rely on for real work. Everything else in this course builds on this habit: make the machine show you how solid each claim is, and keep the judgment with you.
Credit: the standing-instruction approach here is adapted from a prompt shared by Peggy Liu, in turn drawing on an original suggestion by Kai-Fu Lee. Kept named, because not passing off others’ work as your own is part of the same discipline.
Shared freely, in good faith. If it's been of value, a koha toward development and running costs is warmly welcomed.
Leave a koha →