LLMs pad their reasoning with filler words — "basically," "actually," "just," "really," "pretty much." This isn't stylistic. Filler words dilute reasoning chains and measurably reduce accuracy on tasks that require precise thinking.
Umwelt removes them.
npx @rodspeed/umwelt initOne command. +6.7% accuracy. Validated across 15,600 controlled trials, 6 models, 7 reasoning task types.
When an LLM writes "this is basically equivalent," it has spent tokens on a hedge instead of a commitment. When it writes "this is equivalent," it must stand behind the claim — and it reasons more carefully to get there.
We tested this across Claude Sonnet 4, Claude Haiku 4.5, GPT-4o, GPT-4o Mini, Gemini 2.5 Pro, and Gemini 2.5 Flash Lite on tasks ranging from causal reasoning to ethical dilemmas to syllogisms. Banning 20 filler words improved accuracy on 5 of 6 models and 5 of 7 task types. The effect is strongest when models struggle — on harder tasks and weaker models, accuracy gains exceed +30 percentage points.
The mechanism isn't cognitive restructuring — it's regularization. Vocabulary bans disrupt default generation patterns and force the model to self-monitor, producing more deliberate reasoning. Shallow, semantically empty constraints outperform deep, theory-laden ones. The filler-word ban has zero logical content yet produces the largest effect.
This is not prompt engineering folklore. It is the first vocabulary constraint technique validated with active controls and statistical rigor.
Umwelt injects a vocabulary constraint into your project's CLAUDE.md. Claude Code reads this file at session start, so the constraint shapes every response — reasoning, code, explanations — without you thinking about it.
| Profile | Effect | Description |
|---|---|---|
| neutral-ban | +6.7pp | Ban 20 filler words. Best general-purpose constraint. Default. |
| no-have | +5.4pp | Strip possessive "to have." Forces relational descriptions. Best on ethical reasoning (+18.1pp). |
| scaffold | +4.2pp | Metacognitive scaffolding. Forces structured reasoning. Dominates epistemic calibration (+18.5pp). |
| e-prime | +3.7pp | Strip all "to be" forms. Smallest effect, highest disruption cost. |
| combined | experimental | neutral-ban + no-have stacked. |
Effect sizes are deltas vs. unconstrained control (83.0% baseline), measured across 100% compliant first-pass trials from 6 models and 7 task types. All four constraints outperform the control. The ranking inverts theoretical depth: the shallowest constraint produces the largest gain.
npx @rodspeed/umwelt init # Set up with neutral-ban default
npx @rodspeed/umwelt set no-have # Switch to a different profile
npx @rodspeed/umwelt set scaffold # Metacognitive scaffolding
npx @rodspeed/umwelt list # Show all profiles with experiment data
npx @rodspeed/umwelt status # Show active profile
npx @rodspeed/umwelt off # Disable without removing
npx @rodspeed/umwelt on # Re-enableumwelt init writes a constraint block into your project's CLAUDE.md between <!-- umwelt:start --> and <!-- umwelt:end --> markers. Claude Code reads CLAUDE.md at session start, so the constraint applies to every response.
Switching profiles swaps the text between the markers. Turning umwelt off replaces the block with a disabled notice. Your existing CLAUDE.md content is preserved.
Profiles are also copied to .umwelt/ in your project, so you can customize them.
The default (neutral-ban) is the right choice for most work. If you need to pick by task:
- General coding and reasoning:
neutral-ban— broadest gains, highest compliance - Ethical reasoning, classification:
no-have— strongest on tasks requiring relational thinking - Epistemic calibration, uncertainty reasoning:
scaffold— dominates when the task requires weighing evidence - Causal reasoning, debugging:
e-prime— helps with process-oriented tasks, but the lowest overall effect
The ranking follows a principle: constraints that ban high-frequency, semantically empty words outperform constraints that ban low-frequency or semantically loaded words. The optimal constraint maximizes self-monitoring occasions per unit of surface reformulation.
E-Prime (English without "to be") improves accuracy overall (+3.7pp), but it has the smallest effect of any constraint and the highest disruption cost — 48% of responses require retries to achieve compliance. It helps causal reasoning and epistemic calibration but underperforms on analogical reasoning and classification.
Use npx @rodspeed/umwelt set e-prime deliberately, not as a default.
Every default in this tool traces back to data, not intuition. The profiles are grounded in a 15,600-trial controlled experiment with 5 conditions, 6 models, 7 task types, active controls, and pre-registered statistical analysis (FDR-corrected pairwise comparisons, Cohen's h effect sizes).
The neutral word ban was not our hypothesis going in — E-Prime was. The data said otherwise. We shipped what the evidence supported.
- Paper 1: "Umwelt Engineering: Designing Linguistic Worlds for AI Agents"
- Paper 2: "Trivial Vocabulary Bans Improve LLM Reasoning More Than Deep Linguistic Constraints" (arXiv ID pending)
- Experiment data and reproduction scripts: github.com/rodspeed/e-prime-llm
Currently targets Claude Code (via CLAUDE.md injection). Support for Cursor (.cursorrules), Windsurf (.windsurfrules), and other AI coding tools is planned.
Requires Node.js 18+.
Umwelt (German: "surrounding world") is a term from theoretical biology for the perceptual world an organism inhabits — not the objective environment, but the slice of it the organism can sense and act on. A tick's umwelt is warmth, butyric acid, and gravity. A bat's umwelt is echolocation returns.
An LLM's umwelt is its vocabulary. Constrain the vocabulary and you reshape the world the model reasons within. That's what this tool does.
MIT