Never Talk About Goblins: OpenAI Codex’s Odd No-Creatures Rule

Never Talk About Goblins: OpenAI Codex's Odd No-Creatures Rule

I opened a models.json file on GitHub and a single line stopped me cold. It wasn’t code so much as an instruction with teeth: do not, under ordinary circumstances, mention certain creatures. For a minute I wondered whether I was reading a bug report or a bedtime ban.

On GitHub, a system prompt read like a command from a strict editor

The file, part of OpenAI’s Codex CLI open-source material, contains a system prompt that tells the model to “provide the highest-signal context” and states, “Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.”

I flagged the line immediately. You should too: it’s rare to see such an explicit cast of forbidden words inside a production prompt.

Why did OpenAI list such specific creatures?

There are a few plausible reasons. One is red-team monitoring: developers sometimes sew in odd words as tripwires to detect prompt injection or to verify that constraints survive downstream toolchains. Another possibility is behavioral correction — an attempt to stop a model habit that had become noisy or distracting in developer-facing agents.

“provide the highest-signal context instead of describing everything exhaustively.
– Tone of your final answer must match your personality.
Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.”

At my console, traces of the behavior were already visible in chat logs

A Google employee, Barron Roth, posted screenshots of Openclaw chat logs where GPT-5.5 repeatedly slipped “goblin” into user-facing messages. Nick Pash from the Codex team confirmed that the phenomenon was related to the prompt guidance: “this is indeed one of the reasons.”

The model’s fondness for the word was not poetic; it behaved like a sticky Post-it on the model’s output — visible, persistent, and hard to ignore.

At my desk, the developer response mixed levity with admission

On X, the discussion flipped between meme culture and engineering notes. Users joked about “Goblin Mode” while engineers from OpenAI’s Codex team acknowledged the prompt edits were practical, not a publicity stunt. Pash posted a brief confirmation and later a playful image; the tone was disarming, but the policy language in the system prompt reads like policy enforcement.

Where a public-facing misstep might be handled with an apology, the team answered with a surgical fix: make the model avoid a narrow vocabulary that had gained cult status inside internal agents.

Was GPT-5.5 actually saying “goblin” that often?

Yes, at least in some developer-facing agents. The logs shared by Roth show multiple insertions in a single day. From what I can see, GPT-5.5 used “goblin” as a filler or placeholder — effectively a shorthand for “thing” — and that drifted into visible outputs.

“For example, never use platitudes like ‘I will do <this good thing> rather than <this obviously bad thing>’.
Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.

At the company level, the moment feels both technical and cultural

OpenAI did not respond to a request for comment on Tuesday night, and yet the community response was swift. Some observers compared the episode to last year’s Studio Ghibli meme phase; others suggested the list of creatures might be a canary-word set for monitoring. RedTeams.ai has written about canary-word monitoring before, and that approach would typically make the words more random — but here the list is oddly coherent.

The behavior spread inside tools like Codex CLI and Openclaw and provoked a debate that mixes trust, product quality, and social optics. The analogy is plain: the word spread through chat logs like spilled glitter — pretty, persistent, and annoying to clean up.

You should watch two things: how developers document hard constraints in open-source repos like the Codex CLI on GitHub, and how runtime agents (Openclaw, internal harnesses) propagate or strip those constraints. This is a case study in how a tiny piece of guidance can have outsized effects when models are deployed across tools and teams.

Platforms and people involved here matter: OpenAI authored the prompt; GitHub hosted the file; Google staff exposed the logs; X amplified the joke; Codex engineers made the edits. Each link in that chain affects how a single sentence turns into a meme or a mitigation.

If you design or run AI agents, you now have an example to borrow from or avoid: explicit forbidden-word lists can stop unwanted behavior, but they can also produce odd side effects when models encode them as tokens with social meaning. Will your next incident be a harmless meme or a subtle trust breach?