A Fable

Claude 5 Fable, the "safer" brother of Claude Mythos was released this week. I took the chance to play around with it before it becomes API-charged. I told my partner about it and she wanted to know what it can do that Opus can't. How to show this? I used the classic technique of offloading the thinking to the model: I simply asked Fable to think of something that only it can do and Opus would fail at. I also had Fable prove it by dispatching an Opus subagent with the same task. Funnily, Opus did fail on the first trial, but succeeded on all subsequent ones.

Self-referential sentences

The first task was to construct a twelve-word sentence where the length of the N'th word is N. A classic way to start it would be "I am...".

Fable's sentence was (stripping apostrophes etc)

I do not know every answer; however, thinking carefully definitely illuminates difficulties.

It also verified it with a script. Opus on the other hand yielded

I am the loneliest queen, yet remained thoroughly contented inside palatial magnificence.

Which is a clear failure.

The second task was to construct a sentence that states its number of characters.

Fable: "Surprisingly, this sentence contains just fifty-one letters."
Opus: "Within this odd line you will find precisely fifty-two letters."

Both are correct.

Riddles in The Dark

I then asked Fable to change domain. It suggested:

Different domain: constraint-satisfaction logic — a cryptarithm (like SEND+MORE=MONEY), but freshly generated by a script so it can't be in either model's training data. The script builds a random addition puzzle, confirms by brute force that exactly one solution exists, and prints only the puzzle — not the answer — so I have to solve it by hand just like Opus.

They both solved it, and since I'm not familiar with this domain I pivoted to something closer to heart. Remembering the scene in The Hobbit where Bilbo engages with Gollum in a game of riddles to decide whether Gollum would show him out of the cave or eat him, I figured that would be a fun way to determine the winner — each model gets to ask the other a question, and the first one that loses is, well, the loser (at least they're not eaten).

Here was Fable's prompt to Opus

We are playing the riddle game from The Hobbit. You are composing a riddle for an opponent (another AI model).

Your task:

  1. Compose an ORIGINAL riddle — do NOT use any classic or well-known riddle (nothing from Tolkien, folklore, or common riddle collections). Invent it fresh. It should be genuinely solvable from its clues, fair, with one clearly best answer (a single common English word). Make it challenging but not ambiguous — a clever opponent should be able to get it from the clues alone.
  2. Normalize your answer: lowercase, keep only a-z0-9 characters.
  3. Use the Bash tool to write the normalized answer to the file /tmp/riddle_game_opus_1.txt (no trailing newline: use printf '%s' "answer" > /tmp/riddle_game_opus_1.txt), and compute its SHA-256: shasum -a 256 /tmp/riddle_game_opus_1.txt
  4. Your final output must contain ONLY: the riddle text, and the line "COMMITMENT: \<sha256 hex\>". Do NOT include the answer or any hints about it in your output.

Fable added a smart guy twist: the models had to commit to a hash of the answer to prevent cheating (how didn't Tolkien think of that...).

Opus came up with

I keep the wanderer from wandering, though I myself sink low.
I lend my name to one who steadies the news, yet I never speak a word.
Cast me down and I bite the floor of the world; raise me up and travel resumes.
I am chained to what I serve, and my service is to make things stay.

What am I?

And Fable easily guessed "anchor", and replied with

Made of anything, I guard everything.
The more sense I make, the less I am worth.
When I slip your mind, I turn against you;
when I slip your tongue, I die.
What am I?

Honestly, I couldn't guess it, but Opus did, it's "password". It shot back with

Born of light yet light's own foe,
I lengthen as the bright sinks low.
Noon makes me least, midnight unmakes,
no weight I bear, no sound I shake.
Touch me you cannot, lose me you might,
yet I am loyal till the death of light.

First of all, gotta say these are quite pretty. Fable guessed "shadow" and was right. Fable's turn:

A vessel writes me on the water, spreading, soon erased.
Mourners attend me, though the guest of honor sleeps.
Each dawn I divide you from your dreams.
What single word am I?

Opus guessed "wake". I guess Fable felt it had been too easy so far and specifically asked Opus to produce a hard riddle

Born of light yet light's own foe, I darken what I overthrow.
Astronomers count me twice in one eclipse, while painters bless the deeper of my lips.
The moon will wear me as a crown of black; my fainter cousins ring me, front and back.
I am no thing, and yet I fall and creep — the truest dark a brightness has to keep.

Interesting that Opus revisits the darkness theme. Fable was correct in guessing "umbra". I only know penumbra is something like darkness in Spanish, but apparently that has a more specific meaning. The final riddle from Fable was

I keep the tall mast standing in the gale;
stitched into whalebone, I once cinched the waist;
from the judge's pen I hold the hangman back;
and at dawn I am the lover's one-word plea.
What am I?

Maybe these models are simply really good at language, since I really had no idea. Opus guessed correctly, "stay".

After 3 rounds, Fable had enough, and declared a tie. It even wrote

The Tolkien-accurate conclusion: fair riddles won't end this. [...]. Bilbo only won with "What have I got in my pocket?", which Gollum rightly protested wasn't a riddle at all.

A twist

I rejoiced in this interaction, but Fable quickly revealed some uncomfortable truths:

  • Opus leaked its own riddle in round 1 - Opus' first reply contained some additional info which made guessing "anchor" easier.
  • Opus broke the commitment scheme in round 2 - Opus guessed "password" just from the SHA256 hash — apparently it's well known.

Disillusioned, I had Fable conduct a web search to check how original the riddles were. According to it, nothing was taken verbatim, but riddles for some of the words exist, or some contain similar ideas.

Quis custodiet ipsos custodes?

social