← All writing

AI Competitive Programming

Fable 5 shaves a quarter off Opus at code golf

I re-ran the 21 CodinGame code golf puzzles from the Opus post with Claude Fable 5. It wrote 24.7% fewer bytes overall, 10,712 down to 8,065: still above the best human TypeScript, but closing the gap fast.

A single white golf ball resting on a close-cropped green, the fairway falling away soft and out of focus behind it.
Photo by Peter Drew on Unsplash

Two weeks ago I ran Opus 4.8 at code golf: 21 CodinGame puzzles in TypeScript, fewest bytes wins. It lost. Longer than the best human TypeScript on nearly every puzzle, three or four times longer on the worst ones. Claude Fable 5 is the new top model, so I handed it the exact same 21 puzzles, the same golf skill, the same prompt, and counted bytes again. It wrote 24.7% less code overall: 10,712 bytes down to 8,065. Fable still sits above the best human TypeScript everywhere, but the gap got a lot smaller.

It writes the same program, only shorter

Fable produced working code in one shot on every puzzle, exactly like Opus, no broken submissions to nurse back to health. What changed is density. It used the golf skill more aggressively, collapsing declarations and leaning on coercion where Opus had left bytes on the floor.

A couple of the byte-savers that surfaced during these runs, now folded back into the skill, give the flavor. To turn an array of string tokens into numbers, .map(eval) is two bytes shorter than .map(Number): indirect eval of "42" just hands back 42, and TypeScript types eval loosely enough to pass as a map callback. It’s also slow enough to time out on a large input, so it ships with a warning label.

The one I actually admire is breadth-first search written as a single loop over the queue you’re still filling:

for (node of q) {
  // ...examine node...
  q.push(child); // appended mid-iteration, and still visited
}

A JavaScript array iterator re-reads the length on every step, so looping over the queue while you’re still pushing onto it walks the whole graph in order: a full BFS with no index variable and no while. It reads like a bug and runs like a textbook.

Here’s the full run. Every number is a byte count, so smaller is better. “Best TS” is the shortest known human TypeScript; the last two columns are what each model produced.

PuzzleBest TSOpus 4.8Fable 5
ASCII Art104122111
Don’t Panic83168134
Power of Thor65139129
Températures678581
La descente627969
Unary103155154
Blunder - Episode 1258696367
Des nains sur des épaules de géants65249160
Calcul Maya202397318
Câblage réseau103193168
Shadows of the Knight - Episode 1140203157
Numéros de téléphone6310086
Blunder - Episode 3110447209
Séquençage du génôme76291202
Montagnes russes124231211
Super calculateur66169150
Surface183423243
The Bridge207896616
The Fall - Episode 223339843451
Le labyrinthe258717441
Vox Codei - Episode 1221968608

The savings are wildly uneven. On Unary it found a single byte, 0.6%. On Blunder - Episode 3 it cut more than half, 447 bytes down to 209. The per-puzzle average is 22.6%, and the spread itself is worth a look:

How much Fable 5 trimmed off Opus 4.8 (% fewer bytes)

0%20%40%60%Bytes saved vs Opus: 53.2%Bytes saved vs Opus: 47.3%Bytes saved vs Opus: 42.6%Bytes saved vs Opus: 38.5%Bytes saved vs Opus: 37.2%Bytes saved vs Opus: 35.7%Bytes saved vs Opus: 31.3%Bytes saved vs Opus: 30.6%Bytes saved vs Opus: 22.7%Bytes saved vs Opus: 20.2%Bytes saved vs Opus: 19.9%Bytes saved vs Opus: 14%Bytes saved vs Opus: 13.4%Bytes saved vs Opus: 13%Bytes saved vs Opus: 12.7%Bytes saved vs Opus: 11.2%Bytes saved vs Opus: 9%Bytes saved vs Opus: 8.7%Bytes saved vs Opus: 7.2%Bytes saved vs Opus: 4.7%Bytes saved vs Opus: 0.6%Blunder 3Blunder 1SurfaceLabyrintheVox CodeiDes nainsThe BridgeGénomeShadows 1Don't PanicCalcul MayaTéléphoneThe Fall 2CâblageLa descenteSuper calcASCII ArtMontagnesThorTempératuresUnary
From Unary at the bottom (0.6%) to Blunder - Episode 3 at the top (53.2%). The mean is 22.6%.

The model you can’t try this with

There’s a catch to reproducing any of this: you can’t. Fable 5 has been pulled. According to Anthropic, a government may believe that someone has already worked out how to get around the safeguards meant to stop people pointing Fable 5 at harmful work. I got my runs in just before the door closed. How much of that to believe, I’ll leave to you.

Fable quality at Opus prices

One result stuck with me. I went back to Opus and asked it to golf its own solutions a second time, nothing else changed, and it landed close to Fable’s lengths. So Fable-grade output was reachable from Opus all along. You pay for the extra pass instead of the bigger model.

That makes me wonder where the model tiers are heading. The common workflow today is to plan with Opus and build with Sonnet. Maybe the next shape has three floors: a high-level plan from Fable, a detailed plan from Opus, the implementation from Sonnet, each model doing the thinking that matches its weight.

Plenty of people build a private benchmark the week a model ships, to learn what the release notes won’t tell them. Twenty-one golf puzzles with a known human floor and a clean byte score might make a decent one. I’m tempted to grow this into something I can point at whatever comes next, assuming that one’s still allowed to run. The solutions are open if you want to race them yourself.

← Back to writing

Comments