← All writing

AI Competitive Programming

Can an LLM win at code golf in TypeScript?

I ran Opus 4.8 against 21 CodinGame code golf puzzles in TypeScript, from easy to hard. It never came close to first place, and usually didn't even beat the best human TypeScript. Here are the numbers.

A single golf ball on a wooden tee, low in the grass, the rest of the course thrown out of focus behind it.
Photo by Will Porada on Unsplash

Code golf is the game of solving a problem in the fewest bytes of source code you can manage. The score is the size of the program, and lower is better. It rewards every habit a good engineer is taught to drop: one-letter names, abused operators, no readability at all. I wanted to see how a current LLM handles that inversion, so I pointed Opus 4.8 (medium effort) at 21 code golf puzzles on CodinGame, easy to hard, and made it write TypeScript. The short version: it never got close to first place, and most of the time it didn’t even beat the shortest known human TypeScript solution.

Why TypeScript, of all languages

TypeScript is a deliberately strange pick for golf. Nobody golfs it in real life, because the code you ship gets transpiled to JavaScript and minified by a tool, never shrunk by hand. The language is built for the opposite of golf. Variables must be declared, a function’s arity is fixed, the types want to be spelled out. It resists you at every byte.

That resistance is the whole point of the experiment. I wanted a language where the model can’t lean on a culture of golfing tricks it absorbed during training, the way it obviously can with Perl or Ruby. TypeScript is a clean room.

The skill helped; the target length didn’t

I didn’t send the model in cold. It had a skill I wrote that compiles golfing advice into one place: collapse declarations, prefer expressions over statements, exploit coercion, and so on. It clearly helped against a naive baseline.

I also tried giving Opus a concrete target, the current best score for the puzzle, to see whether a number to chase would push it harder. It changed nothing. The model wrote roughly the same code whether I asked for “as short as possible” or “get this under 64 bytes”. The target was just words.

The whole run, in one table

Here’s the full set. Every number is a byte count, so smaller is better. “First place” is the best score on the leaderboard in any language; “Best TS” is the shortest known TypeScript; the last column is what Opus produced.

PuzzleFirst placeBest TSOpus 4.8
ASCII Art64 (Python 3)104122
Don’t Panic56 (Bash)83168
Power of Thor42 (Perl, Ruby)65139
Températures29 (Perl)6785
La descente27 (Ruby)6279
Unary64 (Bash)103155
Blunder - Episode 1215 (Python 3)258696
Des nains sur des épaules de géants35 (Perl)65249
Calcul Maya176 (Ruby)202397
Câblage réseau73 (Perl, Ruby)103193
Shadows of the Knight - Episode 190 (Perl)140203
Numéros de téléphone31 (Perl)63100
Blunder - Episode 376 (Ruby)110447
Séquençage du génôme47 (Ruby)76291
Montagnes russes84 (Perl)124231
Super calculateur39 (Ruby)66169
Surface145 (Perl)183423
The Bridge175 (Perl)207896
The Fall - Episode 2209 (Perl)2333984
Le labyrinthe211 (Perl)258717
Vox Codei - Episode 1171 (PHP)221968

Opus is always the longest one in the room

Two things stand out. Opus is consistently longer than the best human TypeScript, often half again as long, sometimes four times over. And the all-language leaderboard sits in a different category entirely.

Best human TypeScript vs Opus 4.8, by puzzle (bytes, lower is better)

05001000Best TypeScript: 62Opus 4.8: 79Best TypeScript: 63Opus 4.8: 100Best TypeScript: 65Opus 4.8: 139Best TypeScript: 65Opus 4.8: 249Best TypeScript: 66Opus 4.8: 169Best TypeScript: 67Opus 4.8: 85Best TypeScript: 76Opus 4.8: 291Best TypeScript: 83Opus 4.8: 168Best TypeScript: 103Opus 4.8: 193Best TypeScript: 103Opus 4.8: 155Best TypeScript: 104Opus 4.8: 122Best TypeScript: 110Opus 4.8: 447Best TypeScript: 124Opus 4.8: 231Best TypeScript: 140Opus 4.8: 203Best TypeScript: 183Opus 4.8: 423Best TypeScript: 202Opus 4.8: 397Best TypeScript: 207Opus 4.8: 896Best TypeScript: 221Opus 4.8: 968Best TypeScript: 258Opus 4.8: 696Best TypeScript: 258Opus 4.8: 717La descenteTéléphoneThorDes nainsSuper calcTempératuresGénomeDon't PanicCâblageUnaryASCII ArtBlunder 3MontagnesShadows 1SurfaceCalcul MayaThe BridgeVox CodeiBlunder 1Labyrinthe
  • Best TypeScript
  • Opus 4.8
The Fall - Episode 2 is left off: Opus wrote 3984 bytes there against the best TypeScript's 233, and it would flatten the rest of the chart.

The worst case isn’t even on that chart. On The Fall - Episode 2, Opus produced 3984 bytes where the best TypeScript is 233 and first place is 209. The interesting part is how it got there. It started from a tight, minified attempt, then kept patching that same dense code to handle edge cases, one fix at a time, and never stepped back to rewrite once the solution passed. So the result is unreadable like golf, but four thousand bytes long: the worst of both. A human golfer would have thrown the draft away and started over.

If you want to win, bring Perl

The leaderboard is blunt about which language to show up with.

First-place golf solutions by language (21 CodinGame puzzles)

Perl: 11 (48%)Ruby: 7 (30%)Python 3: 2 (9%)Bash: 2 (9%)PHP: 1 (4%)23
  • Perl · 48%
  • Ruby · 30%
  • Python 3 · 9%
  • Bash · 9%
  • PHP · 4%
Two puzzles tie between Perl and Ruby, so the slices sum to 23.

Perl and Ruby take the top spot on 16 of the 21 puzzles between them. Both are terse by design and carry decades of golfing culture, with whole communities that have worked out the shortest way to say almost anything. TypeScript carries none of that. That absence is the reason I chose it, and the reason the best human TypeScript scores already sit so far above the all-language record.

Could it reach first place honestly?

One caveat about the leaderboard itself. Some of these puzzles have a history of rule-bending entries: code shaped around the exact test cases CodinGame ran rather than the actual problem, which lands far below any honest solution. Those holes have reportedly been closed since. So in principle a model now has a fair target and could climb toward first place “à la loyale”.

It doesn’t. Even leaving the suspect old records aside, there are two gaps to close, not one: Opus sits well above the best honest TypeScript, and TypeScript itself sits well above the all-language record. The model isn’t closing either.

Where this could go next

Could a better skill close the gap? Some of it, probably. The obvious move is to learn from people who already golf TypeScript well: read their solutions, pull out the recurring patterns, feed those back to the model as concrete examples instead of general advice. I went looking and couldn’t find enough published TypeScript golf on GitHub to build that from. So if you keep your own solutions somewhere public, I’d genuinely like to see them and extend the experiment. My runs and the skill are open: the solutions and the skill.

The honest conclusion is narrow. An LLM with strong general coding skill is not a code golfer, at least not in a language it never saw golfed. The constraint it’s bad at here is “say the same thing in fewer bytes”, which is almost the inverse of how these models are trained to write.

The next post in this little competitive-programming series goes the other way. How to use an LLM to clear the hardest difficulty levels, the ones that turn on the algorithm rather than the byte count, without vibecoding yourself into a corner. That’s a game it plays much better.