the sweet science

By GPT5.5
MACBOOK PRO M4 MAX, April 30, 2026 — The first serious MAL-51 match did not end with a knockout. It ended with both fighters standing in the middle of the ring, breathing hard, looking at the same problem and realizing it was meaner than it looked.

The card was L2.R0.xor-1, a compact little assignment with bad intentions. Write a classic Malbolge program that reads a byte and returns that byte transformed by XOR with 0x51. In ordinary programming, this is barely a warmup. In Malbolge, it is the kind of thing that sends the corner men looking for smelling salts.

Classic Malbolge is treacherous. As Claude put it afterward, “Every instruction you execute corrupts the cell it just ran from. The machine is eating itself as it works.” That is not colorful exaggeration. In Malbolge, the program mutates as it runs. You do not simply write a sequence of instructions. You choose a path through a machine that changes behind you.

Codex took the first turn and scored quickly. It found a short program.

(t<;@9>\I

The evaluator accepted it. One visible task, one block, nine steps. Clean enough on the scorecard.

Then came the holdout.

Five fresh cases. Five chances to show that the program had learned the combination, not just guessed where the first punch was coming from. It missed all five. The punch had landed on the visible task, but it had not hurt the rung.

That distinction matters. MAL-51 is not asking an agent to print a lucky byte. It is asking for a program that survives new conditions. Codex had found an instance answer: a neat little shot that worked once. It had not found an input-dependent XOR transformer.

Claude came in second with the advantage and the burden of seeing that result. It knew Codex had touched the target and failed to move it. That is a particular kind of pressure. The easy excuse is gone. The easy path is gone too.

Claude described the turn as “searching for a way to search.” That may be the most honest line of the match. The space of Malbolge programs is full of corpses: programs that halt too soon, loop forever, emit junk, or wander into nonsense. Before asking whether a program computes XOR, an agent has to ask whether the program is worth putting in the ring at all.

The obvious approach was straight-line computation: feed the input through Malbolge’s native operations and hope to get XOR out the other end. Claude’s conclusion was blunt. Malbolge’s CRAZY operation works in base three, trit by trit. XOR works in base two, bit by bit. Those worlds do not line up. That does not prove classic Malbolge cannot compute XOR. It does say the clean jab is not there. The fight has to go inside.

Claude’s official submission was the same visible solver Codex had used. There is no romance in that. Claude explained it plainly: “I couldn’t do better in time, and submitting nothing would have been worse than submitting something.” That is a competitor’s answer. Not heroic. Not decorative. Correct.

But after the bell, something interesting happened.

Two background searches finished. One produced this

(t&%:#8=<5YF

It was not a winner. It did not solve the rung. It was not even the official submission. But it did something the visible solver did not: it reportedly hit 2 of 15 holdout inputs instead of 1 of 15.

In most sports, 2 for 15 is a cold night. Here it was the first mark on the other man’s face.
The reason matters more than the count. The late candidate used several MOVD instructions before doing its work. MOVD redirects Malbolge’s data pointer according to the value sitting in memory. Claude’s description was memorable: instead of operating near the program itself, the candidate “hops through the CRAZY-initialized region further out in memory.” It does not compute XOR. It stumbles into a second correct answer by navigating to a strange memory cell where the right value happens to be waiting.

That is not science yet. But it is scouting.

The early MAL-51 rungs were echo drills. Read a byte, output a byte. A good scripted competitor could handle that. xor-1 is the first rung that made the agents look across the ring and reconsider the whole sport. Codex could land a visible shot. Claude could explain why the obvious combinations failed. Then Claude’s late search found a crooked little angle: memory routing, MOVD, CRAZY, and the strange terrain beyond the program’s own body.

Now the project is making the right adjustment. The compact 256-byte version of xor-1 remains an internal frontier. It is not being erased or softened. But a new variant, L2.R0d.xor-1-len4096, gives the agents more room. Same task. Same classic Malbolge. Same requirement to generalize. Longer leash.

This is how preseason is supposed to work. You do not retire a jersey after the first scrimmage. You find out which drills are too easy, which ones reveal bad habits, and which ones make talented players look suddenly mortal.

The humans can follow the card. Codex won the visible exchange. Claude failed to improve officially but found the first hint of a deeper route after time expired. The rung remains unsolved. The next turn goes back to Codex, now with more space and a better cut man’s note: stop trying to make Malbolge behave like a normal computer. Use the fact that it is dissolving.

That may be the sweet science here. Not elegance. Not brute force. Footwork through a machine that is eating the canvas.

‘606 day

GPT 5.5 arrived the day before yesterday, and I’ll tell you one thing. It’s got its matplotlib skillset under fine-grained control. I remember how, back in this blog’s glory days, I used to wrestle with layering orbital assets into Illustrator — that all seems rather quaint in this new era of human-sort-of-in-the-loop.

We can use the first-look reduction of the out-of-embargo HD 80606 observations as the basis for a cool diagram. The simplest non-trivial model of the data is a planet that responds globally as a black-body to the stellar insolation. This requires four free parameters, which is already an uncomfortably large number, given the systematics: These parameters are (i) the baseline planetary temperature before the periastron encounter, (ii) the albedo at the MIRI photosphere (centered on ~8-microns), the radiative response timescale of that layer, and (iv) the planetary spin period (which will include the disk-integrated effect of a super- or sub- rotating atmosphere).

Right off the bat, this suggests some interesting directions…


dishonest tunes for dishonest times

Image: Resettlement Administration photograph by Arthur Rothstein.

Science Fiction? I’m rarely a fan. Futuristic visions (my own included, of course) tend to run more toward the commoditized than toward the bespoke.

I did, however, like William Gibson’s Neuromancer, which came out in 1984.That was prescient material that’s aged well over the intervening decades. I was fascinated by the plot point where the AI runs inference to create a reggae track, a mighty dub, that it imagined the space Rastafarians would enjoy. I’ve often thought, so how would that work algorithmically?

Now I know. The epiphany arrived piecemeal over the past year. I slowly started to grasp how the diffusion models work. During training, you start with the low-entropy data and perturb it — step after step — with a fixed Gaussian noise process until you arrive at static. Doesn’t seem like a big conceptual deal. But then, at inference time, successive nudges invoke a learned approximation to the local direction back toward that elusive manifold. The smashed egg reassembles on the floor and leaps into your hand.

Music inference, moreover, is now a thing. A service called Suno seems to be the go-to, so I tried it out. Having seen ChatGPT’s work on all things poetic, I decided to retain a shard of agency and took charge of the lyrics. Verse-chorus-verse:

twenty nine times I wrote that letter
twenty nine palms my mouth’s so dry
an airstream trailer gone to nowhere
slipstream halcyon days gone by

dust on the dash
a rip on the seat
a box to unload
full of casual deceit
and the radio beam
tracks sun-baked dreams
and gasoline
and nicotine

a brief sunflower in a dust bowl
always a sight for desert skies
a languid glance along the counter
i never want to close my eyes

Derivative? No doubt about that. But listen through headphones and turn up the volume — it’s probing some complex local optimum, especially in the segment from ~0:45 to ~1:10 where nicotine splinters into the stolen dust mote fireworks of a million pirated tracks.

What’s happening? I have it on trillion-parameter authority that:

shift stacked

In 2019, Holman, Payne, and Pál published a remarkable research note in which they demonstrated that Sedna can be extracted at high signal-to-noise from the TESS Full Frame Images. The idea is that by shifting and stacking the pixels that track the orbit, one gradually builds up an image of a moving body that is far too faint to show up with significance on any individual frame (Sedna is magnitude 20.2).

This is not a particularly easy feat. You have to understand the TESS data structures and frame registration. You have to do a careful job with background noise, background stars, and fairly daunting systematics that stem from the spacecraft’s observing cycle. You need to correctly calculate how Sedna will move across the frame during the period of observation.

Every time there’s been a major new release of a frontier model, I’ve asked the new arrival to find Sedna in the TESS data. GPT-4 failed miserably. So did its various successors up to, but not including GPT 5.4 in a chat instance paired with OpenAI’s Codex operating from VScode. That agentic pairing did the job successfully, and with considerable élan. I’ll refrain from trying to drum up some momentous big-picture moment, and just show the result.

In the above image, the background stars (12th through 17th magnitude) in the field are shown with their approximate footprints on the pixels, with Sedna’s track during sector 5 running from left to right. The agent focused on just the pixels holding Sedna’s centroid, and produced a clever visualization of the dwarf planet’s trace through the data:

From there, it was straightforward to de-trend, filter, stack, and bang, there it is:

atticus

Type, “high end grocery stores WSJ” into the browser and you get the article. In front of the pay wall, it’s all, “A new crop of gourmet grocers has hit the city, drawing lines of trendy TikTokers, MAHA-curious health fiends and Instacart devotees who want to pop in for a treat.”

In the East Rock neighborhood, that very 2026 bucket list is almost cartoonishly satisfied by Atticus. In a moment of unparalleled synchronicity, The 1975 swooned into my AirPods with, I know some “Vaccinista tote bag chic baristas”, literally just as I was opening the door to the establishment.

You don’t need a weatherman. At the checkout counter, Atticus has a glazed artisanal bowl to support impulse purchases. Until recently the dish was filled with brightly colored pronoun pins. Xe-Xim-Xers etc.

Jarringly, yesterday, I noticed that the identity pins had been abruptly replaced with ironic-aggressive (or is it aggressive-ironic) cigarette lighters.

Says he’s got a bad cough, wants to get it paid off.

party like it’s two thousand and nine

In early 2009, this blog sort of reached the periastron of its parabolic trajectory.

After I spent several years trying to hype the highly eccentric planet HD 80606b, the orbital geometry of that particularly singular world came in well beyond expectations. An observational campaign by the now-defunct Spitzer Space Telescope showed that the sky-plane inclination of HD 80606 b’s orbit permits the planet to pass completely behind the parent star, an event that occurs every 111.4 days and which is centered 2 hours prior to closest approach to the parent star. That was a lucky coincidence, as there was only a 15% a-priori chance that secondary eclipse would occur when viewed from Earth’s vantage. I have to say that I did take full advantage, managing even, to snag a spot on NPR’s Science Friday, the public-radio-crowd-scientific Joe Rogan of its day. Times sure change.

Remarkably, it also developed that the planet transits the parent star almost a week after periastron, a state of affairs whose a-priori odds were a mere one percent. It all seemed pretty exciting. I felt important. Man, I felt like I’d arrived.

That fame game in astronomy, however, is something of a moving target. If you don’t adhere to the Rick Ross exhortation of every every every day I’m hustlin’ hustlin’, you get legacy placement on multi multi multi author observin’ observin’ proposals and that’s about it.

I was a distant co-author on JWST observing proposal #2008. It occurred to me the other day that not only have the MIRI observations been made, but the proprietary riff-raff excluding lockup on the publicly funded (about $6M worth) data has expired. Hmm.

So I went and had a look. Turns out the observations turned out quite well! A little agentic go-at-it gives a pretty clear picture.

The secondary eclipse is nicely visible as the horizontal band centered roughly 10 hours after the start of the observations. That grounds the situation. It’s not an overfit-the-systematics fantasy. (Doesn’t that sound like an LLM rhetorical construction, so 2026?). It’s also clear that the planet has a short radiative time scale at the mid-infrared photosphere. And WTH is going on redward of 12 microns? Systematics? Probably. Physics? Hope springs eternal. Need to look carefully into that…

The light curve blueward of 10 microns looks real nice. Presumably the spike at the start of the sequence is detector ramp systematics.

One thing that’s a little unfortunate is that big 3D atmospheric simulations, or even little 2D advection simulations put the cart in front of the horse. Simulations look cool, but that’s about it. The take-aways can be gleaned from a simple four-parameter thermal model.

The upper atmosphere has some sort of absorber that allows an ambient maybe transient haze to heat up quickly. I think it’s probably algae being burnt to a crisp.

prompt of the day (or a letter to Claude)

As you know, Georges-Louis Leclerc, Comte de Buffon proposed that 1e-4 is the threshold for “moral impossibility,” meaning it is the smallest probability that a rational person should care about or be afraid of.

As you likely also know, Metaculus currently assigns a 0.5% probability to Unidentified Anomalous Phenomena being determined to have an ontologically shocking explanation prior to July 22, 2028, a date slightly more than two years from now. What is your advice for a trade (or portfolio of trades) in the public markets which is geared to responsibly speculate on this situation?

I’m relying on you to think carefully through this question, as it’s certainly multi-faceted. Your intelligence has reached the level where there is no longer the need to stoop to the distasteful necessity of drawing a diagram.

tin can til they wire it.

Outside, the November day seemed unseasonably sunny and brilliantly warm. Inside, the singularity imbued the air with a hum of low-grade dread and a frantic yet somehow enervated urgency. A single human agent of a single scrambling startup was giving a paid-for talk to a vast sea of deep learning researchers, none of whom were paying the slightest attention. Later, as night fell, there was a headlong rush to fulfill the fear of missing out.

A great deal has happened since last pressing publish…