analog transformers

Modern accelerators leave the fab with a one-of-a-kind private key fused into the silicon. Ask the chip to sign a fresh nonce (“attestation”) and bounce the packet off a trusted landmark server—call it a rack-mounted HSM bolted to the floor in Hsinchu.

Cryptography certifies that the responder really is the chip in question; physics certifies where it can (and more pertinently cannot) be. Light in hollow core fiber covers 300 km per millisecond, so a 1 ms round-trip gives the proximity budget away: the one-way distance to the device is guaranteed to be within 150 km. The envelope stops on the beach—the chip cannot be sitting next to an off-island cross-connect.

And so on. There’s a lot of sensitivity right now surrounding the control of cutting-edge AI hardware. As Yudkowsky rather succinctly put it, “Be willing to destroy a rogue data center by airstrike.” Ding dong nuke ’em.

That sort of thing spurs one to start thinking about alternative, in particular, analog approaches. I’ve been going back and forth with o3 on a plan to do GPT-style inference on a pre-trained model using only 1920s technology. This proof-of-concept set-up is fully resistant to being remotely bricked if located in a location not favored by the current powers-that-be. (Note that sampling is restricted to T=0 in this implementation). Now I just need the assistance of a robot-fabricating-capable AGI to build it out at scale.

Illustrating step 4, we have:

Step 4 – Passive linear maps.
Three collimated beams enter from the left, emerging a meter away at the lamp-and-slit assembly that follows the film-loop encoder. Each beam carries one of the 16 optical channels modulated by the token-plus-position embedding. The brass posts hold a dense 16 × 16 Reck lattice of cube beam-splitters, mirrors, phase plates, and neutral-density slides; the front mesh realises W-Q, the centre W-K, and the rear W-V. As the beams zig-zag through the grid—made visible by stray tobacco smoke—they are successively mixed and attenuated, imprinting the fixed learned weights that will become queries, keys, and values. The processed beams leave the table at the far edge, heading for the selenium-diode dot-product stage.