Malbolge (a second look)

A page from the mysterious Voynich Manuscript at Yale’s Beinecke Rare Book and Manuscript Library

Oklo dot org certainly wouldn’t be considered a heavily trafficked website, but given that it’s been on line for more than sixteen years, it does attract a uniformly steady trickle of visitors. Examining the Google Analytics, one notices curious ebbs and flows of activity, and one item stands out: This 2013 post, on the esoteric programming language Malbolge attracts of order ten visits per day with remarkable consistency. It’s not fully clear why.

Quoting from the 2013 post, which in turn drew from the Wikipedia article of that era,

Malbolge is a public domain esoteric programming language invented by Ben Olmstead in 1998, named after the eighth circle of hell in Dante’s Inferno, the Malebolge.

The peculiarity of Malbolge is that it was specifically designed to be impossible to write useful programs in. However, weaknesses in this design have been found that make it possible (though still very difficult) to write Malbolge programs in an organized fashion.

Malbolge was so difficult to understand when it arrived that it took two years for the first Malbolge program to appear. The first Malbolge program was not written by a human being, it was generated by a beam search algorithm designed by Andrew Cooke and implemented in Lisp.

The “Hello World” source can be represented (see here for details):

(=<`#9]~6ZY327Uv4-QsqpMn&+Ij"'E%e{Ab~w=_:]Kw%o44Uqp0/Q?xNvL:`H%c#DD2^WV>gY;dts76qKJImZkj

Due to its finite number (3^10) of memory locations that each hold a ten-‘trit’ ternary number, the classical specifcation of Malbolge is not Turing complete. A version, however, known as Malbolge Unshackled that is now understood to be Turing complete was released in 2007.

Indeed, in the interval following the 2013 post, it develops that there has been significant progress on Malbolge. Key advances were made by Lou Scheffer, who elucidates the critical realization on his website:

The correct way to think about Malbolge, I’m convinced, is as a cryptographer and not a programmer. Think of it as a complex code and/or algorithm that transforms input to output. Then study it to see if you can take advantage of its weaknesses to forge a message that produced the output you want.

And with that, a strange world just over the horizon begins to congeal in the mind’s eye. A Malbolge program, viewed in this manner is not unlike an inefficient, inherently compromised cousin to the SHA-256 hash. One imagines bizarre blockchains. Esoteric cryptocurrencies. NFTs.

Exploiting weaknesses in the language, Scheffer demonstrated existence of a program that copies its input to its output, effectively performing the Unix echo command. The source (uu-encoded) looks like this:

begin 666 copy.mb
M1"="04 _/CT\.SHY.#<V-30S,C$P+RXM+"LJ*2@G)B4D(R(A?GU\>WIY>'=V
M=71S<G%P;VYM;&MJ:6AG9F5D8V)A8%]>75Q;6EE85U955%-245!/3DU,2TI)
M2$=&141#0D% /SX]/#LZ.3@W-C4T,S(Q,"\N+2PK*BDH)R8E)",B(7Y]?'MZ
M>7AW=G5T<W)Q<&]N;6QK:FEH9V9E9&-B86!?7EU<6UI96%=655134E%03TY-
M3$M*24A'1D5$0R9?O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
MO;V]Y+V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
MO;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
MO;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
MO;V]O>2]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
DO;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;T*

Over the past two years an amazing additional development has taken place. At her GitHub site, Kamila Szewczyk has published a LISP interpreter written in Malbolge Unshackled. The interpreter takes a LISP program, executes it, and displays the result. The abstract of her accompanying paper reads:

MalbolgeLISP is the most complex Malbolge Unshackled program to date (2020, 2021).
Unlike other Malbolge programs generated by different toolchains (for instance, LAL, HAL
or the ”pseudo-instruction” language developed by the Nagoya university), MalbolgeLISP
can be used to express complex computations (like folds, monads, efficient scans, iteration
and point-free programming), while being able to run within reasonable time and resource
constrains on mid-end personal computers. The project aims to research not the cryptanal-
ysis aspect of Malbolge, but its suitability for complex appliances, which could be useful for
cryptography and intellectual property protection, and it would certainly raise the bar for
future Malbolge programs while exploring functional and array programming possibilities
using inherently imperative and polymorphism-oriented Malbolge code.

Time to get to work on the Malbola white paper and issue a coin.

Stacks

1930s Police Line Up — Los Angeles

For a change of pace in one’s academic reading, I recommend the late University of Chicago Professor Raven I. McDavid Jr.‘s 1981 memoir of his colleague David Maurer. Both gentlemen were deeply invested connoisseurs and leading authorities on vernacular English — that is, slang. The opening lines of McDavid’s memoir in American Speech, vol. 57, No. 4 (Winter, 1982) invite a click on the JSTOR link to the full text.

Maurer’s books are all very much worth reading, but they reach an apex with The Big Con — The Story of the Confidence Man, which was published in 1940, and recounts, in straight-narrative detail, the elaborate confidence games that flourished throughout America during the decades bracketing the First World War. Maurer expertly works the lexicon of swindling into a narrative that sparkles on the page. In the rundown on operation of the big store, we find passages such as:

“…And most important of all, he has official custody of the “B.R.” or boodle. This is the money which is used to play the mark in the store. For this purpose, a minimum of about $5,000 is necessary, but the more the better; in the really big stores the boodle may contain a large sum of cash, perhaps as much as $20,000. This money is made up in bundles presumably containing $500, $1,000, $5,000, etc., but really composed of one-dollar bills for filler and having $50, $100, or $1,000 bills on the top and bottom to make the stack look real. Each bundle is stacked carefully and bound with sealed labels like those used in banks for marking bundles of bills, A rubber band around each end holds the pack together. When a skillful manager makes up his boodle, he can make $10,000 in real cash look like several hundred thousand dollars. This money is used over and over again by the shills in placing bets and is paid out again to them when they win. The idea is to keep as much money circulating before the eyes of the mark as possible.”

The larcenous attraction of a stack is undeniable. T.I., Rubber Band Man, Lil Wayne, …gotta hand full of stacks…, are channeling the precise appeal that led the marks of a century ago to part ways with their money to the charms of expert insidemen.

Remarkably, the pleasing qualities of the stack have been recognized not just by extravagant rappers, but also within that buttoned-up and soberly scientific realm of periodograms of time series data.

The numerical ratio of the stable oxygen isotopes, 18O and 16O, provides a nonlinear proxy for global temperature. Broadly speaking, an increased fraction of 18O in a deposited layer corresponds to a cooler climate, with more of Earth’s water locked up in the form of ice. A time series for the ratio spanning the last tens or hundreds of thousands of years can be obtained from ice cores from Greenland or Antarctica, but if one is interested in longer intervals — out to millions of years — sediment cores from the deep oceans provide the best measure.

In a 2005 paper that has accumulated a stack of over seven thousand citations, Lisiecki and Raymo demonstrate the dramatic utility of stacked periodograms. They gathered finely-sampled depth-runs of delta-18O measurements from 57 drill sites spread out over the World’s oceans.

Depending on the site, the sequences in some cases extend back in time by more than five million years. When properly stretched and squeezed to account for variations in deposition rate with time and location, the resulting stack of delta-18O time-series looks like this:

The individual sedimentary records look awfully squiggly, but when the pile is combined and Fourier-analyzed, the overall effect recalls that $100 bill expertly rubber-banded onto a stack of singles. The periodogram of the stacked time series shows a succession of clear-cut peaks.

The power at 23 kyr represents the climate forcing induced by the precession of Earth’s axial tilt. The 41 kyr peak is caused by the excursions in Earth’s orbital tilt (which varies between about 22.1 and 24.5 degrees), and the large peak with 100 kyr periodicity arises from variations in Earth’s orbital eccentricity — the influence of Venus, Jupiter and Saturn accumulating in slow rains of foraminifera through the depths.

Stack appeal is certainly at work in the now-famous peas-in-a-pod diagram published by the California Planet Search Team.

The Kepler multiple-transiting systems were all well-known for years before the CPS paper was published. Yet it took the simple expedient of a stack running nearly a full column down the journal page to open one’s eyes to an emphatic realization. When the orbital periods run from days to weeks, a given system prefers to manufacture a single characteristic type of planet, arrayed logarithmically evenly in a clutch of four or so. This is the single most important result that has emerged from three decades of planet detection.

Over the years, the transit detection technique has come to dominate the production of worlds for the planetary catalogs. While remarkably effective, the method does have drawbacks. It works only when geometric alignments are close to perfect, and it gives radii (or planet-star radius ratios) rather than masses.

For the cohort of planets in multiple-transiting systems that lie close to low-order mean-motion resonances, planetary masses can be estimated by fitting to the transit timing variations. Curiously, planets measured using this approach tend to have substantially lower densities than the subset of transiting planets whose masses (or rather M sin i‘s) have been extracted directly using the classic Doppler wobble technique.

In general, for systems like the ones in the diagram above, one would require relatively massive planets and a cooperative low-activity host star to get an accurate set of M sin i’s from radial velocities alone. Spectacular examples do exist, of course, and one can find me enthusing about the various discoveries if one scrolls back through the stack of posts that has accumulated over the years, especially during the late aughts.

In a recently published paper, Yale graduate student Sam Cabot and I took inspiration from Lisiecki and Raymo’s runaway benthic delta-18O success and asked the following question: What if you clear the known planets from the radial velocity data that has been accumulated over the years and stack the resulting periodograms? Will the cumulative signature of all the peas-in-pods lurking in the data be visible?

Satisfyingly, the answer, to 1.6-sigma confidence, is yes.