(Slow) speedrun: Germaine de Staël

I’m starting to write up a review of Isaiah Berlin’s The Roots of Romanticism, and this quote fragment jumped out at me:

Suppose you went to Germany and spoke there to the people who had once been visited by Madame de Staël, who had interpreted the German soul to the French.

It’s a poetic turn of phrase, and I have just about enough mild curiosity to fancy doing a speedrun on her. Currently I know absolutely nothing. Maybe I’ll also expand it to the people she visited, if it turns out that she’s at the centre of some interesting intellectual circle.

I’m calling this one a slow speedrun because it’s too hot here and like most people in the UK I don’t have air conditioning, so I’m writing this with my feet in a tub of ice water as a poor substitute. It’ll still be an hour long, but I’ll take it easy and probably won’t get through as much as normal.

Right, let’s go. Slowly.


Right, start with wikipedia as ever.

Full name Anne Louise Germaine de Staël-Holstein, commonly known as Madame de Staël. 1766 – 1817.

She was a voice of moderation in the French Revolution and the Napoleonic era up to the French Restoration.

Her intellectual collaboration with Benjamin Constant between 1794 and 1810 made them one of the most celebrated intellectual couples of their time.

OK I’ve never even heard of him. Open in new tab.

She discovered sooner than others the tyrannical character and designs of Napoleon.[5] For many years she lived as an exile – firstly during the Reign of Terror and later due to personal persecution by Napoleon.

In exile she became the centre of the Coppet group with her unrivalled network of contacts across Europe.

Ah, brilliant, there’s an intellectual scene, that’s what I was hoping for. Open in new tab.

In 1814 one of her contemporaries observed that "there are three great powers struggling against Napoleon for the soul of Europe: England, Russia, and Madame de Staël".

Nice. Now I understand the allusion in that Berlin quote.

Known as a witty and brilliant conversationalist, and often dressed in daring outfits, she stimulated the political and intellectual life of her times. Her works, whether novels, travel literature or polemics, which emphasised individuality and passion, made a lasting mark on European thought. De Staël spread the notion of Romanticism widely by its repeated use

OK, now for some historical background on her childhood. Only child of a popular Parisian salon host and a prominent banker and statesman. They both have wikipedia pages too but I doubt I’d get to them.

Mme Necker wanted her daughter educated according to the principles of Jean-Jacques Rousseau and endow her with the intellectual education and Calvinist discipline instilled in her by her pastor father.

Haha, poor child. Sounds like she turned out quite well given the circumstances.

At the age of 13, she read Montesquieu, Shakespeare, Rousseau and Dante.[10] This exposure probably contributed to a nervous breakdown in adolescence, but the seeds of a literary vocation had been sown.

Her father got into trouble by releasing the national budget, which had always been kept secret. So he got dismissed and they moved to a chateau on Lake Geneva. Then back to Paris once the fuss died down.

Aged 11, Germaine had suggested to her mother she marry Edward Gibbon, a visitor to her salon, whom she found most attractive. Then, she reasoned, he would always be around for her.[12] In 1783, at seventeen, she was courted by William Pitt the Younger and by the fop Comte de Guibert, whose conversation, she thought, was the most far-ranging, spirited and fertile she had ever known.

It’s very tempting to get sidetracked and read the article on fops, but let’s not. After this her parents got impatient and married her off to some Swedish diplomat.

On the whole, the marriage seems to have been workable for both parties, although neither seems to have had much affection for the other.

Now we’re getting to her actual work.

In 1788, de Staël published Letters on the works and character of J.J. Rousseau.[15] In this panegyric, written initially for a limited number of friends (in which she considered his housekeeper Thérèse Levasseur as unfaithful), she demonstrated evident talent, but little critical discernment.

OK, she was 22 at this point. Now there’s another argument between her father and the king and he gets dismissed and exiled.

In December 1788 her father persuaded the king to double the number of deputies at the Third Estate in order to gain enough support to raise taxes to defray the excessive costs of supporting the revolutionaries in America. This approach had serious repercussions on Necker’s reputation; he appeared to consider the Estates-General as a facility designed to help the administration rather than to reform government.[16] In an argument with the king, whose speech on 23 June he didn’t attend, Necker was dismissed and exiled on 11 July. On Sunday, 12 July the news became public and an angry Camille Desmoulins suggested storming the Bastille.[17]

Oh but it doesn’t last long:

On 16 July he was reappointed; Necker entered Versailles in triumph.

But then he resigned a couple of years later and moved to Switzerland. This is about the time that Germaine de Staël holds a salon.

The increasing disturbances caused by the Revolution made her privileges as the consort of an ambassador an important safeguard. Germaine held a salon in the Swedish embassy, where she gave "coalition dinners", which were frequented by moderates such as Talleyrand and De Narbonne, monarchists (Feuillants) such as Antoine Barnave, Charles Lameth and his brothers Alexandre and Théodore, the Comte de Clermont-Tonnerre, Pierre Victor, baron Malouet, the poet Abbé Delille, Thomas Jefferson, the one-legged Minister Plenipotentiary to France Gouverneur Morris, Paul Barras, a Jacobin (from the Plain) and the Girondin Condorcets.

That’s quite a list.

Lots of complicated revolutionary stuff after this, things got bad and she fled to Switzerland as well. Then went to England for a bit and caused a scandal:

In January 1793, she made a four-month visit to England to be with her then lover, the Comte de Narbonne at Juniper Hall. (Since 1 February France and Great Britain were at war.) Within a few weeks she was pregnant; it was apparently one of the reasons for the scandal she caused in England.

Back in Switzerland for a while, then she meets Benjamin Constant, then moves back to Paris with him.

In 1796 she published Sur l’influence des passions, in which she praised suicide, a book which attracted the attention of the German writers Schiller and Goethe.

Still absorbed by French politics, Germaine reopened her salon.[41] It was during these years that Mme de Staël arguably exerted most political influence.

More trouble, she leaves Paris for a bit. This is complicated. Then back again. I feel like I’m learning a lot about where she lived and not much about her ideas.

De Staël completed the initial part of her first most substantial contribution to political and constitutional theory, "Of present circumstances that can end the Revolution, and of the principles that must found the republic of France".

Now we’re getting in to her conflict with Napoleon.

On 6 December 1797 she had a first meeting with Napoleon Bonaparte in Talleyrand’s office and again on 3 January 1798 during a ball. She made it clear to him she did not agree with his planned French invasion of Switzerland. He ignored her opinions and would not read her letters.

and later:

He did not like her cultural determinism and generalizations, in which she stated that "an artist must be of his own time".[48][51] In his opinion a woman should stick to knitting.[52] He said about her, according to the Memoirs of Madame de Rémusat, that she "teaches people to think who had never thought before, or who had forgotten how to think".

Still running a salon but it’s getting dangerous. In 1803 Napoleon exiles her from Paris and she travels with Constant to Germany.

33 minutes left, I might have to speed up and not get bogged down in every detail. Though it looks like this is the interesting gbit She meets Goethe, Schiller and Schlegel. Her father dies and it looks like Coppet is the name of the place she’s inherited:

On 19 May she arrived in Coppet and found herself its wealthy and independent mistress, but her sorrow for her father was deep.

In July Constant wrote about her, "She exerts over everything around her a kind of inexplicable but real power. If only she could govern herself, she might have governed the world."

Next she visited Italy, wrote a book on it, Nopleon decided she was having too much fun and sent her back to Coppet.

Her house became, according to Stendhal, "the general headquarters of European thought" and was a debating club hostile to Napoleon, "turning conquered Europe into a parody of a feudal empire, with his own relatives in the roles of vassal states"

Some more travels in France and then Vienna. Benjamin Constant has also married someone else in the meantime, without telling her.

De Staël set to work on her book about Germany – in which she presented the idea of a state called "Germany" as a model of ethics and aesthetics and praised German literature and philosophy.[76] The exchange of ideas and literary and philosophical conversations with Goethe, Schiller, and Wieland had inspired de Staël to write one of the most influential books of the nineteenth century on Germany.

Yet more convoluted stuff where she gets back into France and then gets exiled again when she tries to publish the Germany book there.

She found consolation in a wounded veteran officer named Albert de Rocca, twenty-three years her junior, to whom she got privately engaged in 1811 but did not marry publicly until 1816.

I think I missed what happened to her first husband. It’s too hot to keep track of all this stuff.

Now there’s some complicated journey across eastern Europe to Russia. Then Sweden, then England.

She met Lord Byron, William Wilberforce, the abolitionist and Sir Humphry Davy, the chemist and inventor. According to Byron, "She preached English politics to the first of our English Whig politicians … preached politics no less to our Tory politicians the day after."[85] In March 1814 she invited Wilberforce for dinner and would devote the remaining years of her life to the fight for the abolition of the slave trade.

Returns to Paris yet again, where her salon is popular yet again, then fled to Coppet yet again. This is why I’m getting bogged down. Byron visited Coppet a lot.

"Byron was particularly critical of de Staël’s self-dramatizing tendencies"

haha.

One final trip to Paris:

Despite her increasing ill-health, she returned to Paris for the winter of 1816–17, living at 40, rue des Mathurins. Constant argued with de Staël, who had asked him to pay off his debts to her. A warm friendship sprang up between Madame de Staël and the Duke of Wellington, whom she had first met in 1814, and she used her influence with him to have the size of the Army of Occupation greatly reduced.[94]

She had become confined to her house, paralyzed since 21 February 1817. She died on 14 July 1817

So I’m finally through her biography. My god. She basically travelled everywhere and met everyone. I got tired reading this.

Oh I missed the bit about her novels somehow.

De Staël published a provocative, anti-Catholic novel Delphine, in which the femme incomprise (misunderstood woman) living in Paris between 1789 and 1792, is confronted with conservative ideas about divorce after the Concordat of 1801.

This is before Napoleon exiled her.

Right I have 18 minutes left, I think I’ll look up the Coppet group article. Oh boring, it’s just a couple of short paragraphs and a big list of names.

The Coppet group (Groupe de Coppet), also known as the Coppet circle, was an informal intellectual and literary gathering centred on Germaine de Staël during the time period between the establishment of the Napoleonic First Empire (1804) and the Bourbon Restoration of 1814-1815.[1][2][3][4] The name comes from Coppet Castle in Switzerland.

Core group: her family plus Humboldt, Schlegel and a bunch of names I dont’ recognise. Loong list of visitors, the ones I recognise from a quick skim are Byron, Clausewitz and Humphry Davy.

This doesn’t seem like a very tightly knit scene, too many people and too varied in their views. Maybe not as interesting as I was hoping for. Did a quick google and nothing is really standing out. Fine, let’s look up Benjamin Constant instead for the last ten minutes.

Henri-Benjamin Constant de Rebecque (French: [kɔ̃stɑ̃]; 25 October 1767 – 8 December 1830), or simply Benjamin Constant, was a Swiss-French political thinker, activist and writer on political theory and religion.

I’m sort of running out of energy now. It’s got hotter and this tub of ice water has warmed up. Something something proponent of classical liberalism, wrote some essays and pamphlets and so on. Skim for interesting bits.

Constant looked to Britain rather than to ancient Rome for a practical model of freedom in a large mercantile society. He drew a distinction between the "Liberty of the Ancients" and the "Liberty of the Moderns".

Ancients: parcipatory, burdensome, good for small homogeneous societies. Moderns: less direct participation, voters elect representativies.

He criticised several aspects of the French Revolution, and the failures of the social and political upheaval. He stated how the French attempted to apply ancient republican liberties to a modern state. Constant realized that freedom meant drawing a line between a person’s private life and that of state interference.[19] He praised the noble spirit of regenerating the state. However, he stated that it was naïve for writers to believe that two thousand years had not brought some changes in the customs and needs of the people.

Constant believed that, in the modern world, commerce was superior to war. He attacked Napoleon’s belligerence, on the grounds that it was illiberal and no longer suited to modern commercial social organization. Ancient Liberty tended to rely on war, whereas a state organized on the principles of Modern Liberty would tend to be at peace with all other peaceful nations.

Ah, nice link back to Berlin:

The British philosopher and historian of ideas, Sir Isaiah Berlin has acknowledged his debt to Constant.

Four minutes to go but I’ll end it there, I’m tired of this.


That worked ok apart from the bit where I got tired at the end. I feel like I learned a lot more about personal life and travels round Europe than I did about her ideas – would have been nice to understand more about the romanticism connection, exactly what ideas she picked up from Germany, etc. Still, she was interesting enough that that didn’t bother me too much.

Now I’m going to have a shower and cool down.

Hacker News folk wisdom on visual programming

I’m a fairly frequent Hacker News lurker, especially when I have some other important task that I’m avoiding. I normally head to the Active page (lots of comments, good for procrastination) and pick a nice long discussion thread to browse. So over time I’ve ended up with a good sense of what topics come up a lot. “The Bay Area is too expensive.” “There are too many JavaScript frameworks.” “Bootcamps: good or bad?” I have to admit that I enjoy these. There’s a comforting familiarity in reading the same internet argument over and over again.

One of the more interesting recurring topics is visual programming:

This image has an empty alt attribute; its file name is image-1.png

Visual Programming Doesn’t Suck. Or maybe it does? These kinds of arguments usually start with a few shallow rounds of yay/boo. But then often something more interesting happens. Some of the subthreads get into more substantive points, and people with a deep knowledge of the tool in question turn up, and at this point the discussion can become genuinely useful and interesting.

This is one of the things I genuinely appreciate about Hacker News. Most fields have a problem with ‘ghost knowledge’, hard-won practical understanding that is mostly passed on verbally between practitioners and not written down anywhere public. At least in programming some chunk of it makes it into forum posts. It’s normally hidden in the depths of big threads, but that’s better than nothing.

I decided to read a bunch of these visual programming threads and extract some of this folk wisdom into a more accessible form. The background for how I got myself into this is a bit convoluted. In the last year or so I’ve got interested in the development of writing as a technology. There are two books in particular that have inspired me:

  • Walter Ong’s Orality and Literacy: the Technologizing of the Word. This is about the history of writing and how it differs from speech; I wrote a sort of review here. Everything that we now consider obvious, like vowels, full stops and spaces between words, had to be invented at some point, and this book gives a high level overview of how this happened and why.
  • Catarina Dutilh Novaes’s Formal Languages in Logic. The title makes it sound like a maths textbook, but Novaes is a philosopher and really it’s much closer to Ong’s book in spirit, looking at formal languages as a type of writing and exploring how they differ from ordinary written language.

Dutilh Novaes focuses on formal logic, but I’m curious about formal and technical languages more generally: how do we use the properties of text in other fields of mathematics, or in programming? What is text good at, and what is it bad at? Comment threads on visual programming turn out to be a surprisingly good place to explore this question. If something’s easy in text but difficult in a specific visual programming tool, you can guarantee that someone will turn up to complain about it. Some of these complaints are fairly superficial, but some get into some fairly deep properties of text: linearity, information density, an alphabet of discrete symbols. And conversely, enthusiasm for a particular visual feature can be a good indicator of what text is poor at.

So that’s how I found myself plugging through a text file with 1304 comments pasted into it and wondering what the hell I had got myself into.

What I did

Note: This post is looong (around 9000 words), but also very modular. I’ve broken it into lots of subsections that can be read relatively independently, so it should be fairly easy to skip around without reading the whole thing. Also, a lot of the length is from liberal use of quotes from comment threads. So hopefully it’s not quite as as bad as it looks!

This is not supposed to be some careful scientific survey. I decided what to include and how to categorise the results based on whatever rough qualitative criteria seemed reasonable to me. The basic method, such as it was, was the following:

The basic structure of the rest of the post is the following:

  • A breakdown of what commenters normally meant by ‘visual programming’ in these threads. It’s a pretty broad term, and people come in with very different understandings of it.
  • Common themes. This is the main bulk of the post, where I’ve pulled out topics that came up in multiple threads.
  • A short discussion-type section with some initial questions that came to mind while writing this. There are many directions I could take this in, and this post is long enough without discussing these in detail, so I’ll just wave at some of them vaguely. Probably I’ll eventually write at least one follow-up post to pick up some of these strands when I’ve thought about them more.

Types of visual programming

There are also a lot of disparate visual programming paradigms that are all classed under “visual”, I guess in the same way that both Haskell and Java are “textual”. It makes for a weird debate when one party in a conversation is thinking about patch/wire dataflow languages as the primary VPLs (e.g. QuartzComposer) and the other one is thinking about procedural block languages (e.g. Scratch) as the primary VPLs.

seanmcdirmid

One difficulty with interpreting these comments is that people often start arguing about ‘visual programming’ without first specifying what type of visual programming they mean. Sometimes this gets cleared up further into a comment thread, when people start naming specific tools, and sometimes it never gets cleared up at all. There were a few broad categories that came up frequently, so I’ll start by summarising them below.

Node-based interfaces

Labview code example.png
Example LabVIEW screen (source)

There are a large number of visual programming tools that are roughly in the paradigm of ‘boxes with some arrows between them’, like the LabVIEW example above. I think the technical term for these is ‘node-based’, so that’s what I’ll call them. These ended up being the main topic of conversation in four of the six discussions, and mostly seemed to be the implied topic when someone was talking about ‘visual programming’ in general. Most of these tools are special-purpose ones that are mainly used in a specific domain. These domains came up repeatedly:Laboratory and industrial control. LabVIEW was the main tool discussed in this category. In fact it was probably the most commonly discussed tool of all, attracting its fair share of rants but also many defenders.

Game engines. Unreal Engine’s Blueprints was probably the second most common topic. This is a visual gameplay scripting system.

Music production. Max/MSP came up a lot as a tool for connecting and modifying audio clips.

Visual effects. Houdini, Nuke and Blender all have node-based editors for creating effects.

Data migration. SSIS was the main tool here, used for migrating and transforming Microsoft SQL Server data.

Other tools that got a few mentions include Simulink (Matlab-based environment for modelling dynamical systems), Grasshopper for Rhino3D (3D modelling), TouchDesigner (interactive art installations) and Azure Logic Apps (combining cloud services).

The only one of these I’ve used personally is SSIS, and I only have a basic level of knowledge of it.

Block-based IDEs

Scratch development environment (source).

This category includes environments like Scratch that convert some of the syntax of normal programming into coloured blocks that can be slotted together. These are often used as educational tools for new programmers, especially when teaching children.

This was probably the second most common thing people meant by ‘visual programming’, though there was some argument about whether they should count, as they mainly reproduce the conventions of normal text-based programming:

Scratch is a snap-together UI for traditional code. Just because the programming text is embedded inside draggable blocks doesn’t make it a visual language, its a different UI for a text editor. Sure, its visual, but it doesn’t actually change the language at all in any way. It could be just as easily represented as text, the semantics are the same. Its a more beginner-friendly mouse-centric IDE basically.

dkersten

Drag-n-drop UI builders

Drag-n-drop UI builders came up a bit, though not as much as I originally expected, and generally not naming any specific tool (Delphi did get a couple of mentions.) In particular there was very little discussion of the new crop of no-code/low-code tools, I think because most of these threads predate the current hype wave.

These tools are definitely visual, but not necessarily very programmatic — they are often intended for making one specific layout rather than a dynamic range of layouts. And the visual side of UI design tends to run into conflict with the ability to specify dynamic behaviour:

The main challenge in this particular domain is describing what is supposed to happen to the layout when the size of the window changes, or if there are dependencies among visual elements (e.g. some element only appears when a check box is checked). When laying things out visually you can only ever design one particular instance of a layout. If all your elements are static, this works just fine. But if the layout is in any way dynamic (with window resizing being the most common case) you now have to either describe what you want to have happen when things change, or have the system guess. And there are a lot of options: scaling, cropping, letterboxing, overflowing, “smart” reflow… The possibilities are endless, so describing all of that complexity in general requires a full programming language. This is one the reasons that even CSS can be very frustrating, and people often resort to Javascript to get their UI to do the Right Thing.

lisper

These tools also have less of the discretised, structured element that is usually associated with programming — for example, node-based tools still have a discrete ‘grammar’ of allowable box and arrow states that can be composed together. UI tools are relatively continuous and unstructured, where UI elements can be resized to arbitrary pixel sizes.

Spreadsheets

There’s a good argument for spreadsheets being a visual programming paradigm, and a very successful one:

I think spreadsheets also qualify as visual programming languages, because they’re two-dimensional and grid based in a way that one-dimensional textual programming languages aren’t.

The grid enables them to use relative and absolute 2D addressing, so you can copy and paste formulae between cells, so they’re reusable and relocatable. And you can enter addresses and operands by pointing and clicking and dragging, instead of (or as well as) typing text.

DonHopkins

Spreadsheets are definitely not the canonical example anyone has in mind when talking about ‘visual programming’, though, and discussion of spreadsheets was confined to a few subthreads.

Visual enhancements of text-based code

As a believer myself, I think the problem is that visual programming suffers the same problem known as the curse of Artificial Intelligence:

“As soon as a problem in AI is solved, it is no longer considered AI because we know how it works.” [1]

Similarly, as soon as a successful visual interactive feature (be it syntax highlighting, trace inspectors for step-by-step debugging, “intellisense” code completion…) gets adopted by IDEs and become mainstream, it is no longer considered “visual” but an integral and inevitable part of classic “textual programming”.

[1] http://www.artificial-intelligence.com/comic/7

TuringTest

There were several discussions of visual tooling for understanding normal text-based programs better, through debugging traces, dependency graphs, inheritance hierarchies, etc. Again, these were mostly confined to a few subthreads rather than being a central example of ‘visual programming’.

Several people also pointed out that even text-based programming in a plain text file has a number of visual elements. Code as written by humans is not a linear string of bytes, we make use of indentation and whitespace and visually distinctive characters:

Code is always written with “indentation” and other things that demonstrate that the 2d canvas distribution of the glyphs you’re expressing actually does matter for the human element. You’re almost writing ASCII art. The ( ) and [ ] are even in there to evoke other visual types.
nikki93

Brackets are a nice example — they curve towards the text they are enclosing, reinforcing the semantic meaning in a visual way.

Experimental or speculative interfaces

At the other end of the scale from brackets and indentation, we have completely new and experimental visual interfaces. Bret Victor’s Dynamicland and other experiments were often brought up here, along with speculations on the possibilities opened up by VR:

As long as we’re speculating: I kind of dream that maybe we’ll see programming environments that take advantage of VR.

Humans are really good at remembering spaces. (“Describe for me your childhood bedroom.” or “What did your third grade teacher look like?”)

There’s already the idea of “memory palaces” [1] suggesting you can take advantage of spatial memory for other purposes.

I wonder, what would it be like to learn or search a codebase by walking through it and looking around?

[1] https://en.wikipedia.org/wiki/Method_of_loci

danblick

This is the most exciting category, but it’s so wide open and untested that it’s hard to say anything very specific. So, again, this was mainly discussed in tangential subthreads.

Common themes

There were many talking points that recurred again and again over the six threads. I’ve tried to collect them here.

I’ve ordered them in rough order of depth, starting with complaints about visual programming that could probably be addressed with better tooling and then moving towards more fundamental issues that engage with the specific properties of text as a medium (there’s plenty of overlap between these categories, it’s only a rough grouping). Then there’s a grab bag of interesting remarks that didn’t really fit into any category at the end.

Missing tooling

A large number of complaints in all threads were about poor tooling. As a default format, text has an enormous ecosystem of existing tools for input, search, diffing, formatting, etc etc. Most of these could presumably be replicated for any given visual format, but there are many kinds of visual formats and generally these are missing at least some of the conveniences programmers expect. I’ve discussed some of the most common ones below.

Managing complexity

This topic came up over and over again, normally in relation to node-based tools, and often linking to either this Daily WTF screenshot of LabVIEW nightmare spaghetti or the Blueprints from Hell website. Boxes and arrows can get really messy once there are a lot of boxes and a lot of arrows.

Unreal has a VPL and it is a pain to use. A simple piece of code takes up so much desktop real estate that you either have to slowly move around to see it all or have to add more monitors to your setup to see it all. You think spaghetti code is bad imagine actually having a visual representation of it you have to work with. Organization doesn’t exist you can go left, up, right, or down.

smilesnd

The standard counterargument to this was that LabVIEW and most other node-based environments do come with tools for encapsulation: you can generally ‘box up’ sets of nodes into named function-like subdiagrams. The extreme types of spaghetti code are mostly produced by inexperienced users with a poor understanding of the modularisation options available to them, in the same way that a beginner Python programmer with no previous coding experience might write one giant script with no functions:

Somehow people form the opinion that once you start programming in a visual language that you’re suddenly forced, by some unknown force, to start throwing everything into a single diagram without realizing that they separate their text-based programs into 10s, 100s, and even 1000s of files.

Poorly modularized and architected code is just that, no matter the paradigm. And yes, there are a lot of bad LabVIEW programs out there written by people new to the language or undisciplined in their craft, but the same holds true for stuff like Python or anything else that has a low barrier to entry.

bmitc

Viewed through this lens there’s almost an argument that visual spaghetti is a feature not a bug — at least you can directly see that you’ve created a horrible mess, without having to be much of a programming expert.

There were a few more sophisticated arguments against node-based editors that acknowledged the fact that encapsulation existed but still found the mechanics of clicking through layers of subdiagrams to be annoying or confusing.

It may be that I’m just not a visual person, but I’m currently working on a project that has a large visual component in Pentaho Data Integrator (a visual ETL tool). The top level is a pretty simple picture of six boxes in a pipeline, but as you drill down into the components the complexity just explodes, and it’s really easy to get lost. If you have a good 3-D spatial awareness it might be better, but I’ve started printing screenshots and laying them out on the floor. I’m really not a visual person though…

ianmcgowan

IDEs for text-based languages normally have features like code folding and call hierarchies for moving between levels, but these conventions are less developed in node-based tools. This may be just because these tools are more niche and have had less development time, or it may genuinely be a more difficult problem for a 2D layout — I don’t know enough about the details to tell.

Input

In general, all the dragging quickly becomes annoying. As a trained programmer, you can type faster than you can move your mouse around. You have an algorithm clear in your head, but by the time you’ve assembled it half-way on the screen, you already want to give up and go do something else.

TeMPOraL

Text-based languages also have a highly-refined interface for writing the language — most of us have a great big rectangle sitting on our desks with a whole grid of individual keys mapping to specific characters. In comparison, a visual tool based on a different paradigm won’t have a special input device, so it will have either have to rely on the mouse (lots of tedious RSI-inducing clicking around) or involve learning a new set of special-purpose keyboard shortcuts. These shortcuts can work well for experienced programmers:

If you are a very experienced programmer, you program LabVIEW (one of the major visual languages) almost exclusively with the keyboard (QuickDrop).

Let me show you an example (gif) I press “Ctrl + space” to open QuickDrop, type “irf” (a short cut I defined myself) and Enter, and this automatically drops a code snippet that creates a data structure for an image, and reads an image file.

link to gif

cdtwoaway

But it’s definitely a barrier to entry.

Formatting

If you have any desire for aesthetics, you’ll be spending lots of time moving wires around.

prewett

Another tedious feature of many node-based tools is arranging all the boxes and arrows neatly on the screen. It’s irrelevant for the program output, but makes a big difference to readability. (Also it’s just downright annoying if the lines look wrong — my main memory of SSIS is endless tweaking to get the arrows lined up nicely).

Text-based languages are more forgiving, and also people tend to solve the problem with autoformatters. I don’t have a good understanding of why these aren’t common in node-based editors. (Maybe they actually are and people were complaining about the tools that are missing them? Or maybe the sort of formatting that is useful is just not automatable, e.g. grouping boxes by semantic meaning). It’s definitely a harder problem than formatting text, but there was some argument about exactly how hard it is to get at least a reasonable solution:

Automatic layout is hard? Yes, an optimal solution to graph layout is NP-complete, but so is register allocation, and my compiler still works (and that isn’t even its bottleneck). There’s plenty of cheap approximations that are 99% as good.

ken

Version control and code review

Same story again — text comes with a large ecosystem of existing tools for diffing, version control and code review. It sounds like at least the more developed environments like LabVIEW have some kind of diff tool, and an experienced team can build custom tools on top of that:

We used Perforce. So a custom tool was integrated into Perforce’s visual tool such that you could right-click a changelist and submit it for code review. The changelist would be shelved, and then LabVIEW’s diff tool (lvcompare.exe) would be used to create screenshots of all the changes (actually, some custom tools may have done this in tandem with or as a replacement of the diff tool). These screenshots, with a before and after comparison, were uploaded to a code review web server (I forgot the tool used), where comments could be made on the code. You could even annotate the screenshots with little rectangles that highlighted what a comment was referring to. Once the comments were resolved, the code would be submitted and the changelist number logged with the review. This is based off of memory, so some details may be wrong.

This is important because it shows that such things can exist. So the common complaint is more about people forgetting that text-based code review tools originally didn’t exist and were built. It’s just that the visual ones need to be built and/or improved.

bmitc

But you don’t just get nice stuff out of the box.

Debugging

Opinions were split on debugging. Visual, flow-based languages can make it easy to see exactly which route through the code is activated:

Debugging in unreal is also really cool. The “code paths” light up when activated, so it’s really easy to see exactly which branches of code are and aren’t being run – and that’s without actually using a debugger. Side note – it would be awesome if the lines of text in my IDE lit up as they were run. Also, debugging games is just incredibly fun and sometimes leads to new mechanics.

phantom_package

I remember this being about the only enjoyable feature of my brief time working with SSIS — boxes lit up green if everything went to plan, and red if they hit an exception. It was satisfying getting a nice run of green boxes once a bug was fixed.

On the other hand, there were problems with complexity again. Here are some complaints about LabVIEW debugging:

3) debugging is a pain. LabVIEW’s trace is lovely if you have a simple mathematical function or something, but the animation is slow and it’s not easy to check why the value at iteration 1582 is incorrect. Nor can you print anything out, so you end up putting an debugging array output on the front panel and scrolling through it.

4) debugging more than about three levels deep is painful: it’s slow and you’re constantly moving between windows as you step through, and there’s no good way to figure out why the 20th value in the leaf node’s array is wrong on the 15th iteration, and you still can’t print anything, but you can’t use an output array, either, because it’s a sub-VI and it’s going to take forever to step through 15 calls through the hierarchy.

prewett

Use cases

There was a lot of discussion on what sort of problem domains are suited to ‘visual programming’ (which often turned out to mean node-based programming specifically, but not always).

Better for data flow than control flow

A common assertion was that node-based programming is best suited to data flow situations, where a big pile of data is tipped into some kind of pipeline that transforms it into a different form. Migration between databases would be a good example of this. On the other hand, domains with lots of branching control flow were often held to be difficult to work with. Here’s a representative quote:

Control flow is hard to describe visually. Think about how often we write conditions and loops.

That said – working with data is an area that lends itself well to visual programming. Data pipelines don’t have branching control flow and So you’ll see some really successful companies in this space.

macklemoreshair

I’m not sure how true this is? There wasn’t much discussion of why this would be the case, and it seems that LabVIEW for example has decent functionality for loops and conditions:

Aren’t conditionals and loops easier in visual languages? If you need something to iterate, you just draw a for loop around it. If you need two while loops each doing something concurrently, you just draw two parallel while loops. If you need to conditionally do something, just draw a conditional structure and put code in each condition.

One type of control structure I have not seen a good implementation of is pattern matching. But that doesn’t mean it can’t exist, and it’s also something most text-based languages don’t do anyway.

bmitc

Looking at some examples, these don’t look too bad.

Maybe the issue is that there is a conceptual tension between data flow and control flow situations themselves, rather than just the representation of them? Data flow pipelines often involve multiple pieces of data going through the pipeline at once and getting processed concurrently, rather than sequentially. At least one comment addressed this directly:

One of the unappreciated facets of visual languages is precisely the dichotomy between easy dataflow vs easy control flow. Everyone can agree that

–> [A] –> [B] –>

——>

represents (1) a simple pipeline (function composition) and (2) a sort of local no-op, but what about more complex representations? Does parallel composition of arrows and boxes represent multiple data inputs/outputs/computations occurring concurrently, or entry/exit points and alternative choices in a sequential process? Is there a natural “split” of flowlines to represent duplication of data, or instead a natural “merge” for converging control flows after a choice? Do looping diagrams represent variable unification and inference of a fixpoint, or the simpler case of a computation recursing on itself, with control jumping back to an earlier point in the program with updated data?

zozbot34

Overall I’d have to learn a fair bit more to understand what the problem is.

Accessible to non-programmers

Less controversially, visual tools are definitely useful for people with little programming experience, as a way to get started without navigating piles of intimidating syntax.

So the value ends up being in giving more people who are unskilled or less skilled in programming a way to express “programmatic thinking” and algorithms.

I have taught dozens of kids scratch and that’s a great application that makes programming accessible to “more” kids.

sfifs

Inherently visual tasks

Visual programming is, unsurprisingly, well-suited to tasks that have a strong visual component. We see this on the small scale with things like colour pickers, which are far more helpful for choosing a colour than typing in an RGB code and hoping for the best. So even primarily text-based tools might throw in some visual features for tasks that are just easier that way.

Some domains, like visual effects, are so reliant on being able to see what you’re doing that visual tools are a no-brainer. See the TouchDesigner tutorial mentioned in this comment for an impressive example. If you need to do a lot of visual manipulation, giving up the advantages of text is a reasonable trade:

Why is plain text so important? Well for starters it powers version control and cut and pasting to share code, which are the basis of collaboration, and collaboration is how we’re able to construct such complex systems. So why then don’t any of the other apps use plain text if it’s so useful? Well 100% of those apps have already given up the advantages of plain text for tangential reasons, e.g., turning knobs on a synth, building a model, or editing a photo are all terrible tasks for plain text.

robenkleene

Niche domains

A related point was that visual tools are generally designed for niche domains, and rarely get co-opted for more general programming. A common claim was that visual tools favour concrete situations over abstract ones:

There is a huge difference between direct manipulation of concrete concepts, and graphical manipulation of abstract code. Visual programming works much better with the former than the latter.

seanmcdirmid

It does seem to be the case that visual tools generally ‘stay close to the phenomena’. There’s a tension between between showing a concrete example of a particular situation, and being able to go up to a higher level of abstraction and dynamically generate many different examples. (A similar point came up in the section on drag-n-drop editors above.)

Deeper structural properties of text

“Text is the most socially useful communication technology. It works well in 1:1, 1:N, and M:N modes. It can be indexed and searched efficiently, even by hand. It can be translated. It can be produced and consumed at variable speeds. It is asynchronous. It can be compared, diffed, clustered, corrected, summarized and filtered algorithmically. It permits multiparty editing. It permits branching conversations, lurking, annotation, quoting, reviewing, summarizing, structured responses, exegesis, even fan fic. The breadth, scale and depth of ways people use text is unmatched by anything. There is no equivalent in any other communication technology for the social, communicative, cognitive and reflective complexity of a library full of books or an internet full of postings. Nothing else comes close.”

— Graydon Hoare, always bet on text, quoted by devcriollo

In this section I’ll look at properties that apply more specifically to text. Not everything in the quote above came up in discussion (and much of it is applicable to ordinary language more than to programming languages), but it does give an idea of the special position held by text.

Communicative ability

I think the reason is that text is already a highly optimized visual way to represent information. It started with cave paintings and evolved to what it is now.

“Please go to the supermarket and get two bottles of beer. If you see Joe, tell him we are having a party in my house at 6 tomorrow.”

It took me a few seconds to write that. Imagine I had to paint it.

Changu

The communicative range of text came up a few times. I’m not convinced on this one. It’s true that ordinary language has this ability to finely articulate incredibly specific meanings, in a way that pictures can’t match. But the real reference class we want to compare to is text-based programming, not ordinary language. Programming languages have a much more restrictive set of keywords that communicate a much smaller set of ideas, mostly to do with quantity, logical implication and control flow.

In the supermarket example above, the if-then structure could be expressed in these keywords, but all the rest of the work would be being done by tokens like “bottlesOfBeer”, which are meaningless to the computer and only help the human reading it.

As soon as we’ve assigned something a variable name, we’ve already altered our code into a form to assist our cognition.

sinker

It seems much more reasonable that this limited structure of keywords can be ported to a visual language, and in fact a node-based tool like LabVIEW seems to have most of them. Visual languages generally still have the ability to label individual items with text, so you can still have a “bottlesOfBeer” label if you want and get the communicative benefit of language. (It is true that a completely text-free language would be a pain to deal with, but nobody seems to be doing that anyway.)

Information density

A more convincing related point is that text takes up very little space. We’re already accustomed to distinguishing letters, even if they’re printed in a smallish font, and they can be packed together closely. It is true that the text-based version of the supermarket program would probably take up less space that a visual version.

This complaint came up a lot in relation to mathematical tasks, which are often built up by composing a large number of simpler operations. This can become a massive pain if the individual operations take up a lot of space:

Graphs take up much more space on the screen than text. Grab a pen and draw a computational graph of a Fourier transformation! It takes up a whole screen. As a formula, it takes up a tiny fraction of it. Our state machine used to take up about 2m x 2m on the wall behind us.

Regic

Many node-based tools seem to have some kind of special node for typing in maths in a more conventional linear way, to get around this problem.

(Sidenote: this didn’t come up in any of the discussions, but I am curious as to how fundamental this limitation is. Part of it comes from the sheer familiarity of text. The first letters we learned as a child were printed a lot bigger! So presumably we could learn to distinguish closely packed shapes if we were familiar enough with the conventions. At this point, of course, with a small number of distinctive glyphs, it would share a lot of properties with text-based language. See the section on discrete symbols below.)

Linearity

Humans are centered around linear communication. Spoken language is essentially linear, with good use of a stack of concepts. This story-telling mode maps better on a linear, textual representation than on a graphical representation. When provided with a graph, it is difficult to find the start and end. Humans think in graphs, but communicate linearly.

edejong

The linearity of text is a feature that is mostly preserved in programming. We don’t literally read one giant 1D line of symbols, of course. It’s broken into lines and there are special structures for loops. But the general movement is vertically downwards. “1.5 dimensions” is a nice description:

When you write text-based code, you are also restricted to 2 dimensions, but it’s really more like 1.5 because there is a heavy directionality bias that’s like a waterfall, down and across. I cannot copy pictures or diagrams into a text document. I cannot draw arrows between comments to the relevant code; I have to embed the comment within the code because of this dimensionality/directionality constraint. I cannot “touch” a variable (wire) while the program is running to inspect its value.

bmitc

It’s true that many visual environments give up this linearity and allow more general positioning in 2D space (arbitrary placing of boxes and arrows in node-based programming, for example, or the 2D grids in spreadsheets). This has benefits and costs.

On the costs side, linear structures are a good match to the sequential execution of program instructions. They’re also easy to navigate and search through, top to bottom, without getting lost in branching confusion. Developing tools like autoformatters is more straightforward (we saw this come up in the earlier section on missing tooling).

On the benefits side, 2D structures give you more of an expressive canvas for communicating the meaning of your program: grouping similar items together, for example, or using shapes to distinguish between types of object.

In LabVIEW, not only do I have a 2D surface for drawing my program, I also get another 2D surface to create user interfaces for any function if I need. In text-languages, you only have colors and syntax to distinguish datatypes. In LabVIEW, you also have shape. These are all additional dimensions of information.

bmitc

They can also help in remembering where things are:

One of the interesting things I found was that the 2-dimensional layout helped a lot in remembering where stuff was: this was especially useful in larger programs.

dpwm

And the match to sequential execution is less important if your target domain is also non-sequential in some way:

If the program is completely non-sequential, visual tools which reflects the structure of the program are going to be much better than text. For example, if you are designing a electronic circuit, you draw a circuit diagram. Describing a electronic circuit purely in text is not going to be very helpful.

nacc

Small discrete set of symbols

Written text IS a visual medium. It works because there is a finite alphabet of characters that can be combined into millions of words. Any other “visual” language needs a similar structure of primitives to be unambiguously interpreted.

c2the3rd

This is a particularly important point that was brought up by several commenters in different threads. Text is built up from a small number of distinguishable characters. Text-based programming languages add even more structure, restricting to a constrained set of keywords that can only be combined in predefined ways. This removes ambiguity in what the program is supposed to do. The computer is much stupider than a human and ultimately needs everything to be completely specified as a sequence of discrete primitive actions.

At the opposite end of the spectrum is, say, an oil painting, which is also a visual medium but much more of an unconstrained, freeform one, where brushstrokes can swirl in any arbitrary pattern. This freedom is useful in artistic fields, where rich ambiguous associative meaning is the whole point, but becomes a nuisance in technical contexts. So different parts of the spectrum are used for different things:

Because each method has its pros and cons. It’s a difference of generality and specificity.

Consider this list as a ranking: 0 and 1 >> alphabet >> Chinese >> picture.

All 4 methods can be useful in some cases. Chinese has tens of thousands of characters, some people consider the language close to pictures, but real pictures have more than that (infinite variants).

Chinese is harder to parse than alphabet, and picture is harder than Chinese. (Imagine a compiler than can understand arbitrary picture!)

c_shu

Visual programs are still generally closer to the text-based program end of the spectrum than the oil painting one. In a node-based programming language, for example, there might be a finite set of types of boxes, and defined rules on how to connect them up. There may be somewhat more freedom than normal text, with the ability to place boxes anywhere on a 2D canvas, but it’s still a long way from being able to slap any old brushstroke down. One commenter compared this to diagrammatic notation in category theory:

Category theorists deliberately use only a tiny, restricted set of the possibilities of drawing diagrams. If you try to get a visual artist or designer interested in the diagrams in a category theory book, they are almost certain to tell you that nothing “visual” worth mentioning is happening in those figures.

Visual culture is distinguished by its richness on expressive dimensions that text and category theory diagrams just don’t have.

theoh

Drag-n-drop editors are a bit further towards the freeform end of the spectrum, allowing UI elements to be resized continuously to arbitrary sizes. But there are still constraints — maybe your widgets have to be rectangles, for example, rather than any old hand-drawn shape. And, as discussed in earlier sections, there’s a tension between visual specificity and dynamic programming of many potential visual states at once. Drag-n-drop editors arguably lose a lot of the features of ‘true’ languages by giving up structure, and more programmatic elements are likely to still use a constrained set of primitives.

Finally, there was an insightful comment questioning how successful these constrained visual languages are compared to text:

I am not aware of a constrained pictorial formalism that is both general and expressive enough to do the job of a programming language (directed graphs may be general enough, but are not expressive enough; when extended to fix this, they lose the generality.)

… There are some hybrids that are pretty useful in their areas of applicability, such as state transition networks, dataflow models and Petri nets (note that these three examples are all annotated directed graphs.)

mannykannot

This could be a whole blog post topic in itself, and I may return to it in a follow-up post — Dutilh Novaes makes similar points in her discussion of tractability vs expressiveness in formal logic. Too much to go into here, but I do think this is important.

Grab bag of other interesting points

This section is exactly what it says — interesting points that didn’t fit into any of the categories above.

Allowing syntax errors

This is a surprising one I wouldn’t have thought of, but it came up several times and makes a lot of sense on reflection. A lot of visual programming tools are too good at preventing syntax errors. Temporary errors can actually be really useful for refactoring:

This is also one of the beauties of text programming. It allows temporary syntax errors while restructuring things.

I’ve used many visual tools where every block you laid out had to be properly connected, so in order to refactor it you had to make dummy blocks as input and output and all other kinds of crap. Adding or removing arguments and return values of functions/blocks is guaranteed to give you rsi from excessive mousing.

Too

I don’t quite understand why this is so common in visual tools specifically, but it may be to do with the underlying representation? One comment pointed out that this was a more general problem with any kind of language based on an abstract syntax tree that has to be correct at every point:

For my money, the reason for this is that a human editing code needs to write something invalid – on your way from Valid Program A to Valid Program B, you will temporarily write Invalid Jumble Of Bytes X. If your editor tries to prevent you writing invalid jumbles of bytes, you will be fighting it constantly.

The only languages with widely-used AST-based editing is the Lisp family (with paredit). They get away with this because:

  1. Lisp ‘syntax’ is so low-level that it doesn’t constrain your (invalid) intermediate states much. (ie you can still write a (let) or (cond) with the wrong number of arguments while you’re thinking).
  2. Paredit modes always have an “escape hatch” for editing text directly (eg you can usually highlight and delete an unbalanced parenthesis). You don’t need it often (see #1) – but when you need it, you really need it.

meredydd

Maybe this is more common as a way to build a visual language?

Hybrids

Take what we all see at the end of whiteboard sessions. We see diagrams composed of text and icons that represent a broad swath of conceptual meaning. There is no reason why we can’t work in the same way with programming languages and computer.

bmitc

Another recurring theme was a wish for hybrid tools that combined the good parts of visual and text-based tools. One example that came up in the ‘information density’ section was doing maths in a textual format in an otherwise visual tool, which seems to work quite well:

UE4 Blueprints are visual programming, and are done very well. For a lot of things they work are excellent. Everything has a very fine structure to it, you can drag off pins and get context aware options, etc. You can also have sub-functions that are their own graph, so it is cleanly separated. I really like them, and use them for a lot of things.

The issue is that when you get into complex logic and number crunching, it quickly becomes unwieldy. It is much easier to represent logic or mathematics in a flat textual format, especially if you are working in something like K. A single keystroke contains much more information than having to click around on options, create blocks, and connect the blocks. Even in a well-designed interface.

Tools have specific purposes and strengths. Use the right tool for the right job. Some kind of hybrid approach works in a lot of use cases. Sometimes visual scripting is great as an embedded DSL; and sometimes you just need all of the great benefits of high-bandwidth keyboard text entry.

mgreenleaf

Even current text-based environments have some hybrid aspect, as most IDEs support syntax highlighting, autocompletion, code folding etc to get some of the advantages of visualisation.

Visualising the wrong thing

The last comment I’ll quote is sort of ranty but makes a deep point. Most current visual tools only visualise the kind of things (control flow, types) that are already displayed on the screen in a text-based language. It’s a different representation of fundamentally the same thing. But the visualisations we actually want may be very different, and more to do with what the program does than what it looks like on the screen.

‘Visual Programming’ failed (and continues to fail) simply because it is a lie; just because you surround my textual code with boxes and draw arrows showing the ‘flow of execution’ does not make it visual! This core misunderstanding is why all these ‘visual’ tools suck and don’t help anyone do anything practical (read: practical = complex systems).

When I write code, for example a layout algorithm for a set of gui elements, I visually see the data in my head (the gui elements), then I run the algorithm and see the elements ‘move’ into position dependent upon their dock/anchor/margin properties (also taking into account previously docked elements positions, parent element resize delta, etc). This is the visual I need to see on screen! I need to see my real data being manipulated by my algorithms and moving from A to B. I expect with this kind of animation I could easily see when things go wrong naturally, seeing as visual processing happens with no conscious effort.

Instead visual programming thinks I want to see the textual properties of my objects in memory in fancy coloured boxes, which is not the case at all.

hacker_9

I’m not going to try and comment seriously on this, as there’s almost too much to say — it points toward to a large number of potential tools and visual paradigms, many of which are speculative or experimental. But it’s useful to end here, as a reminder that the scope of visual programming is not just some boxes with arrows with between.

Final thoughts

This post is long enough already, so I’ll keep this short. I collected all these quotes as a sort of exploratory project with no very clear aim in mind, and I’m not yet sure what I’m going to do with it. I probably want to write at least one follow-up post making links back to the Dutilh Novaes and Ong books on text as a technology. Other than that, here are a few early ideas that came to mind as I wrote it:

How much is ‘visual programming’ a natural category? I quickly discovered that commmenters had very different ideas of what ‘visual programming’ meant. Some of these are at least partially in tension with each other. For example, drag-n-drop UI editors often allow near-arbitrary placement of UI elements on the screen, using an intuitive visual interface, but are not necessarily very programmatic. On the other hand, node-based editors allow complicated dynamic logic, but are less ‘visual’, reproducing a lot of the conventions of standard text-based programming. Is there a finer-grained classification that would be more useful than the generic ‘visual programming’ label?

Meaning vs fluency. One of the most appealing features of visual tools is that they can make certain inherently visual actions much more intuitive (a colour picker is a very simple example of this). And proponents of visual programming are often motivated by making programming more understandable. At the same time, a language needs to be a fluent medium for writing code quickly. At the fluent stage, it’s common to ignore the semantic meaning of what you’re doing, and rely on unthinkingly executing known patterns of symbol manipulation instead. Desigining for transparent meaning vs designing for fluency are not the same thing — Vim is a great example of a tool that is incomprehensible to beginners but excellent for fluent text manipulation. It could be interesting to explore the tension between them.

‘Missing tooling’ deep dives. I’m not personally all that interested in following this up, it takes me some way from the ‘text as technology’ angle I came in from, but it seems like an obvious one to mention. The ‘missing tooling’ subsections of this post could all be dug into in far more depth. For each one, it would be valuable to compare many existing visual environments, and understand what’s already available and what the limitations are compared to normal text.

Is ‘folk wisdom from internet forums’ worth exploring as a genre of blog post? Finally, here’s a sort of meta question, about the form of the post rather than the content. There’s an extraordinary amount of hard-to-access knowledge locked up in forums like Hacker News. While writing this post I got distracted by a different rabbit hole about Delphi, which somehow led me to another one about Smalltalk, which… well, you know how it goes. I realised that there were many other posts in this genre that could be worth writing. Maybe there should be more of them?

If you have thoughts on these questions, or on anything else in the post, please leave them in the comments!

Speedrun: Abacus schools

(This is a speedrun post, where I set a one hour timer to see what I can find out about a subject. See the category tag for more examples.)

I’m currently reading Catarina Dutilh Novaes’s Formal Languages in Logic, and one part of the section on the historical development of mathematical notation jumped out at me as potentially interesting. Abbaco (‘abacus’) schools were a kind of practical school in medieval southern Europe that trained the sons of merchants and artisans in useful mathematics for bookkeeping and business. Apparently the mathematical culture associated with these schools actually went beyond the university education of the time in some respects, and helped push forward the development of algebra:

Indeed, modern algebra (and its notation) will ultimately emerge from the sub-scientific tradition of the abbaco schools, rather than the somewhat solidified academic tradition taught at the medieval universities.

I find these sort of semi-informal institutions on the edges of academia intriguing… I’m not sure how much I care about the details, but it seems worth an hour of investigation at least. There’s also a mention of Leonardo da Vinci and Danti Alighieri attending these schools, which could be interesting to follow up.

This speedrun session is also a bit different because we’re trying out a group speedrun event, and David MacIver and Eve Bigaj have also joined. Let’s see how it goes… As usual I typed this as I went and have done only minor tidying up afterwards, so there may be a bunch of typos and dodgy formatting.


There’s a wikipedia article, but it isn’t very long. Looks like there are a few other useful links though

Abacus school is a term applied to any Italian school or tutorial after the 13th century, whose commerce-directed curriculum placed special emphasis on mathematics, such as algebra, among other subjects. These schools sprang after the publication of Fibonacci’s Book of the Abacus and his introduction of the Hindu-Arabic numeral system. In Fibonacci’s viewpoint, this system, originating in India around 400 BCE, and later adopted by the Arabs, was simpler and more practical than using the existing Roman numeric tradition. Italian merchants and traders quickly adopted the structure as a means of producing accountants, clerks, and so on, and subsequently abacus schools for students were established.

So, yep, practical education for merchants and traders.

Significant for a couple of reasons. First they got rid of Roman numerals.

The number of Roman characters a merchant needed to memorize to carry out financial transactions as opposed to Hindu-numerals made the switch practical. Commercialists were first introduced to this new system through Leonardo Fibonacci, who came from a business family and had studied Arabic math. Being convinced of its uses, abacus schools were therefore created and dominated by wealthy merchants, with some exceptions

Also they were instrumental in rising literacy levels.

Nothing about algebra here! Another thing on the search page mentioned Cardano though so hopefully there will be a link.

Then there’s a bunch of stuff about the school system.

Italian abacus school systems differed more in their establishment than in their curriculum during the Middle Ages. For example, institutions and appointed educators were set up in a number of ways, either through commune patronage or independent masters’ personal funds. Some abbaco teachers tutored privately in homes. All instructors, however, were contractually bound to their agreement which usually meant that they could supplement their salary with tuition fees or other rates.

Could be an overlap here with medieval guild funding of universities (e.g. in Bologna), another subject I’m considering speedrunning on.

Independent teachers could also be hired by the commune, but for lower wages.[19] Most times, free-lance masters were contracted by a group of parents in a similar fashion to that of communal agreements, thus establishing their own school if the number of students being tutored was significant in size.[20] Abbaco apprentices training to become masters could also tutor household children and pay for their studies simultaneously.

Last (short) section is on the curriculum.

Arithmetic, geometry, bookkeeping, reading and writing in the vernacular were the basic elementary and secondary subjects in the abbaco syllabus for most institutions, which began in the fall, Mondays through Saturdays.

… Mathematical problems dealt with the everyday exchange of different types of goods or monies of differing values, whether it was in demand or in good quality, and how much of it was being traded. Other problems dealt with distribution of profits, where each member invested a certain sum and may have later withdrawn a portion of that amount

Well that wasn’t a very informative article. There isn’t one in Italian either, just Arabic (same info as English) and Persiian (a stub where I’m not going to even bother to hit translate). So I need to leave wikipedia very early.

OK, this looks good and more what I was after. ‘Solving the Cubic with Cardano – Aspects of Abbaco Mathematics’ by William Branson.

To understand the abbaco mathematics used by Cardano, we have to step back and look at the medieval tradition of abbaco schools and their masters. Though the subject is a fascinating and deep one, there is one particular aspect of this tradition that is crucial in the following account: abbaco masters thought in terms of canonical problems, and one particular canonical problem, the “Problem of Ten,” arises in the solution of the cubic that we will examine.

Quick summary of what they were, similar to wikipedia.

Abbaco mathematics was rhetorical—in Cardano’s time, most of the algebraic symbols with which we are so familiar were either recently invented, concurrent with the Ars Magna, or were well in the future. For example, ‘(+)’ and ‘(–)’ were first recorded in the 1480s, and were not in common use in 1545, when the Ars Magna was published. Robert Recorde would not invent the equals sign until 1557, and the use of letters and exponential notation would have to await Francois Viete in the 1590s and the Geometrie of Rene Descartes of 1637 [Note 2]. What Descartes would write as (x^3=ax+b,) Cardano wrote as “cubus aequalis rebus & numero” [Cardano 1662, Chapter 12, p. 251].

OK this is similar to what Dutilh Novaes was saying, people were solving problems that were algebraic in nature with unknowns to solve for, but the notation was still very wordy.

Rhetorical formulas can be difficult to remember, so algebraic rules were presented with canonical examples, which encoded the rules as algorithms within the examples. Thus, the mind of the abbaco master was a storehouse of such canonical examples, to which he compared the new problems that he came across in his work. When he recognized a parallel structure between the new problem and a canonical problem, he could solve the new problem by making appropriate substitutions into the canonical example.

So these ‘wordy’ forms still had some kind of canonical structures, it wasn’t just free text but was a kind of notation.

Such canonical examples occurred even in the foundational texts of abbaco mathematics, including the Algebra of al-Khwarizmi. An important example for us, one that occurs implicitly in Cardano’s solution to the cubic, is the “problem of ten” [Note 3]. Most abbaco texts had such problems, and one from Robert of Chester’s 1215 translation of al-Khwarizmi’s Algebra into Latin [al-Khwarizmi, p. 111] ran as follows:

Denarium numerum sic in duo diuido, vt vna parte cum altera multiplicata, productum multiplicationis in 21 terminetur. Iam ergo vnam partem, rem proponimus quam cum 10 sine re, quae alteram partem habent, multiplicamus…

In his translation of this passage into English, Louis Karpinski used (x) for ‘rem’ (thing), and so I offer my own translation, without symbols [Note 4]:

Ten numbers in two parts I divide in such a way, in order that one part with the other multiplied has the product of the multiplication conclude with 21. Now therefore one part we declare the thing, and then, with 10 without the thing, which the other part is, we multiply…

My god I can’t even be bothered to read all of that that… very glad we don’t do maths like that now…

The structure of the “problem of ten” was that of a number (a) broken into two parts (x) and (y,) with a condition on the parts; symbolically: [x+y=a\,\,{\rm and}\,\,f(x,y)=b] for some function (f(x,y)) and number (b.) The usual method of solution was to express the two parts as “thing” and “number minus thing” and then to substitute into the condition, as al-Khwarizmi did above. The “problem of ten” was canonical for quadratic problems, and served as a way to remember the rules for solving such problems.

This was used in Cardano’s solution to the cubic, apparently, but there’s no more detail on this page, it just ends there. Looks like a book extract or something.

There’s another MAA page on abbaco schools, though, so I’ll read that next. This is ‘Background: The Abbaco Tradition’ by Randy K. Schwarz.

Bit more detail on where these schools were:

They arose first in northern Italy, whose economy was the most vibrant in Europe during this period (Spiesser 2003, pp. 34-35). A banker and official in Florence, Italy, reported that in 1345 at least 1,000 boys in that city alone were receiving instruction in abbaco and algorismo (Biggs 2009, p. 73). Such schools also began to appear in neighboring southern France, and a few in Catalonia (the area around Barcelona, Spain) and coastal North Africa. These four regions of the western Mediterranean had extensive trade and cultural ties with one another at the time, so it isn’t surprising that they shared methods of practical mathematics and its instruction (Høyrup 2006).

Mentions the Fibonacci book again as a common ancestor. Ah so this is why Fibonacci knew this stuff:

He was only a boy, he reports, when his father, a customs official representing Pisan merchants at their trading enclave of Bugia, in what is now Algeria, brought him to the customs house there to be taught Hindu-Arabic numerals and arithmetic (Sigler 2002, pp. 3, 15)

This article is part of a series on something called the Pamiers manuscript, which translated some of this into French maybe? or some language in modern France anyway. look up later if time.

Nice picture of teaching in an abbaco school here.

In general, the abbaco texts offered practical, simplified treatments in which mathematical techniques were distilled into easy-to-remember rules and algorithms. The focus was on how to carry these out rather than on justifying the theory behind them. At the same time, the books were often innovative in their solutions to particular problems and especially in their pedagogical approach: their presentation was popular, and they introduced the use of illustrations and vernacular languages to the history of mathematics textbooks.

Reference here to something called Swetz 1987, ‘Capitalism and Arithmetic: The New Math of the 15th Century’.

OK this article finishes here too… and I still have 34 minutes, this might be a difficult speedrun for finding information. I may as well skim the intro page and find out what the Pamiers manuscript is while I’m here.

Pamiers is in the far south of France, south of Toulouse near the Pyrenees. Written in the Languedocian language.

One of the striking features of the Pamiers manuscript is the fact that it includes the world’s earliest known instance in which a negative number was accepted as the answer to a problem for purely mathematical reasons. The fact that this occurred in the context of a commercial arithmetic, rather than a more scholastic or theoretical work, is a surprise.

Ah, nice, this is the sort of thing I was hoping for, new ideas coming up in the context of practical problems.

Back to wikipedia for now, what else can I find?

I found a pdf by Albrecht Heeffer which is very short but does mention one interesting book.

The abbaco or abbacus tradition (spelled with double b to distinguish it from the material calculating device called ‘abacus’) has the typical characteristics of a so-called ‘sub-scientific’ tradition of mathematical practice (coined by Jens Høyrup). It is supported by lay culture, e.g. merchants, artisans and surveyors. Knowledge is disseminated through master-apprentice relationships, often within family relations. Texts, as far as they are extant, are written in the vernacular. The tradition is open to foreign influences, including cross-cultural practices. Typically, the tradition is underrepresented in the history of mathematics.

Dutilh Novaes also mentioned the Høyrup book so maybe that is what I should really be reading. It’s this ‘sub-scientific’ angle that I’m interested in.

Abbaco masters made subtle but important contributions to the development of early symbolism. Their two centuries of algebraic practice paved the road for the development of symbolic algebra during the sixteenth century. They introduced mathematical techniques such as complete induction which is believed to have emerged a century later

Yeah, ok, so this is an interesting subject but I probably need to be reading books to find the good bits, rather than skimming the internet. Similar to Vygotsky speedrun maybe.

Let’s find out what this Høyrup book is called. Ah it must be this book mentioned on his wikipedia page: ‘Jacopo da Firenze’s Tractatus algorismi and early italian abacus culture.’ Yes I’m definitely going to buy these chapters off Springer for 25.95 euros each, sounds like a great idea.

Ah here’s a copy of a pdf by Høyrup! It’s 34 pages so I don’t have time to go into the details, but I can skim it. Hm also it looks like it’s mainly arguing about the centrality of Fibonacci in the tradition, I’m not interested in that, I’m interested in the sub-scientific thing.

First though I’d like to chase up that thing about Dante and da Vinci.

20 minutes left.

Search ‘da Vinci abbacco school’, oh god the results are full of random schools named after him and references to The Da Vinci Code. Must include: abbaco.

I have found another vaguely useful paper though, ‘The Market for Luca Pacioli’s Summa Arithmatica’ by Alan Sangster and others. Something here about the two-track nature of education in Renaissance Italy, with these schools at the practical end.

The curriculum of the vernacular schools emerged from the merchant culture and was designed to prepare sons of merchants and craftsmen for their future working lives [Grendler, 1990]. There was another parallel set of schools, the Latin (either scholastic or humanist) schools, where the sons of the privileged were taught in Latin.

The two sets of schools taught very different subjects. The Latin schools sought to teach the future leaders of society and those that aided them, e.g., secretaries and lawyers [Grendler,1989, p. 311]. They specialized in the trivium of grammar, rhetoric, and logic… On the rare occasions when mathematics was taught in these schools, it took the form of “classical or medieval Latin mathematics” [Grendler, 1989, p. 309]. In contrast to the vernacular schools, boys leaving the humanist schools often went to university.

Hang on, why don’t I just look on da Vinci’s wikipedia page? It just says the following:

Despite his family history, Leonardo only received a basic and informal education in (vernacular) writing, reading and math, possibly because his artistic talents were recognized early.

which would at least be consistent with going to one of these schools. And Dante Alighieri:

Not much is known about Dante’s education; he presumably studied at home or in a chapter school attached to a church or monastery in Florence.

Hm, so what did Dutilh Novaes say? Ah, it’s a quote from Heeffer 2007, ‘Humanist Repudiation of Eastern Influences in Early Modern Mathematics’. Pdf is here. Should have looked this up to start with!

Actually I’m confused because, although this is very relevant looking, it doesn’t have the quote in it at all. Ah well, I may as well read it for the rest of the time anyway (only 5 minutes left!). The thing about Dante and da Vinci isn’t really important.

Here’s some more on the sub-scientific idea:

Jens Høyrup coined the term sub-scientific mathematics for a long tradition of practice which has been neglected by historians. As a scholar working on a wide period of mathematical practice, from Babylonian algebra to the seventeenth century, Høyrup has always paid much attention to the more informal transmission of mathematical knowledge which he calls sub-scientific structures.

This is pretty complicated to skim quickly.

The sub-scientific tradition was a cross-cultural amalgam of several traditions. Merchant type arithmetic and recreational problems show a strong similarity with Indian sources. Algebra descended from the Arabs. By the time Regiomontanus learned algebra in Italy it was practiced by abbaco masters for more than 250 years. The tradition of surveying and mensuration within practical geometry goes back to Babylonian times.

Some stuff on ‘proto-algebraic rules’.

Our main hypothesis is that many recipes or precepts for arithmetical problem solving, in abbaco texts and arithmetic books before the second half of the sixteenth century, are based on proto-algebraic rules. We call these rules proto-algebraic because they are, or could be based originally on algebraic derivations. Yet their explanation, communication and application do not involve algebra at all. Proto-algebraic rules are disseminated together with the problems to which they can be applied. The problem functions as a vehicle for the transmission of this sub-scientific structure. Little attention has yet been given to sub-scientific mathematics or proto-algebraic rules.

Ding! Time’s up.


Hm, that was kind of annoying to do a speedrun on, because the Wikipedia article was so short and I had to jump quickly to a bunch of other sources which all either had very limited detail or way too much detail. I never did get to the bottom of the Dante and da Vinci thing.

I’m also still not that clear on the details of exactly what new techniques they introduced, but looks like they were relevant to Cardano’s solution of the cubic, and also to the use of negative numbers in problems. They also introduced a bunch of schematic templates for solving problems, which later developed into modern algebraic notation.

The idea of ‘sub-scientific’ traditions sounds interesting more generally too, maybe I should look up the Høyrup book. Overall this looks like a topic where I’m better off reading books and papers than skimming random web pages.

Crackpot time 3: speculations will turn out well?

In 2017 I wrote two posts about my about my experiences with attempting to do physics outside of academia, which I called Crackpot Time 1 and Crackpot Time 2. At the time I was trying to reconnect to a more expansive, free-ranging energy that I had lost during the hyperfocus on technical details required for Ph.D. work. I was enjoying the ‘crackpot’ label as a kind of tongue-in-cheek pointer to the style of thinking I was trying to cultivate. I wanted to directly attack any topic that looked interesting, without fussing about whether the topic was ‘too ambitious’, or ‘too difficult’, or ‘not my field’. Small details like a total lack of relevant expertise didn’t matter.

I had a lot of this kind of energy in 2017, which was a very good year intellectually for me. I went to two deeply unusual and inspiring physics workshops that immediately raised my ambitions for what it would be possible for me to do in my spare time alongside a full time job. At the same time I was starting to take my side interest in mathematical intuition more seriously, and get oriented reading some phenomenology for the first time, so it was an intense time where I felt like the horizon was opening up fast in all directions. I started this blog and cranked out a bunch of short, unpolished but enthusiastic blog posts to try and make some sense of my thoughts.

I’ve been meaning to write another Crackpot Time update ever since, but just… never have. Partly that’s because I started a monthly newsletter practice in 2018 that took over some of the same role. But also it’s the standard inspiring workshop problem: the inspired feeling eventually wears off and then you then have to do the hard bit, which is doing the actual work. This is less immediately exciting and doesn’t autogenerate breathless updates about how amazing everything is, so they stopped appearing. I’ve finally decided to crank one out anyway, even if it’s effortful and uninspired.

At the beginning of 2020 I got this fortune cracker for Chinese New Year. Perfect fortune for a crackpot, right?

I’m now trying to evaluate whether speculations did in fact turn out well. It’s weirdly hard to decide. I’m normally at least somewhat confused by my progress – trying to do independent work in a complicated domain is slow and ambiguous at the best of times – but I think this is the most confused I’ve been in a long time. Long 2020 has obviously been enormously strange for everyone, and then on top of that I’m in a hard-to-interpret stuck phase. This is my best attempt to explain what I’ve been up to, and where I’m at now.

Focus and accountability

I’m not going to try and go over everything I’ve done since 2017, nobody cares including me, but I’ll do a few quick catch-up paragraphs to get me to the beginning of 2020. I had two good strategic ideas at the start of 2018. The first was to pick a very specific topic to focus on. My natural tendency is to dissipate my energies going partway down some interesting rabbithole before getting distracted by something else, and only end up with a very vague high-level understanding of anything. Useful for getting a sense of the territory, useless for making any sort of meaningful contribution to it.

To counteract that, I picked a single 8-page paper, A toy model for quantum mechanics by S. J. van Enk, as my focus for the whole year. I had some sense that this particular paper would be a good anchor for me, and that turned out to be correct. The core toy model is very concrete and easy to play around with, but touches on a number of ideas in quantum foundations that interest me – negative probabilities, the phase space formulation of quantum physics, the Spekkens toy model. There are also potential intriguing connections to my favourite recurring fascination, retrocausal interpretations of quantum physics. Having to stick close to the anchor paper meant I could explore aspects of these big topics without disappearing off into uselessly ungrounded speculation.

The second good idea was to use a monthly email newsletter as an accountability mechanism, inspired by this post. This wasn’t a Substack or anything, just a bog standard email that I sent out to a handful of people. I’d ramble a bit about what I’d done in the month, and that gave me a bit more incentive to stay on track. I stuck fairly closely to the area of this paper for the whole of 2018 and didn’t stray much further in 2019 either. This gave me far more focussed knowledge than I’d managed to pick up before working on my own.

At the beginning of the year I wrote the following:

My plan for 2018 is to go beyond just learning some physics in my spare time and to do ‘something novel’, interpreted broadly. ‘Novel’ in this case doesn’t have to mean original research (though that would definitely count) – I’m thinking of a wider conception of what counts as a novel contribution, in the style of Chris Olah and Shan Carter’s Research debt essay (I wrote some comments on it here).

I’ve never been too fussed about whether anything I do is original in the sense required for an academic physics paper, as a completely new technical contribution to the field. But my ambitions are higher than just passively making notes from a textbook. I want to follow my own curiosity trail through a subject, write down what I notice on the way, and highlight ideas and connections that currently aren’t available in digested blog post form. The sort of work that Olah and Carter call ‘research distillation’ in the essay linked above.

This took longer to spin up than I was initially hoping for, and I spent most of 2018 just learning background and writing notes. I finally got going in 2019 and had a few thoughts on negative probabilities from a somewhat novel angle, which produced a couple of posts and a mildly popular twitter thread. So that takes me up to 2020, and the fortune cookie.

Long 2020

In early 2020 I had a tedious hour+ two-bus commute to work and sometimes skimmed some interesting-looking papers on my phone. Otherwise I wasn’t getting much done, because my energy was sapped by the stupid commute. I decided to have a twitter break in February to claw back whatever time I could, which worked fairly well. Some time near the end I spent a Saturday holed up in a corner of Bath University library where I had an idea for a very basic toy model that was quite limited by itself but maybe extensible in some interesting way. I was excited to figure out what it could do and started fiddling around with that for the next week or two.

I got back on twitter on March 1 to discover everything had been replaced by coronavirus panic, which was a big shock to me because I had almost completely ignored it until then. So I started catching up on panic, and the toy model went out of my head for the next couple of months along with everything else that wasn’t covid. I no longer had the bus commute, but I also couldn’t think properly, so that didn’t help much.

After a couple of months my brain came back online at least partially, but the toy model was completely dropped. (I still haven’t managed to pay it any consistent attention, it’s a loose thread at the moment.) Instead I remembered the papers I’d been reading on the bus. I’d been learning about Abramsky and Hardy’s logical Bell inequality work, and I realised that I could use the tools from this to finish off a half-baked idea for a post on Bell’s theorem that I had, connecting a classic popular-science explanation to the version you’d find in a text book. The logical Bell inequality techniques made a natural bridge between the two, and over the summer I was able to use this idea to extend my scrappy notes into a full post that I was pretty happy with. I was finally managing the kind of distillation work I’d been thinking about at the start of 2018.

After that I was on a roll, and found a second use for the logical Bell techniques. In my 2019 posts on negative probabilities I used a very simple toy model created by Dan Piponi as an illustrative example. I picked it because it was simple, but I was also intrigued by its relation to quantum physics – it’s structurally similar to qubit phase space, but the specific numbers are different. In a sense it’s even further from classical physics, with the negative probability being more negative than anything allowed in quantum physics.

I’d noticed before that this was interestingly parallel to a much more well-known case of something being ‘worse than quantum physics’, the Popescu-Rohrlich box, but thought it was only a vague similarity. Once I had the logical Bell tools I realised that there was an exact numerical analogy. I couldn’t find this described anywhere else, so I started writing it up.

Unfortunately this took long enough that it took me into the long depressing UK lockdown winter. The news was a constant stream of miserable statistics from the new covid variant mixed in with increasingly batshit US election nonsense, the weather was dark and grey, and working from home was getting more and more tedious. I eventually managed to finish the ‘worse than quantum mechanics’ stuff and get it out as two blog posts, but that overstressed my limited ability to care about things and once I published the posts I lost interest. I made some very half-hearted attempts to find out more about whether this was actually novel, and when this wasn’t completely straightforward I just dropped it. That was some time around February and I still haven’t picked it up again.

So… now what?

I’m writing this up now because I suddenly have a lot of free time. I’ve just quit my job – last day was last Friday – and haven’t lined up another one. I’m planning at least a couple of months off before I start thinking seriously about getting a new job. So this would be the perfect time to pick this up again. I’m not too bothered if I can’t get my attention back round to physics, because I have other weird projects that I am still keen to work on, but it does seem like a shame to just drop all this stuff. I’m not going to push it though.

The thing I’m feeling most is the lack of social support. I’m not naturally plugged in to a community of people in quantum foundations who are thinking about similar topics, so it can be difficult to keep motivation. David MacIver has a great newsletter post on Maintaining Niche Interests, where he talks about struggling with the same problem:

“Nobody actually wants to know” is a bit unfair. It’s more like… there are people who are interested, but they are both less interested than I am in the subject, and also I don’t talk to them much. The people who I talk to on a regular basis are not interested, because this is mostly not their field.

I feel it even more keenly in comparison with some of my other interests that I talk about on this blog and newsletter and on Twitter, where I do have some sort of community. I can talk about some pretty niche topics – Derrida, Vygotsky, the Prussian education system – and get meaningful informed responses from other people. Book recommendations, suggestions for related areas to explore, that sort of thing. It’s not the same as being in a densely-networked in-person research group, but it goes a surprisingly long way.

The pandemic has definitely made it worse. I do normally get some sense of shared community from the physics society I’m in, which organises workshops and meetups (including the two really inspiring ones I went to in 2017). But it’s very much a community built around meeting in person, rather than around producing large quantities of English-language text on the public internet. We’ve tried a few online calls and talks, but it’s not the same.

Even without the pandemic, though, I struggle with this. I’m just not very good at collaborating when it comes to physics. A lot of this is rooted in defensiveness – I’m just weird for a physicist, kind of slow and mediocre technically and with an odd thinking style, highly focussed on examples and weak on abstraction. I go into any interaction worrying that I’m going to look stupid and expecting to not be able to get my point across, which makes it even harder to get my point across, which… you get the idea. It’s difficult. I think I could make good incremental progress on this in the same way I made progress on blogging, but getting the right supportive environment to start the feedback loop going is tricky. Physics culture is not known for providing what I want.

In the mean time I’m going to keep plugging on with other projects and not force anything. After all, it’s been a strange enough year that I should probably feel happy that I did anything at all. Hopefully my interest in physics will return soon and I can get a better sense of whether speculations have turned out well.

Speedrun: The Prussian education system

This is another of my research speedrun experiments – I’ve made a category for them now, so look at the earlier ones if you want to know more.

Today’s topic was inspired by this tweet:

I’d noticed this one too. If you hang around parts of the internet where people talk about how School Is Bad a lot, someone will eventually bring up ‘the Prussian education system’ and how it was designed to indoctrinate factory workers or something. There is never any detail beyond this, we all nod sagely and move on.

Presumably there is more to learn about this topic. Let’s set that one hour timer and find out…


Ok, so… um… where’s Prussia? Somewhere round where Germany is now presumably, but which bit?

Prussia was a historically prominent German state that originated in 1525 with a duchy centered on the region of Prussia on the southeast coast of the Baltic Sea… Prussia, with its capital first in Königsberg and then, when it became the Kingdom of Prussia in 1701, in Berlin, decisively shaped the history of Germany.

Ah, so it included Königsberg, of bridge fame. And a big stripe of Baltic coast at its peak (1870 map).

My historical knowledge is not great and this will be a problem for contextualising all this stuff. Ah well, just get a quick sense of time and space. Done space, in terms of time we have:

The name Prussia derives from the Old Prussians; in the 13th century, the Teutonic Knights—an organized Catholic medieval military order of German crusaders—conquered the lands inhabited by them. In 1308, the Teutonic Knights conquered the region of Pomerelia with (Danzig) Gdańsk.

Then bla bla bla usual complicated mid european wars…

The union of Brandenburg and the Duchy of Prussia in 1618 led to the proclamation of the Kingdom of Prussia in 1701.

Prussia entered the ranks of the great powers shortly after becoming a kingdom,[5][6][7][8] and exercised most influence in the 18th and 19th centuries.

Then lots of complicated 20th century history.

… The Kingdom ended in 1918 along with other German monarchies that collapsed as a result of the German Revolution.

etc etc up to

Prussia existed de jure until its formal abolition by the Allied Control Council Enactment No. 46 of 25 February 1947.

Right I am now an expert on Prussia, ten minutes down.

Next is the wikipedia article on the Prussian education system.

The Prussian education system refers to the system of education established in Prussia…

yep I got that bit…

… as a result of educational reforms in the late 18th and early 19th century, which has had widespread influence since. The Prussian education system was introduced as a basic concept in the late 18th century and was significantly enhanced after Prussia’s defeat in the early stages of the Napoleonic Wars. The Prussian educational reforms inspired other countries and remains important as a biopower in the Foucaultian sense for nation-building.

Oh so is Foucault the source of this meme?? ‘Biopower’ is a bit of jargon I hadn’t heard before, open in new tab.

The term itself is not used in German literature, which refers to the primary aspects of the Humboldtian education ideal respectively as the Prussian reforms; however, the basic concept remains fruitful and has led to various debates and controversies.

Open the Humboldtian thing in another tab.

I’ll go through the wikipedia page sections in turn.

Origin

The basic foundations of a generic Prussian primary education system were laid out by Frederick the Great with the Generallandschulreglement, a decree of 1763 which was written by Johann Julius Hecker. Hecker had already before (in 1748) founded the first teacher’s seminary in Prussia.

Haha wtf:

His concept of providing teachers with the means to cultivate mulberries for homespun silk, which was one of Frederick’s favorite projects, found the King’s favour.

So this is in some way related to the king’s pet mulberry growing project??

It expanded the existing schooling system significantly and required that all young citizens, both girls and boys, be educated by mainly municipality-funded schools from the age of 5 to 13 or 14.

OK so this was one of the first systems of tax funded compulsory education. (compare the UK where this happened in the 1880s, it was still fresh history at the time of Lark Rise)

Topics are reading, writing and god stuff:

The Prussian system consisted of an eight-year course of primary education, called Volksschule. It provided not only basic technical skills needed in a modernizing world (such as reading and writing), but also music (singing) and religious (Christian) education in close cooperation with the churches and tried to impose a strict ethos of duty, sobriety and discipline. Mathematics and calculus were not compulsory at the start, and taking such courses required additional payment by parents.

There were also later educational stages preparing for university.

Oh wow so it already had national testing and a national curriculum (that was a big controversy in the UK in the 1990s).

The Prussian system, after its modest beginnings, succeeded in reaching compulsory attendance, specific training for teachers, national testing for all students (both female and male students), a prescribed national curriculum for each grade and mandatory kindergarten.

So it really did have a lot of the features of modern schooling, I see why it comes up so often. Teacher training as well, and credential gating for the civil service:

In 1810, Prussia introduced state certification requirements for teachers, which significantly raised the standard of teaching.[9] The final examination, Abitur, was introduced in 1788, implemented in all Prussian secondary schools by 1812 and extended to all of Germany in 1871. Passing the Abitur was a prerequisite to entering the learned professions and higher echelons of the civil service.

Outreach

The overall system was soon widely admired for its efficiency and reduction of illiteracy, and inspired education leaders in other German states and a number of other countries, including Japan and the United States.

The Japan link could be interesting… won’t follow that tangent…

The underlying Humboldtian educational ideal of brothers Alexander and Wilhelm von Humboldt was about much more than primary education; it strived for academic freedom and the education of both cosmopolitan-minded and loyal citizens from the earliest levels. The Prussian system had strong backing in the traditional German admiration and respect for Bildung as an individual’s drive to cultivate oneself from within.

These reforms ‘… had a background in the middle and upper middle strata of society and were pioneered by the Bildungsbürgertum.’ Look up that word: ‘a social class that emerged in mid-18th century Germany as an educated class of the bourgeoisie with an educational ideal based on idealistic values and classical antiquity. The Bildungsbürgertum could be described as the intellectual and economic upper bourgeoisie’

The concept as such faced strong resistance both from the top, as major players in the ruling nobility feared increasing literacy among peasants and workers would raise unrest, and from the very poor, who preferred to use their children as early as possible for rural or industrial labor.

Reformers got their chance after the defeat of Prussia in the Napoleonic Wars.

In 1809 Wilhelm von Humboldt, having been appointed minister of education, promoted his idea of a generic education based on a neohumanist ideal of broad general knowledge, in full academic freedom without any determination or restriction by status, profession or wealth.

Now some stuff on interaction with the nationalist movement, featuring my friend Fichte from The Roots of Romanticism. OK so he was keen on education reform as a part of his German nationalism project:

Fichte and other philosophers, such as the Brothers Grimm, tried to circumvent the nobility’s resistance to a common German nation state via proposing the concept of a Kulturnation, nationhood without needing a state but based on a common language, musical compositions and songs, shared fairy tales and legends and a common ethos and educational canon.

Then something about a guy called Jahn who liked gymnastics a lot and shoehorned a bunch of it into the curriculum. The forefather of horrible PE lessons.

Also privileging of High German as an official language.

Now a lot of stuff about Pietism.

Pietist theology stressed the need for "inner spirituality" (Innerlichkeit [de]), to be found through the reading of Scripture. Consequently, Pietists helped form the principles of the modern public school system, including the stress on literacy, while more Calvinism-based educational reformers (English and Swiss) asked for externally oriented, utilitarian approaches and were critical of internally soul searching idealism.

Oh I see, this is important, Pietism actually wanted people to read! Yeah so there’s a whole cluster of interest groups coming together.

Shit I’m 30 minutes in and need to speed up a bit. This is all too interesting! Though normally the wiki article tails off later anyway, so maybe I’m ok.

Some stuff about attitudes to teachers:

Generations of Prussian and also German teachers, who in the 18th century often had no formal education and in the very beginning often were untrained former petty officers, tried to gain more academic recognition, training and better pay and played an important role in various protest and reform movements throughout the 19th and into the 20th century… There is a long tradition of parody and ridicule, where teachers were being depicted in a janus-faced manner as either authoritarian drill masters or, on the other hand, poor wretches which were suffering the constant spite of pranking pupils, negligent parents and spiteful local authorities.

Open ‘Biedermeier’ tab though I don’t have time to look at it… ‘an era in Central Europe between 1815 and 1848 during which the middle class grew in number and the arts appealed to common sensibilities’.

Spread to other countries

Austria first under Maria Theresa, then widely after the French Revolution. Estonia and Latvia, Norway and Sweden, Finnish nationalist movement.

France and the UK took longer, ‘France due to conflicts between a radical secular state and the Catholic Church’ and UK just because of generally not liking change I think. Some stuff in the US too, Horace Mann and the common school movement in Massachusetts.

Now a section about tensions between Prussian system and Anglo culture:

The basic concept of a state-oriented and administered mass educational system is still not granted in the English-speaking world, where either the role of the state as such or the role of state control specifically in education faces still (respectively again) considerable skepticism… One of the important differences is that in the German tradition, there is stronger reference to the state as an important principle, as introduced for example by Hegel’s philosophy of the state, which is in opposition to the Anglo-American contract-based idea of the state.

Ah here’s a bit on the interaction with the Prussian system and military and industrial aims:

Early Prussian reformers took major steps to abandon both serfdom and the line formation as early as 1807 and introduced mission-type tactics in the Prussian military in the same year. The latter enlarged freedom in execution of overall military strategies and had a major influence in the German and Prussian industrial culture, which profited from the Prussian reformers’ introduction of greater economic freedom. The mission-type concept, which was kept by later German armed forces, required a high level of understanding, literacy (and intense training and education) at all levels and actively invited involvement and independent decision making by the lower ranks.

Ah so I’m nearly at the end of the article with 18 minutes to go, the rest is postwar legacy and I’d rather stay more in the historical period. I’ll look up Humboldt first and then maybe Foucault’s biopower thing if time?

Humboldtian model

Haha that’s confusing, there are two different Humboldts with two different ideals:

This article is about Wilhelm von Humboldt’s university concept. For the romantic ideal of science related to Alexander von Humboldt, see Humboldtian science.

So this goes beyond vocational training:

Sometimes called simply the Humboldtian model, it integrates the arts and sciences with research to achieve both comprehensive general learning and cultural knowledge, and it is still followed today.

From his letter to the Prussian king:

There are undeniably certain kinds of knowledge that must be of a general nature and, more importantly, a certain cultivation of the mind and character that nobody can afford to be without. People obviously cannot be good craftworkers, merchants, soldiers or businessmen unless, regardless of their occupation, they are good, upstanding and – according to their condition – well-informed human beings and citizens.

Greek classics are important:

Humboldt believed that study of the Hellenic past would help the German national consciousness, reconciling it with modernity but distinguishing it from French culture, which he saw as rooted in the Roman tradition.

Academic freedom independent from political/economic/religious influences.

Study should be guided by humanistic ideals and free thought, and knowledge should be formed on the basis of logic, reason, and empiricism rather than authority, tradition, or dogma.

University reform:

The University of Berlin, founded in 1810 under the influence of Wilhelm von Humboldt and renamed the Humboldt University of Berlin after World War II, is traditionally seen as the model institution of the 19th century.

Fichte was appointed by Humbolt there.

The university’s features included a unity in teaching and research, the pursuit of higher learning in the philosophy faculty, freedom of study for students (Lernfreiheit, contrasted with the prescriptive curricula of the French system) and corporate autonomy for universities despite state funding.

Don’t have time to check now, but I wonder how this interacted with the history of the Ph.D. system. I know that started in Germany…

Haha, France banned beards:

It was in competition with the post-Revolutionary French concept of the grandes écoles. The French system lacked the freedom of German universities and instead imposed severe discipline and control over curriculum, awarding of degrees, conformity of views, and personal habits, instituting, for example, a ban on beards in 1852.

OK 9 minutes left for Foucault’s biopower:

It relates to the practice of modern nation states and their regulation of their subjects through "an explosion of numerous and diverse techniques for achieving the subjugations of bodies and the control of populations".[1] Foucault first used the term in his lecture courses at the Collège de France,[2][3] and the term first appeared in print in The Will to Knowledge, Foucault’s first volume of The History of Sexuality. In Foucault’s work, it has been used to refer to practices of public health, regulation of heredity, and risk regulation, among many other regulatory mechanisms often linked less directly with literal physical health.

So, control of the state over peoples’ bodies.

Modern power, according to Foucault’s analysis, becomes encoded into social practices as well as human behavior, as the human subject gradually acquiesces to subtle regulations and expectations of the social order. It is an integral feature and essential to the workings of—and makes possible the emergence of—the modern nation state, capitalism, etc.

Hm this is going to take me a long way off topic. The article has no mention of Prussian anything. Let’s go back to something else for 5 minutes… what’s this Bildungsbürgertum article…

As a class of wealthy non-noble people, emerging first in the free imperial cities, they gained material wealth, social position and a better education, which was based on Humboldt’s educational ideal. The idea of Bildung (i.e. culture, education) was shaped by a belief in human perfectibility, specifically that an individual’s potential could be realized through a classical education.

In the late absolutist management state there existed a need for a large number of educated officials to implement reforms. To avoid a violent revolution, as in France, a national class was formed that had access to cultural education and thus to political positions. As a result, many educational institutions were established, significantly more in Germany. The universities established in Germany, including the Humboldt University, became a model for modern universities in other countries. This new class was not primarily defined politically or economically, but mainly culturally.

And the Biedermeier article?

Although the term itself derives from a literary reference from the period, it is used mostly to denote the artistic styles that flourished in the fields of literature, music, the visual arts and interior design.

The Biedermeier period does not refer to the era as a whole, but to a particular mood and set of trends that grew out of the unique underpinnings of the time in Central Europe

Ah so the word comes from a parody:

The term "Biedermeier" appeared first in literary circles in the form of a pseudonym, Gottlieb Biedermaier, used by the country doctor Adolf Kussmaul and lawyer Ludwig Eichrodt in poems that the duo had published in the Munich satirical weekly Fliegende Blätter in 1850.[4]

The verses parodied the people of the era, namely Samuel Friedrich Sauter, a primary teacher and sort of amateurish poet, as depoliticized and petit-bourgeois.


Time’s up! That went pretty well. In terms of sources I didn’t even leave Wikipedia because there was plenty there, so maybe not the most exciting from that perspective. Got a bit distracted down rabbit holes at the end, but that normally happens.

I definitely know a bit more than just ‘boo Prussian education system’ now, and the historical background was interesting. It meshed pretty well with The Roots of Romanticism in terms of time and place, so I had a bit more context than I was expecting.

I’d still like to know why it’s such a meme online… does it trace through Foucault or something else? If you have any leads, let me know!

Funny Turns

After a discussion about obscure Google Scholar hits on twitter last night, I just remembered this long list I made a few years ago. If you dig around postmodern/continental stuff long enough you discover there are a lot of Turns. Linguistic turns, rhetorical turns, hermeneutic turns… I never really did figure out what it was all about (potential speedrun question?)

But what are the weirder ones? I previously did this with the ‘X and its Discontents’ snowclone, which was funnier because people use it for very specific things like Newport or the Lawn Chemical Economy. This time it was mostly long boring abstract adjectives, which is maybe why I never published it. Still, here they are…

Linguistic
Postmodern
Hermeneutic
Interpretive
Mobility
Affective
Boy
Pragmatic
Practice
Cultural
Cognitive
Communicative
Corporeal
Complexity
Constructive
Constructivist
Spatial
Social
Sociological
Sociopolitical
Argumentative
Multilingual
Relational
Semantic
Semiotic
Structural
Systemic
Governance
Ontological
Reflexive
Rhetorical
Computational
Digital
Empirical
Ideational
Educational
Postsecular
Spiritual
Ideological
Action
Local
Narrative
Translational
Demotic
Archival
Performative
Deliberative
Iconic
Postcolonial
Decolonial
Territorial
Infrastructure
Intersectional
Neuroscientific
Transnational
Descriptive
Practical
Material
Participatory
Deconstructive
Leaderist
Cosmopolitan
Biographical
Spectral
Qualitative
Moral
Normative
Visual
Theoretical
Curatorial
Evolutionary
Ecological
Algorithmic
Neoliberal
Intercultural
Ethnographic
Consumerist
Geological
Animal

Speedrun: “Sensemaking”

This is a genre of post I’ve been experimenting with where I pick a topic, set a one hour timer and see what I can find out in that time. Previously: Marx on alienation and the Vygotsky Circle.

I’ve been seeing the term ‘sensemaking’ crop up more and more often. I even went to a workshop with the word in the title last year! I quite like it, and god knows we could all do with making more sense right now, but I’m pretty vague on the details. Are there any nuances of meaning that I’m missing by interpreting it in its everyday sense? I have a feeling that it has a kind of ecological tinge, group sensemaking more than individual sensemaking, but I could be off the mark.

Also, what’s the origin of the term? I get the impression that it’s associated with some part of the internet that’s not too distant from my own corner, but I’m not exactly sure which one. Time to find out…


OK start with wikipedia:

https://en.wikipedia.org/wiki/Sensemaking

> Sensemaking or sense-making is the process by which people give meaning to their collective experiences. It has been defined as "the ongoing retrospective development of plausible images that rationalize what people are doing" (Weick, Sutcliffe, & Obstfeld, 2005, p. 409). The concept was introduced to organizational studies by Karl E. Weick in the 1970s and has affected both theory and practice.

Who’s Weick?

> Karl Edward Weick (born October 31, 1936) is an American organizational theorist who introduced the concepts of "loose coupling", "mindfulness", and "sensemaking" into organizational studies.

And, um, what’s organizational studies?

Organizational studies is "the examination of how individuals construct organizational structures, processes, and practices and how these, in turn, shape social relations and create institutions that ultimately influence people".[1]

OK, something sociology-related. It’s a stub so probably not a huge subfield?

Weick ‘key contributions’ subheadings: ‘enactment’, ‘loose coupling’, ‘sensemaking’, ‘mindfulness’, ‘organizational information theory’

> Although he tried several degree programs within the psychology department, the department finally built a degree program specifically for Weick and fellow student Genie Plog called "organizational psychology".[3]

Only quoting this bc Genie Plog is a great name.

So, enactment: ‘certain phenomena are created by being talked about’. Fine.

Loose coupling:

> Loose coupling in Weick’s sense is a term intended to capture the necessary degree of flex between an organization’s internal abstraction of reality, its theory of the world, on the one hand, and the concrete material actuality within which it finally acts, on the other.

Hm that could be interesting but might take me too far off topic.

Sensemaking:

> People try to make sense of organizations, and organizations themselves try to make sense of their environment. In this sense-making, Weick pays attention to questions of ambiguity and uncertainty, known as equivocality in organizational research that adopts information processing theory.

bit vague but the next bit is more concrete:

> His contributions to the theory of sensemaking include research papers such as his detailed analysis of the breakdown of sensemaking in the case of the Mann Gulch disaster,[8] in which he defines the notion of a ‘cosmology episode’ – a challenge to assumptions that causes participants to question their own capacity to act.

Mann Gulch was a big firefighting disaster:

> As the team approached the fire to begin fighting it, unexpected high winds caused the fire to suddenly expand, cutting off the men’s route and forcing them back uphill. During the next few minutes, a "blow-up" of the fire covered 3,000 acres (1,200 ha) in ten minutes, claiming the lives of 13 firefighters, including 12 of the smokejumpers. Only three of the smokejumpers survived. The fire would continue for five more days before being controlled.

> The United States Forest Service drew lessons from the tragedy of the Mann Gulch fire by designing new training techniques and safety measures that developed how the agency approached wildfire suppression. The agency also increased emphasis on fire research and the science of fire behavior.

This is interesting but I’m in danger of tab explosion here. Keep a tab open with the paper and move on. Can’t resist opening the cosmology episode page though:

> A cosmology episode is a sudden loss of meaning, followed eventually by a transformative pivot, which creates the conditions for revised meaning.

ooh nice. Weick again:

> "Representations of events normally hang together sensibly within the set of assumptions that give them life and constitute a ‘cosmos’ rather than its opposite, a ‘chaos.’ Sudden losses of meaning that can occur when an event is represented electronically in an incomplete, cryptic form are what I call a ‘cosmology episode.’ Representations in the electronic world can become chaotic for at least two reasons: The data in these representations are flawed, and the people who manage those flawed data have limited processing capacity. These two problems interact in a potentially deadly vicious circle."

This is the kind of page that looks like it was written by one enthusiast. But it is pretty interesting. Right, back to Weick.

‘Mindfulness’: this is at a collective, organisational level

> The effective adoption of collective mindfulness characteristics by an organization appears to cultivate safer cultures that exhibit improved system outcomes.

I’m not going to look up ‘organizational information theory’, I have a bit of a ‘systems thinking’ allergy and I don’t wanna.

Right, back to sensemaking article. Roots in social psychology. ‘Shifting the focus from organizations as entities to organizing as an activity.’

‘Seven properties of sensemaking’. Ugh I hate these sort of numbered lists but fine.

  1. Identity. ‘who people think they are in their context shapes what they enact and how they interpret events’

  2. Retrospection. ‘the point of retrospection in time affects what people notice (Dunford & Jones, 2000), thus attention and interruptions to that attention are highly relevant to the process’.

  3. Enaction. ‘As people speak, and build narrative accounts, it helps them understand what they think, organize their experiences and control and predict events’

  4. Social activity. ‘plausible stories are preserved, retained or shared’.

  5. Ongoing. ‘Individuals simultaneously shape and react to the environments they face… As Weick argued, "The basic idea of sensemaking is that reality is an ongoing accomplishment that emerges from efforts to create order and make retrospective sense of what occurs"’

  6. Extract cues from the context.

  7. Plausibility over accuracy.

The sort of gestalt I’m getting is that it focusses on social rather than individual thinking, and action-oriented contextual in-the-thick-of-it doing rather than abstract planning ahead. Some similar terminology to ethnomethodology I think? e.g. accountability.

Ah yeah: ‘Sensemaking scholars are less interested in the intricacies of planning than in the details of action’

> The sensemaking approach is often used to provide insight into factors that surface as organizations address either uncertain or ambiguous situations (Weick 1988, 1993; Weick et al., 2005). Beginning in the 1980s with an influential re-analysis of the Bhopal disaster, Weick’s name has come to be associated with the study of the situated sensemaking that influences the outcomes of disasters (Weick 1993).

‘Categories and related concepts’:

> The categories of sensemaking included: constituent-minded, cultural, ecological, environmental, future-oriented, intercultural, interpersonal, market, political, prosocial, prospective, and resourceful. The sensemaking-related concepts included: sensebreaking, sensedemanding, sense-exchanging, sensegiving, sensehiding, and sense specification.

Haha OK it’s this sort of ‘fluidity soup’ that I have an allergy to. Too many of these buzzwords together. ‘Systems thinking’ is just a warning sign.

‘Other applications’: military stuff. Makes sense, lots of uncertainty and ambiguity there. Patient safety (looks like another random paragraph added by an enthusiast).

There’s a big eclectic ‘see also’ list. None of those are jumping out as the obvious next follow. Back to google. What I really want to know is why people are using this word now in some internet subcultures. Might be quite youtube centred? In which case there is no hope of tracking it down in one speedrun.

Oh yeah let’s look at google images:

Looks like businessy death by powerpoint contexts, not so helpful.

31 minutes left. Shit this goes quick!!

Google is giving me lots of video links. One is Daniel Schmachtenberger, ‘The War on Sensemaking’. Maybe this is the subcultural version I’ve been seeing? His name is familiar. Ok google ‘daniel schmachtenberger sensemaking’. Rebel Wisdom. Yep I’ve vaguely heard of that.

OK here is a Medium post about that series, by Andrew Sweeny:

> There is a war going on in our current information ecosystem. It is a war of propaganda, emotional manipulation, blatant or unconscious lies. It is nothing new, but is reaching a new intensity as our technology evolves. The result is that it has become harder and harder to make sense of the world, with potentially fatal consequences. If we can’t make sense of the world, neither can we make good decisions or meet the many challenges we face as a species.

Yes this is the sort of context I was imagining:

> In War on Sensemaking, futurist and visionary Daniel Schmachtenberger outlines in forensic detail the dynamics at play in this new information ecology — one in which we are all subsumed. He explores how companies, government, and media take advantage of our distracted and vulnerable state, and how we as individuals can develop the discernment and sensemaking skills necessary to navigate this new reality. Schmachtenberger has an admirable ability to diagnose this issue, while offering epistemological and practical ways to help repair the dark labyrinth of a broken information ecology.

It’d be nice to trace the link from Weick to this.

Some stuff about zero sum games and bullshit. Mentions Vervaeke.

> Schmachtenberger also makes the point that in order to become a good sensemaker we need ‘stressors’ — demands that push our mind, body, and heart beyond comfort, and beyond the received wisdom we have inherited. It is not enough to passively consume information: we first need to engage actively with with information ecology we live in and start being aware of how we respond to it, where it is coming from, and why it is being used.

Getting the sense that ‘information ecology’ is a key phrase round here.

Oh yeah ‘Game B’! I’ve heard that phrase around. Some more names: ‘Jordan Hall, Jim Rutt, Bonnita Roy’.

‘Sovereignty’: ‘become responsibility for our own shit’… ‘A real social, ‘kitchen sink level’ of reality must be cultivated to avoid the dangers of too much abstraction, individualism, and idealism.’ Seems like a good idea.

‘Rule Omega’. This one is new to me:

> Rule Omega is simple, but often hard to put into practice. The idea is that every message contains some signal and some noise, and we can train ourselves to distinguish truth and nonsense — to separate the wheat from the chaff. If we disapprove of 95% of a distasteful political rant, for instance, we could train ourselves to hear the 5% that is true.

> Rule Omega means learning to recognise the signal within the noise. This requires a certain attunement and generosity towards the other, especially those who think differently than we do. And Rule Omega can only be applied to those who are willing to engage in a different game, and work with each other in good faith.

Also seems like a Good Thing. Then some stuff about listening to people outside your bubble. Probably a link here to ‘mememic tribes’ type people.

This is a well written article, glad I picked something good.

‘Information war’ and shadow stuff:

> Certainly there are bad actors and conspiracies to harm us, but there is also the ‘shadow within’. The shadow is the unacknowledged part we play in the destruction of the commons and in the never-ending vicious cycle of narrative war. We need to pay attention to the subtle lies we tell ourselves, as much as the ‘big’ lies that society tells us all the time. The trouble is: we can’t help being involved in destructive game theory logic, to a greater or lesser degree.

‘Anti-rivalrous systems’. Do stuff that increases value for others as well as yourself. Connection to ‘anti-rivalrous products’ in economics.

‘Information immune system’. Yeah this is nice! It sort of somehow reminds me of the old skeptics movement in its attempts to help people escape nonsense, but rooted in a warmer and more helpful set of background ideas, and with less tribal outgroup bashing. Everything here sounds good and if it helps people out of ideology prisons I’m all for it. Still kind of curious about intellectual underpinnings… like is there a straight line from Weick to this or did they just borrow a resonant phrase?

‘The dangers of concepts’. Some self-awareness that these ideas can be used to create more bullshit and misinformation themselves.

> As such it can be dangerous to outsource our sensemaking to concepts — instead we need to embody them in our words and actions. Wrestling with the snake of self-deception and illusion and trying to build a better world in this way is a tough game. But it is the only game worth playing.

Games seem to be a recurring motif. Maybe Finite and Infinite Games is another influence.

OK 13 minutes left, what to do? Maybe trace out the link? google ‘schmachtenberger weick’. Not finding much. I’m now on some site called Conversational Leadership which seems to be connected to this scene somehow. Ugh not sure what to do. Back to plain old google ‘sensemaking’ search.

Let’s try this article by Laura McNamara, an organizational anthropologist. Nice job title! Yeah her background looks really interesting:

> Principal Member of Technical Staff at Sandia National Laboratories. She has spent her career partnering with computer scientists, software engineers, physicists, human factors experts, I/O psychologists, and analysts of all sorts.

OK maybe she is trying to bridge the gap between old and new usages:

> Sensemaking is a term that gets thrown around a lot without much consideration about where the concept came from or what it really means. If sensemaking theory is democratizing, that’s good thing.

6 minutes left so I won’t get through all of this. Pick some interesting bits.

> One of my favorite books about sensemaking is Karl Weick’s, Sensemaking in Organizations. I owe a debt of thanks to the nuclear engineer who suggested I read it. This was back in 2001, when I was at Los Alamos National Laboratory (LANL). I’d just finished my dissertation and was starting a postdoctoral position in the statistics group, and word got around that the laboratories had an anthropologist on staff. My nuclear engineer friend was working on a project examining how management changes were impacting team dynamics in one of LANL’s radiochemistry bench laboratories. He called me asking if I had time to work on the project with him, and he asked if I knew much about “sensemaking.” Apparently, his officemate had recently married a qualitative evaluation researcher, who suggested that both of these LANL engineers take the time to read Karl Weick’s book Sensemaking in Organizations.

> My nuclear engineer colleague thought it was the most brilliant thing he’d ever read and was shocked, SHOCKED, that I’d never heard of sensemaking or Karl Weick. I muttered something about anthropologists not always being literate in organizational theory, got off the phone, and immediately logged onto Amazon and ordered it.

Weick’s influences:

> … a breathtakingly broad array of ideas – Emily Dickinson, Anthony Giddens, Pablo Neruda, Edmund Leach…

‘Recipe for sensemaking:’

> Chapter Two of Sensemaking in Organizations contains what is perhaps Weick’s most cited sentence, the recipe for sensemaking: “How can I know what I think until I see what I say?”

And this from the intro paragraph, could be an interesting reference:

> in his gorgeous essay Social Things (which you should read if you haven’t already), Charles Lemert reminds us that social science articulates our native social intelligence through instruments of theory, concepts, methods, language, discourse, texts. Really good sociology and anthropology sharpen that intelligence. They’re powerful because they enhance our understanding of what it means to be human, and they really should belong to everyone.

Something about wiki platforms for knowledge sharing:

> For example, back in 2008, my colleague Nancy Dixon and I did a brief study—just a few weeks—examining how intelligence analysts were responding to the introduction of Intellipedia, a wiki platform intended to promote knowledge exchange and cross-domain collaboration across the United States Intelligence community.

DING! Time’s up.


That actually went really well! Favourite speedrun so far, felt like I found out a lot. Most of the references I ended up on were really well-written and clear this time, no wading through rubbish.

I’m still curious to trace the link between Weick and the recent subculture. Also I might read more of the disaster stuff, and read that last McNamara article more carefully. Lots to look into! If anyone has any other suggestions, please leave a comment 🙂

Worse than quantum physics, part 2

This is Part 2 of a two part explanation — Part 1 is here. It won’t make much sense on its own!

In this post I’m going to get into the details of the analogy I set up last time. So far I’ve described how the PR box is ‘worse than quantum physics’ in a specific sense: it violates the CHSH inequality more strongly than any quantum system, pushing past the Tsirelson bound of 2\sqrt{2} to reach the maximum possible value of 4. I also introduced Piponi’s box example, another even simpler ‘worse than quantum physics’ toy system.

This time I’ll explain the connection between Piponi’s box and qubit phase space, and then show that a similar CHSH-inequality-like ‘logical Bell inequality’ holds there too. In this case the quantum system has a Tsirelson-like bound of \sqrt{3}, interestingly intermediate between the classical limit of 1 and the maximum possible value of 3 obtained by Piponi’s box. Finally I’ll dump a load of remaining questions into a Discussion section in the hope that someone can help me out here.

A logical Bell inequality for the Piponi box

Here’s the table from the last post again:


Measurement T F
a 1 0
b 1 0
a \oplus b 1 0

As with the PR box, we can use the yellow highlighted cells in the table to get a version of Abramsky and Hardy’s logical Bell inequality \sum p_i \leq N-1, this time with N = 3 cells. These cells correspond to the three incompatible propositions a, b, a\oplus b, with combined probability \sum p_i = 3, violating the inequality by the maximum amount.

Converting to expected values E_i = 2p_i -1 gives

\sum E_i = 3 > N-2 = 1.

So that’s the Piponi box ↔ PR box part of the analogy sorted. Next I want to talk about the qubit phase space ↔ Bell state part. But first it will be useful to rewrite the table of Piponi box results in a way that makes the connection to qubit phase space more obvious:



The four boxes represent the four ‘probabilities’ P(a,b) introduced in the previous post, which can be negative. To recover the values in the table, add up rows, columns or diagonals of the diagram. For example, to find p(\lnot a), sum up the left hand column:

p(\lnot a) = P(\lnot a, b) + P(\lnot a, \lnot b) = \frac{1}{2} - \frac{1}{2} = 0.

Or to find p(a \oplus b), sum up the top-left-to-bottom-right diagonal:

p(a \oplus b) = P(a, \lnot b) + P(\lnot a, b) = \frac{1}{2} + \frac{1}{2} = 1.

I made the diagram below to show how this works in general, and now I’m not sure whether that was a good idea. It’s kind of busy and looking at the example above is probably a lot more helpful. On the other hand, I’ve gone through the effort of making it now and someone might find it useful, so here it is:


Qubit phase space

That’s the first part of the analogy done, between the PR box and Piponi’s box model. Now for the second part, between the CHSH system and qubit phase space. I want to show that the same set of measurements that I used for Piponi’s box also crops up in quantum mechanics as measurements on the phase space of a single qubit. This quantum case also violates the classical bound of \sum E_i = 1, but, as with the Tsirelson bound for an entangled qubit system, it doesn’t reach the maximum possible value. Instead, it tops out at \sum E_i = \sqrt{3}.

The measurements a, b, a\oplus b can be instantiated for a qubit in the following way. For a qubit |\psi\rangle, take

p(a)  = \langle \psi | Q_z | \psi \rangle ,

p(b) = \langle \psi | Q_x | \psi \rangle ,

with Q_i  = \frac{1}{2}(I-\sigma_i) for the Pauli matrices \sigma_i. The a\oplus b diagonal measurements then turn out to correspond to

p(a\oplus b) = \langle \psi | Q_y | \psi \rangle ,

completing the set of measurements.

This is the qubit phase space I described in my second post on negative probability – for more details on how this works and how the corresponding P(a,b)s are calculated, see for example the papers by Wootters on finite-state Wigner functions and Picturing Qubits in Phase Space.

As a simple example, in the case of the qubit state |0\rangle these measurements give

p(a) = 0

p(b) = \frac{1}{2}

p(a\oplus b) = \frac{1}{2},

leading to the following phase space:



A Tsirelson-like bound for qubit phase space

Now, we want to find the qubit state |\psi\rangle which gives the largest value of \sum p_i. To do this, I wrote out |\psi\rangle in the general Bloch sphere form |\psi\rangle = \cos(\theta / 2) |0\rangle + e^{i\phi} \sin(\theta / 2) |1\rangle and then maximised the value of the highlighted cells in the table:

\sum p_i = p(a) + p(b) + p(a\oplus b) = \frac{3}{2} - \frac{1}{2}(\cos\theta + \sin\theta\cos\phi + \sin\theta\sin\phi )

This is a straightforward calculation but the details are kind of fiddly, so I’ve relegated them to a separate page (like the boring technical appendix at the back of a paper, but blog post style). Anyway the upshot is that this quantity is maximised when \phi = \frac{5\pi}{4} , \sin\theta = \frac{\sqrt{2}}{\sqrt{3}} and \cos\theta = -\frac{1}{\sqrt{3}}, leading to the following table:


Measurement T F
a \frac{1}{2}\left(1 + \frac{1}{\sqrt{3}} \right) 0
b \frac{1}{2}\left(1 + \frac{1}{\sqrt{3}} \right) 0
a \oplus b \frac{1}{2}\left(1 + \frac{1}{\sqrt{3}} \right) 0

The corresponding qubit phase space, if you’re interested, is the following:


Notice the negative ‘probability’ in the bottom left, with a value of around -0.183. This is in fact the most negative value possible for qubit phase space.

This time, adding up the numbers in the yellow-highlighted cells of the table gives

\sum p_i = \frac{3}{2}\left(1 + \frac{1}{\sqrt{3}} \right),

or, in terms of expectation values,

\sum E_i = \sum (2p_i - 1) =   \sqrt{3}.

So \sqrt{3} is our Tsirelson-like bound for this system, in between the classical limit of 1 and the Piponi box value of 3.


Further questions

As with all of my physics blog posts, I end up with more questions than I started with. Here are a few of them:

Is this analogy already described in some paper somewhere? If so, please point me at it!

Numerology. Why \sqrt{3} and not some other number? As a first step, I can do a bit of numerology and notice that \sqrt{3} = \sqrt{N/2}, where N=6 is the number of cells in the table, and that this rule also fits the CHSH bound of 2\sqrt{2}, where there are N=16 cells.

I can also try this formula on the Mermin example from my Bell post. In that case N=36, so the upper bound implied by the rule would be 3\sqrt{2} … which turns out to be correct. (I didn’t find the upper bound in the post, but you can get it by putting \tfrac{1}{8}(2+\sqrt 2) in all the highlighted cells of the table, similarly to CHSH.)

The Mermin example is close enough to CHSH that it’s not really an independent data point for my rule, but it’s reassuring that it still fits, at least.

What does this mean? Does it generalise? I don’t know. There’s a big literature on different families of Bell results and their upper bounds, and I don’t know my way around it.

Information causality. OK, playing around with numbers is fine, but what does it mean conceptually? Again, I don’t really know my way around the literature. I know there’s a bunch of papers, starting from this one by Pawlowski et al, that introduces a physical principle called ‘information causality’. According to that paper, this states that, for a sender Alice and a receiver Bob,

> the information gain that Bob can reach about the previously unknown to him data set of Alice, by using all his local resources and m classical bits communicated by Alice, is at most m bits.

This principle somehow leads to the Tsirelson bound… as you can see I have not looked into the details yet. This is probably what I should do next. It’s very much phrased in terms of having two separated systems, so I don’t know whether it can be applied usefully in my case of a single qubit.

If you have any insight into any of these questions, or you notice any errors in the post, please let me know in the comments below, or by email.

Worse than quantum physics

I’m still down the rabbithole of thinking way too much about quantum foundations and negative probabilities, and this time I came across an interesting analogy, which I will attempt to explain in this post and the next one. This should follow on nicely from my last post, where I talked about one of the most famous weird features of quantum physics, the violation of the Bell inequalities.

It’s not necessary to read all of that post to understand this one, but you will need to be somewhat familiar with the Bell inequalities (and the CHSH inequality in particular) from somewhere else. For the more technical parts, you’ll also need to know a little bit about Abramsky and Hardy’s logical Bell formulation, which I also covered in the last post. But the core idea probably makes some kind of sense without that background.

So, in that last post I talked about the CHSH inequality and how quantum physics violates the classical upper limit of 2. The example I went through in the post is designed to make the numbers easy, and reaches a value of 2.5, but it’s possible to pick a set of measurements that pushes it further again, to a maximum of 2\sqrt{2} (which is about 2.828). This value is known as the Tsirelson bound.

This maximum value is higher than anything allowed by classical physics, but doesn’t reach the absolute maximum that’s mathematically attainable. The CHSH inequality is normally written something like this:

| E(a,b) + E(\bar{a}, b) + E(a, \bar{b}) - E(\bar{a}, \bar{b}) | \leq 2.

Each of the Es has to be between -1 and +1, so if it was possible to always measure +1 for the first three and -1 for the last one you’d get 4.

This kind of hypothetical ‘superquantum correlation’ is interesting because of the potential to illuminate what’s special about the Tsirelson bound – why does quantum mechanics break the classical limit, but not go all the way? So systems that are ‘worse than quantum physics’ and push all the way to 4 are studied as toy models that can hopefully illuminate something about the constraints on quantum mechanics. The standard example is known as the Popescu-Rohrlich (PR) box, introduced in this paper.

This sounds familiar…

I was reading up on the PR box a while back, and it reminded me of something else I looked into. In my blog posts on negative probability, I used a simple example due to Dan Piponi. This example has the same general structure as measurements on a qubit, but it’s also ‘worse than quantum mechanics’, in the sense that one of the probabilities is more negative than anything allowed in quantum mechanics. Qubits are somewhere in the middle, in between classical systems and the Piponi box.

I immediately noticed the similarity, but at first I thought it was probably something superficial and didn’t investigate further. But after learning about Abramsky and Hardy’s logical formulation of the Bell inequalities, which I covered in the last post, I realised that there was an exact analogy.

This is really interesting to me, because I had no idea that there was any sort of Tsirelson bound equivalent for a single particle system. I’ve already spent quite a bit of time in the last couple of years thinking about the phase space of a single qubit, because it seems to me that a lot of essential quantum weirdness is hidden in there already, before you even consider entanglement with a second qubit – you’ve already got the negative probabilities, after all. But I wasn’t expecting this other analogy to turn up.

I haven’t come across this result in the published literature. But I also haven’t done anything like a thorough search, and it’s quite difficult to because Piponi’s example is in a blog post, rather than a paper. So maybe it’s new, or maybe it’s too simple to write down and stuck in the ghost library, or maybe it’s all over the place and I just haven’t found it yet. I really don’t know, and it seemed like the easiest thing was to just write it up and then try and find out once I had something concrete to point at. I am convinced it hasn’t been written up at anything like a blog-post-style introductory level, so hopefully this can be useful however it turns out.

Post structure

I decided to split this argument into two shorter parts and post them separately, to make it more readable. This first part is just background on the Tsirelson bound and the PR box – there’s nothing new here, but it was useful for me to collect the background I need in one place. I also give a quick description of Piponi’s box model.

In the second post, I’ll move on to explaining the single qubit analogy. This is the interesting bit!

The Tsirelson bound: Mermin’s machine again

To illustrate how Tsirelson’s bound is attained, I’ll go back to Mermin’s machine from the last post. I’ll use the same basic setup as before, but move the settings on the detectors:


This time the two settings on each detector are at right angles to each other, and the right hand detector settings are rotated 45 degrees from the left hand detector. As before, quantum mechanics says that the probabilities of different combinations of lights flashing will obey

p(T,T) = p(F,F) = \frac{1}{2}\cos^2\left(\frac{\theta}{2}\right),

p(T,F) = p(F,T) = \frac{1}{2}\sin^2\left(\frac{\theta}{2}\right),

where \theta is the angle between the detector settings. The numbers are more hassly than Mermin’s example, which was picked for simplicity – here’s the table of probabilities:


Dial setting (T,T) (T,F) (F,T) (F,F)
ab \tfrac{1}{8}(2+\sqrt 2) \tfrac{1}{8}(2-\sqrt 2) \tfrac{1}{8}(2-\sqrt 2) \tfrac{1}{8}(2+\sqrt 2)
ab' \tfrac{1}{8}(2-\sqrt 2) \tfrac{1}{8}(2+\sqrt 2) \tfrac{1}{8}(2+\sqrt 2) \tfrac{1}{8}(2-\sqrt 2)
a'b \tfrac{1}{8}(2+\sqrt 2) \tfrac{1}{8}(2-\sqrt 2) \tfrac{1}{8}(2-\sqrt 2) \tfrac{1}{8}(2+\sqrt 2)
a'b' \tfrac{1}{8}(2+\sqrt 2) \tfrac{1}{8}(2-\sqrt 2) \tfrac{1}{8}(2-\sqrt 2) \tfrac{1}{8}(2+\sqrt 2)

Then we follow the logical Bell procedure of the last post, take a set of mutually contradictory propositions (the highlighted cells) and find their combined probability. This gives \sum p_i = 2+\sqrt 2, or, converting to expectation values E_i = 2p_i - 1,

\sum E_i = 2\sqrt 2 .

This is the Tsirelson bound.

The PR box

The idea of the PR box is to get the highest violation of the inequality possible, by shoving all of the probability into the highlighted cells, like this:

Dial setting (T,T) (T,F) (F,T) (F,F)
ab 1/2 0 0 1/2
a\bar{b} 0 1/2 1/2 0
\bar{a}b 1/2 0 0 1/2
\bar{a}\bar{b} 1/2 0 0 1/2

This time, adding up all the highlighted boxes gives the maximum \sum E_i = 4 .

Signalling

This is kind of an aside in the context of this post, but the original motivation for the PR box was to demonstrate that you could push past the quantum limit while still not allowing signalling between the two devices: if you only have access the left hand box, for example, you can’t learn anything about the right hand box’s dial setting. Say you set the left hand box to dial setting a. If the right hand box was set to b you’d end up measuring T with a probability of

p(T,T| a,b) + p(T,F| a,b) = \frac{1}{2} + 0 = \frac{1}{2}.

If the right hand box was set to \bar{b} instead you’d still get \frac{1}{2}:

p(T,T| a,\bar{b}) + p(T,F| a,\bar{b}) = 0 + \frac{1}{2} = \frac{1}{2}.

The same conspiracy holds if you set the left hand box to \bar{a}, so whatever you do you can’t find out anything about the right hand box.

Negative probabilities

Another interesting feature of the PR box, which will be directly relevant here, is the connection to negative probabilities. Say you want to explain the results of the PR box in terms of underlying probabilities P(a,a',b,b') for all of the settings at once. This can’t be done in terms of normal probabilities, which is not surprising: this property of having consistent results independent of the measurement settings you choose is exactly what’s broken down for non-classical systems like the CHSH system and the PR box.

However you can reproduce the results if you allow some negative probabilities. In the case of the PR box, you end up with the following:


P(T,T,T,T) = \frac{1}{2}

P(T,T,T,F) = 0

P(T,T,F,T) = -\frac{1}{2}

P(T,T,F,F) = 0

P(T,F,T,T) = 0

P(T,F,T,F) = 0

P(T,F,F,T) = \frac{1}{2}

P(T,F,F,F) = 0

P(F,T,T,T) = -\frac{1}{2}

P(F,T,T,F) = \frac{1}{2}

P(F,T,F,T) = \frac{1}{2}

P(F,T,F,F) = 0

P(F,F,T,T) = 0

P(F,F,T,F) = 0

P(F,F,F,T) = 0

P(F,F,F,F) = 0

(I got these from Abramsky and Brandenburger’s An Operational Interpretation of Negative Probabilities and No-Signalling Models.) To get back the probabilities in the table above, sum up all relevant Ps for each dial setting. As an example, take the top left cell of the table above. To get the probability of (T,T) for dial setting (a,b), sum up all cases where a and b are both T:

P(T,T,T,T) + P(T,T,T,F) + P(T,F,T,T) + P(T,F,T,F) = \frac{1}{2}

In this way we recover the values of all the measurements in the table – it’s only the Ps that are negative, not anything we can actually measure. This feature, along with the way that the number -\tfrac{1}{2} crops up specifically, is what reminded me of Piponi’s blog post.

Piponi’s box model

The device in Piponi’s example is a single box containing two bits a and b, and you can make one of three measurements: the value of a, the value of b, or the value of a \oplus b. The result is either T or F, with probabilities that obey the following table:


Measurement T F
a 1 0
b 1 0
a \oplus b 1 0

These measurements are inconsistent and can’t be described with any normal probabilities P(a,b), but, as with the PR box, they can with negative probabilities:

P(T,T) = \frac{1}{2}

P(T,F) = \frac{1}{2}

P(F,T) = \frac{1}{2}

P(F,F) = -\frac{1}{2}

For example, the probability of measuring a\oplus b and getting F is

P(T,T) + P(F,F) = \frac{1}{2} - \frac{1}{2} = 0,

as in the table above.

Notice that -\frac{1}{2} crops up again! The similarities to the PR box go deeper, though. The PR box is a kind of extreme version of the CHSH state of two entangled qubits – same basic mathematics but pushing the correlations up higher. Analogously, Piponi’s box is an extreme version of the phase space for a single qubit. In both cases, quantum mechanics is perched intriguingly in the middle between classical mechanics and these extreme systems. I’ll go through the details of the analogy in the next post.

Bell’s theorem and Mermin’s machine

> Anybody who’s not bothered by Bell’s theorem has to have rocks in his head.

— ‘A distinguished Princeton physicist’, as told to David Mermin

This post is a long, idiosyncratic discussion of the Bell inequalities in quantum physics. There are plenty of good introductions already, so this is a bit of a weird thing to spend my time writing. But I wanted something very specific, and couldn’t find an existing version that had all the right pieces. So of course I had to spend far too much time making one.

My favourite introduction is Mermin’s wonderful Quantum Mysteries for Anyone. This is an absolute classic of clear explanation, and lots of modern pop science discussions derive from it. It’s been optimised for giving a really intense gut punch of NOTHING IN THE WORLD MAKES SENSE ANY MORE, which I’d argue is the main thing you want to get out of learning about the Bell inequalities.

However, at some point if you get serious you’ll want to actually calculate things, which means you’ll need to make the jump from Mermin’s version to the kind of exposition you see in a textbook. The most common modern version of the Bell inequalities you’ll see is the CHSH inequality, which looks like this:

| E(a,b) + E(\bar{a}, b) + E(a, \bar{b}) - E(\bar{a}, \bar{b}) | < 2

(It doesn’t matter what all of that means, at the moment… I’ll get to that later.) The standard sort of derivations of this tend to involve a lot of fussing with algebraic rearrangements and integrals full of \lambdas and so forth. The final result is less of a gut punch and more of a diffuse feeling of unease: "well I guess this number has to be between -2 and 2, but it isn’t".

This feels like a problem to me. There’s a 1929 New Yorker cartoon which depicts ordinary people in the street walking around dumbstruck by Einstein’s theory of general relativity. This is a comic idea because the theory was famously abstruse (particularly back then when good secondary explanations were thin on the ground). But the Bell inequalities are accessible to anyone with a very basic knowledge of maths, and weirder than anything in relativity. I genuinely think that everyone should be walking down the street clutching their heads in shock at the Bell inequalities, and a good introduction should help deliver you to this state. (If you don’t have rocks in your head, of course. In that case nothing will help you.)

It’s also a bit of an opaque black box. For example, why is there a minus sign in front of one of the Es but not the others? I was in a discussion group a few years back with a bunch of postdocs and PhD students, all of us with a pretty strong interest in quantum foundations, and CHSH came up at some point. None of us had much of a gut sense for what that minus sign was doing… it was just something that turned up during some algebra.

I wanted to trace a path from Mermin’s explanation to the textbook one, in the hope of propagating some of that intuitive force forward. I wrote an early draft of the first part of this post for a newsletter in 2018 but couldn’t see how to make the rest of it work, so I dropped it. This time I had a lot more success using some ideas I learned in the meantime. I ended up taking a detour through a third type of explanation, the ‘logical Bell inequalities’ approach of Abramsky and Hardy. This is a general method that can be used on a number of other similar ‘no-go theorems’, not just Bell’s original. It gives a lot more insight into what’s actually going on (including that pesky minus sign). It’s also surprisingly straightforward: the main result is a few steps of propositional logic.

That bit of propositional logic is the most mathematically involved part of this post. The early part just requires some arithmetic and the willingness to follow what Mermin calls ‘a simple counting argument on the level of a newspaper braintwister’. No understanding of the mathematics of quantum theory is needed at all! That’s because I’m only talking about why the results of quantum theory are weird, and not how the calculations that produce those results are done.

If you also want to learn to do the calculations, starting from a basic knowledge of linear algebra and complex numbers, I really like Michael Nielsen and Andy Matuschak’s Quantum Country, which covers the basic principles of quantum mechanics and also the Bell inequalities. You’d need to do the ‘Quantum computing for the very curious’ part, which introduces a lot of background ideas, and then the ‘Quantum mechanics distilled’ part, which has the principles and the Bell stuff.

There’s also nothing about how the weirdness should be interpreted, because that is an enormous 90-year-old can of rotten worms and I would like to finish this post some time in my life 🙂

Mermin’s machine

So, on to Mermin’s explanation. I can’t really improve on it, and it would be a good idea to go and read that now instead, and come back to my version afterwards. I’ve repeated it here anyway though, partly for completeness and partly because I’ve changed some notation and other details to mesh better with the Abramsky and Hardy version I’ll come to later.

(Boring paragraph on exactly what I changed, skip if you don’t care: I’ve switched Mermin’s ‘red’ and ‘green’ to ‘true’ and ‘false’, and the dial settings from 1,2,3 on both sides to a, a', a'' on the left side and b, b', b'' on the right side. I’ve also made one slightly more substantive change. Mermin explains at the end of his paper that in his setup, ‘One detector flashes red or green according to whether the measured spin is along or opposite to the field; the other uses the opposite color convention’. I didn’t want to introduce the complication of having the two detectors with opposite wiring, and have made them both respond the same way, flashing T for along the field and F for opposite. But I also wanted to keep Mermin’s results. To do that I had to change the dial positions of the right hand dial, so that a is opposite b, a' is opposite b', and a'' is opposite b''. )

Anyway, Mermin introduces the following setup:



The machine in the middle is the source. It fires out some kind of particle – photons, electrons, frozen peas, whatever. We don’t really care how it works, we’ll just be looking at why the results are weird.

The two machines on the right and left side are detectors. Each detector has a dial with three settings. On the left they’re labelled a, a' and a''. On the right, they’re b, b' and b''.

On the top of each are two lights marked T and F for true and false. (Again, we don’t really care what’s true or false, we’re keeping everything at a kind of abstract, operational level and not going into the practical details. It’s just two possible results of a measurement.)

It’s vital to this experiment that the two detectors cannot communicate at all. If they can, there’s nothing weird about the results. So assume that a lot of work has gone into making absolutely sure that the detectors are definitely not sharing information in any way at all.

Now the experiment just consists of firing out pairs of particles, one to each detector, with the dials set to different values, and recording whether the lights flash red or green. So you get a big list of results of the form

ab'TF, a''bFT, a'b'FF, ...

The second important point, other than the detectors not being able to communicate, is that you have a free choice of setting the dials. You can set them both beforehand, or when the particles are both ‘in flight’, or even set the right hand dial after the left hand detector has already received its particle but before the right hand particle gets there. It doesn’t matter.

Now you do like a million billion runs of this experiment, enough to convince you that the results are not some weird statistical fluctuation, and analyse the results. You end up with the following table:


Dial setting (T,T) (T,F) (F,T) (F,F)
ab 1/2 0 0 1/2
ab' 1/8 3/8 3/8 1/8
ab'' 1/8 3/8 3/8 1/8
a'b 1/8 3/8 3/8 1/8
a'b' 1/2 0 0 1/2
a'b'' 1/8 3/8 3/8 1/8
a''b 1/8 3/8 3/8 1/8
a''b' 1/8 3/8 3/8 1/8
a''b'' 1/2 0 0 1/2

Each dial setting has a row, and the entries in that row give the probabilities for getting the different results. So for instance if you set the dials to a' and b, there’s a 1/8 chance of getting (T,T).

This doesn’t obviously look particularly weird at first sight. It only turns out to be weird when you start analysing the results. Mermin condenses two results from this table which are enough to show the weirdness. The first is:

Result 1: This result relates to the cases where the two dials are set to ab, a'b', or a''b''. In these cases both lights always flash the same colour. So you might get ab TT, ab FF, a'b' TT etc, but never ab TF or a''b'' FT.

This is pretty easy to explain. The detectors can’t communicate, so if they do the same thing it must be something to do with the properties of the particles they are receiving. We can explain it straightforwardly by postulating that each particle has an internal state with three properties, one for each dial position. Each of these takes two possible values which we label T or F. We can write these states as e.g.

TTF

TTF

where the the entries on the top line refer to the left hand particle’s state when the dial is in the a, a' and a'' positions respectively, and the bottom line refers to the right hand particle’s state when the dial is in the b, b', b'' position.

Result 1 implies that the states of the two particles must always be the same. So the state above is an allowed one, but e.g.

TTF

TFF

isn’t.

Mermin says:

> This hypothesis is the obvious way to account for what happens in [Result 1]. I cannot prove that it is the only way, but I challenge the reader, given the lack of connections between the devices, to suggest any other.

Because the second particle will always have the same state to the first one, I’ll save some typing and just write the first one out as a shorthand. So the first example state will just become TTF.

Now on to the second result. This one covers the remaining options for dial settings, a'b', a''b and the like.

Result 2: For the remaining states, the lights flash the same colour 1/4 of the time, and different colours 3/4 of the time.

This looks quite innocuous on first sight. It’s only when you start to consider how it meshes with Result 1 that things get weird.

(This is the part of the explanation that requires some thinking ‘on the level of a newspaper braintwister’. It’s fairly painless and will be over soon.)

Our explanation for result 1 is that particles in each run of the experiment have an underlying state, and both particles have the same state. Let’s go through the implications of this, starting with the example state TTF.

I’ve enumerated the various options for the dials in the table below. For example, if the left dial is a and the right dial is b', we know that the left detector will light up T and the right will light up T, so the two lights are the same.


Dial setting Lights
ab' same
ab'' different
a'b same
a'b'' different
a''b different
a''b' different

Overall there’s a 1/3 chance of being the same and a 2/3 chance of being different. You can convince yourself that this is also true for all the states with two Ts and an F or vice versa: TTF TFF, TFT, FTT, FTF, FFT.

That leaves TTT and FFF as the other two options. In those cases the lights will flash the same colour no matter what the dial is set to.

So whatever the underlying state is, the chance of the two lights being different is greater than ⅓. But this is incompatible with Result 2, which says that the probability is ¼.

(The thinky part is now done.)

So Results 1 and 2 together are completely bizarre. No assignment of states will work. But this is exactly what happens in quantum mechanics!

You probably can’t do it with frozen peas, though. The details don’t matter for this post, but here’s a very brief description if you want it: the particles should be two spin-half particles prepared in a specific ‘singlet’ state, the dials should connect to magnets that can be oriented in three states at 120 degree angles from each other, and the lights on the detectors measure spin along and opposite to the field. The magnets should be set up so that the state for setting a on the left hand side is oriented at 180 degrees from the state for setting b on the right hand side; similarly a' should be opposite b' and a'' opposite b''. I’ve drawn the dials on the machine to match this. Quantum mechanics then says that the probabilities of the different results are

p(T,T) = p(F,F) = \frac{1}{2}\cos^2{\frac{\theta}{2}}

p(T,F) = p(F,T) = \frac{1}{2}\sin^2{\frac{\theta}{2}}

where \theta is the angle between the magnet states on the left and right sides. This reproduces the numbers in the table above.

Once more with less thinking

Mermin’s argument is clear and compelling. The only problem with it is that you have to do some thinking. There are clever details that apply to this particular case, and if you want to do another case you’ll have to do more thinking. Not good. This is where Abramsky and Hardy’s logical Bell approach comes in. It requires more upfront setup (so actually more thinking in the short term – this section title is kind of a lie, sorry) but can then be applied systematically to all kinds of problems.

This first involves reframing the entries in the probability table in terms of propositional logic. For example, we can write the result (T,F) for (a’,b) as a' \land \lnot b. Then the entries of the table correspond to the probabilities we assign to each statement: in this case, \text{prob}(a' \land \lnot b) = \frac{3}{8}.

Now, look at the following highlighted cells in three rows of the grid:


Dial setting (T,T) (T,F) (F,T) (F,F)
ab 1/2 0 0 1/2
ab' 1/8 3/8 3/8 1/8
ab'' 1/8 3/8 3/8 1/8
a'b 1/8 3/8 3/8 1/8
a'b' 1/2 0 0 1/2
a'b'' 1/8 3/8 3/8 1/8
a''b 1/8 3/8 3/8 1/8
a''b' 1/8 3/8 3/8 1/8
a''b'' 1/2 0 0 1/2

These correspond to the three propositions

\phi_1 = (a\land b) \lor (\lnot a \land\lnot b)

\phi_2 = (a'\land b') \lor (\lnot a' \land\lnot b')

\phi_3 = (a''\land b'') \lor (\lnot a'' \land\lnot b'') ,

which can be written more simply as

\phi_1 = a \leftrightarrow b

\phi_2 = a' \leftrightarrow b'

\phi_3 = a'' \leftrightarrow b''.

where the \leftrightarrow stands for logical equivalence. This also means that a can be substituted for b, and so on, which will be useful in a minute.

Next, look at the highlighted cells in these three rows:


Dial setting (T,T) (T,F) (F,T) (F,F)
ab 1/2 0 0 1/2
ab' 1/8 3/8 3/8 1/8
ab'' 1/8 3/8 3/8 1/8
a'b 1/8 3/8 3/8 1/8
a'b' 1/2 0 0 1/2
a'b'' 1/8 3/8 3/8 1/8
a''b 1/8 3/8 3/8 1/8
a''b' 1/8 3/8 3/8 1/8
a''b'' 1/2 0 0 1/2

These correspond to

\phi_4 = (a\land \lnot b') \lor (\lnot a \land b')

\phi_5 = (a\land \lnot b'') \lor \lnot (a \land b'')

\phi_6 = (a'\land \lnot b'') \lor (\lnot a' \land b'') ,

which can be simplified to

\phi_4 = a \oplus b'

\phi_5 = a \oplus b''

\phi_6 = a' \oplus b''.

where the \oplus stands for exclusive or.

Now it can be shown quite quickly that these six propositions are mutually contradictory. First use the first three propositions to get rid of b , b' and b'', leaving

a \oplus a'

a \oplus a''

a' \oplus a''

You can check that these are contradictory by drawing out the truth table, or maybe just by looking at them, or maybe by considering the following stupid dialogue for a while (this post is long and I have to entertain myself somehow):


Grumpy cook 1: You must have either beans or chips but not both.

Me: OK, I’ll have chips.

Grumpy cook 2: Yeah, and also you must have either beans or peas but not both.

Me: Fine, looks like I’m having chips and peas.

Grumpy cook 3: Yeah, and also you must have either chips or peas but not both.

Me:

Me: OK let’s back up a bit. I’d better have beans instead of chips.

Grumpy cook 1: You must have either beans or chips but not both.

Me: I know. No chips. Just beans.

Grumpy cook 2: Yeah, and also you must have either beans or peas but not both.

Me: Well I’ve already got to have beans. But I can’t have them with chips or peas. Got anything else?

Grumpy cook 3: NO! And remember, you must have either chips or peas.

Me: hurls tray


So, yep, the six highlighted propositions are inconsistent. But this wouldn’t necessarily matter, as some of the propositions are only probabilistically true. So you could imagine that, if you carefully set some of them to false in the right ways in each run, you could avoid the contradiction. However, we saw with Mermin’s argument above that this doesn’t save the situation – the propositions have ‘too much probability in total’, in some sense, to allow you to do this. Abramsky and Hardy’s logical Bell inequalities will quantify this vague ‘too much probability in total’ idea.

Logical Bell inequalities

This bit involves a few lines of logical reasoning. We’ve got a set of propositions \phi_i (six of them in this example case, N in general), each with probability p_i. Let P be the probability of all of them happening together. Call this combined statement

\Phi = \bigwedge_i \phi_i.

Then

1 - P = \text{prob}\left( \lnot\Phi\right) = \text{prob}\left(\bigvee_i \lnot\phi_i\right)

where the second equivalence is de Morgan’s law. This is definitely less than the sum of the probabilities of all the \lnot\phi_i s:

1 - P \leq \text{prob} \sum_i (\lnot\phi_i)

= \sum_i (1 - p_i)

= N - \sum_i p_i .

where N is the total number of propositions. Rearranging gives

\sum_i p_i \leq N + P - 1.

Now suppose the \phi_i are jointly contradictory, as in the Mermin example above, so that the combined probability P = 0. This gives the logical Bell inequality

\sum_i p_i \leq N-1 .

This is the precise version of the ‘too much probability’ idea. In the Mermin case, there are six propositions, three with probability 1 and three with probability ¾, which sum to 5.25. This is greater than N-1 = 5, so the inequality is violated.

This inequality can be applied to lots of different setups, not just Mermin’s. Abramsky and Hardy use the CHSH inequality mentioned in the introduction to this post as their first example. This is probably the common example used to introduce Bell’s theorem, though the notation is usually somewhat different. I’ll go though Abramsky and Hardy’s version and then connect it back to the standard textbook notation.

The CHSH inequality

The CHSH experiment only uses two settings on each side, not three. I’ve drawn a ‘CHSH machine’ in the style of Mermin’s machine to illustrate it:



There are two settings a and \bar{a} on the left side, 60 degrees apart. And there are two settings b and \bar{b} on the right side, also 60 degrees apart, with b opposite a. This leads to the following table:


Dial setting (T,T) (T,F) (F,T) (F,F)
ab 1/2 0 0 1/2
a\bar{b} 3/8 1/8 1/8 3/8
\bar{a}b 3/8 1/8 1/8 3/8
\bar{a}\bar{b} 1/8 3/8 3/8 1/8

Now it’s just a case of following the same reasoning as for the Mermin case. The highlighted rows correspond to the propositions

\phi_1 = (a \land b) \lor  \lnot (a \land \lnot b) = a \leftrightarrow b

\phi_2 = (a \land \bar{b}) \lor \lnot (a \land \lnot \bar{b}) = a \leftrightarrow \bar{b}

\phi_3 = (\bar{a} \land b) \lor \lnot (\bar{a} \land \lnot b) = \bar{a} \leftrightarrow b

\phi_4 = (\lnot \bar{a} \land \bar{b}) \lor (\bar{a} \land \lnot \bar{b}) = \bar{a} \oplus \bar{b}

As with Mermin’s example, these four propositions can be seen to be contradictory. Rather than trying to make up more stupid dialogues, I’ll just follow the method in the paper. First use \phi_3 to replace \bar{a} with b in \phi_4:

\phi_4 = b \oplus \bar{b} .

Then use \phi_1 to swap out b again, this time with a:

\phi_4 = a \oplus \bar{b} .

Finally use \phi_2 to swap out a with \bar{b}, leaving

\bar{b} \oplus \bar{b}

which is clearly contradictory.

(Sidenote: I guess these sort of arguments to show a contradiction do involve some thinking, which is what I was trying to avoid earlier. But in each case you could just draw out a truth table, which is a stupid method that a computer could do. So I think it’s reasonable to say that this is less thinking than Mermin’s method.)

Again, this violates the logical Bell inequality. In total, we have

\sum_i p_i = 1 + \frac{3}{4}  + \frac{3}{4}  + \frac{3}{4} = 3.25 > 3.

The textbook version of this inequality is a bit different. For a start, it uses an ‘expectation value’ for each proposition rather than a straightforward probability, where truth is associated with +1 and falsity with -1. So each proposition \phi_i has an expectation value E_i with

E_i = (+1)\cdot p_i + (-1)\cdot (1-p_i) = 2p_i -1.

Then summing over the E_is gives

\sum_i E_i = \sum_i (2p_i-1) = 2\sum_i p_i - N

and then, using the previous form of the logical Bell inequality,

\sum_i E_i \leq 2(N-1) - N = N-2.

A similar argument for -E_i shows that \sum_i E_i \geq -(N-2), so that this is a bound above and below:

|\sum_i E_i| \leq N - 2.

In this case N = 4 and so the inequality becomes |\sum_i E_i| \leq 2. However adding up the E_is associated to the propositions \phi_i gives 2.5, so the inequality is violated.

There’s still a little further to go to get the textbook version, but we’re getting close. The textbook version writes the CHSH inequality as

| E(a,b) + E(\bar{a}, b) + E(a, \bar{b}) - E(\bar{a}, \bar{b}) | < 2.

where the expectation value is written in the form

E(a,b) = \int A(a,\lambda) B(b, \lambda)\rho(\lambda) d\lambda.

The \lambda are ‘hidden variables’ – properties of the particles that dispose them to act in various ways. For example, in the Mermin case, we imagined them to have hidden states, like

TFF

TFF

that controlled their response to each dial, and showed that any choice of these hidden states would lead to a contradiction.

For a given \lambda, A(\lambda, a) and B(\lambda, b) are the values measured by the left and right hand machines respectively. In our case these values are always either +1 (if the machine flashes T) or -1 (if the machine flashes F). The CHSH argument can also be adapted to a more realistic case where some experimental runs have no detection at all, and the outcome can also be 0, but this simple version won’t do that.

For the dial settings a and b, all we care about with these hidden variables is whether they make the machines respond true or false. So in our case \lambda is just a set of four variables, \lambda = { a\land b, a\land \lnot b, \lnot a\land b, \lnot a\land\lnot b }, and the integral can just become a sum:

E(a,b) = (+1 \times +1)\cdot p(a\land b) + (+1 \times -1)\cdot p(a\land \lnot b) + (-1 \times +1)\cdot p(\lnot a\land b) + (-1 \times -1)\cdot p(\lnot a\land \lnot b)

= p(a\land b) + p(\lnot a\land \lnot b) - p(a\land \lnot b) - p(\lnot a\land b).

= p((a\land b) \lor \lnot (a\land \lnot b)) - p((a\land \lnot b) \lor(\lnot a\land b)).

Now that first proposition (a\land b) \lor \lnot (a\land \lnot b) is just \phi_1 from earlier, which had probability p_1. And the second one covers all the remaining possibilities, so it has probability 1-p_1. So

E(a,b) = p_1 - (1-p_1) = 2p_1 - 1 = E_1.

The argument goes through exactly the same way for E(a, \bar{b}) and E(\bar{a}, b). The last case, E(\bar{a}, \bar{b}), is slightly different. We get

E(\bar{a}, \bar{b}) = p((\bar{a}\land \bar{b}) \lor \lnot (\bar{a}\land \lnot \bar{b})) - p((\bar{a}\land \lnot \bar{b}) \lor(\lnot \bar{a}\land \bar{b}))

following the same logic as before. But this time \phi_4 matches the second proposition (\bar{a}\land \lnot \bar{b}) \lor(\lnot \bar{a}\land \bar{b}), not the first, so that

E(\bar{a}, \bar{b}) = (1-p_4) - p_4 = 1 - 2p_4 = -E_4.

This is where the minus sign in the CHSH inequality comes in! We have

|\sum_i E_i| = | E(a, b) + E(a, \bar{b}) + E(\bar{a}, b) - E(\bar{a}, \bar{b}) | \leq 2.

So we end up with the standard inequality, but with a bit more insight into where the pieces come from. Also, importantly, it’s easy to extend to other situations. For example, you could follow the same method with the six Mermin propositions from earlier to make a kind of ‘Mermin-CHSH inequality’:

|\sum_i E_i| = | E(a, b) + E(a', b') + E(a'', b'') - E(a, b') - E(a, b'') - E(a', b'') | \leq 4.

Or you could have three particles, or a different set of measurements, or you could investigate what happens with other tables of correlations that don’t appear in quantum physics… this is a very versatile setup. The original paper has many more examples.

Final thoughts

There are still some loose ends that it would be good to tie up. I’d like to understand exactly how the inequality-shuffling in a ‘textbook-style’ proof of the CHSH inequality connects to Abramsky and Hardy’s version. Presumably some of it is replicating the same argument, but in a more opaque form. But also some of it must need to deal with the fact that it’s a more general setting, and includes things like measurements returning 0 as well as +1 or -1. It would be nice to figure out which bits are which. I think Bell’s original paper didn’t have the zero thing either, so that could be one place to look.

On the other hand… that all sounds a bit like work, and I can’t be bothered for now. I’d rather apply some of this to something interesting. My next post is probably going to make some connections between the logical Bell inequalities and my previous two posts on negative probability.

If you know the answers to my questions above and can save me some work, please let me know in the comments! Also, I’d really like to know if I’ve got something wrong. There are a lot of equations in this post and I’m sure to have cocked up at least one of them. More worryingly, I might have messed up some more conceptual points. If I’ve done that I’m even more keen to know!