## Grabiner on eighteenth century mathematics

These are some notes I wrote a couple of years ago on Judith Grabiner’s paper ‘Is Mathematical Truth Time Dependent?’ David Chapman suggested I put them up somewhere public, so here they are with a few tweaks and comments. They’re still notes, though – don’t expect proper sentences everywhere! I’m not personally hugely interested in the framing question about mathematical truth, but I really enjoyed the main part of the paper, which compares the ‘if it works, do it’ culture of eighteenth century mathematics to the focus on rigour that came later.

I haven’t read all that much history of mathematics, so I don’t have a lot of context to put this in. If something looks off or oversimplified let me know.

I found this essay in an anthology called New Directions in the Philosophy of Mathematics, edited by Thomas Tymoczko. I picked this book up more or less by luck when I was a PhD student and a professor was having a clearout, and I didn’t have high hopes – nothing I’d previously read about philosophy of mathematics had made much sense to me. Platonism, logicism, formalism and the rest all seemed equally bad, and I wasn’t too interested in formal logic and foundations. However, this book promised something different:

The origin of this book was a seminar in the philosophpy of mathematics held at Smith College during the summer of 1979. An informal group of mathematicians, philosophers and logicians met regularly to discuss common concerns about the nature of mathematics. Our meetings were alternately frustrating and stimulating. We were frustrated by the inablility of traditional philosophical formulations to articulate the actual experience of mathematicians. We did not want yet another restatement of the merits and vicissitudes of the various foundational programs – platonism, logicism, formalism and intuitionism. However, we were also frustrated by the difficulty of articulating a viable alternative to foundationalism, a new approach that would speak to mathematicians and philosophers about their common concerns. Our meetings were most exciting when we managed to glimpse an alternative.

There’s plenty of other good stuff in the book, including some famous pieces like Thurston’s classic On proof and progress in mathematics, and a couple of reprinted sections of Lakatos’s Proofs and Refutations.

Anyway, here are the notes. Anything I’ve put in quotes is Grabiner. Anything in square brackets is some random tangent I’ve gone off on.

Two “generalizations about the way many eighteenth-century mathematicians worked”:

1. “… the primary emphasis was on getting results”. Huge explosion in creativity, but “the chances are good that these results were originally obtained in ways utterly different from the ways we prove them today. It is doubtful that Euler and his contemporaries would have been able to derive their results if they had been burdened with our standards of rigor”.
2. “… mathematicians placed great reliance on the power of symbols. Sometimes it seems to have been assumed that if one could just write down something which was symbolically coherent, the truth of the statement was guaranteed.” This extended to e.g. manipulating infinite power series just like very long polynomials.

Euler’s Taylor expansion of $\cos(nz)$ starting from the binomial expansion as one example. He takes $z$ as infinitely small and $n$ as infinitely large, and is happy to assume their product is finite without worrying too much. “The modern reader may be left slightly breathless”, but he gets the right answer.

Trust in symbol manipulation was “somewhat anomalous in the history of mathematics”. Grabiner suggests it came from the recent success of algebra and the calculus. E.g. Leibniz’s notation, which “does the thinking for us” (chain rule as example). This also extended out of maths, e.g. Lavoisier’s idea of ‘chemical algebra’.

18th c was interested in foundations (e.g. Berkeley on calculus being insufficiently rigorous) but this was “not the basic concern” and generally was “relegated to Chapter I of textbooks, or found in popularizations”, not in research papers.

This changed in the 19th c beginning with Cauchy and Bolzano – beginnings of rigorous treatments of limits, continuity etc.

## Why did standards change?

“The first explanation which may occur to us is like the one we use to justify rigor to our students today: the calculus was made rigorous to avoid errors, and to correct errors already made.” Doesn’t really hold up – there were surprisingly few mistakes in the 18th c stuff as they “had an almost unerring intuition”.

[I’ve been meaning to look into this for a while, as I get sick of that particular justification being trotted out, always with the same dubious examples. One of these is Weierstrass’s continuous-everywhere-differentiable-nowhere function. This is a genuine example of something the less rigorous approach failed to find, but it came much later, so isn’t what got people started on rigour.

The other example normally given is about something called “the Italian school of algebraic geometry”, which apparently went off the rails in the early 20th c and published false stuff. There’s some information on that in the answers to a MathOverflow question by Kevin Buzzard and the linked email from David Mumford – from a quick read it looks like it was one guy, Severi, who really lost it. Anyway, this is also a lot later than the 18th century.]

It is true though that by the end of the 18th c they were getting into topics – complex functions, multivariable calculus – where “there are many plausible conjectures whose truth is relatively difficult to evaluate intuitively”, so rigour was more useful.

Second possible explanation – need to unify the mass of results thrown up in the 18th c. Probably some truth to this: current methods were hitting diminishing returns, time to “sit back and reflect”.

Third explanation – prior existence of rigour in Euclid’s geometry. Berkeley’s attack on calculus was on this line.

One other interesting factor she suggests – an increasing need for mathematicians to teach (as they became employees of government-sponsored institutions rather than being attached to royal courts). École Polytechnique as model for these.

“Teaching always makes the teacher think carefully about the basis for the subject”. Moving from self-educated or apprentice-master set-ups, where you learn piecemeal from examples of successful thinking, to a more formalised ‘here are the foundations’ approach.

Her evidence – origins of foundational work often emerged from lecture courses. This was true for Lagrange, Cauchy, Weierstrass and Dedekind.

[I don’t know how strong this evidence is, but it’s a really interesting theory. I’ve had literalbanana‘s blog post on indexicality thoroughly stuck in my head for the last month, so I’m seeing that idea everywhere – this is one example. Teaching a large class forces you to take knowledge that was previously highly situated and indexical – ‘oh yes, you need to do this’ – and pull it out into a public form that makes sense to people not deeply immersed in that context. Compare Thurston’s quote in Proof and progress in mathematics: “When a significant theorem is proved, it often (but not always) happens that the solution can be communicated in a matter of minutes from one person to another within the subfield. The same proof would be communicated and generally understood in an hour talk to members of the subfield. It would be the subject of a 15- or 20-page paper, which could be read and understood in a few hours or perhaps days by members of the subfield.”]

## How did standards change?

Often 18th c methods were repurposed/generalised. E.g. haphazard comparisons of particular series to the geometric series became Cauchy’s general convergence tests. Or old methods of computing the error term epsilon for the nth approximation get turned round, so that we are given epsilon and show we can always find n to beat that error term. This is essentially the definition of convergence we still use today.

## Conclusion

Goes back to original question: is mathematical truth time-dependent? Sets up two bad options to knock down…

• Relativism. “‘Sufficient unto the day is the rigor thereof.’ Mathematical truth is just what the editors of the Transactions say it is.” This wouldn’t explain why Cauchy and Weierstrass were ever unsatisfied in the first place.

• MAXIMAL RIGOUR AT ALL TIMES. The 18th c was just sloppy. “According to this high standard, which textbooks sometimes urge on students, Euler would never have written a line.”

[A lot of my grumpiness about rigour is because it was exactly what I didn’t need as a first year maths student. I was just exploring the 18th century explosion myself and discovering the power of mathematics, and what I needed right then was to be able to run with that and learn more cool shit, without fussing over precise epsilon-delta definitions. Maybe it would have worked for me a couple of years later, if I’d seen enough examples to have come across a situation where rigour was useful. This seems to vary a lot though – David Chapman replied that lack of rigour was what he was annoyed by at that age, and he was driven to the library to read about Dedekind cuts.]

… then suggests “a third possibility”:

• A Kuhnian picture where mathematics grows “not only by successive increments, but also by occasional revolutions”. “We can be consoled that most of the old bricks will find places somewhere in the new structure”.

## The shitpost-to-scholarship pipeline

I’m at @ssica3003‘s Sensemaker Workshop today, and thought it would be fun to get a blog post out while I’m here, so I dug out this draft I wrote back in August for the newsletter. I wasn’t sure I liked it much at the time, but reading back it’s better than I remembered and works as a first stab at the the idea, at least. Hopefully I can get it a little further down the pipeline. There are some questions at the end, so let me know if you have any thoughts.

Anyway, the rough idea is… there’s an extraordinary explosion of creative idea generation going on online. And there’s this fascinating kind of pipeline where people will start feeling out the vague beginnings of an idea through twitter threads and dumb throwaway posts and blog comments and email conversations, and then if something looks promising they’ll discuss more and pull in bits of other people’s ideas, and gradually build up to more thought out, polished work.

I’m excited about this culture for a lot of reasons. It’s a kind of online version of the casual, unobservable ‘dark matter’ part of academia, the part you can’t access by looking at published work – all the throwing around wild claims over coffee in the common room and in the pub on Friday evenings, bits of ‘yeah, that paper is unreadable, but this is what they’re really talking about’ insight from people in the know, standing round the whiteboard trying to figure something out, group meeting gossip, and the like. And it’s an incredibly vivid and alive version, at a time when large parts of normal academia have become rigid and bureaucratised and plain boring. This seems important to me: it’s the shitposting engine that produces the raw generative power that can drive more focussed work further down the line.

There’s a kind of wild energy; people aren’t afraid to go after big topics. We’ve got ourselves free of the constraints of the academic pimple factory:

An example of Little History is an essay by Matt Might (clearly a Marvel superhero in a counterfactual universe) titled The Illustrated Guide to a Phd. Go read it. It’ll only take a minute. It frames the sum of all human knowledge as a big circular bubble, and your PhD as a little pimple on the surface of it. I’ll call this the Mighty Diagram. It gets passed around in graduate student circles with depressing frequency.

Instead of a dent in the universe, you get a pimple on a uncritically proceduralist conceptualization of the frontier of knowledge as the sum of all the peer-reviewed academic literature in the world.

What makes this essay utterly horrifying is that it is actually an accurate description of what a PhD is; it calibrates academic career expectations correctly and offers an accurate sense of perspective on the peer-reviewed life. I suspect Matt Might sincerely intended the essay as a helpful guide to academic survival, but its effect is to put aspiring scholars in their place, rather than help them find a sense of place in the universe. It’s a You Are Here map for your intellectual journey at the end of a PhD, you disgusting little pimple, you. Kneel before this awe-inspiring edifice of knowledge that you’re lucky to be allowed to add a pimple to.

This rings very true with my own experience of academia, and the mindset it got me into. I personally found that after a couple of years out of there my thinking kind of cleared and became more expansive, and I was able to have good ideas again.

Unfortunately, this pipeline only goes so far. Currently, I think we’re in something like this situation:

It currently tends to dump ideas out somewhere around the ‘insight porn’ point – ideas that you read, think ‘oh that’s clever’, hit the like button, maybe comment on or talk about for a bit, then completely forget a week later. In the best case, a fragment of the idea or a bit of new jargon escapes into the local thought soup and can be combined with other ideas that are currently percolating. Sometimes this can be quite a powerful effect on its own. But there are a lot of places that academia still goes to that just can’t be reached in this way.

One of my favourite examples of this dynamic is Sarah Perry’s theory of mess. This is a genuinely great idea, and it’s not just a vague ‘insight’ – it’s an initial sketch of a satisfying explanatory theory of what mess is, complete with some very convincing examples and thought experiments (put a kaleidoscope filter on your mess, and it’s no longer mess!). But as far as I can tell, it got the same treatment as everything else that goes down the pipe – we all liked it and moved on. No real discussion (that I know of) of how to test it, or what has already been done in this line, or probing to see where it might fall down. Does it work? Who knows! On to the next idea!

Now, there’s an obvious explanation for why this happens. Most of us are not doing this as a full time job. We’re fitting this into the spare time we get, alongside paid work or other responsibilities. So we’re only really interested in doing the enjoyable parts of the idea generation process. Chucking around ideas is easy and fun, whereas checking whether they actually work is hard and boring. It’s not a big surprise that people prefer easy and fun work to hard and boring work.

There’s a lot of truth to this, but I think it’s slightly too cynical, in that it both makes the first part of the pipeline sound too easy and the second half of the pipeline too hard. Chucking around ideas is easy, but to be able to do that we need to have some good ones to chuck around, and that’s not exactly trivial. We have some advantage in being able to go after very broad, vague, ambiguous, undeveloped topics, and slowly clear fog. There’s no pressure to quickly get to a point where we can publish something. And at the same time, polishing up ideas is hardly some unrelenting tedious grind. Calculating can be fun, testing can be fun, writing up can be fun. If your eventual aim is to publish in traditional academia then there are some definite unfun parts, like altering your conversational blog post style to fit a more academic register, but this is only one part of the process.

# My own experiments

For me, at least, it just feels unsatisfying to leave ideas at the insight porn stage. There’s a natural pull in the direction of getting further down the pipeline, rather than a tedious sense of duty.  I’ve been playing around with some haphazard experiments of my own, and I think I’ve got past the insight porn stage too with some of them, but nowhere near as far as I’d like. I’ll go through a couple of examples.

A few years ago, I wrote a tumblr post called stupid bat and ball, title all lower case, 700 words of low-effort writing not far above the shitpost level. I wasn’t really expecting it to go anywhere further. But it did contain a small core of insight – the bat and ball question of the Cognitive Reflection Test is different to the other two questions in some respect, and so the questions don’t really form a natural set. When I got the wordpress blog I reposted it, and eventually it attracted some really good comments that probed the mechanics of the bat and ball question much more deeply than I had. So I realised that this idea probably was worth investigating and that I should up my game a bit, and I started reading some of the literature. I discovered that the bat and ball question came first and the others were picked ‘to be like it’, with no elaboration of the process for picking them, which confirmed my suspicion that not much work went into question validation. And I found a fascinating follow-up paper showing how ridiculously sticky the wrong answer is.

The comments to this post pushed things further again, coming up with more detailed explorations of how the difficulty relates to the way the problem maps numbers to an abstract quantity (the difference in price), but fools you into mapping it to a concrete one (the price of the bat). @_awbery pointed out that this abstract/concrete confusion is completely missing from the other two questions, where all the quantities map to concrete objects. And anders devised a set of ‘similar questions’ that turn up the level of abstractness one step at a time. These comments point towards something like rat-running experiments for the Cognitive Reflection Test, getting an understanding of how the tools we’re trying to use actually work before using them to make inferences about abstractions like ‘cognitive reflection’. I do think a potentially valuable contribution could be made here.

But… I’m not really the person to do it. (Even if I cared more about this specific question than I do. I’d pretty much used up my remaining store of shits-to-give on writing the blog post, and didn’t even have enough left to engage with the comments as fully as I’d have liked to.) Doing psych research without fooling yourself sounds like an absolute minefield even if you know what you’re doing, and I have no expertise at all.

So I guess in this case I quit the pipeline at the level of having a sort of slapdash lit review with some pointers to interesting ways to take it further. Not the most impressive result. But the interesting bit for me was the distance I travelled from the original tumblr post, which I’d put no effort into at all, and the way the project took on a life of its own, with other people helping to propel this considerably further than I’d ever thought to take it myself.

My other example is all my thinking about negative probability in the last year and a half. Although it sounds superficially like a kind of a crackpot topic, there are deep links to quantum mechanics on phase space, and I’ve been using my fascination with this as a serious starting point to learn all kinds of interesting things in quantum foundations/quantum information. I’ve been experimenting with the discipline of using a single paper as my focus, and this has been incredibly helpful for keeping me on track, and damping down my normal habit of wandering from subject to subject too quickly to pick up anything useful.

I’m more serious about this project than the bat and ball one – it actually connects to an enduring deep interest rather than something I blundered into by accident. Again, I’m not yet as far down the pipeline as I want to be, but I’ve got past the vague insight level. My last couple of posts explored an intriguing decomposition of the Wigner function for a qubit that I found myself, and that I can see some potential use for in interpreting negative probabilities. Since then had quite a few more ideas that I want to investigate, and I’ve started to link things into a more coherent picture. There’s also a lot more I could be doing in terms of making contact with people in academia and asking questions (something I’m rather bad at). I can definitely see how to push further.

It’s still really funny to me that I’m cheerfully crashing about between cognitive psych and quantum foundations, with a few clueless forays into reading Derrida for good measure. Whereas in academia I’d have felt daring if I tried to pivot from burst to continuous sources of gravitational waves from neutron stars, or something. Obviously this is too scattered for me to get anything done, and I need to get better at idea triage. But there’s something really psychologically healthy about this mindset of just taking a direct run at whatever I feel like, instead of thinking ‘oh, that’s outside my field, I can’t think about that.’ I want to keep this even as I hopefully learn to focus my efforts more usefully.

# Questions

Right, I want to push this out now so I can stop being antisocial at the workshop. I’ll end with some questions:

• What are examples of people navigating the whole shitpost-to-scholarship pipeline successfully on the public internet? I’m particularly interested in people who are trying for academia-style focussed research on specific object-level questions, rather than big-picture synthesis or popularisation.
• Is there any kind of institutional support out there, or is it all just individual weird nerds pursuing individual weird research programs?
• Has anyone written about this well already? For a start, Venkatesh Rao had a couple of excellent threads here and here on a similar topic. It’s a much more pessimistic take, which actually fits my current drizzle-soaked winter-brain opinions better than these cheery ramblings from last summer – for example, in most of my experiments I haven’t managed to get much further than this sort of ‘reading published literature and blogging a few derivative observations’ stuff. I’d like to hear about anything else relevant that people have liked.

## The middle distance

At the end of my last post, I talked about Brian Cantwell Smith’s idea of ‘the middle distance’ – an intermediate space between complete causal disconnectedness and rigid causal coupling. I was already vaguely aware of this idea from a helpful exchange somewhere in the bowels of a Meaningness comments section but hadn’t quite grasped its importance (the whole thread is worth reading, but I’m thinking about the bit starting here). Then I blundered into my own clumsy restatement of the idea while thinking about cognitive decoupling, and finally saw the point. So I started reading On the Origin of Objects.

It’s a difficult book, with a lot more metaphysics than I realised I was signing up for, and this ‘middle distance’ idea is only a small part of a very complex, densely interconnected argument that that I don’t understand at all well and am not even going to attempt to explain. But the examples Smith uses to illustrate the idea are very accessible without the rest of the machinery of the book, and helpful on their own.

I was also surprised by how little I could find online – searching for e.g. “brian cantwell smith” “middle distance” turns up lots of direct references to On The Origin of Objects, and a couple of reviews, but not much in the way of secondary commentary explaining the term. You pretty much have to just go and read the whole book. So I thought it was worth making a post that just extracted these three examples out.

### Example 1: Super-sunflowers

Smith’s first example is fanciful but intended to quickly give the flavour of the idea:

… imagine that a species of “super-sunflower” develops in California to grow in the presence of large redwoods. Suppose that ordinary sunflowers move heliotropically, as the myth would have it, but that they stop or even droop when the sun goes behind a tree. Once the sun re-emerges, they can once again be effectively driven by the direction of the incident rays, lifting up their faces, and reorienting to the new position. But this takes time. Super-sunflowers perform the following trick: even when the sun disappears, they continue to rotate at approximately the requisite ¼° per minute, so that the super-sunflowers are more nearly oriented to the light when the sun appears.

A normal sunflower is directly coupled to the movement of the sun. This is analogous to simple feedback systems like, for example, the bimetallic strip in a thermostat, which curls when the strip is heated and one side expands more than the other. In some weak sense, the curve of the bimetallic strip ‘represents’ the change in temperature. But the coupling is so direct that calling it ‘representation’ is dragging in more intentional language that we need. It’s just a load of physics.

The super-sunflower brings in a new ingredient: it carries on attempting to track the sun even when they’re out of direct causal contact. Smith argues that this disconnected tracking is the (sunflower) seed that genuine intentionality grows from. We are now on the way to something that can really be said to ‘represent’ the movement of the sun:

This behaviour, which I will call “non-effective tracking”, is no less than the forerunner of semantics: a very simple form of effect-transcending coordination in some way essential to the overall existence or well-being of the constituted system.

### Example 2: Error checking

Now for a more realistic example. Consider the following simple error-checking system:

There’s a 32 bit word that we want to send, but we want to be sure that it’s been transmitted correctly. So we also send a 6-bit ‘check code’ containing the number of ones (19 of them in this instance, or 010011 in binary). If these don’t match, we know something’s gone wrong.

Obviously, we want the 6-bit code to stay coordinated with the 32-bit word for the whole storage period, and not just randomly change to some other count of ones, or it’s useless. Less obviously (“because it is such a basic assumption underlying the whole situation that we do not tend to think about it explicitly”), we don’t want the 6-bit code to invariably be correlated to the 32-bit word, so that a change in the word always changes the code. Otherwise we couldn’t do error checking at all! If a cosmic ray flips one of the bits in the word, we want the code to remain intact, so we can use it to detect the error. So again we have this ‘middle distance’ between direct coupling and irrelevance.

### Example 3: File caches

One final real-world example: file caches. We want the data stored in the cache to be similar to the real data, or it’s not going to be much of a cache. At the same time, though, if we make everything exactly the same as the original data store, it’s going to take exactly as long to access the cache as it is to access the original data, so that it’s no longer really a cache.

## Flex and slop

In all these examples, it’s important that the ‘representing’ system tries to stay coordinated with the distant ‘represented’ system while they’re out of direct contact. The super-sunflower keeps turning, the check code maintains its count of ones, the file cache maintains the data that was previously written to it:

In all these situations, what starts out as effectively coupled is gradually pulled apart, but separated in such a way as to honor a non-effective long-distance coordination condition, leading eventually to effective reconnection or reconciliation.

For this to be possible, the world needs to be able to support the right level of separation:

The world is fundamentally characterized by an underlying flex or slop – a kind of slack or ‘play’ that allows some bits to move about or adjust without much influencing, and without being much influenced by, other bits. Thus we can play jazz in Helsinki, as loud as we please, without troubling the Trappists in Montana. Moths can fly into the night with only a minimal expenditure of energy, because they have to rearrange only a tiny fraction of the world’s mass. An idea can erupt in Los Angeles, turn into a project, capture the fancy of hundreds of people, and later subside, never to be heard of again, all without having any impact whatsoever on the goings-on in New York.

This slop makes causal disconnection possible – ‘subjects’ can rearrange the representation independently of the ‘objects’ being represented. (This is what makes computation ‘cheap’ – we can rearrange some bits without having to also rearrange some big object elsewhere that they are supposed to represent some aspect of.) To make the point, Smith compares this with two imaginary worlds where this sort of ‘middle distance’ representation couldn’t get started. The first world consists of nothing but a huge assemblage of interlocking gears that turn together exactly without slipping, all at the same time. In this world, there is no slop at all, so nothing can ever get out of causal contact with anything else. You could maybe say that one cog ‘represents’ another cog, but really everything is just like the thermostat, too directly coupled to count interestingly as a representation. The second world is just a bunch of particles drifting in the void without interaction. This has gone beyond slop into complete irrelevance. Nothing is connected enough to have any kind of structural relation to anything else.

The three examples given above – file caches, error checking and the super-sunflower – are really only one step up from the thermostat, too simple to have anything much like genuine ‘intentional content’. The tracking behaviour of the representing object is too simple – the super-sunflower just moves across the sky, and the file cache and check code just sit there unchanged. Smith acknowledges this, and says that the exchange between ‘representer’ and ‘represented’ has to have a lot more structure, with alternating patterns of being in and out of causal contact, and some other ‘stabilisation’ patterns that I don’t really understand, that somehow help to individuate the two as separate objects. At this point, the concrete examples run completely dry, and I get lost in some complicated argument about ‘patterns of cross-cutting extension’ which I haven’t managed to disentangle yet. The basic idea illustrated by the three examples was new to me, though, and worth having on its own.

## Cognitive decoupling and banana phones

Last year I wrote a post which used an obscure term from cognitive psychology and an obscure passage from The Bell Jar to make a confused point about something I didn’t understand very well. I wasn’t expecting this to go very far, but it got more interest than I expected, and some very thoughtful comments. Then John Nerst wrote a much clearer summary of the central idea, attached it to a noisily controversial argument-of-the-month and sent it flying off around the internet. Suddenly ‘cognitive decoupling’ was something of a hit.

If I’d known this was going to happen I might have put a bit more effort into the original blog post. For a start, I might have done some actual reading, instead of just grabbing a term I liked the sound of from one of Sarah Constantin’s blog posts and running with it. So I wanted to understand how the term as we’ve been applying it differs from Stanovich’s original use, and what his influences were. I haven’t done a particularly thorough job on this, but I have turned up a few interesting things, including a surprisingly direct link to a 1987 paper on pretending that a banana is a phone. I also learned that the intellectual history I’d hallucinated for the term based on zero reading was completely wrong, but wrong in a way that’s been strangely productive to think about. I’ll describe both the actual history and my weird fake one below. But first I’ll briefly go back over what the hell ‘cognitive decoupling’ is supposed to mean, for people who don’t want to wade through all those links.

## Roses, tripe, and the bat and ball again

Stanovich is interested in whether, to use Constantin’s phrase, ‘rational people exist’. In this case ‘rational’ behaviour is meant to mean something like systematically avoiding cognitive biases that most people fall into. One of his examples is the Wason selection task, which involves turning over cards to verify the statement ‘If the card has an even number on one face it will be red on the reverse’. More vivid real-world situations, like Stanovich’s example of ‘if you eat tripe you will get sick’, are much easier for people to reason about than the decontextualised card-picking version. (Cosmides and Tooby’s beer version is even easier than the tripe one.)

A second example he gives is the ‘rose syllogism’:

Premise 1: All living things need water
Premise 2: Roses need water
Therefore, Roses are living things

A majority of university students incorrectly judge this as valid, whereas almost nobody thinks this structurally equivalent version makes sense:

Premise 1: All insects need oxygen
Premise 2: Mice need oxygen
Therefore, Mice are insects

The rose conclusion fits well with our existing background understanding of the world, so we are inclined to accept it. The mouse conclusion is stupid, so this doesn’t happen.

A final example would be the bat and ball problem from the Cognitive Reflection Test: ‘A bat and a ball cost $1.10. The bat costs$1 more than the ball. How much does the ball cost?’. I’ve already written about that one in excruciating detail, so I won’t repeat myself too much, but in this case the interfering context isn’t so much background knowledge as a very distracting wrong answer.

Stanovich’s contention is that people that manage to navigate these problems successfully have an unusually high capacity for something he calls ‘cognitive decoupling’: separating out the knowledge we need to reason about a specific situation from other, interfering contextual information. In a 2013 paper with Toplak he describes decoupling as follows:

When we reason hypothetically, we create temporary models of the world and test out actions (or alternative causes) in that simulated world. In order to reason hypothetically we must, however, have one critical cognitive capability—we must be able to prevent our representations of the real world from becoming confused with representations of imaginary situations. The so-called cognitive decoupling operations are the central feature of Type 2 processing that make this possible…

The important issue for our purposes is that decoupling secondary representations from the world and then maintaining the decoupling while simulation is carried out is the defining feature of Type 2 processing.

(‘Type 2’ is a more recent name for ‘System 2’, in the ‘System 1’/’System 2’ dual process typology made famous by Kahneman’s Thinking, Fast and Slow. See Kaj Sotala’s post here for a nice discussion of Stanovich and Evan’s work relating this split to the idea of cognitive decoupling, and other work that has questioned the relevance of this split.)

I don’t know how well this works as an explanation of what’s really going on in these situations. I haven’t dug into the history of the Wason or rose-syllogism tests at all, and, as with the bat and ball question, I’d really like to know what was done to validate these as good tests. What similar questions were tried? What other explanations, like prior exposure to logical reasoning, were identified, and how were these controlled for? I don’t have time for that currently. For the purposes of this post, I’m more interested in understanding what Stanovich’s influences were in coming up with this idea, rather than whether it’s a particularly good explanation.

## Context, wide and narrow

Constantin’s post is more or less what she calls a ‘fact post’, summarising research in the area without too much editorial gloss. When I picked this up, I was mostly excited by the one bit of speculation at the end, and the striking ‘cognitive decoupling elite’ phrase, and didn’t make any effort to stay close to Stanovich’s meaning. Now I’ve read some more, I think that in the end we didn’t drift too far away. Here is Nerst’s summary of the idea:

High-decouplers isolate ideas from each other and the surrounding context. This is a necessary practice in science which works by isolating variables, teasing out causality and formalizing and operationalizing claims into carefully delineated hypotheses. Cognitive decoupling is what scientists do.

To a high-decoupler, all you need to do to isolate an idea from its context or implications is to say so: “by X I don’t mean Y”. When that magical ritual has been performed you have the right to have your claims evaluated in isolation. This is Rational Style debate…

While science and engineering disciplines (and analytic philosophy) are populated by people with a knack for decoupling who learn to take this norm for granted, other intellectual disciplines are not. Instead they’re largely composed of what’s opposite the scientist in the gallery of brainy archetypes: the literary or artistic intellectual.

This crowd doesn’t live in a world where decoupling is standard practice. On the contrary, coupling is what makes what they do work. Novelists, poets, artists and other storytellers like journalists, politicians and PR people rely on thick, rich and ambiguous meanings, associations, implications and allusions to evoke feelings, impressions and ideas in their audience. The words “artistic” and “literary” refers to using idea couplings well to subtly and indirectly push the audience’s meaning-buttons.

Now of course, Nerst is aiming at a much wider scope – he’s trying to apply this to controversial real-world arguments, rather than experimental studies of cognitive biases. But he’s talking about roughly the same mechanism of isolating an idea from its surrounding context.

There is a more subtle difference, though, that I find interesting. It’s not a sharp distinction so much as a difference in emphasis. In Nerst’s description, we’re looking at the coupling between one specific idea and its whole background context, which can be a complex soup of ‘thick, rich and ambiguous meanings, associations, implications and allusions’. This is a clear ‘outside’ description of the beautiful ‘inside’ one that I pulled from The Bell Jar, talking about how it actually feels (to some of us, anyway) to drag ideas out from the context that gave them meaning:

Botany was fine, because I loved cutting up leaves and putting them under the microscope and drawing diagrams of bread mould and the odd, heart-shaped leaf in the sex cycle of the fern, it seemed so real to me.

The day I went in to physics class it was death.

A short dark man with a high, lisping voice, named Mr Manzi, stood in front of the class in a tight blue suit holding a little wooden ball. He put the ball on a steep grooved slide and let it run down to the bottom. Then he started talking about let a equal acceleration and let t equal time and suddenly he was scribbling letters and numbers and equals signs all over the blackboard and my mind went dead.

… I may have made a straight A in physics, but I was panic-struck. Physics made me sick the whole time I learned it. What I couldn’t stand was this shrinking everything into letters and numbers. Instead of leaf shapes and enlarged diagrams of the hole the leaves breathe through and fascinating words like carotene and xanthophyll on the blackboard, there were these hideous, cramped, scorpion-lettered formulas in Mr Manzi’s special red chalk.

In this description, the satisfying thing about the botany classes is the rich sensory context: the sounds of the words, the vivid images of ferns and bread mould, the tactile sense of chopping leaves. This is a very broad-spectrum idea of context.

Now, Stanovich does seem to want cognitive decoupling to apply in situations where people access a wide range of background knowledge (‘roses are living things’), but when he comes to hypothesising a mechanism for how this works he goes for something with a much narrower focus. In the 2013 paper with Toplak he talks about specific, explicit ‘representations’ of knowledge interfering with other explicit representations. (I’ll go into more detail later about exactly what he means by a ‘representation’.) He cites an older paper, Pretense and Representation by Leslie, as inspiration for the ‘decoupling’ term:

In a much-cited article, Leslie (1987) modeled pretense by positing a so-called secondary representation (see Perner 1991) that was a copy of the primary representation but that was decoupled from the world so that it could be manipulated — that is, be a mechanism for simulation.

This is very clearly about being able to decouple one specific explicit belief from another similarly explicit ‘secondary representation’, rather than the whole background morass of implicit context. I wanted to understand how this was supposed to work, so I went back and read the paper. This is where the banana phones come in.

## Pretending a banana is a phone

The first surprise for me was how literal this paper was. (Apparently 80s cognitive science was like that.) Leslie is interested in how pretending works – how a small child pretends that a banana is a telephone, to take his main example. And the mechanism he posits is… copy-and-paste, but for the brain:

As in, we get some kind of perceptual input which causes us to store a ‘representation’ that means ‘this is a banana’. Then we make a copy of this. Now we can operate on the copy (‘this banana is a telephone’) without also messing up the banana representation. They’ve become decoupled.

What are these ‘representations’? Leslie has this to say:

What I mean by representation will, I hope, become clear as the discussion progresses. It has much in common with the concepts developed by the information-processing, or cognitivist, approach to cognition and perception…

This is followed by a long string of references to Chomsky, Dennett, etc. So his main influence appears to be, roughly, computational theories of mind. Looking at how he uses the term in the paper itself, it appears that we’re in the domain of Good Old-Fashioned AI: ‘representations’ can be put into a rough correspondence with English propositions about bananas, telephones, and cups of tea, and that we then use them as a kind of raw material to run inference rules on and come to new conclusions:

Leslie doesn’t talk about how all these representations come to mean anything in the real world — how do we know that the string of characters ‘cups contain water’, or its postulated mental equivalent, has anything to do with actual cups and actual water? How do we even parse the complicated flux of the real world into discrete named objects, like ‘cups’, to start with? There’s no story in the paper that tries to bridge this gap — these representations are just sitting there ‘in the head’, causally disconnected from the world.

Well, OK, maybe 80s cognitive science was like that. Maybe Leslie thought that someone else already had a convincing story for how this bit works, and he could just apply the resulting formalism of propositions and inference rules. But this same language of ‘representations’ and ‘simulations’ is still being used uncritically in much more recent papers. Stanovich and Toplak, for example, reproduce Leslie’s decoupling diagram and describe it using the same terms:

For Leslie (1987), the decoupled secondary representation is necessary in order to avoid representational abuse — the possibility of confusing our simulations with our primary representations of the world as it actually is… decoupled representations of actions about to be taken become representations of potential actions, but the latter must not infect the former while the mental simulation is being carried out.

There’s another strange thing about Stanovich using this paper as a model to build on. (I completely missed this, but David Chapman pointed it out to me in an earlier conversation.) Stanovich is interested in what makes actions or behaviours rational, and he wants cognitive decoupling to be at least a partial explanation of this. Leslie is looking at toddlers pretending that bananas are telephones. If even very young children are passing this test for ‘rationality’, it’s not going to be much use for discriminating between ‘rational’ and ‘irrational’ behaviour in adults. So Stanovich would need a narrower definition of ‘decoupling’ that excludes the banana-telephone example if he wants to eventually use it as a rationality criterion.

So I wasn’t very impressed with this as a plausible mechanism for decoupling. Then again, the mechanism I’d been imagining turns out to have some obvious failings too.

# Rabbits and the St. Louis Arch

When I first started thinking about cognitive decoupling, I imagined a very different history for the term. ‘Decoupling’ sounds very physicsy to me, bringing up associations of actual interaction forces and coupling constants, and I’d been reading Dreyfus’s Why Heideggerian AI Failed, which discusses dynamical-systems-inspired models of cognition:

Fortunately, there is at least one model of how the brain could provide the causal basis for the intentional arc. Walter Freeman, a founding figure in neuroscience and the first to take seriously the idea of the brain as a nonlinear dynamical system, has worked out an account of how the brain of an active animal can find and augment significance in its world. On the basis of years of work on olfaction, vision, touch, and hearing in alert and moving rabbits, Freeman proposes a model of rabbit learning based on the coupling of the brain and the environment…

The organism normally actively seeks to improve its current situation. Thus, according to Freeman’s model, when hungry, frightened, disoriented, etc., the rabbit sniffs around until it falls upon food, a hiding place, or whatever else it senses it needs. The animal’s neural connections are then strengthened to the extent that reflects the extent to which the result satisfied the animal’s current need. In Freeman’s neurodynamic model, the input to the rabbit’s olfactory bulb modifies the bulb’s neuron connections according to the Hebbian rule that neurons that fire together wire together.

In many ways this still sounds like a much more promising starting point to me than the inference-rule-following of the Leslie paper. For a start, it seems to fit much better with what’s known about the architecture of the brain (I think – I’m pretty ignorant about this). Neurons are very slow compared to computer processors, but make up for this by being very densely interconnected. So getting anything useful done would rely on a huge amount of activation happening in parallel, producing a kind of global, diffuse ‘background context’ that isn’t sharply divided into separate concepts.

Better still, the problem of how situations intrinsically mean something about the world is sidestepped, because in this case, the rabbit and environment are literally, physically coupled together. A carrot smell out in the world pulls its olfactory bulb into a different state, which itself pulls the rabbit into a different kind of behaviour, which in turn alters the global structure of the bulb in such a way that this behaviour is more likely to occur again in the future. This coupling is so direct that referring to it as a ‘representation’ seems like overkill:

Freeman argues that each new attractor does not represent, say, a carrot, or the smell of carrot, or even what to do with a carrot. Rather, the brain’s current state is the result of the sum of the animal’s past experiences with carrots, and this state is directly coupled with or resonates to the affordance offered by the current carrot.

However, this is also where the problems come in. Everything is so closely causally coupled that there’s no room in this model for decoupling! The idea behind ‘cognitive decoupling’ is to be able to pull away from the world long enough to consider things in the abstract, without all the associations that normally get dragged along for free. In the olfactory bulb model, the rabbit is so locked into its surroundings that this sort of distance is unattainable.

At some point I was googling a bunch of keywords like ‘dynamical systems’ and ‘decoupling’ in the hope of fishing up something interesting, and I came across a review by Rick Grush of Mind as Motion: Explorations in the Dynamics of Cognition by Port and van Gelder, which had a memorable description of the problem:

…many paradigmatically cognitive capacities seem to have nothing at all to do with being in a tightly coupled relationship with the environment. I can think about the St. Louis Arch while I’m sitting in a hot tub in southern California or while flying over the Atlantic Ocean.

Even this basic kind of decoupling from a situation – thinking about something that’s not happening to you right now – needs some capacities that are missing from the olfactory bulb model. Grush even uses the word ‘decoupling’ to describe this:

…what is needed, in slightly more refined terms, is an executive part, C (for Controller), of an agent, A, which is in an environment E, decoupling from E, and coupling instead to some other system E’ that stands in for E, in order for the agent to ‘think about’ E (see Figure 2). Cognitive agents are exactly those which can selectively couple to either the ‘real’ environment, or to an environment model, or emulator, perhaps internally supported, in order to reason about what would happen if certain actions were undertaken with the real environment.

This actually sounds like a plausible alternate history for Stanovich’s idea, with its intellectual roots in dynamical systems rather than the representational theory of mind. So maybe my hallucinations were not too silly after all.

## Final thoughts

I still think that the idea of cognitive decoupling is getting at something genuinely interesting – otherwise I wouldn’t have spent all this time rambling on about it! I don’t think the current representational story for how it works is much good. But the ability to isolate ‘abstract structure’ (whatever that means, exactly) from its surrounding context does seem to be a real skill that people vary in. In practice I expect that much of this context will be more of a diffuse associational soup than the sharp propositional statements of Leslie’s pretence model.

It’s interesting to me that the banana phone model and the olfactory bulb model both run into problems, but in opposite directions. Leslie’s banana phone relies on a bunch of free-floating propositions (‘this is a banana’), with no story for how they refer to actual bananas and phones out in the world. Freeman’s rabbit olfactory bulb has no problem with this – relevance is guaranteed through direct causal coupling to the outside world – but it’s so directly coupled that there’s no space for decoupling. We need something between these two extremes.

David Chapman pointed out to me that Brian Cantwell Smith already has a term for this in The Origin of Objects – he calls it ‘the middle distance’ between direct coupling and causal irrelevance. I’ve been reading the book and have already found his examples to be hugely useful in thinking about this more clearly. These are worth a post in their own right, so I’ll describe them in a followup to this one.

## Messy calculations

I’ve been looking back at some of the mess I produced while trying to get an initial grip on the ideas I wrote up in my last two posts on negative probability. One of my main interests in this blog is the gulf between maths as it is formally written up and the weird informal processes that produce mathematical ideas in the first place, so I thought this might make a kind of mini case study. Apologies in advance for my handwriting.

I do all my work in cheap school-style exercise books. The main thread of what I’m thinking about goes front-to-back: that’s where anything reasonably well-defined that I’m trying to do will go. Working through lecture notes, doing exercises, any calculations where I’m reasonably clear on what I’m actually calculating. But if I have no idea what I’m even trying to do, it goes in the back, instead. The back has all kinds of scribbles and disorganised crap:

Most of it is no good, but new ideas also tend to come from the back. The Wigner function decomposition was definitely a back-of-the-book kind of thing. I’ve mostly forgotten what I was thinking when I made all these scribblings, and I wouldn’t trust the remembered version even if I had one, so I’ll try to refrain from too much analysis.

The idea seems to originate here:

This has the key idea already: start with equal probability for all squares, and then add on correction terms until the bottom left corner goes negative. But the numbers are complete bullshit! Looking back, I can’t make sense of them at all. For instance, I was trying to add $\frac{1}{8}$ to things, instead of $\frac{1}{4}$. Why? No idea! It’s not like $\frac{1}{8}$ is a number that had come up in any of my previous calculations, so I have no idea what I was thinking.

Even with the bullshit numbers, I must have had some inkling that this line of thought was worth pursuing, so I wrote it out again. This time I realised the numbers were wrong and crossed them out, writing the correct ones to the right:

The little squares above the main squares are presumably to tell me what to do: add $\frac{1}{4}$ to the filled in squares and subtract it from the blank ones.

I then did a sanity check on an example with no negative probabilities, and it worked:

At that point, I think I was convinced it worked in general, even though I’d only checked two cases. So I moved to the front of the book. After that, the rest of it looks like actual legit maths that a sane person would do, so it’s not so interesting. But I had to produce this mess to get there.

## Negative probability: now with added equations!

OK, so this is where I go back through everything from the last post, but this time show how all the fiddling around with boxes relates back to quantum physics, and also go into some technical details like explaining what I meant by ‘half the information’ in the discussion at the end. This is unavoidably going to need more maths than the last post, and enough quantum physics knowledge to be OK with qubits and density matrices. I’ll start by translating everything into a standard physics problem.

## Qubit phase space

So, first off, instead of the ‘strange machine’ of the last post we will have a qubit state – as a first example I’ll take the $|0\rangle$ state. The three questions then become measurements on it. Specifically, these measurements are expectation values $q_i$ of the operators $Q_i = \frac{1}{2}(I-\sigma_i)$, where the $\sigma_i$ are the three Pauli matrices.

For $|0\rangle$ we get the following:

$q_z = \langle 0 | Q_z | 0 \rangle = 0$

$q_x = \langle 0 | Q_x | 0 \rangle = \frac{1}{2}$

$q_y = \langle 0 | Q_y | 0 \rangle = \frac{1}{2}$

This can be represented on the same sort of 2×2 grid I used in the previous post:

The $|0\rangle$ state has a definite value of 0 for the $Q_z$ measurement, so the probabilities in the cells where $Q_z = 0$ must sum to 1. For the $Q_x$ state there is an equal chance of either $Q_x = 0$ or $Q_x = 1$. The third measurement, $Q_y$, can be shown to be associated with the diagonals of the grid, in the same way as in Piponi’s example in the previous post, and again there is an equal chance of either value. Imposing all these conditions gives the probability assignment above.

The 2×2 grid is called the phase space of the qubit, and the function that assigns probabilities to each cell is called the Wigner function $W$. To save on drawing diagrams, I’ll represent this as a square-bracketed matrix from now on:

$W = \begin{bmatrix} W(0,1) && W(1,1) \\ W(0,0) && W(1,0) \end{bmatrix}$

For much more detail on how this all works, the best option is probably to read Wootters, who developed a lot of the ideas in the first place. There’s his original paper, which has all the technical details, and a nice follow-up paper on Picturing Qubits in Phase Space which gives a bit more intuition for what’s going on.

In the previous post I gave the following formula for the Wigner function:

$W = \frac{1}{4}\Bigg( \begin{bmatrix}1 && 1 \\ 1 && 1 \end{bmatrix} \nonumber + q_z\begin{bmatrix}-1 && 1 \\ -1 && 1 \end{bmatrix} + (1-q_z)\begin{bmatrix}1 && -1 \\ 1 && -1 \end{bmatrix}$

$\quad +q_x\begin{bmatrix}1 && 1 \\ -1 && -1 \end{bmatrix} + (1-q_x)\begin{bmatrix}-1 && -1 \\ 1 && 1 \end{bmatrix} + q_y\begin{bmatrix}1 && -1 \\ -1 && 1 \end{bmatrix} + (1-q_y)\begin{bmatrix}-1 && 1 \\ 1 && -1 \end{bmatrix} \Bigg),$

which simplifies to

$W = \frac{1}{2}\begin{bmatrix}-q_z + q_x + q_y && q_z + q_x - q_y \\ 2 - q_z - q_x - q_y && q_z -q_x + q_y\end{bmatrix}$

This is a somewhat different form to the standard formula for the Wigner function, but I’ve checked that they’re equivalent. I’ve put the details on a separate notes page here, in a sort of blog post version of the really boring technical appendix you get at the back of papers.

## Magic states

As with the example in the last blog post, it’s possible to get qubit states where some of the values of the Wigner function are negative. The numbers don’t work out so nicely this time, but as one example we can take the qubit state $|\psi\rangle = \frac{1}{\sqrt{1 + (1+\sqrt{2})^2}}\begin{pmatrix} 1 + \sqrt{2} \\ 1 \end{pmatrix}$. (This is the +1 eigenvector of the density matrix $\frac{1}{2}\left(\sigma_z + \sigma_x\right)$.)

The Wigner function for $|\psi\rangle$ is

$W_\psi = \begin{bmatrix} \frac{1}{4} && \frac{1-\sqrt{2}}{4} \\ \frac{1 + \sqrt{2}}{4} && \frac{1}{4} \end{bmatrix} \approx \begin{bmatrix} 0.25 && -0.104 \\ 0.604 && 0.25 \end{bmatrix},$

with one negative entry. I learned while writing this that the states with negative values are called magic states by quantum computing people! These are the states that provide the ‘magic’ for quantum computing, in terms of giving a speed-up over classical computing. I’d like be able to say more about this link, but I’ll never finish the post if I have to get my head around all of that too, so instead I’ll link to this post by Earl Campbell that goes into more detail and points to some references. A quick note on the geometry, though:

The six eigenvectors of the Pauli matrices form the corners of an octahedron on the Bloch sphere, as in my dubious sketch above. We’ve already seen that the $|0\rangle$ state has no magic – all the values are nonnegative. This also holds for the other five, which have the following Wigner functions:

$W_{|1\rangle} = \begin{bmatrix} 0 && \frac{1}{2} \\ 0 && \frac{1}{2} \end{bmatrix}, W_{|+\rangle} = \begin{bmatrix} 0 && 0 \\ \frac{1}{2} && \frac{1}{2} \end{bmatrix}, W_{|-\rangle} = \begin{bmatrix} \frac{1}{2} && \frac{1}{2} \\ 0 && 0 \end{bmatrix},$

$W_{|y_+\rangle} = \begin{bmatrix} 0 && \frac{1}{2} \\ \frac{1}{2} && 0 \end{bmatrix}, W_{|y_-\rangle} = \begin{bmatrix} \frac{1}{2} && 0 \\ 0 && \frac{1}{2} \end{bmatrix}.$

The other states on the surface of the octahedron or inside it also have no magic. The magic states are the ones outside the octahedron, and the further they are from the octahedron the more magic they are. So the most magic states are on the surface of the sphere opposite the middle of the triangular faces.

## Half the information

Why can’t we have a probability of $-\frac{1}{2}$ as before? Well, I briefly mentioned the reason in the previous blog post, but I can go into more detail now. There are constraints on the values of $W$ that forbids values that are this negative. First off, the values of $W$ have to sum to 1 – this makes sense, as they are supposed to be something like probabilities.

The second constraint is more interesting. Taking the $|0\rangle$ state as an example again, this state has a definite answer to one of the questions and no information at all about the other two. There’s redundancy in the questions, so exact answers to two of them would be enough to pin down the state precisely. So we have half of the possible information.

This turns out to be the most information you can get from any qubit state, in some sense. I say ‘in some sense’ because it’s a pretty odd definition of information.

I learned about this from a fascinating paper by van Enk, A toy model for quantum mechanics, which was actually my starting point for thinking about this whole topic. He starts with the Spekkens toy model, a very influential idea that reproduces a number of the features of quantum mechanics using a very simple model. Again, this is too big a topic to get into all the details, but the most basic system in this model maps to the six ‘non-magic’ qubit states listed above, in the corners of the octahedron. These all share the half-the-knowledge property of the $|0\rangle$ state, where we know the answer to one question exactly and have no idea about the others.

Now van Enk’s aim is to extend this idea of ‘half the knowledge’ to more general probability distributions over the four boxes. But this requires having some kind of measure $M$ of what half the knowledge means. He stipulates that this measure should have $M = \frac{1}{2}$ for the six half-the-knowledge states we already have, which seems reasonable. Also, it should have $M = 1$ for states where we know all the information (impossible in quantum physics), and $M = \frac{1}{4}$ for the state of total ignorance about all questions. Or to put it a bit differently,

$M = 2^{-H}$,

where $H$ is an entropy measure – it decreases from 2 to 1 to 0 as we learn more information about the system. There’s a parametrised family $H_\alpha$ of entropies known as the Rényi entropies, which reproduce this behaviour for the cases above, and differ for other distributions over the boxes. (I have some rough notes about these here, which may or may not be helpful.) By far the most well-known one is the Shannon entropy $H_1$, used widely in information theory, but it turns out that this one doesn’t reproduce the states found in quantum physics. Instead, van Enk picks $H_2$, the collision entropy. This has quite a simple form:

$H_2 = -\log_2 \left(\sum_i W_i^2 \right)$,

where the $W_i$ are the four components of $W$ – we’re just summing the squares of them. So then our information measure is just $M_2 = \sum_i W_i^2$, and the second constraint on $W$ is this can have value at most $\frac{1}{2}$:

$\sum_i W_i^2 \leq \frac{1}{2}$.

Why this particular entropy measure? That’s something I don’t really understand. Van Enk describes it as ‘the measure of information advocated by Brukner and Zeilinger’, and links to their paper, but so far I haven’t managed to follow the argument there, either. If anyone reads this and has any insight, I’d like to know!

## Questions

In some ways, I know a lot more about negative probabilities than I did when I started getting interested in this. But conceptually I’m almost as confused as I was at the start! I think the main improvement is that I have some more focussed questions to be confused about:

• Is the way of decomposing the Wigner function that I described in these posts any use for making sense of the negative probabilities? I found it quite helpful for Piponi’s example, in giving some more insight into how the negative value connects to that particular answer being ‘especially inconsistent’. Is it also useful for thinking about qubits?
• Any link to the idea of negative probabilities representing events ‘unhappening’? As I said at the beginning of the first post, I love this idea but have never seen it fully developed anywhere in a satisfying way.
• What’s going on with this collision entropy measure anyway?

I’m not a quantum foundations researcher – I’m just an interested outsider trying to understand how all these ideas fit together. So I’m likely to be missing a lot of context that people in the field would have. If you read this and have pointers to things that I’m missing, please let me know in the comments!

## Negative probability

I’ve been thinking about the idea of negative probabilities a lot recently, and whether it’s possible to make any sense of them. (For some very muddled and meandering background on how I got interested in this, you could wade through my ramblings here, here, here and herebut thankfully none of that is required to understand this post.)

To save impatient readers the hassle of reading this whole thing: I’m not going to come up with any brilliant way of interpreting negative probabilities in this blog post! But recently I did notice a few things that are interesting and that I haven’t seen collected together anywhere else, so I thought it would be worth writing them up.

Now, why would you even bother trying to make sense of negative probabilities? I’m not going to go into this in any depth – John Baez has an great introductory post on negative probability that motivates the idea, and links to a good chunk of the (not very large) literature. This is well worth reading if you want to know more. But there are a couple of main routes that lead people to get interested in this thing.

The first route is pretty much pure curiosity: what happens if we try extending the normal idea of probabilities to negative numbers? This is often introduced in analogy with the way we often use negative numbers in applications to simplify calculations. For example, there’s a fascinating discussion of negative probability by Feynman which starts with the following simple situation:

A man starting a day with five apples who gives away ten and is given eight during the day has three left. I can calculate this in two steps: 5 – 10 = -5 and -5 + 8 = 3.

The final answer is satisfactorily positive and correct although in the intermediate steps of calculation negative numbers appear. In the real situation there must be special limitations of the time in which the various apples are received and given since he never really has a negative number, yet the use of negative numbers as an abstract calculation permits us freedom to do our mathematical calculations in any order, simplifying the analysis enormously, and permitting us to disregard inessential details.

So, although we never actually have a negative number of apples, allowing them to appear in intermediate calculations makes the maths simpler.

The second route is that negative probabilities actually crop up in exactly this way in quantum physics! This isn’t particularly obvious in the standard formulation learned in most undergrad courses, but the theory can also be written in a different way that closely resembles classical statistical mechanics. However, unlike the classical case, the resulting ‘distribution’ is not a normal probability distribution, but a quasiprobability distribution that can also take negative values.

As with Feynman’s apples, these negative values don’t map to anything we observe directly: all measurements we could make give results that occur with zero or positive probabilities, as you would expect. The negative probabilities instead come in as intermediate steps in the calculation.

This should become clearer when I work through a toy example. The particular example I’ll use (which I got from an excellent blog post by Dan Piponi) doesn’t come up in quantum physics, but it’s very close: its main advantage is that the numbers are a bit simpler, so it’s easier to concentrate on the ideas. I’ll do this in two pieces: one that requires no particular physics or maths background and just walks through the example using basic arithmetic, and one that makes connections back to the quantum mechanics literature and might drop in a Pauli matrix or two. This is the no-maths one.

Neither of these routes really get to the point of fully making sense of negative probabilities. In the apple example, we have a tool for making calculations easier, but we also have an interpretation of ‘a negative apple’, in terms of taking away one of the apples you have already. For negative probabilities, we mostly just have the calculational tool. It’s tempting to try and follow the apple analogy and interpret negative probabilities as being to do with something like ‘events unhappening’ – many people have suggested this (see e.g. Michael Nielsen here), and I certainly share the intuition that something like this ought to be possible, but I’ve never seen anything fully worked out along those lines that I’ve found really satisfying.

In the absence of a compelling intuitive explanation, I find it helpful to work through examples and get an idea of how they work. Even if we don’t end up with a good explanation for what negative probabilities are, we can see what they do, and start to build up a better understanding of them that way.

## A strange machine

OK, so let’s go through Piponi’s example (here’s the link again). He describes it very clearly and concisely in the post, so it might be a good idea to just switch to reading that first, but for completeness I’ll also reproduce it here.

Piponi asks us to consider a case where:

a machine produces boxes with (ordered) pairs of bits in them, each bit viewable through its own door.

So you could have 0 in both boxes, 0 in the first and 1 in the second, and so on. Now suppose we ask the following three questions about the boxes:

1. Is the first box in state 0?
2. Is the second box in state 0?
3. Are the boxes both in the same state?

I’ll work through two possible sets of answers to these questions: one consistent and unobjectionable set, and one inconsistent and stupid one.

Let’s say that we find that the answer to the first question is ‘yes’ , the answer to the second is ‘no’, and the answer to the third is ‘no’. This makes sense, and we can interpret this easily in terms of an underlying state of the two boxes. The first box is in state 0, the second box is in state 1, and so of course the two are in different states and the answer to the third question is also satisfied.

We can represent this situation with the grid below:

The system is in state ‘first box 0, second box 1’, with probability 1, and the other states have probability 0. This is all very obvious – I’m just labouring the point so I can compare it to the case of inconsistent answers, where things get weird.

Now suppose we find a inconsistent set of answers when we measure the box: ‘no’ to all three questions. This doesn’t make much intuitive sense: both boxes are in state 0, but also they are in different states. Still, Piponi demonstrates that you can still assign something like ‘probabilities’ to the squares on the grid, as long as you’re OK with one of them being negative:

Let’s go through how this matches up with the answers to the questions. For the first question, we have

$P(\text{first box 0}) = P(\text{first box 0, second box 0}) + P(\text{first box 0, second box 1})$

$P(\text{first box 0}) = -\frac{1}{2} + \frac{1}{2} = 0$

so the answer is ‘no’ as required. Similarly, for the other two questions we have

$P(\text{second box 0}) = P(\text{first box 0, second box 0}) + P(\text{first box 1, second box 0})$

$P(\text{second box 0}) = -\frac{1}{2} + \frac{1}{2} = 0$

and

$P(\text{boxes same}) = P(\text{first box 0, second box 0}) + P(\text{first box 1, second box 1})$

$P(\text{boxes same}) = -\frac{1}{2} + \frac{1}{2} = 0$

so we get ‘no’ to all three, at the expense of having introduced this weird negative probability in one cell of the grid.

It’s not obvious at all what the negative probability means, though! Piponi doesn’t explain how he came up with this solution, but I’m guessing it’s one of either ‘solve the equations and get the answer’ or ‘notice that these numbers happen to work’.

I wanted to think a bit more about interpretation, and although I haven’t fully succeeded, I did notice a more enlightening calculation method, which maybe points in a useful direction. I’ll describe it below.

## A calculation method

Some motivating intuition: all four possible assignments of bits to boxes are inconsistent with the answers in Example 2, but ‘both bits are zero’ is particularly inconsistent. It’s inconsistent with the answers to all three questions, whereas the other assignments are inconsistent with only one question each (for example, ‘both bits are 1’ matches the answer to the first two questions, but is inconsistent with the two states being different).

So you can maybe think in terms of consecutively answering the three questions and penalising assignments that are inconsistent. ‘Both bits are zero’ is an especially bad answer, so it gets clobbered three times instead of just once, pushing the probability negative.

The method I’ll describe is a more formal version of this. I’ll go through it first for Example 1, with consistent answers, to show it works there.

### Back to Example 1

Imagine that we start in a state of complete ignorance. We have no idea what the underlying state is, so we just assign probability ¼ to each cell of the grid, like this:

(I’ll stop drawing the axes every time from this point on.) We then ask the three questions in succession and make corrections. For the first question, ‘is the first box in state 0’, we have the answer ‘yes’, so after we learn this we know that the left two cells of the grid now have probability ½ each, and the right two have probability 0. We can think of this as adding a correction term to our previous state of ignorance:

Notice that the correction term has some negative probabilities in it! But these seem relatively benign from an interpretational point of view – they are just removing probability from some cells so that it can be reassigned to others, and the final answer is still positive. It’s kind of similar to saying $P(\text{heads}) = 1 - P(\text{tails})$, where we subtract some probability to get to the answer.

Next, we add on two more correction terms, one for each of the remaining two questions. The correction term for the second question needs to remove probability from the bottom row and add it to the top row, and the one for the third question corrects the diagonals:

So the system is definitely in the top left state, which is what we found before. It’s good to verify that the method works on a conventional example like this, where the final probabilities are positive.

### Example 2 again

I’ll follow the same method again for Piponi’s example, starting from complete uncertainty and then adding on a correction for each question (this time the answer is ‘no’ each time). This time I’ll do it all in one go:

So we’ve got the same probabilities as Piponi, with the weird negative -½ probability for ‘both in state 0’. This time we get a little bit more insight into where it comes from: it’s picking up a negative correction term from all three questions.

## Discussion

This ‘strange machine’ looks pretty bizarre. But it’s extremely similar to a situation that actually comes up in quantum physics. I’ll go into the details in the follow-up post (‘now with added equations!’), but this example almost replicates the quasiprobability distribution for a qubit, one of the simplest systems in quantum physics. The main difference is that Piponi’s machine is slightly ‘worse’ than quantum physics, in that the -½ value is more negative than anything you get there.

The two examples I did were ones where all three questions have definite yes/no answers, but my method of starting from a state of ignorance and adding on corrections carries over in the obvious way when you have a probability distribution over ‘yes’ and ‘no’. As an example, say you have a 0.8 probability of ‘no’ for the first question. Then you add 0.8 times the correction matrix for ‘no’, with the negative probabilities on the left hand side, and 0.2 times the correction matrix for ‘no’, with the negative probabilities on the right hand side. That’s all there is to it. Just to spell it out I’ll add the general formula: if the three questions have answer ‘no’ with probabilities $q_1$, $q_2$, $q_3$ respectively, then we assign probabilities to the cells as follows:

(If you’re wondering where the $W$ comes from, it’s just the usual letter used to label this thing – it stands for ‘Wigner’, and is a discrete version of his Wigner function.)

It turns out that all examples in quantum physics are of the type where you don’t have certain knowledge of the answers to all three questions. It’s possible to know the answer to one of them for certain, but then you have to be completely ignorant about the other two, and assign probability ½ to both answers. More usually, you will have partial information about all three questions, with a constraint that the total information you get about the system is at most half the total possible information, in a specific technical sense. To go into this in detail will require some more maths, which I’ll get to in the next post.

## SimCity Bricolage

In the references to The World Beyond Your Head I found an intriguing paper by Mizuko Ito, Mobilizing Fun in the Production and Consumption of Children’s Software, following the interactions between children and adults at an after-school computer club. It’s written in a fairly heavy dialect of academicese, but the dialogue samples are fascinating. Here a kid, Jimmy, is playing SimCity 2000, with an undergrad, Holly, watching:

J: (Budget window comes up and Jimmy dismisses it.) Yeah. I’m going to bulldoze a skyrise here. (Selects bulldozer tool and destroys building.) OK. (Looks at H.) Ummm! OK, wait, OK. Should I do it right here?

H: Sure, that might work… that way. You can have it …

J: (Builds highway around city.) I wonder if you can make them turn. (Builds highway curving around one corner) Yeah, okay.

H: You remember, you want the highway to be … faster than just getting on regular streets. So maybe you should have it go through some parts.

J: (Dismisses budget pop-up window. Points to screen.) That’s cool! (Inaudible.) I can make it above?

H: Above some places, I think. I don’t know if they’d let you, maybe not.

J: (Moves cursor over large skyscraper.) That’s so cool!

H: Is that a high rise?

J: Yeah. I love them.

H: Is it constantly changing, the city? Is it like …

J: (Builds complicated highway intersection. Looks at H.)

H: (Laughs.)

J: So cool. (Builds more highway grids in area, creating a complex overlap of four intersections.)

H: My gosh, you’re going to have those poor drivers going around in circles.

J: I’m going to erase that all. I don’t like that, OK. (Bulldozes highway system and blows up a building in process.) Ohhh …

H: Did you just blow up something else?

J: Yeah. (Laughs.)

H: (Laughs.)

J: I’m going to start a new city. I don’t understand this one. I’m going to start with highways. (Quits without saving city.)

As Ito puts it, “by the end Jimmy has wasted thousands of dollars on a highway to nowhere, blown up a building, and trashed his city.” So what’s the point of playing the game in this way?

Well, for a start, it lets him make cool stuff and then blow it up. That might be all the explanation we need! But I think he’s also doing something genuinely useful for understanding the game itself.

Ito mainly seems to be interested in the social dynamics of the situation – the conflict between Jimmy finding ‘fun’, ‘spectacular’ effects in the game, and Holly trying to drag him back to more ‘educational’ behaviours. I can see that too, but I’m interested in a slightly different reading.

To my mind, Jimmy is ‘sketching’: he’s finding out what the highway tool can do as a tool, rather than immediately subsuming it to the overall logic of the game. The highway he’s building is in a pointless location and doesn’t function very well as a highway, but that doesn’t matter. He’s investigating how to make it turn, how to make it intersect with other roads, how to raise it above ground level. While focussed on this, he ignores any more abstract considerations that would pull him out of engagement with the tool. For example, he dismisses the budget popup as fast as he can, so that he can get back to bulldozing buildings.

Now he knows what the tool does, he may as well just trash the current city and start a new one where he can use his knowledge in a more productive way. His explorations are useless in the context of the current game, but will give him raw material to work with later in a different city, where he might need a fancy junction or an overhead highway.

I first wrote a version of this for the newsletter last year. Reading it back this time, I noticed something else: Jimmy’s explorations are a great example of bricolage. I first learned this term from Sherry Turkle and Seymour Papert’s Epistemological Pluralism and the Revaluation of the Concrete, which I talked about here once before. In Turkle and Papert’s sense of the word, adapted from Lévi-Strauss, bricolage is a particular style of programming computers:

Bricoleurs construct theories by arranging and rearranging, by negotiating and renegotiating with a set of well-known materials.

… They are not drawn to structured programming; their work at the computer is marked by a desire to play with the elements of the program, to move them around almost as though they were material elements — the words in a sentence, the notes on a keyboard, the elements of a collage.

… bricoleur programmers, like Levi-Strauss’s bricoleur scientists, prefer negotiation and rearrangement of their materials. The bricoleur resembles the painter who stands back between brushstrokes, looks at the canvas, and only after this contemplation, decides what to do next. Bricoleurs use a mastery of associations and interactions. For planners, mistakes are missteps; bricoleurs use a navigation of midcourse corrections. For planners, a program is an instrument for premeditated control; bricoleurs have goals but set out to realize them in the spirit of a collaborative venture with the machine. For planners, getting a program to work is like ”saying one’s piece”; for bricoleurs, it is more like a conversation than a monologue.

One example in the paper is ‘Alex, 9 years old, a classic bricoleur’, who comes up with a clever repurposing of a Lego motor:

When working with Lego materials and motors, most children make a robot walk by attaching wheels to a motor that makes them turn. They are seeing the wheels and the motor through abstract concepts of the way they work: the wheels roll, the motor turns. Alex goes a different route. He looks at the objects more concretely; that is, without the filter of abstractions. He turns the Lego wheels on their sides to make flat ”shoes” for his robot and harnesses one of the motor’s most concrete features: the fact that it vibrates. As anyone who has worked with machinery knows, when a machine vibrates it tends to ”travel,” something normally to be avoided. When Alex ran into this phenomenon, his response was ingenious. He doesn’t use the motor to make anything ”turn,” but to make his robot (greatly stabilized by its flat ”wheel shoes”) vibrate and thus ”travel.” When Alex programs, he likes to keep things similarly concrete.

This is a similar mode of investigation to Jimmy’s. He’s seeing what kinds of things the motor and wheels can do, as part of an ongoing conversation with his materials, without immediately subsuming them to the normal logic of motors and wheels. In the process, he’s discovered something he wouldn’t have done if he’d just made a normal car. Similarly, Jimmy will have more freedom with the highway tool in the future than if he followed all the rules about budgets and city planning before he understood everything that it can do.

Alternatively, maybe I’m massively overanalysing this short contextless stretch of dialogue, and Jimmy just likes making stuff explode. Maybe he just keeps making and trashing a series of similarly broken cities for the sheer fun of it. Either way, mashing these two papers together has been a fun piece of bricolage of my own.

## Book Review: The World Beyond Your Head

I wrote a version of this for the newsletter last year and decided to expand it out into a post. I’ve also added in a few thoughts based on an email conversation about the book with David MacIver a while back, and a few more thoughts inspired by a more recent post.

This wasn’t a book I’d been planning to read. In fact, I’d never even heard of it. I was just working in the library one day, and the cover caught my attention. It’s been given the subtitle ‘How To Flourish in an Age of Distraction’, and it looks like the publisher has tried to sell it as a sort of book-length version of one of those hand-wringers in The Atlantic about how we all gawp at our phones too much. I’m a sucker for those. This is a bit pathetic, I know, but there are certain repetitive journalist topics that I like simply because they’re repetitive, and the repetition has given them a comfortingly familiar texture, and ‘we all gawp at our phones too much’ is one of them. So I had a flick through.

The actual contents turned out to be less comfortingly familiar, but a lot more interesting. Actually, I recognised a lot of it! Merleau-Ponty on perception… Polanyi on tacit knowledge… lots of references to embodied cognition. This looks like my part of the internet! I hadn’t seen this set of ideas collected together in a pop book before, so I thought I’d better read it.

The author, Matthew Crawford, previously wrote a book called Shop Class As Soul Craft, on the advantages of working in the skilled trades. In this one he zooms out further to take a more philosophical look at why working with your hands with real objects is so satisfying. There’s a lot of good stuff in the book, which I’ll get to in a minute.  I still struggled to warm to it, though, despite it being full of topics I’m really interested in. Some of this was just a tone thing. He writes in a style I’ve seen before and don’t get on with – I’m not American and can’t place it very exactly, but I think it’s something like ‘mild social conservatism repackaged for the educated coastal elite’. According to Wikipedia he writes for something called The New Atlantis, which may be of the places this style comes from. I don’t know. There’s also a more generic ‘get off my lawn’ thing going on, where we are treated to lots of anecdotes about how the airport is too loud and there’s too much advertising and children’s TV is terrible and he can’t change the music in the gym.

The oddest thing for me was his choice of pronouns for the various example characters he makes up throughout the book to illustrate his points. This is always a pain because every option seems to annoy someone, but using ‘he’ consistently would at least have fitted the grumpy old man image quite well. Maybe his editor told him not to do that, though, or maybe he has some kind of point to make, because what he actually decided to do was use a mix of ‘he’ and ‘she’, but only ever pick the pronoun that fits traditional expectations of what gender the character would be. Because he mostly talks about traditionally masculine occupations, this means that maybe 80% of the characters, and almost all of the sympathetic ones, are male – all the hockey players, carpenters, short-order cooks and motorcycle mechanics he’s using to demonstrate skilled interaction with the environment. The only female characters I remember are a gambling addict, a New Age self-help bore, a disapproving old lady, and one musician who actually gets to embody the positive qualities he’s interested in. It’s just weird, and I found it very distracting.

OK, that’s all my whining about tone done. I have some more substantive criticisms later, but first I want to talk about some of the things I actually liked. Underneath all the owning-the-libs surface posturing he’s making a subtle and compelling argument. Unpacking this argument is quite a delicate business, and I kind of understand why the publishers just rounded it off to the gawping at phones thing.

# Violins and slot machines

Earlier, I said that Crawford is out to explain ‘why working with your hands with real objects is so satisfying’, but actually he’s going for something a little more nuanced and specific than that. Not all real objects are satisfying to work with. Here’s his discussion of one that isn’t, at least for an adult:

When my oldest daughter was a toddler, we had a Leap Frog Learning Table in the house. Each side of the square table presents some sort of electromechanical enticement. There are four bulbous piano keys; a violin-looking thing that is played by moving a slide rigidly located on a track; a transparent cylinder full of beads mounted on an axle such that any attempt, no matter how oblique, makes it rotate; and a booklike thing with two thick plastic pages in it.

… Turning off the Leap Frog Learning Table would produce rage and hysterics in my daughter… the device seemed to provide not just stimulation but the experience of agency (of a sort). By hitting buttons, the toddler can reliably make something happen.

The Leap Frog Learning Table is designed to take very complicated data from the environment – toddlers bashing the thing any old how, at any old speed or angle – and funnel this mess into a very small number of possible outcomes. The ‘violin-looking thing’ has only one translational degree of freedom, along a single track. Similarly, the cylinder can only be rotated around one axis. So the toddler’s glancing swipe at the cylinder is not dissipated into uselessness, but instead produces a satisfying rolling motion – they get to ‘make something happen’.

This is extremely satisfying for a toddler, who struggles to manipulate the more resistant objects of the adult world. But there is very little opportunity for growth or mastery there. The toddler has already mastered the toy to almost its full extent. Hitting the cylinder more accurately might make it spin for a bit longer, but it’s still pretty much the same motion.

At the opposite end of the spectrum would be a real violin. I play the violin, and you could describe it quite well as a machine for amplifying tiny changes in arm and hand movements into very different sounds (mostly horrible ones, which is why beginners sound so awful). There are a large number of degrees of freedom – the movements of the each jointed finger in three dimensional space, including those on the bow hand, contribute to the final sound. Also, almost all of them are continuous degrees of freedom. There are no keys or frets to accommodate small mistakes in positioning.

Crawford argues that although tools and instruments that transmit this kind of rich information about the world can be very frustrating in the short term, they also have enormous potential for satisfaction in the long term as you come to master them. Whereas objects like the Leap Frog Learning Table have comparatively little to offer if you’re not two years old:

Variations in how you hit the button on a Leap Frog Learning Table or a slot machine do not similarly produce variations in the effect you produce. There is a closed loop between your action and the effect that you perceive, but the bandwidth of variability has been collapsed… You are neither learning something about the world, as the blind man does with his cane, nor acquiring something that could properly be called a skill. Rather, you are acting within the perception-action circuits encoded in the narrow affordances of the game, learned in a few trials. This is a kind of autistic pseudo-action, based on exact repetition, and the feeling of efficacy that it offers evidently holds great appeal.

(As a warning, Crawford consistently uses ‘autistic’ in this derogatory sense throughout the book; if that sounds unpleasant, steer clear.)

Objects can also be actively deceptive, rather than just tediously simple. In the same chapter there’s some interesting material on gambling machines, and the tricks used to make them addictive. Apparently one of the big innovations here was the idea of ‘virtual reel mapping’. Old-style mechanical fruit machines would have three actual reels with images on them that you needed to match up, and just looking at the machine would give you a rough indication of the total number of images on the reel, and therefore the rough odds of matching them up and winning.

Early computerised machines followed this pattern, but soon the machine designers realised that there no longer needed to be this close coupling between the machine’s internal representation and what the gambler sees. So the newer machines would have a much larger number of virtual reel positions that are mostly mapped to losing combinations, with a large percentage of these combinations being ‘near misses’ to make the machine more addictive. The machine still looks simple, like the toddler’s toy, but the intuitive sense of odds you get from watching the machine becomes completely useless, because the internal logic of the machine is now doing something very complicated that the screen actively hides from you. A machine like this is actively ‘out to get you’, alienating you from the evidence of your own eyes.

# Apples and sheep

Before reading the book I’d never really thought carefully about any of these properties of objects. For a while after reading it, I noticed them everywhere. Here’s one (kind of silly) example.

Shortly after reading the book I was visiting my family, and came across this wooden puzzle my aunt made:

I had a phase when I was ten or so where I was completely obsessed with this puzzle. Looking back, it’s not obvious why. It’s pretty simple and looks like the kind of thing that would briefly entertain much younger children. I was a weird kid and also didn’t have a PlayStation – maybe that’s explanation enough? But I didn’t have some kind of Victorian childhood where I was happy with an orange and a wooden toy at Christmas. I had access to plenty of plastic and electronic nineties tat that was more obviously fun.

I sat down for half an hour to play with this thing and try and remember what the appeal was. The main thing is that it turns out to be way more controllable than you might expect. The basic aim of the puzzle is just to get the ball bearings in the holes in any old order. This is the game that stops being particularly rewarding once you’re over the age of seven. But it’s actually possible to learn to isolate individual ball bearings by bashing them against the sides until one separates off, and then tilt the thing very precisely to steer one individually into a specific hole. That gives you a lot more options for variants on the basic game. For example, you can fill in the holes in a spiral pattern starting from the middle. Or construct a ‘fence’ of outer apples with a single missing ‘gate’ apple, steer two apples into the central pen (these apples are now sheep), and then close the gate with the last one.

The other interesting feature is that because this is a homemade game, the holes are not uniformly deep. The one in the top right is noticeably shallower than the others, and the ball bearing in this slot can be dislodged fairly easily while keeping the other nine in their place. This gives the potential for quite complicated dynamics of knocking down specific apples, and then steering other ones back in.

Still an odd way to have spent my time! But I can at least roughly understand why. The apple puzzle is less like the Leap Frog Learning Table than you might expect, and so the game can reward a surprisingly high level of skill. Part of this is from the continuous degrees of freedom you have in tilting the board, but the cool thing is that a lot of it comes from unintentional parts of the physical design. My aunt made the basic puzzle for small children, and the more complicated puzzles happened to be hidden within it.

The ability to dislodge the top right apple is not ‘supposed’ to be part of the game at all – an abstract version you might code up would have identical holes. But the world is going about its usual business of being incorrigibly plural, and there is just way more of it than any one abstract ruleset needs. The variation in the holes allows some of that complexity to accidentally leak in, breaking the simple game out into a much richer one.

# Pebbles and birdsong

Now for the criticism part. I think there’s a real deficiency in the book that goes deeper than the tone issues I pointed out at the start. Crawford is insightful in his discussions of the kind of complexity that many handcrafted objects exhibit, that’s often standardised away in the modern world. But in his enthusiasm for people doing real things with real tools he’s forgotten the advantages of the systematised, ‘autistic’ world he dislikes. Funnelling messy reality into repeatable categories is how we get shit done at scale. It’s not just some unpleasant feature of modernity, either. Even something as simple as counting with pebbles relies on this:

To make the method work, you must choose bits-of-rock of roughly even sizes, so you can distinguish them from littler bits—stray grains of sand or dust in the bucket—that don’t count. How even? Even enough that you can make a reliable-enough judgement.

The counting procedure abstracts away the vivid specificity of the individual pebbles, and reduces them to simplistic interchangeable tokens. But there’s not much point in complaining about this. You need to do this to get the job done! And you can always break them back out into individuality later on if you want to do something else, like paint a still life of them.

I’m finding myself going back yet again to Christopher Norris’s talk on Derrida, which I discussed in my braindump here. (I’m going to repeat myself a bit in the next section. This was the most thought-provoking single thing I read last year, and I’m still working through the implications, so everything seems to lead back there at the moment.) Derrida picks apart some similar arguments by Rousseau, who was concerned with the bad side of systematisation in music:

One way of looking at Rousseau’s ideas about the melody/harmony dualism is to view them as the working-out of a tiff he was having with Rameau. Thus he says that the French music of his day is much too elaborate, ingenious, complex, ‘civilized’ in the bad (artificial) sense — it’s all clogged up with complicated contrapuntal lines, whereas the Italian music of the time is heartfelt, passionate, authentic, spontaneous, full of intense vocal gestures. It still has a singing line, it’s still intensely melodious, and it’s not yet encumbered with all those elaborate harmonies.

Crawford is advocating for something close to Rousseau’s pure romanticism. He brings along more recent and sophisticated arguments from phenomenology and embodied cognition, but he’s still very much on the side of spontaneity over structure. And I think he’s still vulnerable to the same arguments that Derrida was able to use against Rousseau. Norris explains it as follows:

… Rousseau gets into a real argumentative pickle when he say – lays it down as a matter of self-evident truth – that all music is human music. Bird-song just doesn’t count, he says, since it is merely an expression of animal need – of instinctual need entirely devoid of expressive or passional desire – and is hence not to be considered ‘musical’ in the proper sense of that term. Yet you would think that, given his preference for nature above culture, melody above harmony, authentic (spontaneous) above artificial (‘civilized’) modes of expression, and so forth, Rousseau should be compelled – by the logic of his own argument – to accord bird-song a privileged place vis-à-vis the decadent productions of human musical culture. However Rousseau just lays it down in a stipulative way that bird-song is not music and that only human beings are capable of producing music. And so it turns out, contrary to Rousseau’s express argumentative intent, that the supplement has somehow to be thought of as always already there at the origin, just as harmony is always already implicit in melody, and writing – or the possibility of writing – always already implicit in the nature of spoken language.

Derrida is pointing out that human music always has a structured component. We don’t just pour out a unmarked torrent of frequencies. We define repeatable lengths of notes, and precise intervals between pitches. (The evolution of these is a complicated engineering story in itself.) This doesn’t make music ‘inauthentic’ or ‘artificial’ in itself. It’s a necessary feature of anything we’d define as music.

I’d have been much happier with the book if it had some understanding of this interaction – ‘yes, structure is important, but I think we have too much of it, and here’s why’. But all we get is the romantic side. As with Rousseau’s romanticism, this tips over all too easily into pure reactionary nostalgia for an imagined golden age, and then we have to listen to yet another anecdote about how everything in the modern world is terrible. It’s not the eighteenth century any more, and we can do better now. And for all its genuine insight, this book mostly just doesn’t.

## Book Review: The Eureka Factor

Last month I finally got round to reading The Eureka Factor by John Kounios and Mark Beeman, a popular book summarising research on ‘insightful’ thinking. I first mentioned it a couple of years ago after I’d read a short summary article, when I realised it was directly relevant to my recurring ‘two types of mathematician’ obsession:

The book is not focussed on maths – it’s a general interest book about problem solving and creativity in any domain. But it looks like it has a very similar way of splitting problem solvers into two groups, ‘insightfuls’ and ‘analysts’. ‘Analysts’ follow a linear, methodical approach to work through a problem step by step. Importantly, they also have cognitive access to those steps – if they’re asked what they did to solve the problem, they can reconstruct the argument.

Of course, nobody is really a pure ‘insightful’ or ‘analyst’. And most significant problems demand a mixed strategy. But it does seem like many people have a tendency towards one or the other.

I wasn’t too sure what I was getting into. The replication crisis has made me hyperaware of the dangers of uncritically accepting any results in psychology, and I’m way too ignorant of the field to have a good sense for which results still look plausible. However, the book turned out to be so extraordinarily Relevant To My Interests that I couldn’t resist writing up a review anyway.

The final chapters had a few examples along the lines of ‘[weak environmental effect] primes people to be more/less insightful’, and I know enough to stay away from those, but the earlier parts look somewhat more solid to me. I haven’t made much effort to trace back references, though, and I could easily still be being too credulous.

(I didn’t worry so much about replication with my previous post on the Cognitive Reflection Test. Getting the bat and ball question wrong is hardly the kind of weak effect that you need a sensitive statistical instrument to detect. It’s almost impossible to stop people getting it wrong! I did steer clear of any more dubious priming-style results, though, like the claim that people do better on the CRT when reading it ‘in a disfluent font’.)

## Insight and intuition

First, it’s worth getting clear on exactly what Kounious and Beeman mean by ‘insight’. As they use it, insight is a specific type of creative thinking, which they define more generally as ‘the ability to reinterpret something by breaking it down into its elements and recombining these elements in a surprising way to achieve some goal.’ Insight is distinguished by its suddenness and lack of conscious control:

When this kind of creative recombination takes place in an instant, it’s an insight. But recombination can also result from the more gradual, conscious process that cognitive psychologists call “analytic” thought. This involves methodically and deliberately considering many possibilities until you find the solution. For example, when you’re playing a game of Scrabble, you must construct words from sets of letters. When you look at the set of letters “A-E-H-I-P-N-Y-P” and suddenly realize that they can form the word “EPIPHANY,” then that would be an insight. When you systematically try out different combinations of the letters until you find the word, that’s analysis.

Insights tend to have a few other features in common. Solving a problem by insight is normally very satisfying: the insight comes into consciousness along with a small jolt of positive affect. The insight itself is usually preceded by a longer period of more effortful thought about the problem. Sometimes this takes place just before the moment of insight, while at other times there is an ‘incubation’ phase, where the solution pops into your head while you’ve taken a break from deliberately thinking about it.

I’m not really going to get into this part in my review, but the related word ‘intuition’ is also used in an interestingly specific sense in the book, to describe the sense that a new idea is lurking beneath the surface, but is not consciously accessible yet. Intuitions often precede an insight, but have a different feel to the insight itself:

This puzzling phenomenon has a strange subjective quality. It feels like an idea is about to burst into your consciousness, almost as though you’re about to sneeze. Cognitive psychologists call this experience “intuition,” meaning an awareness of the presence of information in the unconscious mind — a new idea, solution, or perspective — without awareness of the information itself, at least until it pops into consciousness.

## Insight problems

To study insight, psychologists need to come up with problems that reliably trigger an insight solution. One classic example discussed in The Eureka Factor is the Nine Dot Problem, where you are asked to connect the following set of black dots using only four lines, without taking your pen off the page:

If you’ve somehow avoided seeing this puzzle before, think about it for a while first. In the absence of any kind of built-in spoiler blocks for wordpress.com sites, I’ll insert a bunch of blank space here so that you hopefully have to scroll down off your current screen to see my discussion of the solution:

If you didn’t figure it out, a solution can be found in the Wikipedia article on insight problems here. It’ll probably look irritatingly obvious once you see it. The key feature of the solution is that the lines you draw have to extend outside the confines of the square of dots you start with (thus spawning a whole subgenre of annoying business literature on ‘thinking outside the box’). Nothing in the rules forbids this, but the setup focusses most people’s attention on the grid itself, and breaking out of this mindset requires a kind of reframing, a throwing away of artificially imposed constraints. This is a common characteristic of insight problems.

This characteristic also makes insight hard to test. For testing purposes, it’s useful to have a large stock of similar puzzles in hand. But a good reframing like the one in the Nine Dot Problem tends to be a bit of a one-off: once you’ve had the idea of extending the lines outside the box, it applies trivially to all similar puzzles, and not at all to other types of puzzle.

(I talked about something similar in my last post, on the Cognitive Reflection Test. The test was inspired by one good puzzle, the ‘bat and ball problem’, and adds two other questions that were apparently picked to be similar. Five thousand words and many comments later, it’s not obvious to me or most of the other commenters that these three problems form any kind of natural set at all.)

Kounios and Beeman discuss several of these eyecatching ‘one-off’ problems in the book, but their own research that they discuss is focussed on a more standardisable kind of puzzle, the Remote Associates Test. This test gives you three words, such as

PINE
CRAB
SAUCE

and asks you to find the common word that links them. The authors claim that these can be solved either with or without insight, and asked participants to self-categorise their responses as either fitting in the ‘insightful’ or ‘analytic’ categories:

The analytic approach is to consciously search through the possibilities and try out potential answers. For example, start with “pine.” Imagine yourself thinking: What goes with “pine”? Perhaps “tree”? “Pine tree” works. “Crab tree”? Hmmm … maybe. “Tree sauce”? No. Have to try something else. How about “cake”? “Crab cake” works. “Cake sauce” is a bit of a reach but might be acceptable. However, “pine cake” and “cake pine” definitely don’t work. What else? How about “crabgrass”? That works. But “pine grass”? Not sure. Perhaps there is such a thing. But “sauce grass” and “grass sauce” are definitely out. What else goes with “sauce”? How about “applesauce”? That’s good. “Pineapple” and “crab apple” also work. The answer is “apple”!

This is analytical thought: a deliberate, methodical, conscious search through the possible combinations. But this isn’t the only way to come up with the solution. Perhaps you’re trying out possibilities and get stuck or even draw a blank. And then, “Aha! Apple” suddenly pops into your awareness. That’s what would happen if you solved the problem by insight. The solution just occurs to you and doesn’t seem to be a direct product of your ongoing stream of thought.

This categorisation seems suspiciously neat, and if I rely on my own introspection for solving one of these (which is obviously dubious itself) it feels like more of a mix. I’ll often generate some verbal noise about cakes and trees that sounds vaguely like I’m doing something systematic, but the main business of solving the thing seems to be going on nonverbally elsewhere. But I do think there’s something there – the answer can be very immediate and ‘poppy’, or it can surface after a longer and more accessible process of trying plausible words. This was tested in a more objective way by seeing what people do when they don’t come up with the answer:

Insightfuls made more “errors of omission.” When waiting for an insight that hadn’t yet arrived, they had nothing to offer in its place. So when the insight didn’t arrive in time, they let the clock run out without having made a guess. In contrast, Analysts made more “errors of commission.” They rarely timed out, but instead guessed – sometimes correctly – by offering the potential solution they had been consciously thinking about when their time was almost up.

Kounios and Beeman’s research focussed on finding neural correlates of the ‘aha’ moment of insight, using a combination of an EEG test to pinpoint the time of the insight, and fMRI scanning to locate the brain region:

We found that at the moment a solution pops into someone’s awareness as an insight, a sudden burst of high-frequency EEG activity known as “gamma waves” can be picked up by electrodes just above the right ear. (Gamma waves represent cognitive processing in the brain, such as paying attention to something or linking together different pieces of information.) We were amazed at the abruptness of this burst of activity—just what one would expect from a sudden insight. Functional magnetic resonance imaging showed a corresponding increase in blood flow under these electrodes in a part of the brain’s right temporal lobe called the “anterior superior temporal gyrus” (see figure 5.2), an area that is involved in making connections between distantly related ideas, as in jokes and metaphors. This activity was absent for analytic solutions.

So we had found a neural signature of the aha moment: a burst of activity in the brain’s right hemisphere.

I’m not sure how settled this is, though. I haven’t tried to do a proper search of the literature, but certainly a review from 2010 describes the situation as very much in flux:

A recent surge of interest into the neural underpinnings of creative behavior has produced a banquet of data that is tantalizing but, considered as a whole, deeply self-contradictory.

(The book was published somewhat later, in 2015, but mostly cites research from prior to this review, such as this paper.)

As an outsider it’s going to be pretty hard for me to judge this without spending a lot more time than I really want to right now. However, regardless of how this holds up, I was really interested in the authors’ discussion of why a right-hemisphere neural correlate of insight would make sense.

## Insight and context

One of the authors, Mark Beeman, had previously studied language deficits in people who had suffered brain damage to the right hemisphere. One such patient was the trial attorney D.B.:

What made D.B. “lucky” was that the stroke had damaged his right hemisphere rather than his left. Had the stroke occurred in the mirror-image left-hemisphere region, he would have experienced Wernicke’s aphasia, a profound deficit of language comprehension. In the worst cases, people with Wernicke’s aphasia may be completely unable to understand written or spoken language.

Nevertheless, D.B. didn’t feel lucky. He may have been better off than if he’d had a left-hemisphere stroke, but he felt that his language ability was far from normal. He said that he “couldn’t keep up” with conversations or stories the way he used to. He felt impaired enough that he had stopped litigating trials—he thought that it would have been a disservice to his clients to continue to represent them in court.

D.B. and the other patients were able to understand the straightforward meanings of words and the literal meanings of sentences. Even so, they complained about vague difficulties with language. They failed to grasp the gist of stories or were unable to follow multiple-character or multiple-plot stories, movies, or television shows. Many didn’t get jokes. Sarcasm and irony left them blank with incomprehension. They could sometimes muddle along without these abilities, but whenever things became subtle or implicit, they were lost.

An example of the kind of problem D.B. struggled with is the following:

Saturday, Joan went to the park by the lake. She was walking barefoot in the shallow water, not knowing that there was glass nearby. Suddenly, she grabbed her foot in pain and called for help, and the lifeguard came running.

If D.B. was given a statement about something that occurred explicitly in the text, such as ‘Joan went to the park on Saturday’, he could say whether it was true or false with no problems at all. In fact, he did better than all of the control subjects on these sorts of explicit questions. But if he was instead presented with a statement like ‘Joan cut her foot’, where some of the facts are left implicit, he was unable to answer.

This was interesting to me, because it seems so directly relevant to the discussion last year on ‘cognitive decoupling’. This is a term I’d picked up from Sarah Constantin, who herself got it from Keith Stanovich:

Stanovich talks about “cognitive decoupling”, the ability to block out context and experiential knowledge and just follow formal rules, as a main component of both performance on intelligence tests and performance on the cognitive bias tests that correlate with intelligence. Cognitive decoupling is the opposite of holistic thinking. It’s the ability to separate, to view things in the abstract, to play devil’s advocate.

The patients in Beeman’s study have so much difficulty with contextualisation that they struggle with anything at all that is left implicit, even straightforward inferences like ‘Joan cut her foot’. This appears to match with other evidence from visual half-field studies, where subjects are presented with words on either the right or left half of the visual field. Those on the left half will go first to the right hemisphere, so that the right hemisphere gets a head start on interpreting the stimulus. This shows a similar difference between hemispheres:

The left hemisphere is sharp, focused, and discriminating. When a word is presented to the left hemisphere, the meaning of that word is activated along with the meanings of a few closely related words. For example, when the word “table” is presented to the left hemisphere, this might strongly energize the concepts “chair” and “kitchen,” the usual suspects, so to speak. In contrast, the right hemisphere is broad, fuzzy, and promiscuously inclusive. When “table” is presented to the right hemisphere, a larger number of remotely related words are weakly invoked. For example, “table” might activate distant associations such as “water” (for underground water table), “payment” (for paying under the table), “number” (for a table of numbers), and so forth.

Why would picking up on these weak associations be relevant to insight? The story seems to be that this tangle of secondary meanings – the ‘Lovecraftian penumbra of monstrous shadow phalanges’ – works to pull your attention away from the obvious interpretation you’re stuck with, helping you to find a clever new reframing of the problem.

This makes a lot of sense to me as a rough outline. In my own experience at least, the kind of thinking that is likely to lead to an insight experience feels softer and more diffuse than the more ‘analytic’ kind, more a process of sort of rolling the ideas around gently in your head and seeing if something clicks than a really focussed investigation of the problem. ‘Thinking too hard’ tends to break the spell. This fits well with the idea that insights are triggered by activation of weak associations.

## Final thoughts

There’s a lot of other interesting material in the book about the rest of the insight process, including the incubation period leading up to an insight flash, and the phenomenon of ‘intuitions’, where you feel that an insight is on its way but you don’t know what it is yet. I’ll never get through this review if I try to cover all of that, so instead I’m going to finish up with a couple of weak associations of my own that got activated while reading the book.

I’ve been getting increasingly dissatisfied with the way dual process theories split cognition into a fast/automatic/intuitive ‘System 1’ and a slow/effortful/systematic ‘System 2’. System 1 in particular has started to look to me like an amorphous grab bag of all kinds of things that would be better separated out.

The Eureka Factor has pushed this a little further, by bringing out a distinction between two things that normally get lumped under System 1 but are actually very different. One obvious type of System 1-ish behaviour is routine action, the way you go about tasks you have done many times before, like making a sandwich or walking to work. These kinds of activities require very little explicit thought and generally ‘just happen’ in response to cues in the environment.

The kind of ‘insightful’ thinking discussed in The Eureka Factor would also normally get classed under System 1: it’s not very systematic and involves a fast, opaque process where the answer just pops into your head without much explanation. But it’s also very different to routine action. It involves deliberately choosing to think about a new situation, rather than one you have seen many times before, and a successful insight gives you a qualitatively new kind of understanding. The insight flash itself is a very noticeable, enjoyable feature of your conscious attention, rather than the effortless, unexamined state of absorbed action.

This was pointed out to me once before by Sarah Constantin, in the comments section of her Distinctions in Types of Thought:

You seem to be lumping “flashes of insight” in with “effortless flow-state”. I don’t think they’re the same. For one thing, inspiration generally comes in bursts, while flow-states can persist for a while (driving on a highway, playing the piano, etc.) Definitely, “flashes of insight” aren’t the same type of thought as “effortful attention” — insight feels easy, instant, and unforced. But they might be their own, unique category of thought. Still working out my ontology here.

I’d sort of had this at the back of my head since then, but the book has really brought out the distinction clearly. I’m sure these aren’t the only types of thinking getting shoved into the System 1 category, and I get the sense that there’s a lot more splitting out that I need to do.

I also thought about how the results in the book fit in with my perennial ‘two types of mathematician’ question. (This is a weird phenomenon I’ve noticed where a lot of mathematicians have written essays about how mathematicians can be divided into two groups; I’ve assembled a list of examples here.) ‘Analytic’ versus ‘insightful’ seems to be one of the distinctions between groups, at least. It seems relevant to Poincaré’s version, for instance:

The one sort are above all preoccupied with logic; to read their works, one is tempted to believe they have advanced only step by step, after the manner of a Vauban who pushes on his trenches against the place besieged, leaving nothing to chance.

The other sort are guided by intuition and at the first stroke make quick but sometimes precarious conquests, like bold cavalrymen of the advance guard.

In fact, Poincaré once also gave a striking description of an insight flash himself:

Just at this time, I left Caen, where I was living, to go on a geologic excursion under the auspices of the School of Mines. The incidents of the travel made me forget my mathematical work. Having reached Coutances, we entered an omnibus to go some place or other. At the moment when I put my foot on the step, the idea came to me, without anything in my former thoughts seeming to have paved the way for it, that the transformations I had used to define the Fuchsian functions were identical with those of non-Euclidean geometry. I did not verify the idea; I should not have had time, as, upon taking my seat in the omnibus, I went on with a conversation already commenced, but I felt a perfect certainty. On my return to Caen, for conscience’ sake, I verified the result at my leisure.

If the insight/analysis split is going to be relevant here, it would require that people favour either ‘analytic’ or ‘insight’ solutions as a general cognitive style, rather than switching between them freely depending on the problem. The authors do indeed claim that this is the case:

Most people can, and to some extent do, use both of these approaches. A pure type probably doesn’t exist; each person falls somewhere on an analytic-insightful continuum. Yet many—perhaps most—people tend to gravitate toward one of these styles, finding their particular approach to be more comfortable or natural.

This is based on their own research where they recorded participant’s self-report of whether they were using a ‘insight’ or ‘analytic’ approach to solve anagrams, and compared it with EEG recordings of their resting state. They found a number of differences, including more right-hemisphere activity in the ‘insight’ group, and lower levels of communication between the frontal lobe and other parts of the brain, indicating a more disorderly thinking style with less top-down control. This may suggest more freedom to allow weak associations between thoughts to have a crack at the problem, without being overruled by the dominant interpretation.

Again, and you’re probably got very bored of this disclaimer, I have no idea how well the details of this will hold up. That’s true for pretty much every specific detail in the book that I’ve discussed here. Still, the link between insight and weak associations makes a lot of sense to me, and the overall picture certainly triggered some useful reframings. That seems very appropriate for a book about insight.