Funny Turns

After a discussion about obscure Google Scholar hits on twitter last night, I just remembered this long list I made a few years ago. If you dig around postmodern/continental stuff long enough you discover there are a lot of Turns. Linguistic turns, rhetorical turns, hermeneutic turns… I never really did figure out what it was all about (potential speedrun question?)

But what are the weirder ones? I previously did this with the ‘X and its Discontents’ snowclone, which was funnier because people use it for very specific things like Newport or the Lawn Chemical Economy. This time it was mostly long boring abstract adjectives, which is maybe why I never published it. Still, here they are…

Linguistic
Postmodern
Hermeneutic
Interpretive
Mobility
Affective
Boy
Pragmatic
Practice
Cultural
Cognitive
Communicative
Corporeal
Complexity
Constructive
Constructivist
Spatial
Social
Sociological
Sociopolitical
Argumentative
Multilingual
Relational
Semantic
Semiotic
Structural
Systemic
Governance
Ontological
Reflexive
Rhetorical
Computational
Digital
Empirical
Ideational
Educational
Postsecular
Spiritual
Ideological
Action
Local
Narrative
Translational
Demotic
Archival
Performative
Deliberative
Iconic
Postcolonial
Decolonial
Territorial
Infrastructure
Intersectional
Neuroscientific
Transnational
Descriptive
Practical
Material
Participatory
Deconstructive
Leaderist
Cosmopolitan
Biographical
Spectral
Qualitative
Moral
Normative
Visual
Theoretical
Curatorial
Evolutionary
Ecological
Algorithmic
Neoliberal
Intercultural
Ethnographic
Consumerist
Geological
Animal

Speedrun: “Sensemaking”

This is a genre of post I’ve been experimenting with where I pick a topic, set a one hour timer and see what I can find out in that time. Previously: Marx on alienation and the Vygotsky Circle.

I’ve been seeing the term ‘sensemaking’ crop up more and more often. I even went to a workshop with the word in the title last year! I quite like it, and god knows we could all do with making more sense right now, but I’m pretty vague on the details. Are there any nuances of meaning that I’m missing by interpreting it in its everyday sense? I have a feeling that it has a kind of ecological tinge, group sensemaking more than individual sensemaking, but I could be off the mark.

Also, what’s the origin of the term? I get the impression that it’s associated with some part of the internet that’s not too distant from my own corner, but I’m not exactly sure which one. Time to find out…


OK start with wikipedia:

https://en.wikipedia.org/wiki/Sensemaking

Sensemaking or sense-making is the process by which people give meaning to their collective experiences. It has been defined as "the ongoing retrospective development of plausible images that rationalize what people are doing" (Weick, Sutcliffe, & Obstfeld, 2005, p. 409). The concept was introduced to organizational studies by Karl E. Weick in the 1970s and has affected both theory and practice.

Who’s Weick?

Karl Edward Weick (born October 31, 1936) is an American organizational theorist who introduced the concepts of "loose coupling", "mindfulness", and "sensemaking" into organizational studies.

And, um, what’s organizational studies?

Organizational studies is "the examination of how individuals construct organizational structures, processes, and practices and how these, in turn, shape social relations and create institutions that ultimately influence people".[1]

OK, something sociology-related. It’s a stub so probably not a huge subfield?

Weick ‘key contributions’ subheadings: ‘enactment’, ‘loose coupling’, ‘sensemaking’, ‘mindfulness’, ‘organizational information theory’

Although he tried several degree programs within the psychology department, the department finally built a degree program specifically for Weick and fellow student Genie Plog called "organizational psychology".[3]

Only quoting this bc Genie Plog is a great name.

So, enactment: ‘certain phenomena are created by being talked about’. Fine.

Loose coupling:

Loose coupling in Weick’s sense is a term intended to capture the necessary degree of flex between an organization’s internal abstraction of reality, its theory of the world, on the one hand, and the concrete material actuality within which it finally acts, on the other.

Hm that could be interesting but might take me too far off topic.

Sensemaking:

People try to make sense of organizations, and organizations themselves try to make sense of their environment. In this sense-making, Weick pays attention to questions of ambiguity and uncertainty, known as equivocality in organizational research that adopts information processing theory.

bit vague but the next bit is more concrete:

His contributions to the theory of sensemaking include research papers such as his detailed analysis of the breakdown of sensemaking in the case of the Mann Gulch disaster,[8] in which he defines the notion of a ‘cosmology episode’ – a challenge to assumptions that causes participants to question their own capacity to act.

Mann Gulch was a big firefighting disaster:

As the team approached the fire to begin fighting it, unexpected high winds caused the fire to suddenly expand, cutting off the men’s route and forcing them back uphill. During the next few minutes, a "blow-up" of the fire covered 3,000 acres (1,200 ha) in ten minutes, claiming the lives of 13 firefighters, including 12 of the smokejumpers. Only three of the smokejumpers survived. The fire would continue for five more days before being controlled.

The United States Forest Service drew lessons from the tragedy of the Mann Gulch fire by designing new training techniques and safety measures that developed how the agency approached wildfire suppression. The agency also increased emphasis on fire research and the science of fire behavior.

This is interesting but I’m in danger of tab explosion here. Keep a tab open with the paper and move on. Can’t resist opening the cosmology episode page though:

A cosmology episode is a sudden loss of meaning, followed eventually by a transformative pivot, which creates the conditions for revised meaning.

ooh nice. Weick again:

"Representations of events normally hang together sensibly within the set of assumptions that give them life and constitute a ‘cosmos’ rather than its opposite, a ‘chaos.’ Sudden losses of meaning that can occur when an event is represented electronically in an incomplete, cryptic form are what I call a ‘cosmology episode.’ Representations in the electronic world can become chaotic for at least two reasons: The data in these representations are flawed, and the people who manage those flawed data have limited processing capacity. These two problems interact in a potentially deadly vicious circle."

This is the kind of page that looks like it was written by one enthusiast. But it is pretty interesting. Right, back to Weick.

‘Mindfulness’: this is at a collective, organisational level

The effective adoption of collective mindfulness characteristics by an organization appears to cultivate safer cultures that exhibit improved system outcomes.

I’m not going to look up ‘organizational information theory’, I have a bit of a ‘systems thinking’ allergy and I don’t wanna.

Right, back to sensemaking article. Roots in social psychology. ‘Shifting the focus from organizations as entities to organizing as an activity.’

‘Seven properties of sensemaking’. Ugh I hate these sort of numbered lists but fine.

  1. Identity. ‘who people think they are in their context shapes what they enact and how they interpret events’

  2. Retrospection. ‘the point of retrospection in time affects what people notice (Dunford & Jones, 2000), thus attention and interruptions to that attention are highly relevant to the process’.

  3. Enaction. ‘As people speak, and build narrative accounts, it helps them understand what they think, organize their experiences and control and predict events’

  4. Social activity. ‘plausible stories are preserved, retained or shared’.

  5. Ongoing. ‘Individuals simultaneously shape and react to the environments they face… As Weick argued, "The basic idea of sensemaking is that reality is an ongoing accomplishment that emerges from efforts to create order and make retrospective sense of what occurs"’

  6. Extract cues from the context.

  7. Plausibility over accuracy.

The sort of gestalt I’m getting is that it focusses on social rather than individual thinking, and action-oriented contextual in-the-thick-of-it doing rather than abstract planning ahead. Some similar terminology to ethnomethodology I think? e.g. accountability.

Ah yeah: ‘Sensemaking scholars are less interested in the intricacies of planning than in the details of action’

The sensemaking approach is often used to provide insight into factors that surface as organizations address either uncertain or ambiguous situations (Weick 1988, 1993; Weick et al., 2005). Beginning in the 1980s with an influential re-analysis of the Bhopal disaster, Weick’s name has come to be associated with the study of the situated sensemaking that influences the outcomes of disasters (Weick 1993).

‘Categories and related concepts’:

The categories of sensemaking included: constituent-minded, cultural, ecological, environmental, future-oriented, intercultural, interpersonal, market, political, prosocial, prospective, and resourceful. The sensemaking-related concepts included: sensebreaking, sensedemanding, sense-exchanging, sensegiving, sensehiding, and sense specification.

Haha OK it’s this sort of ‘fluidity soup’ that I have an allergy to. Too many of these buzzwords together. ‘Systems thinking’ is just a warning sign.

‘Other applications’: military stuff. Makes sense, lots of uncertainty and ambiguity there. Patient safety (looks like another random paragraph added by an enthusiast).

There’s a big eclectic ‘see also’ list. None of those are jumping out as the obvious next follow. Back to google. What I really want to know is why people are using this word now in some internet subcultures. Might be quite youtube centred? In which case there is no hope of tracking it down in one speedrun.

Oh yeah let’s look at google images:

Looks like businessy death by powerpoint contexts, not so helpful.

31 minutes left. Shit this goes quick!!

Google is giving me lots of video links. One is Daniel Schmachtenberger, ‘The War on Sensemaking’. Maybe this is the subcultural version I’ve been seeing? His name is familiar. Ok google ‘daniel schmachtenberger sensemaking’. Rebel Wisdom. Yep I’ve vaguely heard of that.

OK here is a Medium post about that series, by Andrew Sweeny:

There is a war going on in our current information ecosystem. It is a war of propaganda, emotional manipulation, blatant or unconscious lies. It is nothing new, but is reaching a new intensity as our technology evolves. The result is that it has become harder and harder to make sense of the world, with potentially fatal consequences. If we can’t make sense of the world, neither can we make good decisions or meet the many challenges we face as a species.

Yes this is the sort of context I was imagining:

In War on Sensemaking, futurist and visionary Daniel Schmachtenberger outlines in forensic detail the dynamics at play in this new information ecology — one in which we are all subsumed. He explores how companies, government, and media take advantage of our distracted and vulnerable state, and how we as individuals can develop the discernment and sensemaking skills necessary to navigate this new reality. Schmachtenberger has an admirable ability to diagnose this issue, while offering epistemological and practical ways to help repair the dark labyrinth of a broken information ecology.

It’d be nice to trace the link from Weick to this.

Some stuff about zero sum games and bullshit. Mentions Vervaeke.

Schmachtenberger also makes the point that in order to become a good sensemaker we need ‘stressors’ — demands that push our mind, body, and heart beyond comfort, and beyond the received wisdom we have inherited. It is not enough to passively consume information: we first need to engage actively with with information ecology we live in and start being aware of how we respond to it, where it is coming from, and why it is being used.

Getting the sense that ‘information ecology’ is a key phrase round here.

Oh yeah ‘Game B’! I’ve heard that phrase around. Some more names: ‘Jordan Hall, Jim Rutt, Bonnita Roy’.

‘Sovereignty’: ‘become responsibility for our own shit’… ‘A real social, ‘kitchen sink level’ of reality must be cultivated to avoid the dangers of too much abstraction, individualism, and idealism.’ Seems like a good idea.

‘Rule Omega’. This one is new to me:

Rule Omega is simple, but often hard to put into practice. The idea is that every message contains some signal and some noise, and we can train ourselves to distinguish truth and nonsense — to separate the wheat from the chaff. If we disapprove of 95% of a distasteful political rant, for instance, we could train ourselves to hear the 5% that is true.

Rule Omega means learning to recognise the signal within the noise. This requires a certain attunement and generosity towards the other, especially those who think differently than we do. And Rule Omega can only be applied to those who are willing to engage in a different game, and work with each other in good faith.

Also seems like a Good Thing. Then some stuff about listening to people outside your bubble. Probably a link here to ‘mememic tribes’ type people.

This is a well written article, glad I picked something good.

‘Information war’ and shadow stuff:

Certainly there are bad actors and conspiracies to harm us, but there is also the ‘shadow within’. The shadow is the unacknowledged part we play in the destruction of the commons and in the never-ending vicious cycle of narrative war. We need to pay attention to the subtle lies we tell ourselves, as much as the ‘big’ lies that society tells us all the time. The trouble is: we can’t help being involved in destructive game theory logic, to a greater or lesser degree.

‘Anti-rivalrous systems’. Do stuff that increases value for others as well as yourself. Connection to ‘anti-rivalrous products’ in economics.

‘Information immune system’. Yeah this is nice! It sort of somehow reminds me of the old skeptics movement in its attempts to help people escape nonsense, but rooted in a warmer and more helpful set of background ideas, and with less tribal outgroup bashing. Everything here sounds good and if it helps people out of ideology prisons I’m all for it. Still kind of curious about intellectual underpinnings… like is there a straight line from Weick to this or did they just borrow a resonant phrase?

‘The dangers of concepts’. Some self-awareness that these ideas can be used to create more bullshit and misinformation themselves.

As such it can be dangerous to outsource our sensemaking to concepts — instead we need to embody them in our words and actions. Wrestling with the snake of self-deception and illusion and trying to build a better world in this way is a tough game. But it is the only game worth playing.

Games seem to be a recurring motif. Maybe Finite and Infinite Games is another influence.

OK 13 minutes left, what to do? Maybe trace out the link? google ‘schmachtenberger weick’. Not finding much. I’m now on some site called Conversational Leadership which seems to be connected to this scene somehow. Ugh not sure what to do. Back to plain old google ‘sensemaking’ search.

Let’s try this article by Laura McNamara, an organizational anthropologist. Nice job title! Yeah her background looks really interesting:

Principal Member of Technical Staff at Sandia National Laboratories. She has spent her career partnering with computer scientists, software engineers, physicists, human factors experts, I/O psychologists, and analysts of all sorts.

OK maybe she is trying to bridge the gap between old and new usages:

Sensemaking is a term that gets thrown around a lot without much consideration about where the concept came from or what it really means. If sensemaking theory is democratizing, that’s good thing.

6 minutes left so I won’t get through all of this. Pick some interesting bits.

One of my favorite books about sensemaking is Karl Weick’s, Sensemaking in Organizations. I owe a debt of thanks to the nuclear engineer who suggested I read it. This was back in 2001, when I was at Los Alamos National Laboratory (LANL). I’d just finished my dissertation and was starting a postdoctoral position in the statistics group, and word got around that the laboratories had an anthropologist on staff. My nuclear engineer friend was working on a project examining how management changes were impacting team dynamics in one of LANL’s radiochemistry bench laboratories. He called me asking if I had time to work on the project with him, and he asked if I knew much about “sensemaking.” Apparently, his officemate had recently married a qualitative evaluation researcher, who suggested that both of these LANL engineers take the time to read Karl Weick’s book Sensemaking in Organizations.

My nuclear engineer colleague thought it was the most brilliant thing he’d ever read and was shocked, SHOCKED, that I’d never heard of sensemaking or Karl Weick. I muttered something about anthropologists not always being literate in organizational theory, got off the phone, and immediately logged onto Amazon and ordered it.

Weick’s influences:

… a breathtakingly broad array of ideas – Emily Dickinson, Anthony Giddens, Pablo Neruda, Edmund Leach…

‘Recipe for sensemaking:’

Chapter Two of Sensemaking in Organizations contains what is perhaps Weick’s most cited sentence, the recipe for sensemaking: “How can I know what I think until I see what I say?”

And this from the intro paragraph, could be an interesting reference:

in his gorgeous essay Social Things (which you should read if you haven’t already), Charles Lemert reminds us that social science articulates our native social intelligence through instruments of theory, concepts, methods, language, discourse, texts. Really good sociology and anthropology sharpen that intelligence. They’re powerful because they enhance our understanding of what it means to be human, and they really should belong to everyone.

Something about wiki platforms for knowledge sharing:

For example, back in 2008, my colleague Nancy Dixon and I did a brief study—just a few weeks—examining how intelligence analysts were responding to the introduction of Intellipedia, a wiki platform intended to promote knowledge exchange and cross-domain collaboration across the United States Intelligence community.

DING! Time’s up.


That actually went really well! Favourite speedrun so far, felt like I found out a lot. Most of the references I ended up on were really well-written and clear this time, no wading through rubbish.

I’m still curious to trace the link between Weick and the recent subculture. Also I might read more of the disaster stuff, and read that last McNamara article more carefully. Lots to look into! If anyone has any other suggestions, please leave a comment 🙂

Worse than quantum physics, part 2

This is Part 2 of a two part explanation — Part 1 is here. It won’t make much sense on its own!

In this post I’m going to get into the details of the analogy I set up last time. So far I’ve described how the PR box is ‘worse than quantum physics’ in a specific sense: it violates the CHSH inequality more strongly than any quantum system, pushing past the Tsirelson bound of 2\sqrt{2} to reach the maximum possible value of 4. I also introduced Piponi’s box example, another even simpler ‘worse than quantum physics’ toy system.

This time I’ll explain the connection between Piponi’s box and qubit phase space, and then show that a similar CHSH-inequality-like ‘logical Bell inequality’ holds there too. In this case the quantum system has a Tsirelson-like bound of \sqrt{3}, interestingly intermediate between the classical limit of 1 and the maximum possible value of 3 obtained by Piponi’s box. Finally I’ll dump a load of remaining questions into a Discussion section in the hope that someone can help me out here.

A logical Bell inequality for the Piponi box

Here’s the table from the last post again:


Measurement T F
a 1 0
b 1 0
a \oplus b 1 0

As with the PR box, we can use the yellow highlighted cells in the table to get a version of Abramsky and Hardy’s logical Bell inequality \sum p_i \leq N-1, this time with N = 3 cells. These cells correspond to the three incompatible propositions a, b, a\oplus b, with combined probability \sum p_i = 3, violating the inequality by the maximum amount.

Converting to expected values E_i = 2p_i -1 gives

\sum E_i = 3 > N-2 = 1.

So that’s the Piponi box ↔ PR box part of the analogy sorted. Next I want to talk about the qubit phase space ↔ Bell state part. But first it will be useful to rewrite the table of Piponi box results in a way that makes the connection to qubit phase space more obvious:



The four boxes represent the four ‘probabilities’ P(a,b) introduced in the previous post, which can be negative. To recover the values in the table, add up rows, columns or diagonals of the diagram. For example, to find p(\lnot a), sum up the left hand column:

p(\lnot a) = P(\lnot a, b) + P(\lnot a, \lnot b) = \frac{1}{2} - \frac{1}{2} = 0.

Or to find p(a \oplus b), sum up the top-left-to-bottom-right diagonal:

p(a \oplus b) = P(a, \lnot b) + P(\lnot a, b) = \frac{1}{2} + \frac{1}{2} = 1.

I made the diagram below to show how this works in general, and now I’m not sure whether that was a good idea. It’s kind of busy and looking at the example above is probably a lot more helpful. On the other hand, I’ve gone through the effort of making it now and someone might find it useful, so here it is:


Qubit phase space

That’s the first part of the analogy done, between the PR box and Piponi’s box model. Now for the second part, between the CHSH system and qubit phase space. I want to show that the same set of measurements that I used for Piponi’s box also crops up in quantum mechanics as measurements on the phase space of a single qubit. This quantum case also violates the classical bound of \sum E_i = 1, but, as with the Tsirelson bound for an entangled qubit system, it doesn’t reach the maximum possible value. Instead, it tops out at \sum E_i = \sqrt{3}.

The measurements a, b, a\oplus b can be instantiated for a qubit in the following way. For a qubit |\psi\rangle, take

p(a)  = \langle \psi | Q_z | \psi \rangle ,

p(b) = \langle \psi | Q_x | \psi \rangle ,

with Q_i  = \frac{1}{2}(I-\sigma_i) for the Pauli matrices \sigma_i. The a\oplus b diagonal measurements then turn out to correspond to

p(a\oplus b) = \langle \psi | Q_y | \psi \rangle ,

completing the set of measurements.

This is the qubit phase space I described in my second post on negative probability – for more details on how this works and how the corresponding P(a,b)s are calculated, see for example the papers by Wootters on finite-state Wigner functions and Picturing Qubits in Phase Space.

As a simple example, in the case of the qubit state |0\rangle these measurements give

p(a) = 0

p(b) = \frac{1}{2}

p(a\oplus b) = \frac{1}{2},

leading to the following phase space:



A Tsirelson-like bound for qubit phase space

Now, we want to find the qubit state |\psi\rangle which gives the largest value of \sum p_i. To do this, I wrote out |\psi\rangle in the general Bloch sphere form |\psi\rangle = \cos(\theta / 2) |0\rangle + e^{i\phi} \sin(\theta / 2) |1\rangle and then maximised the value of the highlighted cells in the table:

\sum p_i = p(a) + p(b) + p(a\oplus b) = \frac{3}{2} - \frac{1}{2}(\cos\theta + \sin\theta\cos\phi + \sin\theta\sin\phi )

This is a straightforward calculation but the details are kind of fiddly, so I’ve relegated them to a separate page (like the boring technical appendix at the back of a paper, but blog post style). Anyway the upshot is that this quantity is maximised when \phi = \frac{5\pi}{4} , \sin\theta = \frac{\sqrt{2}}{\sqrt{3}} and \cos\theta = -\frac{1}{\sqrt{3}}, leading to the following table:


Measurement T F
a \frac{1}{2}\left(1 + \frac{1}{\sqrt{3}} \right) 0
b \frac{1}{2}\left(1 + \frac{1}{\sqrt{3}} \right) 0
a \oplus b \frac{1}{2}\left(1 + \frac{1}{\sqrt{3}} \right) 0

The corresponding qubit phase space, if you’re interested, is the following:


Notice the negative ‘probability’ in the bottom left, with a value of around -0.183. This is in fact the most negative value possible for qubit phase space.

This time, adding up the numbers in the yellow-highlighted cells of the table gives

\sum p_i = \frac{3}{2}\left(1 + \frac{1}{\sqrt{3}} \right),

or, in terms of expectation values,

\sum E_i = \sum (2p_i - 1) =   \sqrt{3}.

So \sqrt{3} is our Tsirelson-like bound for this system, in between the classical limit of 1 and the Piponi box value of 3.


Further questions

As with all of my physics blog posts, I end up with more questions than I started with. Here are a few of them:

Is this analogy already described in some paper somewhere? If so, please point me at it!

Numerology. Why \sqrt{3} and not some other number? As a first step, I can do a bit of numerology and notice that \sqrt{3} = \sqrt{N/2}, where N=6 is the number of cells in the table, and that this rule also fits the CHSH bound of 2\sqrt{2}, where there are N=16 cells.

I can also try this formula on the Mermin example from my Bell post. In that case N=36, so the upper bound implied by the rule would be 3\sqrt{2} … which turns out to be correct. (I didn’t find the upper bound in the post, but you can get it by putting \tfrac{1}{8}(2+\sqrt 2) in all the highlighted cells of the table, similarly to CHSH.)

The Mermin example is close enough to CHSH that it’s not really an independent data point for my rule, but it’s reassuring that it still fits, at least.

What does this mean? Does it generalise? I don’t know. There’s a big literature on different families of Bell results and their upper bounds, and I don’t know my way around it.

Information causality. OK, playing around with numbers is fine, but what does it mean conceptually? Again, I don’t really know my way around the literature. I know there’s a bunch of papers, starting from this one by Pawlowski et al, that introduces a physical principle called ‘information causality’. According to that paper, this states that, for a sender Alice and a receiver Bob,

the information gain that Bob can reach about the previously unknown to him data set of Alice, by using all his local resources and m classical bits communicated by Alice, is at most m bits.

This principle somehow leads to the Tsirelson bound… as you can see I have not looked into the details yet. This is probably what I should do next. It’s very much phrased in terms of having two separated systems, so I don’t know whether it can be applied usefully in my case of a single qubit.

If you have any insight into any of these questions, or you notice any errors in the post, please let me know in the comments below, or by email.

Worse than quantum physics

I’m still down the rabbithole of thinking way too much about quantum foundations and negative probabilities, and this time I came across an interesting analogy, which I will attempt to explain in this post and the next one. This should follow on nicely from my last post, where I talked about one of the most famous weird features of quantum physics, the violation of the Bell inequalities.

It’s not necessary to read all of that post to understand this one, but you will need to be somewhat familiar with the Bell inequalities (and the CHSH inequality in particular) from somewhere else. For the more technical parts, you’ll also need to know a little bit about Abramsky and Hardy’s logical Bell formulation, which I also covered in the last post. But the core idea probably makes some kind of sense without that background.

So, in that last post I talked about the CHSH inequality and how quantum physics violates the classical upper limit of 2. The example I went through in the post is designed to make the numbers easy, and reaches a value of 2.5, but it’s possible to pick a set of measurements that pushes it further again, to a maximum of 2\sqrt{2} (which is about 2.828). This value is known as the Tsirelson bound.

This maximum value is higher than anything allowed by classical physics, but doesn’t reach the absolute maximum that’s mathematically attainable. The CHSH inequality is normally written something like this:

| E(a,b) + E(\bar{a}, b) + E(a, \bar{b}) - E(\bar{a}, \bar{b}) | \leq 2.

Each of the Es has to be between -1 and +1, so if it was possible to always measure +1 for the first three and -1 for the last one you’d get 4.

This kind of hypothetical ‘superquantum correlation’ is interesting because of the potential to illuminate what’s special about the Tsirelson bound – why does quantum mechanics break the classical limit, but not go all the way? So systems that are ‘worse than quantum physics’ and push all the way to 4 are studied as toy models that can hopefully illuminate something about the constraints on quantum mechanics. The standard example is known as the Popescu-Rohrlich (PR) box, introduced in this paper.

This sounds familiar…

I was reading up on the PR box a while back, and it reminded me of something else I looked into. In my blog posts on negative probability, I used a simple example due to Dan Piponi. This example has the same general structure as measurements on a qubit, but it’s also ‘worse than quantum mechanics’, in the sense that one of the probabilities is more negative than anything allowed in quantum mechanics. Qubits are somewhere in the middle, in between classical systems and the Piponi box.

I immediately noticed the similarity, but at first I thought it was probably something superficial and didn’t investigate further. But after learning about Abramsky and Hardy’s logical formulation of the Bell inequalities, which I covered in the last post, I realised that there was an exact analogy.

This is really interesting to me, because I had no idea that there was any sort of Tsirelson bound equivalent for a single particle system. I’ve already spent quite a bit of time in the last couple of years thinking about the phase space of a single qubit, because it seems to me that a lot of essential quantum weirdness is hidden in there already, before you even consider entanglement with a second qubit – you’ve already got the negative probabilities, after all. But I wasn’t expecting this other analogy to turn up.

I haven’t come across this result in the published literature. But I also haven’t done anything like a thorough search, and it’s quite difficult to because Piponi’s example is in a blog post, rather than a paper. So maybe it’s new, or maybe it’s too simple to write down and stuck in the ghost library, or maybe it’s all over the place and I just haven’t found it yet. I really don’t know, and it seemed like the easiest thing was to just write it up and then try and find out once I had something concrete to point at. I am convinced it hasn’t been written up at anything like a blog-post-style introductory level, so hopefully this can be useful however it turns out.

Post structure

I decided to split this argument into two shorter parts and post them separately, to make it more readable. This first part is just background on the Tsirelson bound and the PR box – there’s nothing new here, but it was useful for me to collect the background I need in one place. I also give a quick description of Piponi’s box model.

In the second post, I’ll move on to explaining the single qubit analogy. This is the interesting bit!

The Tsirelson bound: Mermin’s machine again

To illustrate how Tsirelson’s bound is attained, I’ll go back to Mermin’s machine from the last post. I’ll use the same basic setup as before, but move the settings on the detectors:


This time the two settings on each detector are at right angles to each other, and the right hand detector settings are rotated 45 degrees from the left hand detector. As before, quantum mechanics says that the probabilities of different combinations of lights flashing will obey

p(T,T) = p(F,F) = \frac{1}{2}\cos^2\left(\frac{\theta}{2}\right),

p(T,F) = p(F,T) = \frac{1}{2}\sin^2\left(\frac{\theta}{2}\right),

where \theta is the angle between the detector settings. The numbers are more hassly than Mermin’s example, which was picked for simplicity – here’s the table of probabilities:


Dial setting (T,T) (T,F) (F,T) (F,F)
ab \tfrac{1}{8}(2+\sqrt 2) \tfrac{1}{8}(2-\sqrt 2) \tfrac{1}{8}(2-\sqrt 2) \tfrac{1}{8}(2+\sqrt 2)
ab' \tfrac{1}{8}(2-\sqrt 2) \tfrac{1}{8}(2+\sqrt 2) \tfrac{1}{8}(2+\sqrt 2) \tfrac{1}{8}(2-\sqrt 2)
a'b \tfrac{1}{8}(2+\sqrt 2) \tfrac{1}{8}(2-\sqrt 2) \tfrac{1}{8}(2-\sqrt 2) \tfrac{1}{8}(2+\sqrt 2)
a'b' \tfrac{1}{8}(2+\sqrt 2) \tfrac{1}{8}(2-\sqrt 2) \tfrac{1}{8}(2-\sqrt 2) \tfrac{1}{8}(2+\sqrt 2)

Then we follow the logical Bell procedure of the last post, take a set of mutually contradictory propositions (the highlighted cells) and find their combined probability. This gives \sum p_i = 2+\sqrt 2, or, converting to expectation values E_i = 2p_i - 1,

\sum E_i = 2\sqrt 2 .

This is the Tsirelson bound.

The PR box

The idea of the PR box is to get the highest violation of the inequality possible, by shoving all of the probability into the highlighted cells, like this:

Dial setting (T,T) (T,F) (F,T) (F,F)
ab 1/2 0 0 1/2
a\bar{b} 0 1/2 1/2 0
\bar{a}b 1/2 0 0 1/2
\bar{a}\bar{b} 1/2 0 0 1/2

This time, adding up all the highlighted boxes gives the maximum \sum E_i = 4 .

Signalling

This is kind of an aside in the context of this post, but the original motivation for the PR box was to demonstrate that you could push past the quantum limit while still not allowing signalling between the two devices: if you only have access the left hand box, for example, you can’t learn anything about the right hand box’s dial setting. Say you set the left hand box to dial setting a. If the right hand box was set to b you’d end up measuring T with a probability of

p(T,T| a,b) + p(T,F| a,b) = \frac{1}{2} + 0 = \frac{1}{2}.

If the right hand box was set to \bar{b} instead you’d still get \frac{1}{2}:

p(T,T| a,\bar{b}) + p(T,F| a,\bar{b}) = 0 + \frac{1}{2} = \frac{1}{2}.

The same conspiracy holds if you set the left hand box to \bar{a}, so whatever you do you can’t find out anything about the right hand box.

Negative probabilities

Another interesting feature of the PR box, which will be directly relevant here, is the connection to negative probabilities. Say you want to explain the results of the PR box in terms of underlying probabilities P(a,a',b,b') for all of the settings at once. This can’t be done in terms of normal probabilities, which is not surprising: this property of having consistent results independent of the measurement settings you choose is exactly what’s broken down for non-classical systems like the CHSH system and the PR box.

However you can reproduce the results if you allow some negative probabilities. In the case of the PR box, you end up with the following:


P(T,T,T,T) = \frac{1}{2}

P(T,T,T,F) = 0

P(T,T,F,T) = -\frac{1}{2}

P(T,T,F,F) = 0

P(T,F,T,T) = 0

P(T,F,T,F) = 0

P(T,F,F,T) = \frac{1}{2}

P(T,F,F,F) = 0

P(F,T,T,T) = -\frac{1}{2}

P(F,T,T,F) = \frac{1}{2}

P(F,T,F,T) = \frac{1}{2}

P(F,T,F,F) = 0

P(F,F,T,T) = 0

P(F,F,T,F) = 0

P(F,F,F,T) = 0

P(F,F,F,F) = 0

(I got these from Abramsky and Brandenburger’s An Operational Interpretation of Negative Probabilities and No-Signalling Models.) To get back the probabilities in the table above, sum up all relevant Ps for each dial setting. As an example, take the top left cell of the table above. To get the probability of (T,T) for dial setting (a,b), sum up all cases where a and b are both T:

P(T,T,T,T) + P(T,T,T,F) + P(T,F,T,T) + P(T,F,T,F) = \frac{1}{2}

In this way we recover the values of all the measurements in the table – it’s only the Ps that are negative, not anything we can actually measure. This feature, along with the way that the number -\tfrac{1}{2} crops up specifically, is what reminded me of Piponi’s blog post.

Piponi’s box model

The device in Piponi’s example is a single box containing two bits a and b, and you can make one of three measurements: the value of a, the value of b, or the value of a \oplus b. The result is either T or F, with probabilities that obey the following table:


Measurement T F
a 1 0
b 1 0
a \oplus b 1 0

These measurements are inconsistent and can’t be described with any normal probabilities P(a,b), but, as with the PR box, they can with negative probabilities:

P(T,T) = \frac{1}{2}

P(T,F) = \frac{1}{2}

P(F,T) = \frac{1}{2}

P(F,F) = -\frac{1}{2}

For example, the probability of measuring a\oplus b and getting F is

P(T,T) + P(F,F) = \frac{1}{2} - \frac{1}{2} = 0,

as in the table above.

Notice that -\frac{1}{2} crops up again! The similarities to the PR box go deeper, though. The PR box is a kind of extreme version of the CHSH state of two entangled qubits – same basic mathematics but pushing the correlations up higher. Analogously, Piponi’s box is an extreme version of the phase space for a single qubit. In both cases, quantum mechanics is perched intriguingly in the middle between classical mechanics and these extreme systems. I’ll go through the details of the analogy in the next post.

Bell’s theorem and Mermin’s machine

> Anybody who’s not bothered by Bell’s theorem has to have rocks in his head.

— ‘A distinguished Princeton physicist’, as told to David Mermin

This post is a long, idiosyncratic discussion of the Bell inequalities in quantum physics. There are plenty of good introductions already, so this is a bit of a weird thing to spend my time writing. But I wanted something very specific, and couldn’t find an existing version that had all the right pieces. So of course I had to spend far too much time making one.

My favourite introduction is Mermin’s wonderful Quantum Mysteries for Anyone. This is an absolute classic of clear explanation, and lots of modern pop science discussions derive from it. It’s been optimised for giving a really intense gut punch of NOTHING IN THE WORLD MAKES SENSE ANY MORE, which I’d argue is the main thing you want to get out of learning about the Bell inequalities.

However, at some point if you get serious you’ll want to actually calculate things, which means you’ll need to make the jump from Mermin’s version to the kind of exposition you see in a textbook. The most common modern version of the Bell inequalities you’ll see is the CHSH inequality, which looks like this:

| E(a,b) + E(\bar{a}, b) + E(a, \bar{b}) - E(\bar{a}, \bar{b}) | < 2

(It doesn’t matter what all of that means, at the moment… I’ll get to that later.) The standard sort of derivations of this tend to involve a lot of fussing with algebraic rearrangements and integrals full of \lambdas and so forth. The final result is less of a gut punch and more of a diffuse feeling of unease: "well I guess this number has to be between -2 and 2, but it isn’t".

This feels like a problem to me. There’s a 1929 New Yorker cartoon which depicts ordinary people in the street walking around dumbstruck by Einstein’s theory of general relativity. This is a comic idea because the theory was famously abstruse (particularly back then when good secondary explanations were thin on the ground). But the Bell inequalities are accessible to anyone with a very basic knowledge of maths, and weirder than anything in relativity. I genuinely think that everyone should be walking down the street clutching their heads in shock at the Bell inequalities, and a good introduction should help deliver you to this state. (If you don’t have rocks in your head, of course. In that case nothing will help you.)

It’s also a bit of an opaque black box. For example, why is there a minus sign in front of one of the Es but not the others? I was in a discussion group a few years back with a bunch of postdocs and PhD students, all of us with a pretty strong interest in quantum foundations, and CHSH came up at some point. None of us had much of a gut sense for what that minus sign was doing… it was just something that turned up during some algebra.

I wanted to trace a path from Mermin’s explanation to the textbook one, in the hope of propagating some of that intuitive force forward. I wrote an early draft of the first part of this post for a newsletter in 2018 but couldn’t see how to make the rest of it work, so I dropped it. This time I had a lot more success using some ideas I learned in the meantime. I ended up taking a detour through a third type of explanation, the ‘logical Bell inequalities’ approach of Abramsky and Hardy. This is a general method that can be used on a number of other similar ‘no-go theorems’, not just Bell’s original. It gives a lot more insight into what’s actually going on (including that pesky minus sign). It’s also surprisingly straightforward: the main result is a few steps of propositional logic.

That bit of propositional logic is the most mathematically involved part of this post. The early part just requires some arithmetic and the willingness to follow what Mermin calls ‘a simple counting argument on the level of a newspaper braintwister’. No understanding of the mathematics of quantum theory is needed at all! That’s because I’m only talking about why the results of quantum theory are weird, and not how the calculations that produce those results are done.

If you also want to learn to do the calculations, starting from a basic knowledge of linear algebra and complex numbers, I really like Michael Nielsen and Andy Matuschak’s Quantum Country, which covers the basic principles of quantum mechanics and also the Bell inequalities. You’d need to do the ‘Quantum computing for the very curious’ part, which introduces a lot of background ideas, and then the ‘Quantum mechanics distilled’ part, which has the principles and the Bell stuff.

There’s also nothing about how the weirdness should be interpreted, because that is an enormous 90-year-old can of rotten worms and I would like to finish this post some time in my life 🙂

Mermin’s machine

So, on to Mermin’s explanation. I can’t really improve on it, and it would be a good idea to go and read that now instead, and come back to my version afterwards. I’ve repeated it here anyway though, partly for completeness and partly because I’ve changed some notation and other details to mesh better with the Abramsky and Hardy version I’ll come to later.

(Boring paragraph on exactly what I changed, skip if you don’t care: I’ve switched Mermin’s ‘red’ and ‘green’ to ‘true’ and ‘false’, and the dial settings from 1,2,3 on both sides to a, a', a'' on the left side and b, b', b'' on the right side. I’ve also made one slightly more substantive change. Mermin explains at the end of his paper that in his setup, ‘One detector flashes red or green according to whether the measured spin is along or opposite to the field; the other uses the opposite color convention’. I didn’t want to introduce the complication of having the two detectors with opposite wiring, and have made them both respond the same way, flashing T for along the field and F for opposite. But I also wanted to keep Mermin’s results. To do that I had to change the dial positions of the right hand dial, so that a is opposite b, a' is opposite b', and a'' is opposite b''. )

Anyway, Mermin introduces the following setup:



The machine in the middle is the source. It fires out some kind of particle – photons, electrons, frozen peas, whatever. We don’t really care how it works, we’ll just be looking at why the results are weird.

The two machines on the right and left side are detectors. Each detector has a dial with three settings. On the left they’re labelled a, a' and a''. On the right, they’re b, b' and b''.

On the top of each are two lights marked T and F for true and false. (Again, we don’t really care what’s true or false, we’re keeping everything at a kind of abstract, operational level and not going into the practical details. It’s just two possible results of a measurement.)

It’s vital to this experiment that the two detectors cannot communicate at all. If they can, there’s nothing weird about the results. So assume that a lot of work has gone into making absolutely sure that the detectors are definitely not sharing information in any way at all.

Now the experiment just consists of firing out pairs of particles, one to each detector, with the dials set to different values, and recording whether the lights flash red or green. So you get a big list of results of the form

ab'TF, a''bFT, a'b'FF, ...

The second important point, other than the detectors not being able to communicate, is that you have a free choice of setting the dials. You can set them both beforehand, or when the particles are both ‘in flight’, or even set the right hand dial after the left hand detector has already received its particle but before the right hand particle gets there. It doesn’t matter.

Now you do like a million billion runs of this experiment, enough to convince you that the results are not some weird statistical fluctuation, and analyse the results. You end up with the following table:


Dial setting (T,T) (T,F) (F,T) (F,F)
ab 1/2 0 0 1/2
ab' 1/8 3/8 3/8 1/8
ab'' 1/8 3/8 3/8 1/8
a'b 1/8 3/8 3/8 1/8
a'b' 1/2 0 0 1/2
a'b'' 1/8 3/8 3/8 1/8
a''b 1/8 3/8 3/8 1/8
a''b' 1/8 3/8 3/8 1/8
a''b'' 1/2 0 0 1/2

Each dial setting has a row, and the entries in that row give the probabilities for getting the different results. So for instance if you set the dials to a' and b, there’s a 1/8 chance of getting (T,T).

This doesn’t obviously look particularly weird at first sight. It only turns out to be weird when you start analysing the results. Mermin condenses two results from this table which are enough to show the weirdness. The first is:

Result 1: This result relates to the cases where the two dials are set to ab, a'b', or a''b''. In these cases both lights always flash the same colour. So you might get ab TT, ab FF, a'b' TT etc, but never ab TF or a''b'' FT.

This is pretty easy to explain. The detectors can’t communicate, so if they do the same thing it must be something to do with the properties of the particles they are receiving. We can explain it straightforwardly by postulating that each particle has an internal state with three properties, one for each dial position. Each of these takes two possible values which we label T or F. We can write these states as e.g.

TTF

TTF

where the the entries on the top line refer to the left hand particle’s state when the dial is in the a, a' and a'' positions respectively, and the bottom line refers to the right hand particle’s state when the dial is in the b, b', b'' position.

Result 1 implies that the states of the two particles must always be the same. So the state above is an allowed one, but e.g.

TTF

TFF

isn’t.

Mermin says:

> This hypothesis is the obvious way to account for what happens in [Result 1]. I cannot prove that it is the only way, but I challenge the reader, given the lack of connections between the devices, to suggest any other.

Because the second particle will always have the same state to the first one, I’ll save some typing and just write the first one out as a shorthand. So the first example state will just become TTF.

Now on to the second result. This one covers the remaining options for dial settings, a'b', a''b and the like.

Result 2: For the remaining states, the lights flash the same colour 1/4 of the time, and different colours 3/4 of the time.

This looks quite innocuous on first sight. It’s only when you start to consider how it meshes with Result 1 that things get weird.

(This is the part of the explanation that requires some thinking ‘on the level of a newspaper braintwister’. It’s fairly painless and will be over soon.)

Our explanation for result 1 is that particles in each run of the experiment have an underlying state, and both particles have the same state. Let’s go through the implications of this, starting with the example state TTF.

I’ve enumerated the various options for the dials in the table below. For example, if the left dial is a and the right dial is b', we know that the left detector will light up T and the right will light up T, so the two lights are the same.


Dial setting Lights
ab' same
ab'' different
a'b same
a'b'' different
a''b different
a''b' different

Overall there’s a 1/3 chance of being the same and a 2/3 chance of being different. You can convince yourself that this is also true for all the states with two Ts and an F or vice versa: TTF TFF, TFT, FTT, FTF, FFT.

That leaves TTT and FFF as the other two options. In those cases the lights will flash the same colour no matter what the dial is set to.

So whatever the underlying state is, the chance of the two lights being different is greater than ⅓. But this is incompatible with Result 2, which says that the probability is ¼.

(The thinky part is now done.)

So Results 1 and 2 together are completely bizarre. No assignment of states will work. But this is exactly what happens in quantum mechanics!

You probably can’t do it with frozen peas, though. The details don’t matter for this post, but here’s a very brief description if you want it: the particles should be two spin-half particles prepared in a specific ‘singlet’ state, the dials should connect to magnets that can be oriented in three states at 120 degree angles from each other, and the lights on the detectors measure spin along and opposite to the field. The magnets should be set up so that the state for setting a on the left hand side is oriented at 180 degrees from the state for setting b on the right hand side; similarly a' should be opposite b' and a'' opposite b''. I’ve drawn the dials on the machine to match this. Quantum mechanics then says that the probabilities of the different results are

p(T,T) = p(F,F) = \frac{1}{2}\cos^2{\frac{\theta}{2}}

p(T,F) = p(F,T) = \frac{1}{2}\sin^2{\frac{\theta}{2}}

where \theta is the angle between the magnet states on the left and right sides. This reproduces the numbers in the table above.

Once more with less thinking

Mermin’s argument is clear and compelling. The only problem with it is that you have to do some thinking. There are clever details that apply to this particular case, and if you want to do another case you’ll have to do more thinking. Not good. This is where Abramsky and Hardy’s logical Bell approach comes in. It requires more upfront setup (so actually more thinking in the short term – this section title is kind of a lie, sorry) but can then be applied systematically to all kinds of problems.

This first involves reframing the entries in the probability table in terms of propositional logic. For example, we can write the result (T,F) for (a’,b) as a' \land \lnot b. Then the entries of the table correspond to the probabilities we assign to each statement: in this case, \text{prob}(a' \land \lnot b) = \frac{3}{8}.

Now, look at the following highlighted cells in three rows of the grid:


Dial setting (T,T) (T,F) (F,T) (F,F)
ab 1/2 0 0 1/2
ab' 1/8 3/8 3/8 1/8
ab'' 1/8 3/8 3/8 1/8
a'b 1/8 3/8 3/8 1/8
a'b' 1/2 0 0 1/2
a'b'' 1/8 3/8 3/8 1/8
a''b 1/8 3/8 3/8 1/8
a''b' 1/8 3/8 3/8 1/8
a''b'' 1/2 0 0 1/2

These correspond to the three propositions

\phi_1 = (a\land b) \lor (\lnot a \land\lnot b)

\phi_2 = (a'\land b') \lor (\lnot a' \land\lnot b')

\phi_3 = (a''\land b'') \lor (\lnot a'' \land\lnot b'') ,

which can be written more simply as

\phi_1 = a \leftrightarrow b

\phi_2 = a' \leftrightarrow b'

\phi_3 = a'' \leftrightarrow b''.

where the \leftrightarrow stands for logical equivalence. This also means that a can be substituted for b, and so on, which will be useful in a minute.

Next, look at the highlighted cells in these three rows:


Dial setting (T,T) (T,F) (F,T) (F,F)
ab 1/2 0 0 1/2
ab' 1/8 3/8 3/8 1/8
ab'' 1/8 3/8 3/8 1/8
a'b 1/8 3/8 3/8 1/8
a'b' 1/2 0 0 1/2
a'b'' 1/8 3/8 3/8 1/8
a''b 1/8 3/8 3/8 1/8
a''b' 1/8 3/8 3/8 1/8
a''b'' 1/2 0 0 1/2

These correspond to

\phi_4 = (a\land \lnot b') \lor (\lnot a \land b')

\phi_5 = (a\land \lnot b'') \lor \lnot (a \land b'')

\phi_6 = (a'\land \lnot b'') \lor (\lnot a' \land b'') ,

which can be simplified to

\phi_4 = a \oplus b'

\phi_5 = a \oplus b''

\phi_6 = a' \oplus b''.

where the \oplus stands for exclusive or.

Now it can be shown quite quickly that these six propositions are mutually contradictory. First use the first three propositions to get rid of b , b' and b'', leaving

a \oplus a'

a \oplus a''

a' \oplus a''

You can check that these are contradictory by drawing out the truth table, or maybe just by looking at them, or maybe by considering the following stupid dialogue for a while (this post is long and I have to entertain myself somehow):


Grumpy cook 1: You must have either beans or chips but not both.

Me: OK, I’ll have chips.

Grumpy cook 2: Yeah, and also you must have either beans or peas but not both.

Me: Fine, looks like I’m having chips and peas.

Grumpy cook 3: Yeah, and also you must have either chips or peas but not both.

Me:

Me: OK let’s back up a bit. I’d better have beans instead of chips.

Grumpy cook 1: You must have either beans or chips but not both.

Me: I know. No chips. Just beans.

Grumpy cook 2: Yeah, and also you must have either beans or peas but not both.

Me: Well I’ve already got to have beans. But I can’t have them with chips or peas. Got anything else?

Grumpy cook 3: NO! And remember, you must have either chips or peas.

Me: hurls tray


So, yep, the six highlighted propositions are inconsistent. But this wouldn’t necessarily matter, as some of the propositions are only probabilistically true. So you could imagine that, if you carefully set some of them to false in the right ways in each run, you could avoid the contradiction. However, we saw with Mermin’s argument above that this doesn’t save the situation – the propositions have ‘too much probability in total’, in some sense, to allow you to do this. Abramsky and Hardy’s logical Bell inequalities will quantify this vague ‘too much probability in total’ idea.

Logical Bell inequalities

This bit involves a few lines of logical reasoning. We’ve got a set of propositions \phi_i (six of them in this example case, N in general), each with probability p_i. Let P be the probability of all of them happening together. Call this combined statement

\Phi = \bigwedge_i \phi_i.

Then

1 - P = \text{prob}\left( \lnot\Phi\right) = \text{prob}\left(\bigvee_i \lnot\phi_i\right)

where the second equivalence is de Morgan’s law. This is definitely less than the sum of the probabilities of all the \lnot\phi_i s:

1 - P \leq \text{prob} \sum_i (\lnot\phi_i)

= \sum_i (1 - p_i)

= N - \sum_i p_i .

where N is the total number of propositions. Rearranging gives

\sum_i p_i \leq N + P - 1.

Now suppose the \phi_i are jointly contradictory, as in the Mermin example above, so that the combined probability P = 0. This gives the logical Bell inequality

\sum_i p_i \leq N-1 .

This is the precise version of the ‘too much probability’ idea. In the Mermin case, there are six propositions, three with probability 1 and three with probability ¾, which sum to 5.25. This is greater than N-1 = 5, so the inequality is violated.

This inequality can be applied to lots of different setups, not just Mermin’s. Abramsky and Hardy use the CHSH inequality mentioned in the introduction to this post as their first example. This is probably the common example used to introduce Bell’s theorem, though the notation is usually somewhat different. I’ll go though Abramsky and Hardy’s version and then connect it back to the standard textbook notation.

The CHSH inequality

The CHSH experiment only uses two settings on each side, not three. I’ve drawn a ‘CHSH machine’ in the style of Mermin’s machine to illustrate it:



There are two settings a and \bar{a} on the left side, 60 degrees apart. And there are two settings b and \bar{b} on the right side, also 60 degrees apart, with b opposite a. This leads to the following table:


Dial setting (T,T) (T,F) (F,T) (F,F)
ab 1/2 0 0 1/2
a\bar{b} 3/8 1/8 1/8 3/8
\bar{a}b 3/8 1/8 1/8 3/8
\bar{a}\bar{b} 1/8 3/8 3/8 1/8

Now it’s just a case of following the same reasoning as for the Mermin case. The highlighted rows correspond to the propositions

\phi_1 = (a \land b) \lor  \lnot (a \land \lnot b) = a \leftrightarrow b

\phi_2 = (a \land \bar{b}) \lor \lnot (a \land \lnot \bar{b}) = a \leftrightarrow \bar{b}

\phi_3 = (\bar{a} \land b) \lor \lnot (\bar{a} \land \lnot b) = \bar{a} \leftrightarrow b

\phi_4 = (\lnot \bar{a} \land \bar{b}) \lor (\bar{a} \land \lnot \bar{b}) = \bar{a} \oplus \bar{b}

As with Mermin’s example, these four propositions can be seen to be contradictory. Rather than trying to make up more stupid dialogues, I’ll just follow the method in the paper. First use \phi_3 to replace \bar{a} with b in \phi_4:

\phi_4 = b \oplus \bar{b} .

Then use \phi_1 to swap out b again, this time with a:

\phi_4 = a \oplus \bar{b} .

Finally use \phi_2 to swap out a with \bar{b}, leaving

\bar{b} \oplus \bar{b}

which is clearly contradictory.

(Sidenote: I guess these sort of arguments to show a contradiction do involve some thinking, which is what I was trying to avoid earlier. But in each case you could just draw out a truth table, which is a stupid method that a computer could do. So I think it’s reasonable to say that this is less thinking than Mermin’s method.)

Again, this violates the logical Bell inequality. In total, we have

\sum_i p_i = 1 + \frac{3}{4}  + \frac{3}{4}  + \frac{3}{4} = 3.25 > 3.

The textbook version of this inequality is a bit different. For a start, it uses an ‘expectation value’ for each proposition rather than a straightforward probability, where truth is associated with +1 and falsity with -1. So each proposition \phi_i has an expectation value E_i with

E_i = (+1)\cdot p_i + (-1)\cdot (1-p_i) = 2p_i -1.

Then summing over the E_is gives

\sum_i E_i = \sum_i (2p_i-1) = 2\sum_i p_i - N

and then, using the previous form of the logical Bell inequality,

\sum_i E_i \leq 2(N-1) - N = N-2.

A similar argument for -E_i shows that \sum_i E_i \geq -(N-2), so that this is a bound above and below:

|\sum_i E_i| \leq N - 2.

In this case N = 4 and so the inequality becomes |\sum_i E_i| \leq 2. However adding up the E_is associated to the propositions \phi_i gives 2.5, so the inequality is violated.

There’s still a little further to go to get the textbook version, but we’re getting close. The textbook version writes the CHSH inequality as

| E(a,b) + E(\bar{a}, b) + E(a, \bar{b}) - E(\bar{a}, \bar{b}) | < 2.

where the expectation value is written in the form

E(a,b) = \int A(a,\lambda) B(b, \lambda)\rho(\lambda) d\lambda.

The \lambda are ‘hidden variables’ – properties of the particles that dispose them to act in various ways. For example, in the Mermin case, we imagined them to have hidden states, like

TFF

TFF

that controlled their response to each dial, and showed that any choice of these hidden states would lead to a contradiction.

For a given \lambda, A(\lambda, a) and B(\lambda, b) are the values measured by the left and right hand machines respectively. In our case these values are always either +1 (if the machine flashes T) or -1 (if the machine flashes F). The CHSH argument can also be adapted to a more realistic case where some experimental runs have no detection at all, and the outcome can also be 0, but this simple version won’t do that.

For the dial settings a and b, all we care about with these hidden variables is whether they make the machines respond true or false. So in our case \lambda is just a set of four variables, \lambda = { a\land b, a\land \lnot b, \lnot a\land b, \lnot a\land\lnot b }, and the integral can just become a sum:

E(a,b) = (+1 \times +1)\cdot p(a\land b) + (+1 \times -1)\cdot p(a\land \lnot b) + (-1 \times +1)\cdot p(\lnot a\land b) + (-1 \times -1)\cdot p(\lnot a\land \lnot b)

= p(a\land b) + p(\lnot a\land \lnot b) - p(a\land \lnot b) - p(\lnot a\land b).

= p((a\land b) \lor \lnot (a\land \lnot b)) - p((a\land \lnot b) \lor(\lnot a\land b)).

Now that first proposition (a\land b) \lor \lnot (a\land \lnot b) is just \phi_1 from earlier, which had probability p_1. And the second one covers all the remaining possibilities, so it has probability 1-p_1. So

E(a,b) = p_1 - (1-p_1) = 2p_1 - 1 = E_1.

The argument goes through exactly the same way for E(a, \bar{b}) and E(\bar{a}, b). The last case, E(\bar{a}, \bar{b}), is slightly different. We get

E(\bar{a}, \bar{b}) = p((\bar{a}\land \bar{b}) \lor \lnot (\bar{a}\land \lnot \bar{b})) - p((\bar{a}\land \lnot \bar{b}) \lor(\lnot \bar{a}\land \bar{b}))

following the same logic as before. But this time \phi_4 matches the second proposition (\bar{a}\land \lnot \bar{b}) \lor(\lnot \bar{a}\land \bar{b}), not the first, so that

E(\bar{a}, \bar{b}) = (1-p_4) - p_4 = 1 - 2p_4 = -E_4.

This is where the minus sign in the CHSH inequality comes in! We have

|\sum_i E_i| = | E(a, b) + E(a, \bar{b}) + E(\bar{a}, b) - E(\bar{a}, \bar{b}) | \leq 2.

So we end up with the standard inequality, but with a bit more insight into where the pieces come from. Also, importantly, it’s easy to extend to other situations. For example, you could follow the same method with the six Mermin propositions from earlier to make a kind of ‘Mermin-CHSH inequality’:

|\sum_i E_i| = | E(a, b) + E(a', b') + E(a'', b'') - E(a, b') - E(a, b'') - E(a', b'') | \leq 4.

Or you could have three particles, or a different set of measurements, or you could investigate what happens with other tables of correlations that don’t appear in quantum physics… this is a very versatile setup. The original paper has many more examples.

Final thoughts

There are still some loose ends that it would be good to tie up. I’d like to understand exactly how the inequality-shuffling in a ‘textbook-style’ proof of the CHSH inequality connects to Abramsky and Hardy’s version. Presumably some of it is replicating the same argument, but in a more opaque form. But also some of it must need to deal with the fact that it’s a more general setting, and includes things like measurements returning 0 as well as +1 or -1. It would be nice to figure out which bits are which. I think Bell’s original paper didn’t have the zero thing either, so that could be one place to look.

On the other hand… that all sounds a bit like work, and I can’t be bothered for now. I’d rather apply some of this to something interesting. My next post is probably going to make some connections between the logical Bell inequalities and my previous two posts on negative probability.

If you know the answers to my questions above and can save me some work, please let me know in the comments! Also, I’d really like to know if I’ve got something wrong. There are a lot of equations in this post and I’m sure to have cocked up at least one of them. More worryingly, I might have messed up some more conceptual points. If I’ve done that I’m even more keen to know!

Speedrun: The Vygotsky Circle

I did a ‘speedrun’ post a couple of months ago where I set a one hour timer and tried to find out as much as I could about Marx’s theory of alienation. That turned out to be pretty fun, so I’m going to try it again with another topic where I have about an hour’s worth of curiosity.

I saw a wikipedia link to something called ‘the Vygotsky Circle’ a while back. I didn’t click the link (don’t want to spoil the fun!) but from the hoverover it looks like that includes Vygotsky, Luria and… some other Russian psychologists, I guess? I’d heard of those two, but I only have the faintest idea of what they did. Here’s the entirety of my current knowledge:

  • Vygotsky wrote a book called Thought and Language. Something about internalisation?
  • Luria’s the one who went around pestering peasants with questions about whether bears in the Arctic are white. And presumably a load of other stuff… he pops up in pop books with some frequency. E.g. I think he did a study of someone with an extraordinary memory?

That’s about it, so plenty of room to learn more. And also anything sounds about ten times more interesting if it’s a Circle. Suddenly it’s an intellectual movement, not a disparate bunch of nerds. So… let’s give this a go.


OK first go to that wiki article.

The Vygotsky Circle (also known as Vygotsky–Luria Circle[1][2]) was an influential informal network of psychologists, educationalists, medical specialists, physiologists, and neuroscientists, associated with Lev Vygotsky (1896–1934) and Alexander Luria (1902–1977), active in 1920-early 1940s in the Soviet Union (Moscow, Leningrad and Kharkiv).

So who’s in it?

The Circle included altogether around three dozen individuals at different periods, including Leonid Sakharov, Boris Varshava, Nikolai Bernstein, Solomon Gellerstein, Mark Lebedinsky, Leonid Zankov, Aleksei N. Leontiev, Alexander Zaporozhets, Daniil Elkonin, Lydia Bozhovich, Bluma Zeigarnik, Filipp Bassin, and many others. German-American psychologist Kurt Lewin and Russian film director and art theorist Sergei Eisenstein are also mentioned as the “peripheral members” of the Circle.

OK that’s a lot of people! Hm this is a very short article. Maybe the Russian one is longer? Nope. So this is the entirety of the history of the Circle given:

The Vygotsky Circle was formed around 1924 in Moscow after Vygotsky moved there from the provincial town of Gomel in Belarus. There at the Institute of Psychology he met graduate students Zankov, Solov’ev, Sakharov, and Varshava, as well as future collaborator Aleksander Luria.[5]:427–428 The group grew incrementally and operated in Moscow, Kharkiv, and Leningrad; all in the Soviet Union. From the beginning of World War II 1 Sept 1939 to the start of the Great Patriotic War, 22 June 1941, several centers of post-Vygotskian research were formed by Luria, Leontiev, Zankov, and Elkonin. The Circle ended, however, when the Soviet Union was invaded by Germany to start the Great Patriotic War.

However, by the end of 1930s a new center was formed around 1939 under the leadership of Luria and Leontiev. In the after-war period this developed into the so-called the “School of Vygotsky-Leontiev-Luria”. Recent studies show that this “school” never existed as such.

There are two problems that are related to the Vygotsky circle. First was the historical recording of the Soviet psychology with innumerable gaps in time and prejudice. Second was the almost exclusive focus on the person, Lev Vygotsky, himself to the extent that the scientific contributions of other notable characters have been considerably downplayed or forgotten.

This is all a bit more nebulous than I was hoping for. Lots of references and sources at least. May end up just covering Vygotsky and Luria.

OK Vygotsky wiki article. What did he do?

He is known for his concept of the zone of proximal development (ZPD): the distance between what a student (apprentice, new employee, etc.) can do on their own, and what they can accomplish with the support of someone more knowledgeable about the activity. Vygotsky saw the ZPD as a measure of skills that are in the process of maturing, as supplement to measures of development that only look at a learner’s independent ability.

Also influential are his works on the relationship between language and thought, the development of language, and a general theory of development through actions and relationships in a socio-cultural environment.

OK here’s the internalisation thing I vaguely remembered hearing about:

… the majority of his work involved the study of infant and child behavior, as well as the development of language acquisition (such as the importance of pointing and inner speech[5]) …

Influenced by Piaget, but differed on inner speech:

Piaget asserted that egocentric speech in children “dissolved away” as they matured, while Vygotsky maintained that egocentric speech became internalized, what we now call “inner speech”.

Not sure I’ve picked a good topic this time, pulls in way too many directions so this is going to be very shallow and skip around. And ofc there’s lots of confusing turbulent historical background, and all these pages refer to various controversies of interpretation 😦 Skip to Luria, can always come back:

Alexander Romanovich Luria (Russian: Алекса́ндр Рома́нович Лу́рия, IPA: [ˈlurʲɪjə]; 16 July 1902 – 14 August 1977) was a Russian neuropsychologist, often credited as a father of modern neuropsychological assessment. He developed an extensive and original battery of neuropsychological tests during his clinical work with brain-injured victims of World War II, which are still used in various forms. He made an in-depth analysis of the functioning of various brain regions and integrative processes of the brain in general. Luria’s magnum opus, Higher Cortical Functions in Man (1962), is a much-used psychological textbook which has been translated into many languages and which he supplemented with The Working Brain in 1973.

… became famous for his studies of low-educated populations in the south of the Soviet Union showing that they use different categorization than the educated world (determined by functionality of their tools).

OK so this was early on.

Some biographical stuff. Born in Kazan, studied there, then moved to Moscow where he met Vygotsky. And others:

During the 1920s Luria also met a large number of scholars, including Aleksei N. Leontiev, Mark Lebedinsky, Alexander Zaporozhets, Bluma Zeigarnik, many of whom would remain his lifelong colleagues.

Leontiev’s turned up a few times, open in another tab.

OK the phrase ‘cultural-historical psychology’ has come up. Open the wikipedia page:

Cultural-historical psychology is a branch of avant-garde and futuristic psychological theory and practice of the “science of Superman” associated with Lev Vygotsky and Alexander Luria and their Circle, who initiated it in the mid-1920s–1930s.[1] The phrase “cultural-historical psychology” never occurs in the writings of Vygotsky, and was subsequently ascribed to him by his critics and followers alike, yet it is under this title that this intellectual movement is now widely known.

This all sounds like a confusing mess where I’d need to learn way more background than I’m going to pick up in an hour. Back to Luria. Here’s the peasant-bothering stuff:

The 1930s were significant to Luria because his studies of indigenous people opened the field of multiculturalism to his general interests.[12] This interest would be revived in the later twentieth century by a variety of scholars and researchers who began studying and defending indigenous peoples throughout the world. Luria’s work continued in this field with expeditions to Central Asia. Under the supervision of Vygotsky, Luria investigated various psychological changes (including perception, problem solving, and memory) that take place as a result of cultural development of undereducated minorities. In this regard he has been credited with a major contribution to the study of orality.

That last bit has a footnote to Ong’s Orality and Literacy. Another place I’ve seen the name before.

In 1933, Luria married Lana P. Lipchina, a well-known specialist in microbiology with a doctorate in the biological sciences.

Then studied aphasia:

In his early neuropsychological work in the end of the 1930s as well as throughout his postwar academic life he focused on the study of aphasia, focusing on the relation between language, thought, and cortical functions, particularly on the development of compensatory functions for aphasia.

This must be another pop-science topic where I’ve come across him before. Hm where’s the memory bit? Oh I missed it:

Apart from his work with Vygotsky, Luria is widely known for two extraordinary psychological case studies: The Mind of a Mnemonist, about Solomon Shereshevsky, who had highly advanced memory; and The Man with a Shattered World, about a man with traumatic brain injury.

Ah this turns out to be late on in his career:

Among his late writings are also two extended case studies directed toward the popular press and a general readership, in which he presented some of the results of major advances in the field of clinical neuropsychology. These two books are among his most popular writings. According to Oliver Sacks, in these works “science became poetry”.[31]

In The Mind of a Mnemonist (1968), Luria studied Solomon Shereshevskii, a Russian journalist with a seemingly unlimited memory, sometimes referred to in contemporary literature as “flashbulb” memory, in part due to his fivefold synesthesia.

In The Man with the Shattered World (1971) he documented the recovery under his treatment of the soldier Lev Zasetsky, who had suffered a brain wound in World War II.

OK 27 minutes left. I’ll look up some of the other characters. Leontiev first. Apparently he was ‘a Soviet developmental psychologist, philosopher and the founder of activity theory.’ What’s activity theory?

Activity theory (AT; Russian: Теория деятельности)[1] is an umbrella term for a line of eclectic social sciences theories and research with its roots in the Soviet psychological activity theory pioneered by Sergei Rubinstein in 1930s. At a later time it was advocated for and popularized by Alexei Leont’ev. Some of the traces of the theory in its inception can also be found in a few works of Lev Vygotsky,[2]. These scholars sought to understand human activities as systemic and socially situated phenomena and to go beyond paradigms of reflexology (the teaching of Vladimir Bekhterev and his followers) and classical conditioning (the teaching of Ivan Pavlov and his school), psychoanalysis and behaviorism.

So maybe he founded it or maybe he just advocated for it. This is all a bit of a mess. But, ok, it’s an umbrella term for moving past behaviourism.

One of the strengths of AT is that it bridges the gap between the individual subject and the social reality—it studies both through the mediating activity. The unit of analysis in AT is the concept of object-oriented, collective and culturally mediated human activity, or activity system.

This all looks sort of interesting, but a bit vague, and will probably take me down some other rabbithole. Back to Leontiev.

After Vygotsky’s early death, Leont’ev became the leader of the research group nowadays known as the Kharkov School of Psychology and extended Vygotsky’s research framework in significantly new ways.

Oh shit completely missed the whole thing about Vygotsky’s early death. Back to him… died aged 37! Of tuberculosis. Mostly became famous after his death, and through the influence of his students. Ah this bit on his influence might be useful. Soviet influence first:

In the Soviet Union, the work of the group of Vygotsky’s students known as the Vygotsky Circle was responsible for Vygotsky’s scientific legacy.[42] The members of the group subsequently laid a foundation for Vygotskian psychology’s systematic development in such diverse fields as the psychology of memory (P. Zinchenko), perception, sensation, and movement (Zaporozhets, Asnin, A. N. Leont’ev), personality (Lidiya Bozhovich, Asnin, A. N. Leont’ev), will and volition (Zaporozhets, A. N. Leont’ev, P. Zinchenko, L. Bozhovich, Asnin), psychology of play (G. D. Lukov, Daniil El’konin) and psychology of learning (P. Zinchenko, L. Bozhovich, D. El’konin), as well as the theory of step-by-step formation of mental actions (Pyotr Gal’perin), general psychological activity theory (A. N. Leont’ev) and psychology of action (Zaporozhets).

That at least says something about what all of those names did. Open Zinchenko tab as first.

Then North American influence:

In 1962 a translation of his posthumous 1934 book, Thinking and Speech, published with the title,Thought and Language, did not seem to change the situation considerably.[citation needed] It was only after an eclectic compilation of partly rephrased and partly translated works of Vygotsky and his collaborators, published in 1978 under Vygotsky’s name as Mind in Society, that the Vygotsky boom started in the West: originally, in North America, and later, following the North American example, spread to other regions of the world.[citation needed] This version of Vygotskian science is typically associated with the names of its chief proponents Michael Cole, James Wertsch, their associates and followers, and is relatively well known under the names of “cultural-historical activity theory” (aka CHAT) or “activity theory”.[45][46][47] Scaffolding, a concept introduced by Wood, Bruner, and Ross in 1976, is somewhat related to the idea of ZPD, although Vygotsky never used the term.[

Ah so Thought and Language was posthumous.

Then a big pile of controversy about how his work was interpreted. Now we’re getting headings like ‘Revisionist movement in Vygotsky Studies’, think I’ll bail out now. 16 minutes left.

OK let’s try Zinchenko page.

The main theme of Zinchenko’s research is involuntary memory, studied from the perspective of the activity approach in psychology. In a series of studies, Zinchenko demonstrated that recall of the material to be remembered strongly depends on the kind of activity directed on the material, the motivation to perform the activity, the level of interest in the material and the degree of involvement in the activity. Thus, he showed that following the task of sorting material in experimental settings, human subjects demonstrate a better involuntary recall rate than in the task of voluntary material memorization.

This influenced Leontiev and activity theory. That’s about all the detail there is. What to do next? Look up some of the other people I guess. Try a few, they’re all very short articles, give up with that.

Fine I’ll just google ‘vygotsky thought and language’ and see what i get. MIT Press description:

Vygotsky’s closely reasoned, highly readable analysis of the nature of verbal thought as based on word meaning marks a significant step forward in the growing effort to understand cognitive processes. Speech is, he argues, social in origins. It is learned from others and, at first, used entirely for affective and social functions. Only with time does it come to have self-directive properties that eventually result in internalized verbal thought. To Vygotsky, “a word is a microcosm of human consciousness.”

OK, yeah that does sound interesting.

Not finding great sources. 8 minutes left. Zone of proximal development section of Vygotsky’s page:

“Zone of Proximal Development” (ZPD) is a term Vygotsky used to characterize an individual’s mental development. He originally defined the ZPD as “the distance between the actual developmental level as determined by independent problem solving and the level of potential development as determined through problem solving under adult guidance or in collaboration with more capable peers.” He used the example of two children in school who originally could solve problems at an eight-year-old developmental level (that is, typical for children who were age 8). After each child received assistance from an adult, one was able to perform at a nine-year-old level and one was able to perform at a twelve-year-old level. He said “This difference between twelve and eight, or between nine and eight, is what we call the zone of proximal development.” He further said that the ZPD “defines those functions that have not yet matured but are in the process of maturation, functions that will mature tomorrow but are currently in an embryonic state.” The zone is bracketed by the learner’s current ability and the ability they can achieve with the aid of an instructor of some capacity.

ZPD page itself:

Zygotsky spent a lot of time studying the impact of school instruction on children and noted that children grasp language concepts quite naturally, but that math and writing did not come as naturally. Essentially, he concluded that because these concepts were taught in school settings with unnecessary assessments, they were of more difficulty to learners. Piaget believed that there was a clear distinction between development and teaching. He said that development is a spontaneous process that is initiated and completed by the children, stemming from their own efforts. Piaget was a proponent of independent thinking and critical of the standard teacher-led instruction that was common practice in schools.

But also:

… He believed that children would not advance very far if they were left to discover everything on their own. It’s crucial for a child’s development that they are able to interact with more knowledgeable others. They would not be able to expand on what they know if this wasn’t possible.

OK 3 minutes left. Let’s wildly skip between tabs learning absolutely nothing. Hm maybe this would have been interesting? ‘Vygotsky circle as a personal network of scholars: restoring connections between people and ideas’.

Ding! Didn’t get much past reading the title.


Well that didn’t work as well as the alienation one. Sprawling topic, and I wasn’t very clear on what I wanted to get out of it. History of the Circle itself or just some random facts about what individual people in it did? I mostly ended up with the second one, and not much insight into what held it together conceptually, beyond some vague idea about ‘going beyond behaviourism’/’looking at general background of human activity, not just immediate task’.

Still, I guess I know a bit more about these people than I did going in, and would be able to orient more quickly if I wanted to find out anything specific.

The Mane Six as Mitford Sisters

IMG_20200628_154808962_BURST000_COVER_TOP

[Written as part of Notebook Blog Month.]

I’ve saved the most important topic for last. As far as I can tell, nobody on the internet has tackled the vital question of how My Little Pony characters map to Mitford sisters. So I’m going to fix that.

As a bit of background, the Mitfords were a wildly eccentric English aristocratic family. The novelist Nancy Mitford is probably the most famous of them, but her five sisters were an impressively bizarre mix of communists, fascists, socialites, farmers and Hitler obsessives. (There’s also a brother who nobody cares about.) I’m not particularly well up on Mitford lore, but I am a big fan of Nancy Mitford’s The Pursuit of Love, and I’m fascinated by eccentrics of all kinds, so I know the basics.

My Little Pony also has six main characters. (Plus Spike.) They turn out to match up surprisingly closely with Mitford sisters, right up to the point where they don’t, and then I just have to make it up.

So, first of all, Nancy Mitford is Twilight Sparkle. This one is completely obvious. In The Pursuit of Love, Nancy is the narrator of a fictionalised version of the Mitfords’ lives, with her as the quieter, more studious observer. Done.

Deborah Mitford was a famous socialite known for… being social and stuff. Lots of witty correspondence with other witty socialite types. She also ran a big stately home that was open to the public, which is kind of like organising parties if you squint hard enough. So Pinkie Pie.

Jessica Mitford was the most adventurous and rebellious of the sisters, running away to Spain and then later becoming an activist in the US, where she worked on civil rights campaigns and joined the Communist Party. Also investigated unscrupulous business practices in the funeral home industry for some reason. Clearly has to be Rainbow Dash.

Unity Valkyrie Mitford is the oddest of the lot. She became completely obsessed with Hitler, stalked him around Munich, eventually made her way into his inner circle, shot herself in the head when Britain declared war on Germany, survived the attempt and lasted out almost another ten years before dying of meningitis caused by swelling around the bullet, which was never removed. I feel like this is a job for Rarity, partly because of the rhyming name, partly because she’s the only one capable of pulling off this much drama.

OK, this is the point where the mapping gets a bit trickier. The two remaining sisters are Pamela and Diana. Pamela was the most retiring of the Mitfords, staying out of the public eye, at least in comparison to the others. She was practical-minded, loved animals and the countryside and managed a farm for a while. A pretty good fit for either Applejack or Fluttershy. She did still manage to do some weird Mitford stuff, marrying a bisexual millionaire physicist and then becoming the ‘companion’ of an Italian horsewoman after they divorced.

Diana is the dud Mitford. She was another fascist, and not even a spectacularly bizarre one like Unity. Mainly known for marrying Oswald Mosley, leader of the British Union of Fascists, editing a fascist magazine, and spending some time in prison during the war for being a fascist.

I don’t really want to lumber either of them with Diana, but I need to make a choice, so I’ve introduced an outside tie break. Fluttershy is best pony. So Fluttershy gets Pamela Mitford and poor old Applejack is stuck with Diana Mitford.

I’m sure everyone’s relieved that this major open question has been definitively answered at last.

Bullshitting about bullshit jobs

IMG_20200628_151152903

[Written as part of Notebook Blog Month.]

Today’s topic is bullshit jobs. I’ve done no preparation for this beyond a reread of David Graeber’s essay, On the Phenomenon of Bullshit Jobs: A Work Rant. This piece was enormously popular when it came out in 2013, so presumably there’s a large secondary literature of commentaries and follow-ups. I haven’t read any of it, this is me just bullshitting from first principles. So I’m probably repeating a lot of obvious talking points.

Rereading the article, a couple of things jumped out straight away:

  • A lot of the specifics of his argument are not particularly convincing (I’ll get to this in a minute).
  • It rings deeply true anyway, because we all know deep in our hearts and viscera that so many jobs are full of useless bullshit. It’s not surprising that it was so popular.

So, I’ll briefly go over his thesis. Advances in productivity and automation should have freed up lots of time by now, and we should have got the 15 hour working week that John Maynard Keynes expected. But clearly we haven’t. Graeber rules out the consumer treadmill as an explanation:

The standard line today is that he didn’t figure in the massive increase in consumerism. Given the choice between less hours and more toys and pleasures, we’ve collectively chosen the latter. This presents a nice morality tale, but even a moment’s reflection shows it can’t really be true. Yes, we have witnessed the creation of an endless variety of new jobs and industries since the ’20s, but very few have anything to do with the production and distribution of sushi, iPhones, or fancy sneakers.

Instead, he points the finger at what he describes as whole new classes of jobs:

… rather than allowing a massive reduction of working hours to free the world’s population to pursue their own projects, pleasures, visions, and ideas, we have seen the ballooning of not even so much of the ‘service’ sector as of the administrative sector, up to and including the creation of whole new industries like financial services or telemarketing, or the unprecedented expansion of sectors like corporate law, academic and health administration, human resources, and public relations. And these numbers do not even reflect on all those people whose job is to provide administrative, technical, or security support for these industries, or for that matter the whole host of ancillary industries (dog-washers, all-night pizza delivery) that only exist because everyone else is spending so much of their time working in all the other ones.

These are what I propose to call ‘bullshit jobs’.

It’s as if someone were out there making up pointless jobs just for the sake of keeping us all working.

So, in this view there are a bunch of jobs that are bullshit. Here’s another sample list further down:

A world without teachers or dock-workers would soon be in trouble, and even one without science fiction writers or ska musicians would clearly be a lesser place. It’s not entirely clear how humanity would suffer were all private equity CEOs, lobbyists, PR researchers, actuaries, telemarketers, bailiffs or legal consultants to similarly vanish.

It’s not very satisfying to me to leave this at the level of a big binary list of jobs that are bullshit (telemarketers, corporate law) and jobs that are not (teachers, tube drivers). To my mind the bullshit is much more fractally distributed throughout the whole economy. It’s definitely true that some jobs are much more prone to gathering bullshit than others. But most of the ‘bullshit’ jobs Graeber lists serve some useful functions. I’m not sure what he’s got against actuaries – insurance seems like a reasonable thing to me, and somebody needs to work out how much it should cost. And some level of financial and legal work needs to go on. (Some of these I do find hard to defend at all. I think telemarketing might actually be pure bullshit given a narrow enough definition? Does anyone need to bother other people by phone any more? I mean I really hate phones so I could be biased here, but that does sound like bullshit to me.)

At the other end of the scale, a lot of his ‘non-bullshit’ jobs get mixed up with bullshit too. Teachers are always having to grapple with the latest bullshit government initiative, for example. The bullshit is mixed right through everything.

I want to probe a bit deeper into what factors are upstream of jobs becoming bullshit. There definitely seem to be warning signs for bullshit. For a start, jobs are particularly likely to contain a lot of bullshit if they contain a lot of abstraction layers. For example:

  • selling abstract things (financial derivatives) rather than concrete things (potatoes)
  • managing people who do things, rather than doing things directly
  • contracting out work to a second company, rather than doing it yourself.
  • producing hard-to-measure output (potato marketing board) rather than obvious results (potato farmer)

These aren’t bad things intrinsically, they need to happen to some extent or we’d all be stuck individually bartering potatoes all day. But they provide places for the bullshit to get in.

For the rest of this post I’m going to play with one potential taxonomy of bullshit jobs. This isn’t supposed to be a Grand Unified Theory of Bullshit Jobs, it’s just me playing around. I don’t have time to try multiple taxonomies in a notebook post like this, so if this one turns out to not be very insightful then I just have to deal with that I suppose. Anyway, it’s inspired by Frankfurt’s characterisation of bullshit in his classic On Bullshit:

It is impossible for someone to lie unless he thinks he knows the truth. Producing bullshit requires no such conviction. A person who lies is thereby responding to the truth, and he is to that extent respectful of it. When an honest man speaks, he says only what he believes to be true; and for the liar, it is correspondingly indispensable that he considers his statements to be false. For the bullshitter, however, all these bets are off: he is neither on the side of the true nor on the side of the false. His eye is not on the facts at all, as the eyes of the honest man and of the liar are, except insofar as they may be pertinent to his interest in getting away with what he says. He does not care whether the things he says describe reality correctly. He just picks them out, or makes them up, to suit his purpose.

The main feature of bullshit, as Frankfurt explains it, is that it is indifferent to the truth, rather than outright false. It’s produced as a side effect of some other self-serving process.

This already ties in quite well with the idea of bullshit seeping through abstraction layers. Abstraction layers tend to be places where caring breaks down – it’s just easier to care about potatoes than financial derivatives. In my taxonomy I’m going to explore three kinds of not caring:

  1. Nobody cares about the problem
  2. Nobody cares whether the solution could possibly fix the problem
  3. The incentives push against caring anyway

I’ll go through the eight options. First there’s the Platonic ideal of a non-bullshit job:

  • Care about problem, care about solution, good incentives. Honest artisan hand crafts a beautiful table to sell directly etc etc.

Then we get into the region of somewhat bullshit jobs:

  • Care about problem, care about solution, bad incentives. This is where teaching often ends up, for example (good teachers, I mean). Loves their subject, wants to teach it well, but also has to tick the boxes for ‘learning outcomes’ or w/e.

  • Care about problem, don’t care about solution, good incentives. Caring about the problem but not whether the solution could possibly fix it is a funny one, but something like it happens quite often once a bit of self-deception gets involved, and some bad incentives. Doing it with good incentives is harder. Maybe a widget factory boss gets infatuated with some kind of trendy-but-useless management methodology. They genuinely want to sell more widgets, the market for widgets functions well, but there’s a layer of cargo-cult stupidity in the middle. That’s the best I can do, maybe somebody else can come up with a better example.

  • Don’t care about problem, care about solution, good incentives. This is somebody putting in a solid day of work at a a job they don’t particular care for intrinsically, but with decent working conditions and high standards for what counts as a good job done.

None of these seem like the canonical bullshit job to me, but they are definitely likely to contain bullshit elements. Then we get towards the real bullshit:

  • Care about the problem, don’t care about the solution, bad incentives. This is the self-delusion thing again. Maybe this is a charity employee who genuinely cares about the cause but has some motivated reasoning going on about whether the thing they’re doing could possibly help. If the charity is able to fundraise whether the work is useful or not, you get this situation.

  • Don’t care about the problem, care about the solution, bad incentives. This is the typical academic with a very specific pet hammer, churning out papers that use it in dubious ways.

  • Don’t care about the problem, don’t care about the solution, good incentives. This is just working a job you don’t care about again, but this time without being held back by any kind of standards of professionalism or craftsmanship. Working conditions are still good.

Then finally we get to:

  • Don’t care about the problem, don’t care about the solution, incentives are terrible anyway. You can get here quite easily by switching the remaining yes to a no in the examples above: the academic doesn’t even care about whether the technique is carried out right, the charity employee is completely indifferent to the cause rather than well-meaning but self-deluded, the apathetic worker also has a horrible boss and isn’t paid well. Then you just have pure bullshit.

I’ll try it on one real example, that job I did where I walked around hospitals measuring things:

There were two hospitals being merged together on a new site, and the project management office needed to collect data on how much storage space the new hospital would need for medical supplies. I’m not sure what the best way of doing this would be, but maybe it would involve, I don’t know, some Fermi estimates based on their current storage requirements, plus some efficiencies for the single site. What they actually did was make a giant spreadsheet of every sort of item ordered by the hospital (bandages, prosthetics, tiny orthopaedic screws) and then employ EIGHT OF US to go round the hospitals with tape measures FOR WEEKS tracking down and measuring every individual item on the list, including the tiny orthopaedic screws.

If that doesn’t score highly on the bullshitometer then something is really up.

So… the original problem, ‘how much storage space do we need’, is a good one, and presumably somebody somewhere really cared about the answer. Once this had been filtered through a couple of management layers and subcontractors most of the caring had been lost. I’ll give it a generous half a point.

Definitely nobody cared whether the solution could possibly fix the problem. Zero points.

Incentives were also awful. The work had been contracted out through a couple of layers, and the people who had to find the answer had no particular reason to care beyond coming up with a number that kept the layer above happy. Zero points again.

So I make that 0.5/3. Definitely a bullshit job. So it passes this basic sanity check. (Not a great one, as I’d have had exactly this example at the back of my mind when I came up with the criteria!)

So… was that particulary taxonomy any good? Not particularly, but it did get my thinking through the space of possibilities. I do think that the general strategy of looking for places where not-caring gets in is a good one for spotting bullshit.

“Neoliberalism”

IMG_20200625_110020941
 
[Written as part of Notebook Blog Month.]

Everybody hates neoliberalism, it’s the law. But what is it?

This is probably the topic I’m most ignorant about and ill-prepared-for on the whole list, and I wasn’t going to do it. But it’s good prep for the bullshit jobs post, which was a popular choice, so I’m going to try. I’m going to be trying to articulate my current thoughts, rather than attempting to say anything original. And also I’m not really talking about neoliberalism as a coherent ideology or movement. (I think I’d have to do another speedrun just to have a chance of saying something sensible.) More like “neoliberalism”, scarequoted, as a sort of diffuse cloud of associations that the term brings to mind. Here’s my cloud (very UK-centric):

  • Big amorphous companies with bland generic names like Serco or Interserve, providing an incoherent mix of services to the public sector, with no obvious specialism beyond winning government contracts
  • Public private partnerships
  • Metrics! Lots of metrics!
  • Incuriosity about specifics. E.g. management by pushing to make a number go up, rather than any deep engagement with the particulars of the specific problem
  • Food got really good over this period. I think this actually might be relevant and not just something that happened at the same time
  • Low cost short-haul airlines becoming a big thing (in Europe anyway – don’t really understand how widespread this is)
  • Thinking you’re on a public right of way but actually it’s a private street owned by some shopping centre or w/e. With private security and lots of CCTV
  • Post-industrial harbourside developments with old warehouses converted into a Giraffe and a Slug and Lettuce
  • A caricatured version of Tony Blair’s disembodied head is floating over the top of this whole scene like a barrage balloon. I don’t think this is important but I thought you’d like to know

I’ve had this topic vaguely in mind since I read a blog post by Timothy Burke, a professor of modern history, a while back. The post itself has a standard offhand ‘boo neoliberalism’ side remark, but then when challenged in the comments he backs it up with an excellent, insightful sketch of what he means. (Maybe this post should just have been a copy of this comment, instead of my ramblings.)

I’m sensitive to the complaint that “neoliberalism” is a buzz word that can mean almost everything (usually something the speaker disapproves of).

A full fleshing out is more than I can provide, though. But here’s some sketches of what I have in mind:

1) The Reagan-Thatcher assault on “government” and aligned conceptions of “the public”–these were not merely attempts to produce new efficiencies in government, but a broad, sustained philosophical rejection of the idea that government can be a major way to align values and outcomes, to tackle social problems, to restrain or dampen the power of the market to damage existing communities. “The public” is not the same, but it was an additional target: the notion that citizens have shared or collective responsibilities, that there are resources and domains which should not be owned privately but instead open to and shared by all, etc. That’s led to a conception of citizenship or social identity that is entirely individualized, privatized, self-centered, self-affirming, and which accepts no responsibility to shared truths, facts, or mechanisms of dispute and deliberation.

2) The idea of comprehensively measuring, assessing, quantifying performance in numerous domains; insisting that values which cannot be measured or quantified are of no worth or usefulness; and constantly demanding incremental improvements from all individuals and organizations within these created metrics. This really began to take off in the 1990s and is now widespread through numerous private and public institutions.

3) The simultaneous stripping bare of ordinary people to numerous systems of surveillance, measurement, disclosure, monitoring, maintenance (by both the state and private entities) while building more and more barriers to transparency protecting the powerful and their most important private and public activities. I think especially notable since the late 1990s and the rise of digital culture. A loss of workplace and civil protections for most people (especially through de-unionization) at the same time that the powerful have become increasingly untouchable and unaccountable for a variety of reasons.

4) Nearly unrestrained global mobility for capital coupled with strong restrictions on labor (both in terms of mobility and in terms of protection). Dramatically increased income inequality. Massive “shadow economies” involving illegal or unsanctioned but nevertheless highly structured movements of money, people, and commodities. Really became visible by the early 1990s.

A lot of the features in my association cloud match pretty well: metrics, surveillance, privatisation. Didn’t really pick up much from point 4. I think 2 is the one which interests me most. My read on the metric stuff is that there’s a genuinely useful tool here that really does work within its domain of application but is disastrous when applied widely to everything. The tool goes something like:

  • let go of a need for top-down control
  • fragment the system into lots of little bits, connected over an interface of numbers (money, performance metrics, whatever)
  • try to improve the system by hammering on the little bits in ways such that the numbers go in the direction you want. This could be through market forces, or through metrics-driven performance improvements.

If your problem is amenable to this kind of breakdown, I think it actually works pretty well. This is why I think ‘food got good’ is actually relevant and not a coincidence. It fits this playbook quite nicely:

  • It’s a known problem. People have been selling food for a long time and have some well-tested ideas about how to cook, prep, order supplies, etc. Theres’s innovation on top of that, but it’s not some esoteric new research field.
  • Each individual purchase (of a meal, cake, w/e) is small and low-value. So the domain is naturally fragmented into lots of tiny bits.
  • This also means that lots of people can afford to be customers, increasing the number of tiny bits
  • Fast feedback. People know whether they like a croissant after minutes, not years.
  • Relevant feedback. People just tell you whether they like your croissants, which is the thing you care about. You don’t need to go search for some convoluted proxy measure of whether they like your croissants.
  • Lowish barriers to entry. Not especially capital-intensive to start a cafe or market stall compared with most businesses.
  • Lowish regulations. There’s rules for food safety, but it’s not like building planes or someting.
  • No lock-in for customers. You can go to the donburi stall today and the pie and mash stall tomorrow.
  • All of this means that the interface layer of numbers can be an actual market, rather than some faked-up internal market of metrics to optimise. And it’s a pretty open market that most people can access in some form. People don’t go out and buy trains, but they do go out and buy sandwiches.

There’s another very important, less wonky factor that breaks you out of the dry break-it-into-numbers method I listed above. You ‘get to cheat’ by bringing in emotional energy that ‘comes along for free’. People actually like food! They start cafes because they want to, even when it’s a terrible business idea. They already intrinsically give a shit about the problem, and markets are a thin interface layer over the top rather than most of the thing. This isn’t going to carry over to, say, airport security or detergent manufacturing.

As you get further away from an idealised row of spherical burger vans things get more complicated and ambiguous. Low cost airlines are a good example. These actually did a good job of fragmenting the domain into lots of bits that were lumped together by the older incumbents. And it’s worked pretty well, by bringing down prices to the point where far more people can afford to travel. (Of course there’s also the climate change considerations. If you ignore those it seems like a very obvious Good Thing, once you include them it’s somewhat murkier I suppose.)

The price you pay is that the experience gets subtly degraded at many points by the optimisation, and in aggregate these tend to produce a very unsubtle crappiness. For a start there’s the simple overhead of buying the fragmented bits separately. You have to click through many screens of a clunky web application and decide individually about whether you want food, whether you want to choose your own seat, whether you want priority queuing, etc. All the things you’d just have got as default on the old, expensive package deal. You also have to say no to the annoying ads trying to upsell you on various deals on hotels, car rentals and travel insurance.

Then there are the all the ways the flight itself becomes crappier. It’s at a crap airport a long way from the city you want to get to, with crappy transport links. The flight is a cheap slot at some crappy time of the early morning. The plane is old and crappily fitted out. You’re having a crappy time lugging around the absolute maximum amount of hand luggage possible to avoid the extra hold luggage fee. (You’ve got pretty good at optimising numbers yourself.)

This is often still worth it, but can easily tip into just being plain Too Crappy. I’ve definitely over-optimised flight booking for cheapness and regretted it (normally when my alarm goes off at three in the morning).

Low cost airlines seem basically like a good idea, on balance. But then there are the true disasters, the domains that have none of the natural features that the neoliberal playbook works on. A good example is early-stage, exploratory academic research. I’ve spent too long on this post already. You can fill in the depressing details yourself.

Some rambling thoughts about visual imagery

IMG_20200621_115251676

[Written as part of Notebook Blog Month.]

I’ve got some half-written drafts for topics on the original list which I want to finish soon, but for now I seem to be doing better by going off-list and rambling about whatever’s in my head. Today it’s visual imagery.

I’ve ended up reading a bunch of things vaguely connected with mnemonics in the last couple of weeks. I’m currently very bad at concentrating on books properly, but I’m still reading at a similar rate, so everything is in this weird quarter-read state. Anyway here’s the list of things I’ve started:

  • Moonwalking with Einstein by Joshua Foer. Pop book about learning to compete in memory championships. This is good and an easy read, so there is some chance I’ll actually finish it.
  • Orality and Literacy by Walter Ong. One of the references I followed up. About oral cultures in general but there is stuff on memorisation (e.g. repetitive passages in Homer being designed for easy memorisation when writing it down is not an option)
  • Brienne Yudkowsky’s posts on mnemonics
  • These two interesting posts by AllAmericanBreakfast on Less Wrong this week about experimenting with memory palaces to learn information for a chemistry exam.
     

Those last two posts are interesting to me because they’re written by someone in the very early stages of fiddling around with this stuff who doesn’t consider themself to naturally have a good visual imagination. I’d put myself in the same category, but probably worse. Actually I’m really confused about what ‘visual imagery’ even is. I have some sort of – stuff? – that has a sort of visual component, maybe mixed in with some spatial/proprioceptive/tactile stuff. Is that what people mean by ‘visual imagery’? I guess so? It’s very transitory and hard to pin down in my case, though, and I don’t feel like I make a lot of use out of it. The idea of using these crappy materials to make something elaborate like a memory palace sounds like a lot of work. But maybe it would work better if I spent more time on it.

The thing that jumped out of the first post for me was this bit:

I close my eyes and allow myself to picture nothing, or whatever random nonsense comes to mind. No attempt to control.

Then I invite the concept of a room into mind. I don’t picture it clearly. There’s a vague sense, though, of imagining a space of some kind. I can vaguely see fleeting shadowy walls. I don’t need to get everything crystal clear, though.

This sounded a lot more fun and approachable to me than crafting a specific memory palace to memorise specific things. I didn’t even get to the point of ‘inviting the concept of a room in’, just allowed any old stuff to come up, and that worked ok for me. I’m not sure how much of this ‘imagery’ was particularly visual, but I did find lots of detailed things floating into my head. It seems to work better if I keep a light touch and only allow some very gentle curiosity-based steering of the scene.

Here’s the one I found really surprising and cool. I was imagining an intricately carved little jade tortoise for some reason, and put some mild curiosity into what its eyes were made of. And I discovered that they were tiny yellow plastic fake gemstones that were weirdly familiar. So I asked where I recognised them from (this was quite heavy-handed questioning that dragged me out of the imagery). And it turns out that they were from a broken fish brooch I had as a kid. I prised all the fake stones off with a knife at some point to use for some project I don’t remember.

I haven’t thought about that brooch in, what, 20 years? But I remember an impressive amount of detail about it! I’ve tried to draw it above. Some details like the fins are a best guess, but the blue, green and yellow stones in diagonal stripes are definitely right. It’s interesting that this memory is still sitting there and can be brought up by the right prompt.

I think I’ll play with this exercise a bit more and see what other rubbish I can dredge up.