Negative probability: now with added equations!

OK, so this is where I go back through everything from the last post, but this time show how all the fiddling around with boxes relates back to quantum physics, and also go into some technical details like explaining what I meant by ‘half the information’ in the discussion at the end. This is unavoidably going to need more maths than the last post, and enough quantum physics knowledge to be OK with qubits and density matrices. I’ll start by translating everything into a standard physics problem.

Qubit phase space

So, first off, instead of the ‘strange machine’ of the last post we will have a qubit state – as a first example I’ll take the |0\rangle state. The three questions then become measurements on it. Specifically, these measurements are expectation values q_i of the operators Q_i = \frac{1}{2}(I-\sigma_i), where the \sigma_i are the three Pauli matrices.

For |0\rangle we get the following:

q_z = \langle 0 | Q_z | 0 \rangle = 0

q_x = \langle 0 | Q_x | 0 \rangle = \frac{1}{2}

q_y = \langle 0 | Q_y | 0 \rangle = \frac{1}{2}

This can be represented on the same sort of 2×2 grid I used in the previous post:

The |0\rangle state has a definite value of 0 for the Q_z measurement, so the probabilities in the cells where Q_z = 0 must sum to 1. For the Q_x state there is an equal chance of either Q_x = 0 or Q_x = 1. The third measurement, Q_y, can be shown to be associated with the diagonals of the grid, in the same way as in Piponi’s example in the previous post, and again there is an equal chance of either value. Imposing all these conditions gives the probability assignment above.

The 2×2 grid is called the phase space of the qubit, and the function that assigns probabilities to each cell is called the Wigner function W. To save on drawing diagrams, I’ll represent this as a square-bracketed matrix from now on:

W = \begin{bmatrix} W(0,1) && W(1,1) \\ W(0,0) && W(1,0) \end{bmatrix}

For much more detail on how this all works, the best option is probably to read Wootters, who developed a lot of the ideas in the first place. There’s his original paper, which has all the technical details, and a nice follow-up paper on Picturing Qubits in Phase Space which gives a bit more intuition for what’s going on.

In the previous post I gave the following formula for the Wigner function:

W = \frac{1}{4}\Bigg( \begin{bmatrix}1 && 1 \\ 1 && 1 \end{bmatrix} \nonumber + q_z\begin{bmatrix}-1 && 1 \\ -1 && 1 \end{bmatrix} + (1-q_z)\begin{bmatrix}1 && -1 \\ 1 && -1 \end{bmatrix}

\quad +q_x\begin{bmatrix}1 && 1 \\ -1 && -1 \end{bmatrix} + (1-q_x)\begin{bmatrix}-1 && -1 \\ 1 && 1 \end{bmatrix} + q_y\begin{bmatrix}1 && -1 \\ -1 && 1 \end{bmatrix} + (1-q_y)\begin{bmatrix}-1 && 1 \\ 1 && -1 \end{bmatrix} \Bigg),

which simplifies to

W = \frac{1}{2}\begin{bmatrix}-q_z + q_x + q_y && q_z + q_x - q_y \\ 2 - q_z - q_x - q_y && q_z -q_x + q_y\end{bmatrix}

This is a somewhat different form to the standard formula for the Wigner function, but I’ve checked that they’re equivalent. I’ve put the details on a separate notes page here, in a sort of blog post version of the really boring technical appendix you get at the back of papers.

Magic states

As with the example in the last blog post, it’s possible to get qubit states where some of the values of the Wigner function are negative. The numbers don’t work out so nicely this time, but as one example we can take the qubit state |\psi\rangle = \frac{1}{\sqrt{1 + (1+\sqrt{2})^2}}\begin{pmatrix} 1 + \sqrt{2} \\ 1 \end{pmatrix}. (This is the +1 eigenvector of the density matrix \frac{1}{2}\left(\sigma_z + \sigma_x\right).)

The Wigner function for |\psi\rangle is

W_\psi = \begin{bmatrix} \frac{1}{4} && \frac{1-\sqrt{2}}{4} \\ \frac{1 + \sqrt{2}}{4} && \frac{1}{4} \end{bmatrix} \approx \begin{bmatrix} 0.25 && -0.104 \\ 0.604 && 0.25 \end{bmatrix},

with one negative entry. I learned while writing this that the states with negative values are called magic states by quantum computing people! These are the states that provide the ‘magic’ for quantum computing, in terms of giving a speed-up over classical computing. I’d like be able to say more about this link, but I’ll never finish the post if I have to get my head around all of that too, so instead I’ll link to this post by Earl Campbell that goes into more detail and points to some references. A quick note on the geometry, though:

The six eigenvectors of the Pauli matrices form the corners of an octahedron on the Bloch sphere, as in my dubious sketch above. We’ve already seen that the |0\rangle state has no magic – all the values are nonnegative. This also holds for the other five, which have the following Wigner functions:

W_{|1\rangle} = \begin{bmatrix} 0 && \frac{1}{2} \\ 0 && \frac{1}{2} \end{bmatrix}, W_{|+\rangle} = \begin{bmatrix} 0 && 0 \\ \frac{1}{2} && \frac{1}{2} \end{bmatrix}, W_{|-\rangle} = \begin{bmatrix} \frac{1}{2} && \frac{1}{2} \\ 0 && 0 \end{bmatrix},

W_{|y_+\rangle} = \begin{bmatrix} 0 && \frac{1}{2} \\ \frac{1}{2} && 0 \end{bmatrix}, W_{|y_-\rangle} = \begin{bmatrix} \frac{1}{2} && 0 \\ 0 && \frac{1}{2} \end{bmatrix}.

The other states on the surface of the octahedron or inside it also have no magic. The magic states are the ones outside the octahedron, and the further they are from the octahedron the more magic they are. So the most magic states are on the surface of the sphere opposite the middle of the triangular faces.

Half the information

Why can’t we have a probability of -\frac{1}{2} as before? Well, I briefly mentioned the reason in the previous blog post, but I can go into more detail now. There are constraints on the values of W that forbids values that are this negative. First off, the values of W have to sum to 1 – this makes sense, as they are supposed to be something like probabilities.

The second constraint is more interesting. Taking the |0\rangle state as an example again, this state has a definite answer to one of the questions and no information at all about the other two. There’s redundancy in the questions, so exact answers to two of them would be enough to pin down the state precisely. So we have half of the possible information.

This turns out to be the most information you can get from any qubit state, in some sense. I say ‘in some sense’ because it’s a pretty odd definition of information.

I learned about this from a fascinating paper by van Enk, A toy model for quantum mechanics, which was actually my starting point for thinking about this whole topic. He starts with the Spekkens toy model, a very influential idea that reproduces a number of the features of quantum mechanics using a very simple model. Again, this is too big a topic to get into all the details, but the most basic system in this model maps to the six ‘non-magic’ qubit states listed above, in the corners of the octahedron. These all share the half-the-knowledge property of the |0\rangle state, where we know the answer to one question exactly and have no idea about the others.

Now van Enk’s aim is to extend this idea of ‘half the knowledge’ to more general probability distributions over the four boxes. But this requires having some kind of measure M of what half the knowledge means. He stipulates that this measure should have M = \frac{1}{2} for the six half-the-knowledge states we already have, which seems reasonable. Also, it should have M = 1 for states where we know all the information (impossible in quantum physics), and M = \frac{1}{4} for the state of total ignorance about all questions. Or to put it a bit differently,

M = 2^{-H},

where H is an entropy measure – it decreases from 2 to 1 to 0 as we learn more information about the system. There’s a parametrised family H_\alpha of entropies known as the Rényi entropies, which reproduce this behaviour for the cases above, and differ for other distributions over the boxes. (I have some rough notes about these here, which may or may not be helpful.) By far the most well-known one is the Shannon entropy H_1, used widely in information theory, but it turns out that this one doesn’t reproduce the states found in quantum physics. Instead, van Enk picks H_2, the collision entropy. This has quite a simple form:

H_2 = -\log_2 \left(\sum_i W_i^2 \right),

where the W_i are the four components of W – we’re just summing the squares of them. So then our information measure is just M_2 = \sum_i W_i^2, and the second constraint on W is this can have value at most \frac{1}{2}:

\sum_i W_i^2 \leq \frac{1}{2}.

Why this particular entropy measure? That’s something I don’t really understand. Van Enk describes it as ‘the measure of information advocated by Brukner and Zeilinger’, and links to their paper, but so far I haven’t managed to follow the argument there, either. If anyone reads this and has any insight, I’d like to know!


In some ways, I know a lot more about negative probabilities than I did when I started getting interested in this. But conceptually I’m almost as confused as I was at the start! I think the main improvement is that I have some more focussed questions to be confused about:

  • Is the way of decomposing the Wigner function that I described in these posts any use for making sense of the negative probabilities? I found it quite helpful for Piponi’s example, in giving some more insight into how the negative value connects to that particular answer being ‘especially inconsistent’. Is it also useful for thinking about qubits?
  • Any link to the idea of negative probabilities representing events ‘unhappening’? As I said at the beginning of the first post, I love this idea but have never seen it fully developed anywhere in a satisfying way.
  • What’s going on with this collision entropy measure anyway?

I’m not a quantum foundations researcher – I’m just an interested outsider trying to understand how all these ideas fit together. So I’m likely to be missing a lot of context that people in the field would have. If you read this and have pointers to things that I’m missing, please let me know in the comments!

6 thoughts on “Negative probability: now with added equations!

  1. JM August 9, 2019 / 11:15 am

    Not related to this specific blog post, but given your interest in intuition/math you might enjoy “Proofs and refutations” by Lakatos.


    • Lucy Keer August 9, 2019 / 8:09 pm

      I bought a copy of that recently, haven’t got round to it yet! I’ve read an excerpt before – an early part of the dialogue about Euler’s theorem for polyhedra – and that was good, so I should really read the rest.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s