The Bat and Ball Problem Revisited

In this post, I’m going to assume you’ve come across the Cognitive Reflection Test before and know the answers. If you haven’t, it’s only three quick questions, go and do it now.

Bat_&_Ball_down_side_16-07-07
Bat & Ball train station, Sevenoaks [source]

One of the striking early examples in Kahneman’s Thinking, Fast and Slow is the following problem:

(1) A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball.

How much does the ball cost? _____ cents

This question first turns up informally in a paper by Kahneman and Frederick, who find that most people get it wrong:

Almost everyone we ask reports an initial tendency to answer “10 cents” because the sum $1.10 separates naturally into $1 and 10 cents, and 10 cents is about the right magnitude. Many people yield to this immediate impulse. The surprisingly high rate of errors in this easy problem illustrates how lightly System 2 monitors the output of System 1: people are not accustomed to thinking hard, and are often content to trust a plausible judgment that quickly comes to mind.

In Thinking, Fast and Slow, the bat and ball problem is used as an introduction to the major theme of the book: the distinction between fluent, spontaneous, fast ‘System 1’ mental processes, and effortful, reflective and slow ‘System 2’ ones. The explicit moral is that we are too willing to lean on System 1, and this gets us into trouble:

The bat-and-ball problem is our first encounter with an observation that will be a recurrent theme of this book: many people are overconfident, prone to place too much faith in their intuitions. They apparently find cognitive effort at least mildly unpleasant and avoid it as much as possible.

This story is very compelling in the case of the bat and ball problem. I got this problem wrong myself when I first saw it, and still find the intuitive-but-wrong answer very plausible looking. I have to consciously remind myself to apply some extra effort and get the correct answer.

However, this becomes more complicated when you start considering other tests of this fast-vs-slow distinction. Frederick later combined the bat and ball problem with two other questions to create the Cognitive Reflection Test:

(2) If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets? _____ minutes

(3) In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long would it take for the patch to cover half of the lake? _____ days

These are designed to also have an ‘intuitive-but-wrong’ answer (100 minutes, 24 days), and an ‘effortful-but-right’ answer (5 minutes, 47 days). But this time I seem to be immune to the wrong answers, in a way that just doesn’t happen with the bat and ball:

I always have the same reaction, and I don’t know if it’s common or I’m just the lone idiot with this problem. The ‘obvious wrong answers’ for 2. and 3. are completely unappealing to me (I had to look up 3. to check what the obvious answer was supposed to be). Obviously the machine-widget ratio hasn’t changed, and obviously exponential growth works like exponential growth.

When I see 1., however, I always think ‘oh it’s that bastard bat and ball question again, I know the correct answer but cannot see it’. And I have to stare at it for a minute or so to work it out, slowed down dramatically by the fact that Obvious Wrong Answer is jumping up and down trying to distract me.

If this test was really testing my propensity for effortful thought over spontaneous intuition, I ought to score zero. I hate effortful thought! As it is, I score two out of three, because I’ve trained my intuitions nicely for ratios and exponential growth. The ‘intuitive’, ‘System 1’ answer that pops into my head is, in fact, the correct answer, and the supposedly ‘intuitive-but-wrong’ answers feel bad on a visceral level. (Why the hell would the lily pads take the same amount of time to cover the second half of the lake as the first half, when the rate of growth is increasing?)

The bat and ball still gets me, though. My gut hasn’t internalised anything useful, and it’s super keen on shouting out the wrong answer in a distracting way. My dislike for effortful thought is definitely a problem here.

I wanted to see if others had raised the same objection, so I started doing some research into the CRT. In the process I discovered a lot of follow-up work that makes the story much more complex and interesting.

I’ve come nowhere near to doing a proper literature review. Frederick’s original paper has been cited nearly 3000 times, and dredging through that for the good bits is a lot more work than I’m willing to put in. This is just a summary of the interesting stuff I found on my limited, partial dig through the literature.

Thinking, inherently fast and inherently slow

Frederick’s original Cognitive Reflection Test paper describes the System 1/System 2 divide in the following way:

Recognizing that the face of the person entering the classroom belongs to your math teacher involves System 1 processes — it occurs instantly and effortlessly and is unaffected by intellect, alertness, motivation or the difficulty of the math problem being attempted at the time. Conversely, finding \sqrt{19163} to two decimal places without a calculator involves System 2 processes — mental operations requiring effort, motivation, concentration, and the execution of learned rules.

I find it interesting that he frames mental processes as being inherently effortless or effortful, independent of the person doing the thinking. This is not quite true even for the examples he gives — faceblind people and calculating prodigies exist.

This framing is important for interpreting the CRT. If the problem inherently has a wrong ‘System 1 solution’ and a correct ‘System 2 solution’, the CRT can work as intended, as an efficient tool to split people by their propensity to use one strategy or the other. If there are ‘System 1’ ways to get the correct answer, the whole thing gets much more muddled, and it’s hard to disentangle natural propensity to reflection from prior exposure to the right mathematical concepts.

My tentative guess is that the bat and ball problem is close to being this kind of efficient tool. Although in some ways it’s the simplest of the three problems, solving it in a ‘fast’, ‘intuitive’ way relies on seeing the problem in a way that most people’s education won’t have provided. (I think this is true, anyway – I’ll go into more detail later.) I suspect that this is less true the other two problems – ratios and exponential growth are topics that a mathematical or scientific education is more likely to build intuition for.

(Aside: I’d like to know how these other two problems were chosen. The paper just states the following:

Motivated by this result [the answers to the bat and ball question], two other problems found to yield impulsive erroneous responses were included with the “bat and ball” problem to form a simple, three-item “Cognitive Reflection Test” (CRT), shown in Figure 1.

I have a vague suspicion that Frederick trawled through something like ‘The Bumper Book of Annoying Riddles’ to find some brainteasers that don’t require too much in the way of mathematical prerequisites. The lilypads one has a family resemblance to the classic grains-of-wheat-on-a-chessboard puzzle, for instance.)

However, I haven’t found any great evidence either way for this guess. The original paper doesn’t break down participants’ scores by question – it just gives mean scores on the test as a whole. I did however find this meta-analysis of 118 CRT studies, which shows that the bat and ball question is the most difficult on average – only 32% of all participants get it right, compared with 40% for the widgets and 48% for the lilypads. It also has the biggest jump in success rate when comparing university students with non-students. That looks like better mathematical education does help on the bat and ball, but it doesn’t clear up how it helps. It could improve participants’ ability to intuitively see the answer. Or it could improve ability to come up with an ‘unintuitive’ solution, like solving the corresponding simultaneous equations by a rote method.

What I’d really like is some insight into what individual people actually do when they try to solve the problems, rather than just this aggregate statistical information. I haven’t found exactly what I wanted, but I did turn up a few interesting studies on the way.

No, seriously, the answer isn’t ten cents

My favourite thing I found was this (apparently unpublished) ‘extremely rough draft’ by Meyer, Spunt and Frederick from 2013, revisiting the bat and ball problem. The intuitive-but-wrong answer turns out to be extremely sticky, and the paper is basically a series of increasingly desperate attempts to get people to actually think about the question.

One conjecture for what people are doing when they get this question wrong is the attribute substitution hypothesis. This was suggested early on by Kahneman and Frederick, and is a fancy way of saying that they are instead solving the following simpler problem:

(1) A bat and a ball cost $1.10 in total. The bat costs $1.00.

How much does the ball cost? _____ cents

Notice that this is missing the ‘more than the ball’ clause at the end, turning the question into a much simpler arithmetic problem. This simple problem does have ‘ten cents’ as the answer, so it’s very plausible that people are getting confused by it.

Meyer, Spunt and Frederick tested this hypothesis by getting respondents to recall the problem from memory. This showed a clear difference: 94% of ‘five cent’ respondents could recall the correct question, but only 61% of ‘ten cent’ respondents. It’s possible that there is a different common cause of both the ‘ten cent’ response and misremembering the question, but it at least gives some support for the substitution hypothesis.

However, getting people to actually answer the question correctly was a much more difficult problem. First they tried bolding the words more than the ball to make this clause more salient. This made surprisingly little impact: 29% of respondents solved it, compared with 24% for the original problem. Printing both versions was slightly more successful, bumping up the correct response to 35%, but it was still a small effect.

After this, they ditched subtlety and resorted to pasting these huge warnings above the question:

Computation warning: 'Be careful! Many people miss the following problem because they do not take the time to check their answer. ' Comprehension warning: 'Be careful! Many people miss the following problem because they read it too quickly and actually answer a different question than the one that was asked.''

These were still only mildly effective, with a correct solution jumping to 50% from 45%. People just really like the answer ‘ten cents’, it seems.

At this point they completely gave up and just flat out added “HINT: 10 cents is not the answer.” This worked reasonably well, though there was still a hard core of 13% who persisted in writing down ‘ten cents’.

That’s where they left it. At this point there’s not really any room to escalate beyond confiscating the respondents’ pens and prefilling in the answer ‘five cents’, and I worry that somebody would still try and scratch in ‘ten cents’ in their own blood. The wrong answer is just incredibly compelling.

So, what are people doing when they solve this problem?

Unfortunately, it’s hard to tell from the published literature (or at least what I found of it). What I’d really like is lots of transcripts of individuals talking through their problem solving process. The closest I found was this paper by Szaszi et al, who did carry out these sort of interview, but it doesn’t include any examples of individual responses. Instead, it gives a aggregated overview of types of responses, which doesn’t go into the kind of detail I’d like.

Still, the examples given for their response categories give a few clues. The categories are:

  • Correct answer, correct start. Example given: ‘I see. This is an equation. Thus if the ball equals to x, the bat equals to x plus 1… ‘
  • Correct answer, incorrect start. Example: ‘I would say 10 cents… But this cannot be true as it does not sum up to €1.10…’
  • Incorrect answer, reflective, i.e. some effort was made to reconsider the answer given, even if it was ultimately incorrect. Example: ‘… but I’m not sure… If together they cost €1.10, and the bat costs €1 more than the ball… the solution should be 10 cents. I’m done.’
  • No reflection. Example: ‘Ok. I’m done.’

These demonstrate one way to reason your way to the correct answer (solve the simultaneous equations) and one way to be wrong (just blurt out the answer). They also demonstrate one way to recover from an incorrect solution (think about the answer you blurted out and see if it actually works). Still, it’s all rather abstract and high level.

How To Solve It

However, I did manage to stumble onto another source of insight. While researching the problem I came across this article from the online magazine of the Association for Psychological Science, which discusses a variant ‘Ford and Ferrari problem’. This is quite interesting in itself, but I was most excited by the comments section. Finally some examples of how the problem is solved in the wild!

The simplest ‘analytical’, ‘System 2’ solution is to rewrite the problem as two simultaneous linear equations and plug-and-chug your way to the correct answer. For example, writing B for the bat and b for the ball, we get the two equations

B + b = 110,
B - b = 100,

which we could then solve in various standard ways, e.g.

2B = 210,
B = 105,

which then gives

b = 110 - B = 5.

There are a couple of variants of this explained in the comments. It’s a very reliable way to tackle the problem: if you already know how to do this sort of rote method, there are no surprises. This sort of method would work for any similar problem involving linear equations.

However, it’s pretty obvious that a lot of people won’t have access to this method. Plenty of people noped out of mathematics long before they got to simultaneous equations, so they won’t be able to solve it this way. What might be less obvious, at least if you mostly live in a high-maths-ability bubble, is that these people may also be missing the sort of tacit mathematical background that would even allow them to frame the problem in a useful form in the first place.

That sounds a bit abstract, so let’s look at some responses (I’ll paste all these straight in, so any typos are in the original). First, we have these two confused commenters:

The thing is, why does the ball have to be $.05? It could have been .04 0r.03 and the bat would still cost more than $1.

and

This is exactly what bothers me and resulted in me wanting to look up the question online. On the quiz the other 2 questions were definitive. This one technically could have more than one answer so this is where phycologists actually mess up when trying to give us a trick question. The ball at .4 and the bat at 1.06 doesn’t break the rule either.

These commenters don’t automatically see two equations in two variables that together are enough to constrain the problem. Instead they seem to focus mainly on the first condition (adding up to $1.10) and just use the second one as a vague check at best (‘the bat would still cost more than $1’). This means that they are unable to immediately tell that the problem has a unique solution.

In response, another commenter, Tony, suggests a correct solution which is an interesting mix of writing the problem out formally and then figuring out the answer by trial and error:

I hear your pain. I feel as though psychologists and psychiatrists get together every now and then to prove how stoopid I am. However, after more than a little head scratching I’ve gained an understanding of this puzzle. It can be expressed as two facts and a question A=100+B and A+B=110, so B=? If B=2 then the solution would be 100+2+2 and A+B would be 104. If B=6 then the solution would be 100+6+6 and A+B would be 112. But as be KNOW A+B=110 the only number for B on it’s own is 5.

This suggests enough half-remembered mathematical knowledge to find a sensible abstract framing, but not enough to solve it the standard way.

Finally, commenter Marlo Eugene provides an ingenious way of solving the problem without writing all the algebraic steps out:

Linguistics makes all the difference. The conceptual emphasis seems to lie within the word MORE.

X + Y = $1.10. If X = $1 MORE then that leaves $0.10 TO WORK WITH rather than automatically assign to Y

So you divide the remainder equally (assuming negative values are disqualified) and get 0.05.

So even this small sample of comments suggests a wide diversity of problem-solving methods leading to the two common answers. Further, these solutions don’t all split neatly into ‘System 1’ ‘intuitive’ and ‘System 2’ ‘analytic’. Marlo Eugene’s solution, for instance, is a mixed solution of writing the equations down in a formal way, but then finding a clever way of just seeing the answer rather than solving them by rote.

I’d still appreciate more detailed transcripts, including the time taken to solve the problem. My suspicion is still that very few people solve this problem with a fast intuitive response, in the way that I rapidly see the correct answer to the lilypad question. Even the more ‘intuitive’ responses, like Marlo Eugene’s, seem to rely on some initial careful reflection and a good initial framing of the problem.

If I’m correct about this lack of fast responses, my tentative guess for the reason is that it has something to do with the way most of us learn simultaneous equations in school. We generally learn arithmetic as young children in a fairly concrete way, with the formal numerical problems supplemented with lots of specific examples of adding up apples and bananas and so forth.

But then, for some reason, this goes completely out of the window once the unknown quantity isn’t sitting on its own on one side of the equals sign. This is instead hived off into its own separate subject, called ‘algebra’, and the rules are taught much later in a much more formalised style, without much attempt to build up intuition first.

(One exception is the sort of puzzle sheets that are often given to young kids, where the unknowns are just empty boxes to be filled in. Sometimes you get 2+3=□, sometimes it’s 2+□=5, but either way you go about the same process of using your wits to figure out the answer. Then, for some reason I’ll never understand, the worksheets get put away and the poor kids don’t see the subject again until years later, when the box is now called x for some reason and you have to find the answer by defined rules. Anyway, this is a separate rant.)

This lack of a rich background in puzzling out the answer to specific concrete problems means most of us lean hard on formal rules in this domain, even if we’re relatively mathematically sophisticated. Only a few build up the necessary repertoire of tricks to solve the problem quickly by insight. I’m reminded of a story in Feynman’s The Pleasure of Finding Things Out:

Around that time my cousin, who was three years older, was in high school. He was having considerable difficulty with his algebra, so a tutor would come. I was allowed to sit in a corner while the tutor would try to teach my cousin algebra. I’d hear him talking about x.

I said to my cousin, “What are you trying to do?”

“I’m trying to find out what x is, like in 2x + 7 = 15.”

I say, “You mean 4.”

“Yeah, but you did it by arithmetic. You have to do it by algebra.”

I learned algebra, fortunately, not by going to school, but by finding my aunt’s old schoolbook in the attic, and understanding that the whole idea was to find out what x is – it doesn’t make any difference how you do it.

I think this reliance on formal methods might be somewhat less true for exponential growth and ratios, the subjects underpinning the lilypad and widget questions. Certainly I seem to have better intuition there, without having to resort to rote calculation. But I’m not sure how general this is.

How To Visualise It

If you wanted to solve the bat and ball problem without having to ‘do it by algebra’, how would you go about it?

My original post on the problem was a pretty quick, throwaway job, but over time it picked up some truly excellent comments by anders and Kyzentun, which really start to dig into the structure of the problem and suggest ways to ‘just see’ the answer. The thread with anders in particular goes into lots of other examples of how we think through solving various problems, and is well worth reading in full. I’ll only summarise the bat-and-ball-related parts of the comments here.

We all used some variant of the method suggested by Marlo Eugene in the comments above. Writing out the basic problem again, we have:

B + b = 110,
B - b = 100.

Now, instead of immediately jumping to the standard method of eliminating one of the variables, we can just look at what these two equations are saying and solve it directly ‘by thinking’. We have a bat, B. If you add the price of the ball, b, you get 110 cents. If you instead remove the same quantity b you get 100 cents. So the bat’s price must be exactly halfway between these two numbers, at 105 cents. That leaves five for the ball.

Now that I’m thinking of the problem in this way, I directly see the equations as being ‘about a bat that’s halfway between 100 and 110 cents’, and the answer is incredibly obvious.

Kyzentun suggests a variant on the problem that is much less counterintuitive than the original:

A centered piece of text and its margins are 110 columns wide. The text is 100 columns wide. How wide is one margin?

Same numbers, same mathematical formula to reach the solution. But less misleading because you know there are two margins, and thus know to divide by two after subtracting.

In the original problem, the 110 units and 100 units both refer to something abstract, the sum and difference of the bat and ball. In Kyzentun’s version these become much more concrete objects, the width of the text and the total width of the margins. The work of seeing the equations as relating to something concrete has mostly been done for you.

Similarly, anders works the problem by ‘getting rid of the 100 cents’, and splitting the remainder in half to get at the price of the ball:

I just had an easy time with #1 which I haven’t before. What I did was take away the difference so that all the items are the same (subtract 100), evenly divide the remainder among the items (divide 10 by 2) and then add the residuals back on to get 105 and 5.

The heuristic I seem to be using is to treat objects as made up of a value plus a residual. So when they gave me the residual my next thought was “now all the objects are the same, so whatever I do to one I do to all of them”.

I think that after reasoning my way through all these perspectives, I’m finally at the point where I have a quick, ‘intuitive’ understanding of the problem. But it’s surprising how much work it was for such a simple bit of algebra.

Final thoughts

Rather than making any big conclusions, the main thing I wanted to demonstrate in this post is how complicated the story gets when you look at one problem in detail. I’ve written about close reading recently, and this has been something like a close reading of the bat and ball problem.

Frederick’s original paper on the Cognitive Reflection Test is in that generic social science style where you define a new metric and then see how it correlates with a bunch of other macroscale factors (either big social categories like gender or education level, or the results of other statistical tests that try to measure factors like time preference or risk preference). There’s a strange indifference to the details of the test itself – at no point does he discuss why he picked those specific three questions, and there’s no attempt to model what was making the intuitive-but-wrong answer appealing.

The later paper by Meyer, Spunt and Frederick is much more interesting to me, because it really starts to pick apart the specifics of the bat and ball problem. Is an easier question getting substituted? Can participants reproduce the correct question from memory?

I learned the most from the individual responses, though, and seeing the variety of ways people go about solving the problem. It’s very strange to me that I had an easier time digging this out from an internet comment thread than the published literature! I would love to see a lot more research into what people actually do when they do mathematics, and the bat and ball problem would be a great place to start.

Questions

I’m interested in any comments on the post, but here are a few specific things I’d like to get your answers to:

  • My rapid, intuitive answer for the bat and ball question is wrong (at least until I retrained it by thinking about the problem way too much). However, for the other two I ‘just see’ the correct answer. Is this common for other people, or do you have a different split?
  • If you’re able to rapidly ‘just see’ the answer to the bat and ball question, how do you do it?
  • How do people go about designing tests like these? This isn’t at all my field and I’d be interested in any good sources. I’d kind of assumed that there’d be some kind of serious-business Test Creation Methodology, but for the CRT at least it looks like people just noticed they got surprising answers for the bat and ball question and looked around for similar questions. Is that unusual compared to other psychological tests?

[I’ve cross-posted this at LessWrong, because I thought the topic fits quite nicely – comments at either place are welcome.]

29 thoughts on “The Bat and Ball Problem Revisited

  1. Gavin Rebeiro December 12, 2018 / 9:27 pm

    With respect to “How do people go about designing tests like these?”, I think the test is cleverly designed to be misleading on purpose.

    Notice how it says “The bat costs $1.00 more than the ball” instead of ‘The ball costs more than the bat **and** the difference is $1.00’. We can see that there’s some ‘data compression’ going on here. This can quickly lead the intuition astray: thinking something like “The bat costs $1.00”, when we are attempting to answer the question quickly, leading to a bungled answer.

    This brings me to another point. Interpreting ‘a is more than b’ quickly – and correctly – isn’t something someone without serious mathematical background can come up – I’d say in a similar vein to what you said about setting up a set of simultaneous equations. We can settle with something like ‘(a-b)>0’ but this ‘compressed data’ can be easily overseen even if one has a background in mathematics, due to the way the data is presented. The semantics, in this case, has been cleverly guised to be misleading (this is my conjecture).

    Like

    • Gavin Rebeiro December 12, 2018 / 9:28 pm

      By ‘the ball costs more than the bat …’ I meant ‘the bat costs more than the ball …’

      Serves me right for scoffing down dinner while writing a comment. 😛

      Like

  2. David R. MacIver December 13, 2018 / 1:12 am

    Hmm. I think I react the same way to all three of the problems (It’s hard to say how much of this is retcon as I would have first done them ages ago, but I’m pretty sure I got them right originally).

    Roughly what I think I have is a system 1 that is well trained to stick its hand up and go “Excuse me, I think this is somebody else’s problem?” and from my maths training I’m pretty good at rapid system 2 thinking on easyish problems, so in all of these cases my intuitive response *is* to step back and think slightly harder about the problem rather than to try to get an answer on pure intuition.

    FWIW my bat-and-ball solution (which again I’m not sure how much of this is retcon because it’s been a while since I first saw it) is basically “OK so b + 100 + b = 110, so 2b = 10 and the ball costs $0.05”. I’m not sure if I explicitly did the algebra in my head originally but I think I probably did.

    Liked by 1 person

    • drossbucket December 13, 2018 / 6:52 pm

      > Roughly what I think I have is a system 1 that is well trained to stick its hand up and go “Excuse me, I think this is somebody else’s problem?”

      I wish I could learn this! At this point I’ve pretty much given up…

      Like

  3. David Chapman December 13, 2018 / 2:49 am

    Questions 2 and 3 are different for me from question 1, but perhaps not for the same reason as you. I don’t find the answers “intuitive”; instead, I recognize them as problems of familiar types for which there are simple solution methods. I could solve them from first principles if I had to, of course, but I don’t. So it’s neither “intuitive” nor “rational,” it’s “mindless method dispatch.”

    (BTW, I have always thought the System 1/2 distinction was simplistic and actively misleading. My vague impression is that the field has recently come to the same conclusion.)

    I’ve done bat-and-ball so many times that my introspective report on what I do is probably worthless. In fact, it’s not clear that people’s introspective reports are meaningful even the first time they do it. There’s a large literature in cognitive psychology, particular the CMU school of the 1970s-80s (Alan Newell & co) based on experiments in which they got people to narrate what they were thinking as they were solving problems. The experimenters were able to tell a lot of interesting stories on this basis, but I’m not sure they had any relationship with reality.

    Anyway, my introspective report fwiw:

    I start by recognizing that this is a pair of linear equations, and observing with a sinking feeling that the last thing on earth I want to do is solve linear equations, so I hope there’s some easier way. I figure that the problem is so simple that there probably is (whew!). So then I imagine two wooden rods with square cross-section. (I’d guess this is from exposure to [Cuisenaire rods](https://en.wikipedia.org/wiki/Cuisenaire_rods) as a child.) The two rods are placed in contact, the bat one above the ball one, right-aligned thus:

    [—————-]
    [—]

    Now the part of the top rod that is not in contact with the bottom must be 100 long:

    [—————-]
    [—]

    If we imagine the top one looping back like a snake, or boustrephodon script, the whole snake is 110 long:

    [——110—–\
    [—/

    So how long is the curly bit? Here, unlike others, I don’t “see” that the answer is to divide by two. In fact it took me a long time to figure out why that is correct (because I am bad at math). Instead, I think “Well, what’s a plausible answer? it’s probably going to be regular in some way. Five is a nice regular number in general, and especially in relationship to ten. Would five work as the answer? Check: 5+105=110? Why yes! Thank god we’re done with this stupid thing.”

    “How do people go about designing tests like these?” I have minimal expertise in this field. However, in the past couple years I’ve been following the psychology replication/reform movement as an enthusiastic fan via twitter and podcasts. (I think this is the most exciting thing happening in science other than maybe CRISPR.)

    The people I respect in the field constantly gripe about people making up half-baked tests like this without validating them in any way. Apparently there’s a huge amount known about how to do this sort of thing right, and 99% of academic psychologists ignore that, and of the 1% who bother to read the literature, 90% do it Wrong.

    I share your intuition that there’s something genuinely interesting about the bat-and-ball question, and that the other two are different. I think the other two just test “have you taken enough math courses,” whereas the bat-and-ball one requires actually thinking a bit.

    Like

    • David Chapman December 13, 2018 / 2:50 am

      Drat. WordPress screwed up my diagrams. I hope you can guess what they were supposed to show!

      Like

    • drossbucket December 13, 2018 / 6:50 pm

      > I don’t find the answers “intuitive”; instead, I recognize them as problems of familiar types for which there are simple solution methods.

      OK, I *think* that is different from me? For the widgets problem my brain just produces the answer ‘5’. I don’t know how it gets it, I just write it down. I haven’t tried producing a verbal justification more elaborate than ‘well the ratio is the same’, but I think I might struggle. Some part of me might ‘recognise it as a problem of a familiar type’ but I don’t naturally have much access to its reasons.

      Similarly, for the bat and ball my brain produces the answer ’10 cents’ and I just write it down. Not so helpful.

      For the lilypad I have a sort of vague mental image of a square being divided in half many times, as well as the answer. But as with you I’ve thought about this so much by now that I can’t really trust these stories I’m telling anyway.

      I remember from somewhere that you really don’t like the word ‘intuition’. Presumably because it squashes together way too much?

      > I don’t “see” that the answer is to divide by two. In fact it took me a long time to figure out why that is correct (because I am bad at math).

      Same!

      > I have always thought the System 1/2 distinction was simplistic and actively misleading.

      Yeah I’m very suspicious of it too. I scare quoted ‘System 1’ and ‘System 2’ a lot in the post as I was mostly finding the labels unhelpful.

      > In fact, it’s not clear that people’s introspective reports are meaningful even the first time they do it.

      So you’re saying that I might not get much useful out of these reports even if I had more of them? I feel like they tell you *something* – some people actually set it up like an algebra problem and solve the simultaneous equations, for example, which is very different to someone else just trying numbers until they get one that works.

      Does ethnomethodology instead work by recording what people do and not trying to elicit introspective accounts? (Next year I plan to finally read some ethnometho. Currently I’ve only read the Greiffenhagen study you linked before.)

      > The people I respect in the field constantly gripe about people making up half-baked tests like this without validating them in any way.

      Yeah, that does not surprise me at all.

      > Apparently there’s a huge amount known about how to do this sort of thing right

      Do you know how to find this? Just the right terms to google would help. I kept turning up really obvious ‘questionnaire design’ stuff aimed at school/undergrad projects, which isn’t what I want.

      Like

      • David Chapman December 13, 2018 / 8:14 pm

        I think our different training may lead to somewhat different quick solutions. We’ve both used math, but differently. As an engineer, I’ve solved endless problems like the widget one, thinking about factory utilization specifically. Exponential growth is second nature from both biology and finance.

        Rin’dzin’s explanation (in private email to the two of us) makes sense to me—I will follow that up later today. Her insight is that what’s hard about the ball one is that it requires mixing abstract and concrete reasoning in a way the others don’t.

        Yeah, “intuition” just means “mental process we don’t have an explanation for,” which is almost all of them.

        I think these introspective reports are extremely interesting, and probably can tell you *something*. But how they relate to what actually happens is unclear; they *might* be completely misleading.

        Yes, the fundamental principle of ethnomethodological methodology is “look at what people say and do, and don’t ever speculate about what’s happening in their head, because we can’t know.” At first that seems like a straitjacket, and highly unintuitive; but it forces you to really look, and then you see what is going on. Whereas, if you make up stories about what people are thinking, you get lost in fantasies. (Which is what most of “cognitive science” amounts to.)

        I don’t know where to find stuff on how to do things like the CRT right, but the Black Goat people would know, or would know where to go. Sanjay Srivastava (one of them) tweeted something about this in the past week or so.

        Like

  4. robotpliers December 13, 2018 / 7:46 pm

    Interesting. My solution to the bat and ball problem didn’t involve simultaneous equations or explicit value/residual thinking. It’s actually hard to explain, but I probably leapt to 10 cents first, saw the error, then realized since that was $1.20 total, it had to be half, or 5 cents, to make it $1.10. That’s similar to the value/residual thinking, but the correction was much more intuitive to me than you explained it above. So I took an intuitive guess, checked that it was wrong, then intuited I was off by a factor of 2x based on the magnitude of my error. At least, that’s what I think I did!

    I agree with David Chapman above, that problems 2 and 3 were just pattern matching to ratio or exponential problems that I’ve seen before.

    Liked by 1 person

    • robotpliers December 13, 2018 / 7:48 pm

      Oh, I should add, that although I pattern matched problems 2 and 3, I did not find problem 2 intuitive and had to think through just what I was taking the ratio of. Problem 3 seemed more intuitive, but again, its similar to problems I’ve seen before.

      Like

  5. Ken December 13, 2018 / 8:13 pm

    I found question 2 to require the most caution.

    My immediate reaction for question 1 was to try 10 cents, but when I saw the difference was only 90 cents it seemed an obvious step to “slide” the values 5 cents in each direction to produce the correct interval. This whole process was quite fast and I didn’t find it mind-bendingly unintuitive, although 10 cents was my initial reaction it didn’t “stick” for me.

    Question 2 I had to re-read a couple of times to make sure I understood, and although I did produce the correct answer for some reason I felt more doubt about my answer for longer, and I had to spend maybe half a minute convincing myself that I hadn’t messed it up.

    I don’t know how to produce unintuitive questions on demand, but I can offer my favourite:

    Alice the chemist has 1 kilogram of a chemical solution. This solution consists of 99% water (by weight). She turns on a fan and leaves it for a while to dry, and when she returns the solution is now 98% water. How much does it now weigh?

    The answer is 500 grams, i.e. half of what it weighed before. This is not even a little bit intuitive to me, and even writing it out now it feels strange although it’s easy enough to confirm that it’s true.

    Like

    • drossbucket December 14, 2018 / 7:37 pm

      Oh yeah, I came across that one a while back too! I know the correct answer but my gut is still REALLY unhappy with it.

      Like

  6. Neither_Bird December 13, 2018 / 8:14 pm

    Suppose your friend comes up to you and makes one of the following statements:

    1. “I just bought a bat and a ball for $1.10. The bat cost more than $1!”

    2. “I just bought a bat and a ball for $1.10. I was going to just get the ball, but I figured it was worth spending $1 more to get the bat too.”

    3. “I just bought a bat and a ball for $1.10. The bat cost $1 more than the ball did.”

    Statements 1 and 2 are perfectly natural, everyday statements. You could understand them easily even if you weren’t paying full attention. By contrast, statement 3 is kinda weird and unnatural—why would you compare the price of a ball to the price of a bat? They’re completely different things.

    The prior probability that someone is going to make statement 3 is very low, so unless you’re being pedantic about the exact phrasing of the statement, you’ll naturally assume they meant to make statement 1 or 2 instead.

    Liked by 2 people

    • thegreenjudy August 8, 2019 / 8:37 pm

      Exactly my point! It’s a question of semantics. It’s not me being bad at maths, it’s that the person who wrote the question is bad at English 😉 Besides, English isn’t my mother tongue so maybe culture and language plays a role here, too. I didn’t overlook the word “more” as it’s commonly suggested (jumping to conclusions by the way is not a mark of critical thinking either) neither was I being impatient. I just interpreted the question differently.

      It took me half an hour to figure out why 5 cents is the correct answer. I am a practical person and although I knew it was a trick question I still thought it was 10 cents because I didn’t realize that you were COMPARING A to B in relation to each other. Rather than adding up it’s prices like anyone else would when they go to a shop. I wasn’t convinced I was right (as I knew it was a trick question) I just couldn’t come up with a different logical answer.

      One my journey to find a good explanation it took me a while to find an explanation that didn’t just present me with a wall of algebra. My question was not HOW but WHY??? Which reminded me that many highly analytical people are pretty rubbish at explaining stuff. Instead it’s being presumed that I am not smart or patient enough. But now I get what the question wanted from me in the first place the answer is obvious to me as well. And I don’t even need algebra for that. Ah well who cares anyway!

      Like

  7. anders December 13, 2018 / 9:43 pm

    I’ve been trying to design problems that are misleading in a similar way to the ball and bat problem, and it is taking a level of thought just barely beyond what I am capable of.
    The difficulty is adjusting problems so that I misvisualize part of them without realizing that I got it wrong, while still being self aware enough to recognize that I got it wrong.

    Some problems that I get wrong.
    If you drive to your grandma’s house at 30mph how fast do you need to drive on the return trip to average a speed of 60mph?

    You have 25 gram of gold. How many grams of an alloy that is 80% nickle and 20% gold by weight do you need to add, to get an alloy that is 40% gold by weight?

    If you’d like to manipulate the level of visualizability of the ball and bat problem, this series seems to ascend in difficulty.

    You have a sheet of paper that is 8.5 inches wide.
    If you center a picture that is 8 inches wide, how wide are the margins?

    Cut a 10 foot long piece of wood into two pieces so that one piece is 2 feet longer than the other.
    How long are the pieces?

    A length of board is 10 inches shorter than another length.
    Together they are 20 inches.
    How long are the boards?

    Leah is 2 years older than Tracy. Together they are 10 years old.
    How old are they?

    A chair and a table together cost $150.
    The chair costs $90 less than the table.
    How much do they each cost?

    A ball and a bat cost $1.10 .
    The bat costs $1.00 more than the ball.
    How much does the ball cost?

    Liked by 1 person

  8. drossbucket December 14, 2018 / 6:42 am

    Comment from @_awbery_ via email, with some good analysis:

    Here’s a guess about what happens with the bat and ball, and why it’s harder than the others:

    The problem is a ‘two things’ problem. The first sentence presents two things, a bat and a ball. The language correctly reflects there are two things we should consider. The first sentence is ‘this plus that equals $1.10’. It correctly sounds like a + b; two things. The first sentence presents the state of affairs, not the problem itself. The second sentence presents the problem. The language of the second sentence reinforces the two things idea because there’s still the bat and the ball and they’re compared against each other: ‘there’s this one and it’s more than that one’. The trickiness is that it is a two things problem, but the two things we need to consider are not the most object level single units, but the bat, and the bat-plus-ball. Our brains are pulled toward the object level division of things by the language and the visual nature of the problem. We have to think really hard to understand that the abstract construct of the problem is the same shape as the state of affairs – there are two things to consider in relation to each other – but while the bat and the ball are still involved, they’re reconfigured by a non-intuitive/non-object-like division.

    There’s no object level mirror trick in the other two problems, they’re straight forward maths mapping an object level visual representation. The widget problem presents a process which doesn’t change how the machines and widgets relate to each other in its solution. Our brains don’t have to mash up the pond and the lilies to separate the visual presentation to an abstract level. We can see that the pond is the same pond, half covered with lilies then fully covered with lilies at the next step. We don’t suddenly have some new abstract unreal configuration of lilies and pond to contend with.

    I think this is why Kyzentun and Ander’s methods help get at the bat and ball problem intuitively – because they bypass the conflict between object level and abstract and translate it into the formal algebra realm. The problem as presented is non-intuitive because the objects visualization it suggests doesn’t reflect the shape of the formal solution.

    So I think this is a particular type of problem, one in which visual shape and language of the presentation collude to obfuscate the visualization of the solution at an abstract/formal level. It’s a different type of problem to the other two in this sense, because the objects they present can be used as given in the solution.

    Maybe we need to encounter more of the same type of problem to develop an intuitive approach. I’m not sure that familiarizing and practising the intuitive answer to just the one problem is enough, though it’s probably a step towards getting better at noticing the trick.

    Like

    • David Chapman December 14, 2018 / 7:04 pm

      I think this might explain what’s hard about the bat-and-ball problem. Let me try to restate it in different language (which might or might not accurately reflect Rin’dzin’s understanding).

      Some problems you can construct a diagram for (mentally or on paper) and then pretty much read the answer off. Others, you have to apply some sort of rational procedural method to. You may or may not understand why the procedure works from first principles, but when you actually use it, you aren’t actively understanding it, just following the rules reliably. Some problems you can solve either way—if you succeed in “seeing” what the right diagram is, you can do that, but if not you back up to a formal procedure.

      What’s tricky about bat-and-ball is that it *seems* like you should be able to construct a simple mental diagram and read the answer off. But you can’t, if the diagram contains only the objects mentioned in the problem statement (i.e. the bat and the ball). Once you realize that, you can fall back on the formal procedure (simultaneous linear equations, ugh).

      Alternatively, you can construct a mental diagram that contains one or two additional entities not mentioned in the problem statement. You need to represent the price difference as a visualizable object, and then find that it naturally divides in two. That is what the diagrams I sketched, that were eaten by WordPress, did. It’s also what thinking about “page margins” does.

      A lot of high school geometry proofs involve introducing additional entities into the diagram. I bet there are students who get lost at that point—”how am I supposed to know what other lines to add?”

      It’s rare for algebra problems to need to add entities, I think? So it may not occur to even those of us who went through undergraduate math.

      Interestingly, even though my own solution method adds the virtual size-100 entity, and gets close to making the size-5 entities explicit, just before getting clear about that my brain apparently says “look I’m tired of this, it’s too complicated, let’s just try plausible a solution and see if it works” and luckily it does!

      Like

      • Rin'dzin December 15, 2018 / 6:12 am

        I think this might explain what’s hard about the bat-and-ball problem. Let me try to restate it in different language (which might or might not accurately reflect Rin’dzin’s understanding).

        Yes, nicely translated, thank you.

        Like

  9. christianhendriks December 17, 2018 / 3:22 am

    I began reading this post and started laughing at the suggestion that some respondents would write in “ten cents” with their own blood; my mother always wants to be included in funny things and asked me what I was laughing about, so I read her and her boyfriend most of the post up to that point. However, I first made them answer the three questions, so I have some relatively fresh information about how she and her boyfriend worked through these problems.

    My mother’s boyfriend answered “Ten cents” to question one fairly quickly and did not seem to do any real thinking about it. My mother thought for a little bit and then gave the correct answer. (Or, actually, she thought a bit, asked me to read the question again, thought very briefly, then answered correctly.) It took us about twenty seconds (?) to explain the correct answer to her boyfriend.

    On the second question, my mother started reasoning out loud: “if it takes five machines to make five widgets, then for a hundred machines and a hundred widgets… can you read the question again?” And then, once I did, she immediately answered correctly. This looks, to me, like she was following a half-rote sort of process, trying to set up an equation, before realizing (either before asking me to re-read or while I was re-reading) that you don’t need to do any math at all: you just need to see that the ratio stays the same. Her boyfriend nodded at her answer, but later said he hadn’t arrived at any answer himself yet–he just recognized the right one when he heard it.

    On the third question, my mother simply asked me to repeat how many days it took to cover the whole pond and then answered correctly. Her boyfriend paused (presumably checking?) and then nodded.

    Then I read aloud up to the end of “No, Seriously, The Answer Isn’t Ten Cents” (not anything following), and we discussed it a little more. Her boyfriend said he might have done better if he’d seen it written, and we talked a bit about aural processing vs. visual processing, which is probably just noise for your purposes. And then we talked about how both my mother and I, seeing the bat and ball problem, try to model it as an equation… however, the equation I make is a bit different than hers is. I do 1.1 = x + (x + 1), simplify to 0.1 = 2x, and solve for 0.05 = x. She said something very much like, “A dollar ten equals x plus x plus one, and half of ten is five, so five cents.” I think based on speed and intonation that this is Marlo Eugene’s solution: carefully constructing the equation and then just seeing the solution. I tend to solve the equation, even if I think I see the solution.

    Background: my mother has not taken math past high school (Canadian public school in the late 70s) but loves algebra problems. My mother’s boyfriend hates math but has managed factories, so the widgets question made sense to him.

    (I have confabulated a memory of my first time encountering these questions, because I remember answering it correctly [as in, answering wrong, checking, seeing my answer is wrong, and then just doing the math] and answering it falsely [but understanding the answer immediately when I see it]. These memories are probably of no use. What I can tell you is that I know the answer and my mind provides it immediately when I see the problem–indeed, I could possibly be tricked into answering “five cents” even if you slightly rephrased the question so that “ten cents” is correct, because I already ‘know this one.’ It doesn’t feel wrong to me; again, I can’t say whether it ever did, as I don’t trust my memory on this. But then, I try model things as equations a lot, so I have practice.)

    Like

    • christianhendriks December 17, 2018 / 3:27 am

      After I finished reading the article, I brought up some of the explanations you consider. My mother and I both agreed that Kyzentun’s abstraction idea is a better answer than the attribute substitution hypothesis. My mother didn’t explain why she found it more plausible; I think 61% is a lot of people still remembering the question correctly for that to be the largest factor. I think Neither_Bird above also makes a good point, though I hate it: I want everyone to speak precisely always.

      Like

      • christianhendriks December 17, 2018 / 3:30 am

        Final anecdote/clarification: although I model things as equations habitually, this does not mean that I don’t shy away when it gets hard. Example: I flat out refuse to even attempt several of the examples anders provides above. I can see roughly what sort of equation I would have to do and I just so badly do not want to do it.

        Like

      • drossbucket December 17, 2018 / 6:53 pm

        Haha, I admit to having the same reaction! I’ll probably get round to them eventually, but currently I just cannot be arsed to actually think about something.

        Liked by 1 person

      • David Chapman December 17, 2018 / 7:23 pm

        It seems that the one definite take-away from this discussion is that the thing STEM PhDs hate most is actually doing math.

        Liked by 1 person

  10. anders December 18, 2018 / 10:26 pm

    Me and my brother worked on the alloy problem yesterday to try to find a varient that has a satisfying wrong answer.

    What we came up with was:
    You have 100 grams of gold and 100 grams of an alloy that is 50% gold by weight.
    How many grams of the alloy do you have to add to the 100 grams of gold to get an alloy that is 95% gold by weight?

    The answer is obvioulsy (rot13) gra tenzf bs nyybl

    Different ways of conceptualizing it are

    Adult tickets cost $1.00 and childrens tickets cost 50 cents.
    If the average ticket price was 95 cents and you admitted 100 adults, how many children did you admit.

    There is a 50 meter tower and a 95 meter tower 100 meters apart.
    If you build a 100 meter tower, how many meters past the 95 meter tower must it be if you want the tips of all three towers to be in a straight line?

    You drove for and hour and 40 minutes at 100 kilometers an hour, the speed limit slows you down to 50 kilometers an hour. How many minutes can you drive at 50 kilometers an hour before your average speed drops below 95 kilometers per hour

    Like

  11. Kate July 20, 2020 / 10:54 am

    I REALLY enjoyed reading your piece today!

    I instantly solved the widgets question (exponentially changed second variables were consistent with first , so outcome was the same. No biggie).

    It didn’t take me long to figure out the lily pad question either: 24 just intuitively seemed wrong, so I went back and *really* re-read the question, and the answer just “popped.”

    But the bat and ball… even when I read the correct answer, I had to spend more than a half-hour to “get” it…and even then the logic wasn’t initially sticking…I kept going stubbornly back to 10 cents in my head. I got the “MORE THAN” cue, and I was still getting stuck.

    Finally, FINALLY, it “popped” for me: “SPLIT THE DIFFERENCE” my mind finally cried! After that, I was able to think of it algebraically, and it finally settled logically into my resistant brain.

    BTW, I also found the column/margins question much easier to parse out, in part because I knew I had to “split the difference.”

    Thank you for writing this piece, truly! I thought I might be the only one stumbling over ball and bat as if it were quantum physics (!)

    Like

Leave a comment