067: Does the Marshmallow Test tell us anything useful?

The Marshmallow Test is one of the most famous experiments in Psychology: Dr. Walter Mischel and his colleagues presented a preschooler with a marshmallow.  The child was told that the researcher had to leave the room for a period of time and the child could either wait until the researcher returned and have two marshmallows, or if the child couldn’t wait, they could call the researcher back by ringing a bell and just have one marshmallow.  The idea was to figure how delayed gratification develops, and, in later studies, understand its importance in our children’s lives and academic success.


Dr. Mischel and his colleagues have followed some of the children he originally studied and have made all kinds of observations about their academic, social, and coping competence, and even their health later in life.

But a new study by Dr. Tyler Watts casts some doubt on the original results.  In this episode we talk with Dr. Watts about the original work and some of its flaws (for example, did you know that the original sample consisted entirely of White children of professors and grad students, but the results were extrapolated as if they apply to all children?).  We then discuss the impact of his new work, and what parents should take away from all of this.

As a side note that you might enjoy, my almost 4YO saw me open my computer to publish this episode and asked me what I was doing.  I said I needed to publish a podcast episode and she asked me what it was about.  I told her it’s about the Marshmallow Test and asked her if she wanted to try it.

She is, as I type, sitting at our dining room table with three marshmallows on a plate in front of her, trying to hold out for 15 minutes.  We’re not doing it in strictly; we are both still in the room with her, although we’re both typing and ignoring her and asking her to turn back toward the table when she asks us a question.

She keeps asking how many minutes have passed, which I imagine (as I tell her) is quite helpful to her in terms of measuring the remaining effort needed.  She seems most torn between wanting to continue building her Lego airport and the need for the three marshmallows.  She has sung a bit, and smelled the marshmallows a bit, and stacked them into a tower, but she is mostly trying to ignore them and is counting as high as she can.

14 minute update [quiet, despairing voice]: “I’ve been waiting for so long…

She did make it to 15 minutes (that’s her devouring the third marshmallow in the picture for this episode), although I wonder if she might not have without the time updates.  We’ll have to try that another day:-)



Bembenutty, H., & Karabenick, S.A. (2004). Inherent association between academic delay of gratification, future time perspective, and self-regulated learning. Educational Psychology Review 16(1), 35-57.

Bennett, J. (2018, May 25). NYU Steinhardt Professor replicates famous Marshmallow Test, makes new observations. New York University. Retrieved from https://www.nyu.edu/about/news-publications/news/2018/may/nyu-professor-replicates-longitudinal-work-on-famous-marshmallow.html

Berman M.G., Yourganov, G., Askren, M.K., Ayduk, O., Casey, B.J., Gotlib, I.H., Kross, E., McIntosh, A.R., Strogher, S., Wilson, N.L., Zayas, V., Mischel, W., Shoda, Y., & Jonides, J. (2013). Dimensionality of brain networks linked to life-long individual differences in self-control. Nature Communications 4(1373), 1-7.

Bronfenbrenner, U. (1979). The ecology of human development: Experiments by nature and design. Cambridge, MA: Harvard University Press.

Calarco, J.M. (2018, June 1). Why rich kids are so good at the Marshmallow Test. The Atlantic. Retrieved from https://www.theatlantic.com/family/archive/2018/06/marshmallow-test/561779/?utm_source=newsletter&utm_medium=email&utm_campaign=family-weekly-newsletter&utm_content=20180602&silverid-ref=MzYwODc2MjE4MjE4S0

Carlson, S.M., Shoda, Y., Ayduk, O., Aber, L., Schaefer, C., Sethi, A., Wilson, N., Peake, P.K., & Mischel, W. (2017). Cohort effects in children’s delay of gratification. HECO Working Paper Series 2017-077.

Duckworth, A.L., Tsukayama, E., & Kirby, T.A. (2013). Is it really self-control? Examining the predictive power of the delay of gratification task. Personality and Social Psychology Bulletin 39(7), 843-855.

Imuta, K., Hayne, H., & Scarf, D. (2014). I want it all and I want it now: Delay of gratification in preschool children. Developmental Psychobiology 56, 1541-1552.

Kidd, C., Palmeri, H., & Aslin, R.N. (2012). Rational snacking: Young children’s decision-making on the marshmallow task is moderated by beliefs about environmental reliability. Cognition 126, 109-114.

Michaelson, L.E., & Munakata, Y. (2016). Trust matters: Seeing how an adult treats another person influences preschoolers’ willingness to delay gratification. Developmental Science 19(6), 1011-1019.

Mischel, W., & Ebbesen, E. (1970). Attention in delay of gratification. Journal of Personality and Social Psychology 16(2), 329-337.

Mischel, W., Ebbesen E.B., & Zeiss, A.R. (1972). Cognitive and attentional mechanisms in delay of gratification. Journal of Personality and Social Psychology 21(2), 204-218.

Mischel, W., & Baker, N. (1975). Cognitive appraisals and transformations in delay behavior. Journal of Personality and Social Psycholgy 31(2), 254-261.

Mischel, Q., Shsoda, Y., & Peake, P.K. (1988). The nature of adolescent competences predicted by preschool delay of gratification. Journal of Personality and Social Psychology 54(4), 687-696.

Mischel, W., Ayduk, O., Berman, M., Casey, B.J., Gotlib, I.H., Jonides, J., Kross, E., Teslovich, T., Wilson, N.L., Zayas, V., & Shoda, Y. (2011). ‘Willpower’ over the life span: Decomposing self-regulation. SCAN 6, 252-256.

Schlam, T.R., Wilson, N.L., Shoda, Y., Mischel, W., & Ayduk, O. (2013). Preschoolers’ delay of gratification predicts their body mass 30 years later. The Journal of Pediatrics 162(1), 90-93.

Shoda, Y., Mischel, W., & Peake, P.K. (1990). Predicting adolescent cognitive and self-regulatory competencies from preschool delay of gratification: Identifying diagnostic conditions. Developmental Psychology 26(6), 978-986.

Tangney, J.P., Baumeister, R.F., & Boone, A.L. (2004). High self0control predicts good adjustment, less pathology, better grades, and interpersonal success. Journal of Personality 72(2), 271-324.

Watts, T.W., Duncan, G.J., & Quan, H. (2018). Revisiting the Marshmallow Test: A conceptual replication investigating links between early delay of gratification and later outcomes. Psychological Science 1-19.  DOI: https://doi.org/10.1177/0956797618761661


Read Full Transcript



Jen:    [00:38]

Hello and welcome to today’s episode of the Your Parenting Mojo podcast. Following on from our recent episode on the 30 Million Word Gap, today we’re going to take another close look at a piece of classic research. This time we’re looking at The Marshmallow Study. You’ve probably heard of the study because it’s one of the most famous ones in the field of psychology. Dr Walter Mischel and his colleagues presented a preschooler with a marshmallow. The child was told that the researcher had to leave the room for a period of time and the child could either wait until the research he returned and have two marshmallows, or if the child couldn’t wait, they could call the researcher back by ringing a bell. But then they’d only get to have one marshmallow. The idea was to figure out how delayed gratification develops and in later studies to understand its importance in our children’s lives and academic success. I was actually surprised to find that the marshmallow study consisted of a series of studies starting in the early 1960s and continuing for over a decade, and my guest today, Dr Tyler Watts of New York University, has just published a new study with his colleagues to try and help us understand whether the impacts of delayed gratification really are as large as that body of research indicates. Dr Watts as a research assistant professor and postdoctoral scholar and the Steinhardt School of Culture Education and human development in New York University. He received his Ba from the University of Texas at Austin and his PhD From the University of California Irvine. Welcome Dr Watts.

Dr. Watts:  [02:00]

Thank you.

Jen:   [02:01]

So I wonder if you could start out just by sending a bit of context for us. Can you describe this series of experiments that’s become known as the marshmallow study and what was the basic procedure that was used and what did the researchers find?

Dr. Watts:   [02:13]

Sure. You had the exact same experience that I did.

Jen:    [02:17]

Yeah. Okay, good.

Dr. Watts:  [02:21]

I first heard about these studies when I was an undergraduate at UT, the University of Texas. I was a psychology student and I think I probably first heard about it in the intro to psych course and then some sort of developmental class. We probably covered it there too, and then when we started, me and Greg, the second author on this paper is trying to kind of sniffing around to decide if we wanted to to look into this. I started going back and reading Mischel’s original papers and then of course I realized the same thing. This was done over probably a decade and there were a series of different studies as he was kind of tweaking the marshmallow test and sort of figuring out what it was telling him along the way. So I think people first have to realize kind of where the state of psychology was in the sixties when Mischel first started doing this work.

Dr. Watts:  [03:10]

It was a whole nother time we were, we were coming out of, of course the sort of classic psychoanalysis, Freud and Jung and those guys. So that era had kind of ended and then we were, we had gone through this sort of behavioral scientists aspects like the behavior period, which is sort of the kind of rigid rules of sort of human learning and conditioning. And then we were, you know, cognitive psychology was really sort of coming online and we were really starting to sort of have a new approach to probing at people’s thinking and figuring out sort of how human beings, what are the kind of like limits to human cognition and the ways in which we can…we were really coming up with kind of new ways to study it. So Mischel is really kind of coming into this discussion and a really interesting time and people had I think assumed and predicted that being able to delay gratification was this kind of important life skill that probably set aside or differentiated sort of what we think of as successful adults from less successful adults.

Dr. Watts:    [04:12]

And people didn’t know if children could really do this and if they could do it, they didn’t really know how to measure it. And so in psychology, you know, measurement is, is everything. So Michelle started coming up with this test to be able to actually produce variation and kids’ ability to delay gratification and the test is known as the Marshmallow Test and he figured out that if you sat a four year old – a kid around the age of four – in front of a marshmallow and you told them that if they wait to eat the marshmallow or to touch the marshmallow until the experimenter returns to the room, then there’ll be rewarded with a second marshmallow. So basically the kid is given this test right where they have something sitting in front of them that they want and they’re told by an adult that if they can wait to engage with it or wait to eat it, that they’re going to be rewarded with a second thing, right?

Dr. Watts:   [05:08]

With the double double the amount of, of that reward. And he very kind of wisely figured out that this test illuminated all sorts of interesting stuff about the way kids think and the way they behave under kind of a sort of stressful, somewhat stressful situation. And he realized that from a measurement standpoint that the test did exactly what he wanted, which is it produced variation, right? So some kids were better at this than other kids, so there would be some kids who couldn’t wait at all. And as soon as the experimenter left the room, they reach out and grab the marshmallow. Then there are some kids who will be able to wait for a couple minutes and then there are some kids who would be able to wait for whatever length of time they were left alone.

Dr. Watts:  [05:46]

And in some of the trials I think he capped it at a pretty short amount and kids weren’t able to wait for very long and longer and longer periods of time. He would kind of test longer and longer periods of time as he went along. And then he also hadn’t figured out. And I think this is one thing that a lot of people don’t realize is he kind of put a lot of different constraints on the test as he went along. So he was interested in like, you know, what happens if you obscure the marshmallow from a kid’s vision, so are kids able to to wait longer if you don’t force them to look at the marshmallow right in the room; what if you suggest to them before they do the task sort of strategies to help distract them from the marshmallows. So if, if, if you give them strategies to help them wait longer, are they able to do it?

Jen  [06:37]

What kind of strategies would he use?

Dr. Watts:    [06:39]

Yeah, so I’m trying to remember exactly. But he would sort of give them, I think sort of ways to distract themselves. So he would sort of suggest sort of like cognitive sort of tricks for distracting. Think about something else.

Jen:    [06:53]

Okay. Think about something fun or something like that.

Dr. Watts:   [06:55]

Yeah, yeah, yeah. The kinds of things that we try to tell ourselves to do today. And so he kind of put the kids through all sorts of different constraints on this measure. And you know, it’s sort of similar to the research. I don’t know if the people that listen to your podcasts would be familiar with Milgram’s famous obedience studies, right. But we always talk about sort of one condition of that, which is where the experimenter would tell the person, keep shocking the person on the other end of the line if they’re getting these questions wrong and that’s what they would do, but actually Milgram I think spent maybe 15 years or something like that, studying all sorts of different conditions around which that experiment was given and that’s. And that’s exactly what Walter Mischel did too.

Jen:    [07:37]

Yeah, and Angela Duckworth who we’ve actually done an episode on her book Grit. She did a paper on this I think a while back now, and I thought that the points that she pulled out about why this test was so successful, we’re really salient and we call it the marshmallow test, but actually the child got to choose whether they had a marshmallow or a pretzel or sometimes some other food, I think in another study. So the fact that they get to choose means that they get…if they, if they like sugar, they get to show you retweet. If they like salt, they get assaulted, treat, but they only get a really small amount. We’re only talking about one or two marshmallows, one or two pretzels. And so even if the child is really hungry, they know that this isn’t going to satisfy that hunger. So it’s not like we’re seeing the impact of their hunger on the test and (we hope anyway.)

Dr. Watts:   [08:22]

Yeah, and I think, you know, Angela Duckworth studied a few different samples of kids doing this and one of the things that she did was on the same sample that we actually used for our, uh, our replication too. So it’s important to point out that when Mischel was doing this, he was at Stanford.

Jen:    [08:39]

Yeah. It was already fairly successful from my perspective. And he was sampling kids from the Stanford Business school community primarily. I think there were some kids that were, that were from outside of it too, but basically it was that community, predominantly kids of professors. So they were fairly well off kids obviously, whose parents, at least one of their parents was…had presumably a high level degree and was working at one of the best academic institutions in the world. So this is a fairly selective sample of kids that, uh, that he was working with. But that’s not to deride those studies because that’s exactly what you did is that, I mean we, you know, we’ve had so much work, it’s hard to put in perspective. This was going on 50 years ago. We had so much work that’s sort of changed our thinking about how we should sample, how we should design experiments. And so Mischel was really sort of a pioneer in this, in this work. So that was completely normal at that time that you would just take the kids that were around.

Jen:     [09:37]

Yes. And I think it sort of speaks to the White middle class view that, that their way of parenting is the right way and this we should…it’s appropriate to measure White middle class children because anything that deviates from that approach to parenting is different and potentially in a deficit. And I think we’ve become better at seeing that and working with it now and trying to overcome that limitation, but I definitely didn’t realize when I started doing this research that we were essentially looking at a tiny sample of children who came from a very advantaged background, and that these results are being extrapolated out as if they are relevant to all of mankind.

Dr. Watts:   [10:16]

Yeah. And that’s one thing I, I want to be sure that we mentioned to be fair in the years since Mischel did this work with the Stanford kids, the marshmallow test has been given to lots of different populations and Michelle has been involved in a lot of those studies. So they have done populations with both older kids, kids from lower socioeconomic backgrounds, kids of various racial and ethnic backgrounds. I think I just got a notification that there was a recent study where they were doing it in I think Cameroon. So, you know, there’s, so they’ve done all sorts of different sampling designs. The key, the key things though is that the longitudinal findings, which means what happens when we follow up with the kids that were given the Marshmallow Test, to my knowledge, almost everything we know about the longitudinal nature of what sort of gratification delay and the prediction between the marshmallow test and later your outcomes. Almost all of that is derived off of that Stanford sample.

Jen:  [11:14]

Yeah, so the researchers, they actually didn’t design it as longitudinal study, did they? So they didn’t even bother to collect the addresses of the people that participated and then they to go back and try and find all these people again.

Dr. Watts:   [11:26]

Yeah, and again, and at that time it was really innovative to even think longitudinally. So you know, we didn’t have a lot of longitudinal data sets and the way that we think about data today is so different largely because of the advances that we’ve made in technology. So what Mischel and his colleagues at the time, which I should mention, Yuichi Shoda, who is the second author or first author on a lot of these papers. So they worked really closely on many of these, so we had other collaborators, they realized that we would be really interesting to follow up with the kids who did the Marshmallow Test when they were in preschool and see if there was a correlation between the length of time that the kid waited and other stuff going on in their life. So they, you know, 15, 18 years on decided let’s follow up with the kids and let’s see who we can get in touch with.

Dr. Watts:   [12:16]

So of course they only ended up with a small fraction of the case that originally did the test and then the really famous paper, which is the Shoda, Mischel and Rodriguez paper from 1990. That’s the one that reports on the follow up data. And that’s the one that we were really interested in sort of probing and giving a second closer look to. And so what they find there was that basically they contacted the mothers of the kids that are originally participated and they gave them a survey. And among the things that they asked for were SAT scores. And then they also ask for mothers to rate their kid’s behavior and personality on a lot of different dimensions. And what they found was one that they only found correlations between wait time at age four; how long did you wait on the Marshmallow Test at age four and later outcomes for what they call the diagnostic condition, which is the kids who were in this sort of what we think of as the classic example of the Marshmallow Test they’re put in the room with the marshmallow.

Dr. Watts:    [13:19]

They can see it or whatever other treat they wanted. They can see the treat, they’re not given strategies with which to help them delay. And the treat is in plain sight for them. That’s not obscured from their vision. And then, and then the experimenter doesn’t tell them how. I’m pretty sure it doesn’t tell them how long they’re going to be. And so then they’re just kind of left to wait. That’s what Mischel and colleagues ended up calling the diagnostic condition because they found that was the only condition with which waiting, predicting later outcomes. Right? So for the kids that they were able to follow up with and among the kids that were in the diagnostic condition, it was only about 30 to 50 kids, right? But they found that among that sample it was a really large correlation between the length of time that you waited and SAT scores in both math and verbal SAT, and then they also found a correlation between how long you waited and later measures of personality, what mothers are basically rating things like how socially adjusted as your kid; are they sort of doing what you would think of as sort of good student behavior in school; things like that and they found pretty sizable correlations among all of those aspects.

Dr. Watts:    [14:30]

I wonder if I could actually read some of that paper because it’s kind of mind blowing. It says, “according to parental ratings, those who delayed longer, are more verbally fluent use and respond to reason ar attentive and able to concentrate are planful and think ahead. Competent, skillful, resourceful, and initiating activities.” I mean it goes on for another 10 lines. It’s of mind boggling that all of this stuff is apparently correlated in some way to the length of time that somebody can wait for a marshmallow.

Dr. Watts:    [14:59]

It is. It’s really impressive. In the SAT score correlation was like 0.6…

Jen:    [15:04]

Which is pretty high.

Dr. Watts:   [15:06]

Yeah. You look at a lot of statistics, behavioral science stuff. I mean, you know, that’s, that’s a huge correlation

Jen:  [15:11]

Yeah. Yeah. So I think I just to put that in context for anyone who doesn’t read as many papers as I do, I think point three is sort of where, where you started to say, okay, that’s weakly correlated. Right? And 0.3 to 0.6 is more sort of okay, we’re pretty sure there’s an effect here.

Dr. Watts:     [15:28]

Yeah. There are these kind of like sort of guidelines that are sort of old for how to discuss the size of a correlation, but I think everyone’s expectations have kind of come down on that over time. So if I were to see a 0.3 correlation between something like the marshmallow test and a later SAT score, I would say it was huge.

Jen:   [15:46]

When there are so many other potential variables that could impact that sat score to have a higher impact. Yeah.

Dr. Watts:  [15:52]

Even if you don’t adjust, even if you know that you’re getting something that we would say is sort of biased that’s not adjusting for other variables, just the fact that you’re able to get that kind of signal would be pretty impressive. So they got something twice the size of that.

Jen:   [16:05]

Yeah. And they, there was also another study that found that each additional minute that a preschooler delayed gratification predicted appoint to reduction in BMI, body mass index and adulthood presumably because you’re better able to control your food intake,

Dr. Watts:  [16:21]

Well that’s probably the finding that’s the most intuitive, right?

Jen:  [16:22]

Yeah. Yeah.

Dr. Watts:  [16:25]

Yeah. So that’s important to note that they kept following this study. So the study or they, they kept following that sample of kids from Stanford into adulthood. So the study has had many lives because I think they found them in adolescence and then when they saw those results they thought, okay, we need to stay in touch with these people. So they’ve sort of kept reporting on them into middle adulthood as far as I know. And then interest, in public interest in this study has kind of risen up and at different times, which I think seven or eight years ago it sort of peaked again and I’ve been kind of speculating that that was partly due to probably YouTube because people can start watching videos of kids taking the Marshmallow Test…

Jen:  [17:04]

Some of them are pretty funny.

Dr. Watts:   [17:06]

Yeah, and they’re great and you know, if you haven’t seen them, you absolutely need to them; they’re wonderful. So we could start watching those videos and then I just think sort of public interest sort of swelled yet again. And we started having these theories like Grit come online, which is probably related. Yeah. So, so now we’ve been in kind of a phase where, you know, the Marshmallow Test is wildly famous. Again, it’s just, it’s really unusual for psychology studies to have that kind of attention and that sort of lifespan.

Jen:  [17:33]

Yep. Okay. Alright. So I want to sort of step back from the longitudinal stuff for a minute and go back to some of the potential criticisms and clarifications of that original study. Now we talked about the small sample size drawn from a very non-representative sample of the general population. I know that when Professor Duckworth did some work on this, she used stickers instead of marshmallows and she found that the amount of time a child could wait for more stickers was actually only very weakly related to a child’s performance in a real life delay. And of course that made me immediately think of Urie Bronfenbrenner who said that “developmental psychology is the science of strange behavior of children in strange situations with strange adults for the briefest possible period of time.” So I wonder if we could talk for a minute about the real world applicability of the Marshmallow Test

Dr. Watts:   [18:23]

Yeah, well, and we mention in the paper – sort of there’s been a lot of work done in the Marshmallow Test, especially in the past probably 15 or 20 years and a lot of it is really sort of digging up the question, what is this thing measure and how indicative is this test have a kid’s self control or willpower. Right? And you know, those are kind of different constructs, right? We say them in the same breath, but we could imagine sort of self control and willpower not necessarily describing the same sort of characteristics in a human being and Angela Duckworth has done a lot of really good work too. Some of it with the sample that we used trying to figure out like is it is what’s important about delayed gratification, sort of a cognitive component? So is it Michelle had theorized that it was sort of the kids’ ability to come up with these strategies to help them delay gratification? You know, if, if a kid is able to sort of recognize that there is this impulsivity rising up within them and they’re able to sort of quiet that down by singing a song or distracting themselves or thinking about something else or if you see the videos, they’ll close their eyes really tight to not look at the off. He sort of suspected maybe that it was sort of a cognitive ability that was really driving the prediction and Duckworth has looked into that and she’s found sort of evidence that statistically the way we do these things with these kinds of factor analyses and we see if like the gratification delayed tests sort of relates more to cognitive ability or relates more to maybe measures of kid’s personality and she kind of found evidence that it was maybe relating to both, both things. So, and then I think another version of the study that’s been talked about a lot recently and has been talked about in context of our findings to is some researchers thought, well maybe trust is a big element here.

Dr. Watts:  [20:11]

And if you undermine a kid’s trust in the task, like you have the experimenter sort of a lie, tell a lie, right? And the kid and views them as lying before they then tell them that they’re going to get an extra marshmallow if they wait. If you give the kid a reason not to trust the experimenter and the kid won’t wait. Right. Which is a really sort of important insight, right? Yeah. And I think the point of that was that, you know, there are kids that may come from environments depending on their home life or even their life at school where their trust in their environment and their trust in adults has been eroded and in the future isn’t easily predictable, then it may make sense not to wait and it may actually be the rational response. Yeah. So yeah, so there’s been all sorts of really interesting stuff that’s cropped up around the Marshmallow Test and on our study doesn’t directly. I think it’s kind of situated within that literature, but we were really kind of after the longitudinal component, but all of that I think is really important to keep in mind.

Jen:   [21:11]

Yeah, for sure. And Professor Duckworth I think also found a relationship between shyness and the findings because I think she hypothesized that if a child is shy, they’re going to freeze up in the face of a researcher that they don’t know; they haven’t met before, who’s giving them instructions. And so that could be a link as well. The reason that’s completely unrelated to their ability to delay gratification, that could explain some of the findings. And then another one that I found really interesting was the way that you present the rewards can impact the results. There was a study that I think presented a sticker to a child and said, here’s the sticker you can have right now if you want it. And then they put four more stickers in that same pool. Instead of having one sticker on one side and five stickers on the other, they put all five in the same pool to identify the reward you can get if you wait. And that one actually found that the three year olds outperformed the four year olds on their desire to wait for five stickers. And so, I mean, that blew my mind again; what you think you’re looking at is changes in cognitive processes as children get older. And actually what it might be is an artifact of how the study was designed.

Dr. Watts:  [22:17]

Sure. Absolutely. Absolutely.

Jen:     [22:21]

So, so there’s definitely a lot going on with this. So. Okay, so there are some of the criticisms and things we need to keep in mind as we’re. We’re looking at the results of these studies and so I think what it’s tempting to do and what policymakers have attempted to do is to transfer what we’ve learned from these studies to academic outcomes and one of the papers on this topic started out and I’ll quote it: “An ideal student who routinely goes home after school has a snack studies until dinner (i.e. stays on task) then continues studying until bedtime is likely more academically successful than one who is not focused on schoolwork.” And I think that what we’re trying to do here is to take the results of the Marshmallow Study, which is pretty intrinsically motivated. I want that marshmallow and schoolwork is something that is very extrinsically motivated because either your parents are telling you have to do your homework or you have to get the grade. Because a lot of students don’t know what they want to do with their lives. And so I wonder how valid is it to take this look at what seems to be intrinsic motivation and apply it to a situation that is very much more concerned with extrinsic motivation. What do you think?

Dr. Watts:  [23:33]

Well, it’s a great point and I think that the way that the test is often been interpreted is that you’re able at age four, if you’ve kind of gained this skill that it becomes this kind of internalized personality dimension so that when you’re in a situation where you know that you could do something now that may be fun or may be gratifying, but you can wait a little bit longer and sort of postpone it and get back to work, that you’re going to be more successful. So I think you’re right, I think it’s important nuance to think about sort of whether there’s an intrinsic or extrinsic reward, but I think the story about the Marshmallow Test has always been so appealing probably, especially in the United States because it kind of speaks to this sort of pull yourself up by your bootstraps, right?

Jen:    [24:18]


Dr. Watts:  [24:18]

This ability to kind of take control over your, over your environment and your life. Sort of keep your head down and work hard, right?

Jen:    [24:27]


Dr. Watts:  [24:27]

And you know, I don’t want to put words in Walter Mischel’s mouth because I think sometimes he’s been a lot more careful when he’s written about this. Not surprisingly then when other people have written, but I’ve mentioned in a few other folks that I’ve talked to that as I was kind of going through the responses to the study when we released it, I found this charter school, I think in Houston. It’s a pretty large school and here’s a part of their website that’s got information for parents and they have sort of advice to parents and they put on there something…I’m going to mess up the quote, but basically to the effect of if you can teach your kid one thing, teach them delay of gratification, and then they start to talk about the results of the Walter Mischel Longitudinal work. And that’s, that’s exactly the thing that we were, that we were after that. That’s what our focus was on.

Dr. Watts:  [25:17]


Jen:    [25:18]

Okay. So, so let’s start to get into that, then.

Dr. Watts:   [25:21]

Yeah, yeah.

Jen:   [25:21]

So you describe your study as a quote, “conceptual replication” of the marshmallow study, although when you read about it in the media, that has usually been shortened to replication, for example, in the press release on your study as well as in the Atlantic article that was published a few days ago. What is the difference in why does it matter?

Dr. Watts:   [25:39]

Yeah, I think it’s crucial and I’m learning that it’s more crucial. I think I had probably a little too blase with that early on or a little too casual, so you can imagine if your listeners are sort of familiar. There’s this phrase that goes around right now in psychology and also in medical science too, but in the social sciences, which is where I work; it’s been a big deal. There’s this thing called the replication crisis or the reproducibility crisis, which is basically to say it’s. It’s hard to pin down exactly what that means, but the gist of it is that if you…that what happens is we sort of published really flashy findings and we talked about them and make a lot out of them and then someone else will come along and try to reproduce those findings and not be able to do it. And so that leaves you thinking that the findings from the original study were somehow inflated or big by chance and it is a lot of reasons why that could happen. So everybody kind of agrees that replication is important. Once you start thinking about doing a replication, you realize how difficult it is to kind of define what you’re doing because in sense a replication would be sort of in the most narrow defined way. Well there, there could be one where you just take someone else’s data and you just analyze the statistics on your computer, right? You just do this statistical work on your computer…

Jen:    [26:56]

Which some people do do.

Dr. Watts:  [26:57]

Yeah. And it’s like kind of like same data, same question, my computer kind of thing. Right? So that’s a really narrow version of a replication. Then you could do something where you basically do the exact same study. So the very same methods say, so in this case it would be to give the very same version of the Marshmallow Test and follow up the exact same way, right? Years later; ask the very same questions with parents and report on that. That would be another kind of replication that’s also fairly narrow, but you just do it with a different sample.

Jen:    [27:31]

The original study happened fairly recently than you might expect to get a pretty similar result. Although I would expect given the distance of time from the original study, even if you use the same script and the Bing Nursery I think is still there, even if you pulled from that population, your results will probably be different. I mean just for example, kids had a lot less sugar in their diets in those days, so that could potentially impact it.

Dr. Watts:  [27:54]

No, that’s exactly right. So there’s all sorts of historical differences in cohort differences that you would expect. Yeah. And then what we did, I think we’re a little closer to this, which I think this is probably what the field actually needs, although that you can make an argument that we need all of these things is, you know, let’s look at the conclusions of the study or look at the way I study has been interpreted and the knowledge that’s been gleaned from it and let’s sort of take a different approach to trying to arrive at the same conclusion. Right? So let’s sort of see if we can go after the same question and maybe we differ and vary the methods and the sample as well. So that’s, I think we’re a little closer to that than we are to like a really hardcore straight up. we just did the same thing but with different kids. So that’s why we. The title of the paper and I think the reviewers of the article or the action editor of the article as Psych Science, I think was the one that suggested we do this. We called it a conceptual replication, right? Because really we’re kind of trying to replicate conceptually the same thing that the original study was doing, but we had to come at it both due to limitations and because we wanted, we actually wanted to change the statistical methods from a few different angles.

Jen:    [29:13]

Okay. Alright. So let’s talk about data then because I think you didn’t actually collect this data yourselves. They came from a government data set and that worked really well in your favor because the government has the money to fund these massive studies that would be incredibly difficult for a few researchers to get money for. But I think that the government scientists are also made some design decisions that impacted your findings, right. Can you tell us about those?

Dr. Watts:  [29:37]

That’s right. So this study was collected and conducted by a team of developmental psychologists who first went to NIH and asked for funding to follow a fairly large sample of kids from 10 different sites across the us from birth into -I think originally they were planning to go into sort of early childhood, like around age three or four…

Jen:   [29:58]

Sorry, that’s the National Institute for Health, right?

Dr. Watts:    [30:01]

Yeah, the National Institute for Health, which is sort of the scientific arm of the federal government or one of one of the scientific arms of the federal government. And so they got the money to do this. It was mainly to study childcare and there were a lot of debates happening at that time around whether trump care was good for kids are sending your daycare or sending your kid to daycare for the entire work day at a young age was maybe would have adverse developmental effects. So they were really interested in this question.

Dr. Watts:  [30:28]

So they collected a ton of information on parents and kids at the time of the kids’ birth and then they followed the parents and kids into early childhood and collected information on the kids early environments. And fortunately for us they decided to do this Marshmallow Test probably because the early longitudinal findings that from 1990 had just come out. So these kids were sampled at birth in 1991. So the, the longitudinal findings from the Mischel studies were we’re just getting publicized in the early nineties. So. So they, they thought it would be an important measure to collect, so they did the marshmallow test with kids at age four and then they kept going back to NIH and getting additional funds to keep to keep following the kids into adolescence. So when she did many waves of data collection in between birth and age 15, which is what we ultimately looked at as our main time of the outcome measures and adolescence.

Dr. Watts:   [31:27]

So what’s nice about this data set is, like I said, they collected a bunch of information on parents and their families, which allowed us to do the statistical controlling that that we can get into in a second, which I thought was really the key contributions of this study. And because they were collecting a whole bunch of measures and the data set was not at all focused on the Marshmallow Test. They gave a sort of shorter version of the marshmallow test and they stopped the test at seven minutes. So if a kid waited for seven minutes, then the test ended and we were at first really concerned about this and worried that that would sort of clued us being able to do what we wanted to do because the original children had waited much longer. Right. Some had. Right. I mean you can see in some of his original, his studies, he reports an average wait time of one minute. I don’t know if you, if you noticed that it’s not the early seventies. And then over time it’s interesting to think about what may have happened, but as maybe he got better at giving the tests, but it’s. Kids were waiting a little bit longer and then in the later versions of the test, to my knowledge, he was letting some kids wait as long as 15 minutes, maybe 20 minutes. Which if you sit in a room alone staring at a marshmallow. Oh my gosh, it’s a long period time. Even if you’re an adult, right. So if you’re an adult, you probably can’t stay off your phone for 20 minutes right? Long enough to sit in the room. So. So anyways, so they stopped the test at seven minutes and when we first started analyzing the data, we wanted to figure out if we could still learn anything from this even knowing that the test had been capped at seven minutes and we grew more and more confident as we analyze the data that we could and that we could do what we wanted to do so we can talk about that in a second. So kids take the Marshmallow Test at age four and then they were followed periodically up to age 15 and I think they’re still planning on doing another round of data collection in adulthood. But we had data through eight slash 15 and so at age 15 they measured a math and reading achievement.

Dr. Watts:   [33:21]

It wasn’t a self report of SAT scores as lot of kids haven’t even taken the SAT by that age, but they did a sort of math and reading test. It’s called the Woodcock Johnson. This is a really famous, a cognitive battery that has been studied for a long time. And so we use that to measure a academic achievement at age 15. And then there were also mother reports of kids’ behaviors, so there was sort of a report of sorts of things that seem like antisocial behavior kind of acting out at school that the mothers reported on. Then there is a kind of something called internalizing which can be thought of sort of broadly as sort of like depressive behavior, depressive symptoms. And then we also looked at kid direct measures of behavior directly measured from the kids themselves. So the kids reported on risky behaviors; things like things that we think are risky behaviors for teenagers.

Jen:  [34:12]

So drinking alcohol, smoking marijuana, sexual risk taking. And then kids also reported on their own sort of impulse behaviors. And we actually looked at someone, a developmental psychologist suggested. We look at this thing called the stoplight task, which is basically a sort of of like a game where a kid is trying to get from point A to point B and they’re told they will be rewarded if they get there as fast as possible and they encounter stoplights along the way. And the task is really looking at whether they brake and slow down or stop when they see a yellow or red light or if they try to speed through the stoplight, right? And risk getting into a car crash. And that’s kind of a measure of sort of impulsivity and risk taking as well. So anyway, so we looked at all of that at age 15 to try to, like we said, kind of do a conceptual replication of that original Mischel longitudinal study where they looked at both SAT scores and then kind of this broad like you, you read off the results earlier, that kind of broad dimensions of behavior and personality.

Jen:   [35:14]

Yeah. And so, okay. So that’s a ton of variables.

Dr. Watts:   [35:17]


Jen:    [35:17]

Now I am not a statistician. Fortunately I know people are, and one of them read this study for me and I believe that when a researcher at quote “controls for a lot of variables that can impact the results by making outcomes that would otherwise appear significant look insignificant.” Or actually I think it’s the other way around, isn’t it? Look insignificant? They shouldn’t be insignificant.

Dr. Watts:   [35:38]

No, you got it right.

Jen:    [35:38]

I did get it right. Okay, thank you. So in a language that a lay person like me can understand, can you help us figure out what choices did you make here and what impact did that have on the results?

Dr. Watts:   [35:49]

Yeah, so this, like I said, was I thought the key sort of invention of our study and so all of those variables that I just listed off measured at age 15, those are the outcomes, right? That’s what we’re trying to see. Does waiting longer on the Marshmallow Test impact or affect all of these outcome measures right? At Age 15. So the control variables are all things that were measured primarily before the kid took the marshmallow test, right, or measured at the same time that the kid took the marshmallow test, so either measured at age four, or measured before. And why are these important? This is all about interpretation, so it may be the case that the marshmallow test predicts later achievement or later outcomes, but you don’t know if it predicts later outcomes because the Marshmallow Test is kind of symptomatic of other things going on in a kid’s life. Like say, kids that have really great attentive, structured parenting environments are able to wait longer on the Marshmallow Test, and those same kids also have many markers of success later on in life, but it’s not really the Marshmallow Test that is driving the later success. It’s actually the parenting, right? That’s what probably everyone listening to your podcast would want to want to hope, right, that it’s the parenting that’s really shaping everything?

Jen:   [37:10]

Well, yeah! for sure.

Dr. Watts:    [37:13]

And so you, you all these things that are sort of the result of the parenting and you can think that those are the things that are causing the later the later outcomes. But actually you would be sort of making what we call sort of a confounding error, right? In statistics like there’s a third variable that’s really explaining everything. And the third variable is what’s explaining being good at the Marshmallow Test. And it’s also explaining the markers of success later on. So what we want to do is try to control for as many of these sort of third variables as we could. So we were able to take measures of race and ethnicity. So we had, what’s the sort of identified race and ethnicity of the kid, the gender measures of family income, mother’s education measures are taken of the kid at birth. So like the kid’s birth weight, the mothers also report on a very early measure of the kid’s temperament, which is like was this kid a fussy baby or a sort of a quiet easy to please baby.

Dr. Watts:   [38:11]

They also, we also had a really pretty early measure of kids’ cognitive ability taken an age 24 months and we had this thing that was an invention for the data set that we used, which was a measure of the home environment where an observer actually came into the home environment and observed the kid interacting with the parents. Right. And was looking for sort of things known to be sort of positive parenting behaviors as well as markers of having an enriching home environment like a lot of children’s books in the home and toys for the kids to play with, things like that. So we first looked at the relationship between delay of gratification and later outcomes not controlling for anything. Right? So we didn’t adjust for any of those other variables and we just looked at the raw “do kids who wait longer have better outcomes?” And we saw sure enough that they had higher achievement scores on math and reading at age 15, but they, these relationships were much smaller than what was in the original paper.

Jen:     [39:07]

Oh, really?

Dr. Watts:  [39:08]

Right. Yeah. So it was, it was. I, I don’t want to mess it up. I’m not looking at my paper in front of me right now…

Jen:     [39:13]

You don’t have it memorized?

Dr. Watts:   [39:14]

I know; I should have by now It was, I think I’m fairly certain this less than half the size of what Mischel and Shoda had reported. Although like I said earlier, that’s still something. Right. And you know, we were still impressed with that correlation even though it was half the size and then surprisingly… And one of the really surprising findings of the paper was that we didn’t find any relationship even without any statistical controls or any of the behavioral outcomes. So all those behavioral outcomes that I just listed off, like the stoplight task and kids reporting on their risky behaviors and the mothers report of the kids’ behaviors. None of that. None of it. No. I think there may have been one loan variable that had a significant effect on it, but no, for the most part we found no effects.

Jen:   [39:57]

Okay. Why? What’s going on?

Dr. Watts:  [40:00]

That was really puzzling and, and you know, I mean we weren’t expecting to find that. I think our prior hypothesis going in was that we would have found bigger effects for the behavioral outcomes then for the cognitive outcomes because I think we were going in thinking of the delay of gratification as being kind of a personality or behavioral is like the economists think of these things like noncognitive skills rather than cognitive skills which psychologists hate because they would say everything cognitive. So we were kind of expecting that going in, but we were surprised that we didn’t find it and that was just one of the sort of interesting, puzzling findings with the paper. So we focus most of our analyses… We then split our sample and looked primarily at kids whose mothers had not completed college and that was about 550-something kids who were in the sample who had the marshmallow test, had measures of later outcomes whose mothers had by birth, not completed college.

Dr. Watts:  [40:58]

We did that for two reasons. One, because we thought it was a conceptually interesting sample to focus on because all of the Mischel stuff, as we said earlier, was, was mainly derived off of the sample of kids who, whose mothers were part of the Stanford community, right? So we thought that it was really complimentary to sort of look at a sample of kids whose mothers hadn’t completed college and because you know that that group, when we talk about sort of educational policy and when we talk about interventions, we’re often thinking about sort of more kids from more disadvantaged backgrounds. Right? And so that was also…we thought sort of an important group of kids from that. We also did it because of the seven minute measurement problem that I mentioned earlier. So among those kids whose mothers had completed college, almost 70 percent of them hit the seven minute mark on the measure.

Dr. Watts:   [41:48]

So that means that they waited the full length of time and, and the measure ended, right? So that the Marshmallow Test ended. So from the statisticians point of view, that’s a major problem because you need variation in order to do all of this. Right? And so if most of the kids are sitting at seven minutes, you don’t really know who’s better at delaying gratification among those kids, right? Because they’re all sitting at seven. So in the lower socio-economic status sample. But the kids have mothers who hadn’t completed college, they were much less likely to hit the ceiling. So I think only about 40 percent of them waited the full length time. And then among the other 60 percent of them that didn’t wait for seven minutes, there was really nice kind of variation. Right. In how long they waited. So there was like kind of what we say in statistics, like a nice distribution.

Dr. Watts:   [42:37]

So there was sort of kids that didn’t wait at all. Then kids that waited for one minute or two, three, four, five, six, so we can really get a much better gradient of gratification delay. So there was kind of the conceptual reason that we focused on that sample. And then there was also kind of the statistical measurement reason. So what we found, like I said, was that waiting longer among those kids did predict later achievement. And so then we started introducing these control variables. So what we did was we sort of added to the model all the measures that I listed off earlier. So measures of the home environment, measures of the kids birth weight, race, ethnicity, gender, income, family income, measured between birth and age for that early cognitive measure taken at 24 months. And so what that model does is it says, let’s take two kids who have the same parenting environment, the same race as race and ethnicity, the same gender, the same early cognitive skills.

Dr. Watts:  [43:37]

And let’s say that one of them is able to delay gratification a little bit longer than the other. Does that difference matter? Right? Once you’ve set all those other things equal, then does the difference in delayed gratification actually matter over and above all of those other factors and what we found was a much smaller, again, prediction to the academic achievement. It was still statistically significant, but it was fairly small. We have a fairly big sample, so so a small effect could still be statistically significant, so the size of that correlation got much smaller than in the model where we didn’t control for anything, which means that those other factors were largely driving the prediction to later academic achievement and we still found no prediction to the later behavioral outcomes. Then we tried one more model that was a more rigorous controlled model, which is where we also controlled for measures taken at preschool, so measures taken at the same time as the Marshmallow Test of kids math achievement, reading achievement and sort of a measure of their behavioral adjustment at that time.

Dr. Watts: [44:43]

So that’s basically saying like, if we also control for sort of other characteristics of the kid at age four, then does being able to delay gratification matter over and above their kind of general cognitive ability and their behavioral adjustment at that time. And then we found that it was a pretty well estimated zero to later academic achievement. Right? So what we’d like to, you know, we had a pretty decent confidence interval around that effect and it was not statistically significant, which means that, you know, it just means that again, it doesn’t mean the delayed gratification doesn’t predict, it just means that when you sort of control for other characteristics of the kid, delayed gratification doesn’t seem to be sort of uniquely important on top of those other characteristics.

Jen: [45:29]

Right. Okay. Alright. So we’re, we’re heading rapidly here towards where I want to go, which is what message should we take home from the body of work on this? And I’m going to tell you what message I take away from it and then I want you to tell me if I’m right. So it seems to me as though the message is that some rich children can better resist the marshmallow than others, and that all of those rich children, the ones who can resist the marshmallow may be more likely to have better life outcomes as best we can tell from a pretty tiny sample size, but we really don’t know as much about how poor children will respond to the marshmallow and we might not even be able to teach poor children to resist the marshmallow because of how their life experiences have shaped them. And even if we could, that resisting the marshmallow is unlikely to be the key or even an important key in helping them to achieve better life outcomes. In contrast, I should say to the Houston school that you mentioned that it’s telling parents, if you teach your child nothing else, you should teach them to resist the marshmallow. So are you taking the same message out of all of this as I am?

Dr. Watts:  [46:30]

I think much of what you said is highly plausible and could be interpreted from both our study mainly, you know, and some of the other ones that we talked about earlier. I think the main takeaway is sort of what you said at the end, which is question of even if we can teach this and we decided that we should, is teaching this going to make much of a difference. Right. And that’s where I, that’s where I think our study, it really has something to say and it’s sort of saying that if you worked to create a program that say provides kids with strategies to delay gratification, right? You give them strategies to help them figure out how to do better at the Marshmallow Test, but you don’t change other aspects of their life. Right. Whether that be their sort of general cognitive ability or other behavioral aspects or their parenting environment

Jen:  [47:23]

Or poverty…

Dr. Watts:  [47:23]

Right…or their parenting environment or their socioeconomic situation, family income, mother’s education. If you don’t change any of that stuff, the fact that you changed their ability to delay gratification probably isn’t likely to have much effect.

Jen:  [47:35]


Dr. Watts: [47:36]

Right. And so I think that to me is, is the key finding. That’s not to say that it’s not a worthwhile life skill, or that there are times when you need to be able to do this. I mean certainly any adult in the working world today who’s constantly online in front of their computer, in a space with, you know, notification after notification in email and something to click on. You know, you’re constantly kind of faced with this task really all day long. I’m not saying that it’s not important at all. What I’m saying is just is it something that we should be worried about teaching four year olds in isolation if we want to really get an important developmental outcomes later on. I think that’s where we would say that’s probably not the first thing that we would choose.

Jen:    [48:22]

Gosh, I hope there are policy makers listening right now who preferably ones who have gotten growth mindset instituted in schools in California and are rating teachers on children’s growth mindsets…

Dr. Watts:  [48:34]

Right and I. So, and I also want to be careful that the model that we were sort of had different interpretations for the different models that had different variables controlled. So you know the model that just controlled for kids’ background characteristics from their home and family, that early measure of cognitive ability, but it didn’t control for sort of other factors measured at the same time. This is a little bit nuanced, but we still did find a statistically significant prediction to achievement for the gratification delayed tests…for the Marshmallow Test with that model, which to us meant that to the best we can tell what the state and other state or does not really allow for causal claims to be ironclad. So you can’t…You have to kind of say all of this with the big caveat that nothing here was experimental.

Jen:   [49:22]

Yep. So we can’t say that delay of gratification caused the better. Okay.

Dr. Watts:  [49:26]

Yeah. So even though we’re trying to scratch it that by introducing controls, we’re certainly trying to push ourselves in a causal direction. We’re still not all the way there, right. An experimental study is really the only way to really start to, to know. But the fact that controlling for the background characteristics didn’t kill all of the prediction to later achievement suggested that if you had an intervention that probably changed gratification delay, but also change other aspects about the kid, yet at the same time, broader aspects of the kid, broader behavioral and cognitive capacities of the kid at the same time, that may actually, you know, we can’t rule out that that wouldn’t have an effect. Right. So, but then you’re thinking about probably delayed gratification as a component of something broader and larger rather than just a sort of narrow intervention than just teaching that skill. Does that, does that make sense?

Jen:   [50:17]

Yep, Yep. Absolutely. So I think the ultimate take home message then for parents is don’t worry too much about the Marshmallow Test…

Dr. Watts:   [50:24]


Jen:     [50:24]

It’s a good skill to have, but don’t freak out if your can’t resist gratification yet; they’ve got a long way to go and there are many other skills that are also important.

Dr. Watts:  [50:32]

No, that’s exactly right. If you’ve got a four year old who doesn’t want to wait. You don’t need to be too concerned. Yeah.

Jen:    [50:40]

Yeah. Okay. All right. And on that note, thank you so much for helping us to understand all this and really get to the bottom of what’s going on. I’m so grateful for your time.

Dr. Watts:  [50:48]

It was fun. Thank you.

Jen:   [50:49]

And so our listeners can find all the references for today’s episode and there are lots of them at YourParentingMojo.com/Marshmallow, if you can resist it.


Also published on Medium.

About the author, Jen

Jen Lumanlan (M.S., M.Ed.) hosts the Your Parenting Mojo podcast (www.YourParentingMojo.com), which examines scientific research related to child development through the lens of respectful parenting.

Leave a Comment