About Nobel laureates and chocolate, correlations and unreliable data

Figure published in Messerli (2012) Chocolate Consumption, Cognitive Function, and Nobel Laureates, New England Journal of Medicine (full paper here).

Is the number of Nobel laureates by country correlated with chocolate consumption? This chart has quickly become very popular on the internet since its publication in 2012 in the New England Journal of Medicine. Visually, the little flags seem to be aligned on an imaginary line showing the amazing correlation between the number of Nobel laureates for 10M inhabitants and chocolate consumption in these countries.

Before realizing that it was a parody, many articles, blogs and media showed that this apparent correlation does not make sense and this chart quickly became a “reference” (to show the limits of this kind of analysis). These critics obviously show that the number of Nobels for 10M population is also “correlated” to GDP/capita, human development index, consumption of all kinds of luxury goods, etc.

But the quest for the perfect correlation is not the only problem here! Based on the graph only (usually without reading the original paper), critics forgot to look at the data used. The little exercise that follows does not claim to restore “scientific truth” since this paper is a joke, but is a discussion item about a visualization that is now part our “dataviz” popular culture.

Variable sources = imprecise graph

An analysis that is based on ill-defined data can only be imprecise. The graph below shows how the variability of these data during a 10-year interval, and according to several sources, changes the result:

The data are of two types:

Nobel laureates – “A list of countries ranked in terms of Nobel laureates per capita was downloaded from Wikipedia” (Messerli 2012,1) Here’s the Wikipedia page. Wikipedia data is not a problem in itself. The problem is that another page exists, with different counts: List of Nobel laureates by country! It’s a reality: it is common to change nationality and/or that the place of birth is not the place of residence at the award, especially among the scientific population who travel a lot.

Chocolate consumption – “Data on per capita yearly chocolate consumption in 22 countries was obtained from Chocosuisse, Theobroma-cacao and Caobisco. Data were available from 2011 for 1 country, from 2010 for 15 countries, from 2004 for 5 countries and from 2002 for 1 country” (Messerli 2012,1). Here, the joke is obvious, but rightly parody difficulties that statisticians often encounter. These data are very broad estimates procured by actors of this industry, they cover different years and are highly contradictory from one source to another! Sources: theobroma-cacao.de / confectionerynews.com / chocosuisse.ch / caobisco.

What about the correlation?

In fact, the author of the study simply chose a “chocolate consumption” value from a multitude of others. The chart below summarizes the extreme positions that could have been obtained, compared to the correlation presented in Messerli 2012:

Conclusion

The “flags” graph is not only badly thought out, but its data are also wrong. Even if such paper was published as a joke for a few colleagues, it does not promote public understanding of science, statistics and medicine.

READ ALSO – If you like the “remastering” of popular data visualization, take a look at the vectorization of Minard’s historical map.

2 Comments

Thomas Lumley on 02/02/2015 at 23:15

Nice work. I pointed out some of the data problems at the time: http://www.statschat.org.nz/2012/10/12/theres-nothing-like-a-good-joke

I think the worst data problem is that the Nobel Prize data is cumulative back to 1901, so the chocolate data is decades too recent to be a good proxy for chocolate consumption at the time.

MD on 29/01/2016 at 14:42

The correlation is simply spurious. Chocolat consumption and level of research both depend on GDP:
http://jn.nutrition.org/content/143/6/931.short

Trackbacks/Pingbacks

@DiegoKuonen - MT @GrandjeanMartin: About Nobel laureates and chocolate, correlations and unreliable data: http://t.co/rIeBLu0eGT http://t.co/vQtfixdMc5
@CarlosSanzDiaz - Funny - About Nobel laureates and chocolate, correlations and unreliable data #malaciencia http://t.co/b53qXHU2PM vía @GrandjeanMartin
DataIsBeautifuI (@DataIsBeautifuI) - Correlation between Nobel laureates and chocolate consumption: a parody of the use of unreliable data [OC] http://t.co/7ChxrHoOar
@favData - Correlation between Nobel laureates and chocolate consumption: a parody of the use of unreliable #Da http://t.co/zzKFO0wYt4 #bigData
@freddie2310 - About Nobel laureates and chocolate, correlations and unreliable data http://t.co/lGx6S0x3MR #ToutLire
@DataExpertise - About Nobel laureates and chocolate, correlations and unreliable data http://t.co/yq6RQ25jt7 via @GrandjeanMartin
Lockall (@Lockall) - About Nobel laureates and chocolate, correlations and unreliable data http://t.co/r7juNZL815 via @GrandjeanMartin
JubiloMX (@jubiloMX) - relation, correlation, and causation... http://t.co/g3twtU37iW bad data, can give you false results... and does not mean causatioN!
@grssnbchr - #ddj: Stop mantra-like repeating the correlation/causation fallacy, focus on data source & quality: http://t.co/UZFwd2PhSC @GrandjeanMartin
@MaritaTovar - About Nobel laureates and chocolate, correlations and unreliable data http://t.co/zT2dhVq6rh vía @GrandjeanMartin
júbilo haku (@olibuj) - relation, correlation, and causation... http://t.co/2HS8duxgw6 bad data, can give you false results... and does not mean causatioN!
@kellettboy1 - About Nobel laureates and chocolate, correlations and unreliable data http://t.co/yMHOMkyDjU
@PeterCochrane - Politicians and managers should read this and try to understand... http://t.co/BKP9vnrG6K
@BillingViews - Love this: https://t.co/TPJzedvp38
@higgins_rory - About Nobel laureates and chocolate, correlations and unreliable data https://t.co/EyiDXPq1Wi
@felixbbopp - About Nobel laureates and chocolate, correlations and unreliable data http://t.co/DQaEvN93gB via @GrandjeanMartin
Martin Grandjean (@GrandjeanMartin) - @erikbryn a short piece about this graph (originally a joke): http://t.co/wEIEv5bG3o
@holmkw - Always check your data! About Nobel laureates and chocolate, correlations and unreliable data https://t.co/dIzkstKwI0 via @grandjeanmartin

About Nobel laureates and chocolate, correlations and unreliable data

Variable sources = imprecise graph

What about the correlation?

Conclusion

Related

2 Comments

Trackbacks/Pingbacks

Comment this postCancel reply

NEWSLETTER

SOCIAL

twitter 2 icon TWITTER

facebook 2 icon FACEBOOK

youtube icon YOUTUBE

linkedin 2 icon LINKEDIN

instagram icon INSTAGRAM

learn icon SCHOLAR

RECENT POSTS