This was the provocative title of a recent Telegraph article that caught our eye. The article reports the findings of a study by the University of Westminster that examined the ability of the general public to recognise cases of depression. The study included 1,218 participants who were each presented with a paragraph describing symptoms representative of a mental health disorder. The paragraphs were all identical, except that half described the symptoms of ‘Jack’ and the other half of ‘Kate’.
The initial result that the Telegraph reported was: “Fifty-seven per cent thought that ‘Kate’ was suffering from a mental health problem, while only 52 per cent thought that ‘Jack’ was.” Looking at these headline numbers, we were quite surprised that they were sufficient to make any definitive conclusions in the study; our gut feeling was that the difference between 57% and 52% was probably not statistically significant. To check this we can use a statistical hypothesis test. In this case we use the so-called Binomial test which tests the assumption that there is no difference between the two proportions. As suspected, we found that there was insufficient evidence to reject this assumption; and we have to conclude that the two proportions are not significantly different (at least not at the usual 95% confidence level).
The result was close to being statistically significant, but what would it have taken to get a significant result? This would be achieved by either there being a greater difference between the proportions or by having more participants in the study. With a bit of reverse engineering we can work these numbers out. We find that if either 4 more participants had found ‘Kate’ to be suffering from a mental illness or 4 less had found ‘Jack’ not to be suffering, the difference between the overall proportions would have been significant. (These would have led to sample proportions of 58.6% and 51.3% respectively.) Alternatively, if we kept the proportions constant, we would have needed 1,525 participants in the study – an increase of just over 25% to achieve a significant result.
So how did this study become a story if the result itself was not significant? We went back to the original research paper to find out a bit more about the statistical basis for the article. What we found was that our interpretation of the Telegraph’s article was incorrect; we had assumed that respondents could specify only that ‘Jack’ or ‘Kate’ either were or were not suffering from a mental health disorder. In fact there was an additional option available: a respondent could also indicate that they were unsure. The results of the study are summarised in the following table.
With these data, we need to use a different test as we are no longer dealing with a simple yes or no response. In this case a Chi-squared test can be used to assess whether the pattern of responses (yes, no or unsure) were independent of whether the subject was male (i.e. ‘Jack’) or female (i.e. ‘Kate’). Using this test the researchers found that there was “a significant difference in responses to the question of whether the described individual suffered from a mental health disorder as a function of the gender of the described individual”.
The subtlety here is that, although the proportions of respondents who did identify suffering or were unsure do not exhibit large differences between ‘Kate’ and ‘Jack’, combining the responses leads to a significant result due to the larger differences in the third option (those who said that the subject was not suffering from mental illness). In other words, the interesting numbers are not, as the Telegraph reported, that 57% and 52% of respondents identified that ‘Kate’ and ‘Jack’ were suffering from a mental health disorder, rather that 10% did not identify suffering in ‘Kate’ compared to 21% in ‘Jack’. By focussing on the former, the Telegraph introduced an element of confusion in their article as these numbers do not themselves support the overall conclusions of the study. Specifically, respondents were twice as likely to rule out mental health problems in a male patient compared to a female patient.
Overall the write-up in the Telegraph describing the study was informative, however we can’t help but feel that the most important statistics could have been more prominent and that the title is slightly misleading. Other articles that perhaps did a better job include: “Study Reveals Gender Gap in Spotting Depression” (Health Day) and “Depression in women identified more easily” (the British Psychological Society).
Articles such as this highlight that, whilst reporting statistical findings in the mainstream media has improved greatly in recent years (no doubt helped by the Royal Statistical Society’s getstats campaign and organisations like the Science Media Centre), inconsistencies do still remain and it’s often worth looking at the facts yourself! Looking at this article in more detail was also a good opportunity for us to go back and think about our intuition surrounding statistical significance and the importance and effect of sample size. We’ve only been able to touch on this issue briefly here, but see our “Importance and Effect of Sample Size” blog for further discussion on this topic.