More Information
Worked Example
A study compares the average capillary density in the feet of individuals with and without ulcers. A sample of 10 patients with ulcers has mean capillary density of 29, with standard deviation 7.5. A control sample of 10 individuals without ulcers has mean capillary density of 34, with standard deviation 8.0. (All measurements are in capillaries per square mm.) Using this information, the p-value is calculated as 0.167. Since this p-value is greater than 0.05, it would conventionally be interpreted as meaning that the data do not provide strong evidence of a difference in capillary density between individuals with and without ulcers.
If both sample sizes were increased to 20, the p-value would reduce to 0.048 (assuming the sample means and standard deviations remained the same), which we would interpret as strong evidence of a difference. Note that this result is not inconsistent with the previous result: with bigger samples we are able to detect smaller differences between populations.
Assumptions
This test assumes that the two populations follow normal distributions (otherwise known as Gaussian distributions). Normality of the distributions can be tested using, for example, a Q-Q plot. An alternative test that can be used if you suspect that the data are drawn from non-normal distributions is the Mann-Whitney U test.
The version of the test used here also assumes that the two populations have different variances. If you think the populations have the same variance, an alternative version of the two sample t-test (two sample t-test with a pooled variance estimator) can be used. The advantage of the alternative version is that if the populations have the same variance then it has greater statistical power – that is, there is a higher probability of detecting a difference between the population means if such a difference exists.
Discussion
Performing this test assesses the extent to which the difference between the sample means provides evidence of a difference between the population means. The test puts forward a “null” hypothesis that the population means are equal, and measures the probability of observing a difference at least as big as that seen in the data under the null hypothesis (the p-value). If the p-value is large then the observed difference between the sample means is unsurprising and is interpreted as being consistent with hypothesis of equal population means. If on the other hand the p-value is small then we would be surprised about the observed difference if the null hypothesis really was true. Therefore, a small p-value is interpreted as evidence that the null hypothesis is false and that there really is a difference between the population means. Typically a threshold (known as the significance level) is chosen, and a p-value less than the threshold is interpreted as indicating evidence of a difference between the population means. The most common choice of significance level is 0.05, but other values, such as 0.1 or 0.01 are also used.
Note that a large p-value (say, larger than 0.05) cannot in itself be interpreted as evidence that the populations have equal means. It may just mean that the sample size is not large enough to detect a difference. To find out how large your sample needs to be in order to detect a difference (if a difference exists), see our sample size calculator.
If evidence of a difference in the population means is found, you may wish to quantify that difference. The difference between the sample means is a point estimate of the difference between the population means, but it can be useful to assess how reliable this estimate is using a confidence interval. A confidence interval provides you with a set of limits in which you expect the difference between the population means to lie. The p-value and the confidence interval are related and have a consistent interpretation: if the p-value is less than α then a (1-α)*100% confidence interval will not contain zero. For example, if the p-value is less than 0.05 then a 95% confidence interval will not contain zero.
If you wish to calculate a confidence interval, our confidence interval calculator will do the work for you.
Definitions
Sample mean
The sample mean is your ‘best guess’ for what the true population mean is given your sample of data and is calcuated as:
μ = (1/n)* ∑ni=1xi,
where n is the sample size and x1,…,xn are the n sample observations.
Sample standard deviation
The sample standard deviation is calcuated as s=√σ2, where:
σ2 = (1/(n-1))* ∑ni=1(xi-μ)2,
μ is the sample mean, n is the sample size and x1,…,xn are the n sample observations.
Sample size
This is the total number of samples randomly drawn from you population. The larger the sample size, the more certain you can be that the estimate reflects the population. Choosing a sample size is an important aspect when desiging your study or survey. For some further information, see our blog post on The Importance and Effect of Sample Size and for guidance on how to choose your sample size, see our sample size calculator.
P-value
The p-value is the probability that the difference between the sample means is at least as large as what has been observed, under the assumption that the population means are equal. The smaller the p-value, the more surprised we would be by the observed difference in sample means if there really was no difference between the population means. Therefore, the smaller the p-value, the stronger the evidence is that the two populations have different means.