Two-Sample t-test

Calculators

Use this calculator to test whether samples from two independent populations provide evidence that the populations have different means. For example, based on blood pressures measurements taken from a sample of women and a sample of men, can we conclude that women and men have different mean blood pressures?

This test is known as an a two sample (or unpaired) t-test. It produces a “p-value”, which can be used to decide whether there is evidence of a difference between the two population means.

The p-value is the probability that the difference between the sample means is at least as large as what has been observed, under the assumption that the population means are equal. The smaller the p-value, the more surprised we would be by the observed difference in sample means if there really was no difference between the population means. Therefore, the smaller the p-value, the stronger the evidence is that the two populations have different means.

Typically a threshold (known as the significance level) is chosen, and a p-value less than the threshold is interpreted as indicating evidence of a difference between the population means. The most common choice of significance level is 0.05, but other values, such as 0.1 or 0.01 are also used.

This calculator should be used when the sampling units (e.g. the sampled individuals) in the two groups are independent. If you are comparing two measurements taken on the same sampling unit (e.g. blood pressure of an individual before and after a drug is administered) then the appropriate test is the paired t-test.

Calculator

This is your estimated mean calculated using a sample of data collected from population 1.

This is your estimated mean calculated using a sample of data collected from population 2.

This is your estimated standard deviation calculated using a sample of data collected from population 1.

This is your estimated standard deviation calculated using a sample of data collected from population 2.

This is the size of the sample you have used to calculate the sample mean for population 1.

This is the size of the sample you have used to calculate the sample mean for population 2.

Alternative Scenarios

With sample means of
The p-value would be
0.776
0.018
0.001
With sample standard deviations of
The p-value would be
0.038
0.23
0.364
With sample sizes of
The p-value would be
0.038
0.23
0.364

More Information

Worked Example

A study compares the average capillary density in the feet of individuals with and without ulcers. A sample of 10 patients with ulcers has mean capillary density of 29, with standard deviation 7.5. A control sample of 10 individuals without ulcers has mean capillary density of 34, with standard deviation 8.0. (All measurements are in capillaries per square mm.) Using this information, the p-value is calculated as 0.167. Since this p-value is greater than 0.05, it would conventionally be interpreted as meaning that the data do not provide strong evidence of a difference in capillary density between individuals with and without ulcers.

If both sample sizes were increased to 20, the p-value would reduce to 0.048 (assuming the sample means and standard deviations remained the same), which we would interpret as strong evidence of a difference. Note that this result is not inconsistent with the previous result: with bigger samples we are able to detect smaller differences between populations.

Assumptions

This test assumes that the two populations follow normal distributions (otherwise known as Gaussian distributions). Normality of the distributions can be tested using, for example, a Q-Q plot. An alternative test that can be used if you suspect that the data are drawn from non-normal distributions is the Mann-Whitney U test.

The version of the test used here also assumes that the two populations have different variances. If you think the populations have the same variance, an alternative version of the two sample t-test (two sample t-test with a pooled variance estimator) can be used. The advantage of the alternative version is that if the populations have the same variance then it has greater statistical power – that is, there is a higher probability of detecting a difference between the population means if such a difference exists.

Discussion

Performing this test assesses the extent to which the difference between the sample means provides evidence of a difference between the population means. The test puts forward a “null” hypothesis that the population means are equal, and measures the probability of observing a difference at least as big as that seen in the data under the null hypothesis (the p-value). If the p-value is large then the observed difference between the sample means is unsurprising and is interpreted as being consistent with hypothesis of equal population means. If on the other hand the p-value is small then we would be surprised about the observed difference if the null hypothesis really was true. Therefore, a small p-value is interpreted as evidence that the null hypothesis is false and that there really is a difference between the population means. Typically a threshold (known as the significance level) is chosen, and a p-value less than the threshold is interpreted as indicating evidence of a difference between the population means. The most common choice of significance level is 0.05, but other values, such as 0.1 or 0.01 are also used.

Note that a large p-value (say, larger than 0.05) cannot in itself be interpreted as evidence that the populations have equal means. It may just mean that the sample size is not large enough to detect a difference. To find out how large your sample needs to be in order to detect a difference (if a difference exists), see our sample size calculator.

If evidence of a difference in the population means is found, you may wish to quantify that difference. The difference between the sample means is a point estimate of the difference between the population means, but it can be useful to assess how reliable this estimate is using a confidence interval. A confidence interval provides you with a set of limits in which you expect the difference between the population means to lie. The p-value and the confidence interval are related and have a consistent interpretation: if the p-value is less than α then a (1-α)*100% confidence interval will not contain zero. For example, if the p-value is less than 0.05 then a 95% confidence interval will not contain zero.

If you wish to calculate a confidence interval, our confidence interval calculator will do the work for you.

Definitions

Sample mean

The sample mean is your ‘best guess’ for what the true population mean is given your sample of data and is calcuated as:

μ = (1/n)* ∑ni=1xi,

where n is the sample size and x1,…,xn are the n sample observations.

Sample standard deviation

The sample standard deviation is calcuated as s=√σ2, where:

σ2 = (1/(n-1))* ∑ni=1(xi-μ)2,

μ is the sample mean, n is the sample size and x1,…,xn are the n sample observations.

Sample size

This is the total number of samples randomly drawn from you population.  The larger the sample size, the more certain you can be that the estimate reflects the population.  Choosing a sample size is an important aspect when desiging your study or survey.  For some further information, see our blog post on The Importance and Effect of Sample Size and for guidance on how to choose your sample size, see our sample size calculator.

P-value

The p-value is the probability that the difference between the sample means is at least as large as what has been observed, under the assumption that the population means are equal. The smaller the p-value, the more surprised we would be by the observed difference in sample means if there really was no difference between the population means. Therefore, the smaller the p-value, the stronger the evidence is that the two populations have different means.