Use this calculator to determine a confidence interval for your odds ratio. An odds ratio is a measure of association between the presence or absence of two properties. For example, it could provide a measure of association between customers who are either older or younger than 25 and either have or have not claimed on their car insurance, in order to determine whether age is associated with the propensity to claim. The value of the odds ratio tells you how much more likely someone under 25 might be to make a claim, for example, and the associated confidence interval indicates the degree of uncertainty associated with that ratio.
In 1950, the Medical Research Council conducted a case-control study of smoking and lung cancer (Doll and Hill 1950). 649 male cancer patients were included (the cases), 647 of whom were reported to be smokers. 649 men without cancer were also included (controls), 622 of whom were reported to be smokers. The odds ratio of lung cancer for smokers compared with non-smokers can be calculated as (647*27)/(2*622) = 14.04, i.e., the odds of lung cancer in smokers is estimated to be 14 times the odds of lung cancer in non-smokers. We would like to know how reliable this estimate is? The 95% confidence interval for this odds ratio is between 3.33 and 59.3. The interval is rather wide because the numbers of non-smokers, particularly for lung cancer cases, are very small. Increasing the confidence level to 99% this interval would increase to between 2.11 and 93.25.
Doll and Hill 1950 is a famous study from the literature and is described in further detail in the following reference book (pp240-243).
Martin Bland, An Introduction to Medical Statistics Third Edition, Oxford University Press (2000).
This calculator uses the following formulae to calculate the odds ratio (or) and its confidence interval (ci). or = a*d / b*c, where:
- a is the number of times both A and B are present,
- b is the number of times A is present, but B is absent,
- c is the number of times A is absent, but B is present, and
- d is the number of times both A and B are negative.
To calculate the confidence interval, we use the log odds ratio, log(or) = log(a*d/b*c), and calculate its standard error:
se(log(or)) = √1/a + 1/b + 1/c +1/d
The confidence interval, ci, is calculated as:
ci = exp(log(or) ± Zα/2*√1/a + 1/b + 1/c + 1/d),
where Zα/2 is the critical value of the Normal distribution at α/2 (e.g. for a confidence level of 95%, α is 0.05 and the critical value is 1.96).
Note: The logarithms included in the formulae above are natural logarithms, i.e., log base e, sometimes denoted ln().
When the prevalence of the outcome is low, the odds ratio can be used to estimate the relative risk in a case-control study. This is useful as the calculation of relative risk depends on being able to estimate the risks. In a prospective study we can do this as we know how many of the risk group develop the outcome. However, this cannot be done if we start with the outcome and try to work back to the risk factor, as in a case-control study. Calculating a confidence interval provides you with an indication of how reliable your odds ratio is (the wider the interval, the greater the uncertainty associated with your estimate). By changing the inputs (the contingency table and confidence level) in the Alternative Scenarios you can see how each input is related to the confidence interval. The larger your sample size, the more certain you can be that the estimates reflect the population, so the narrower the confidence interval. However, the relationship is not linear, e.g., doubling the sample size does not halve the confidence interval. Choosing a sample size is an important aspect when designing your study or survey. For some further information, see our blog post on The Importance and Effect of Sample Size.
Odds and odds ratio
The odds of an event occurring is calculated as the ratio of the probability of a property being present compared to the probability of it being absent; this is simply the number of times that the property is absent divided by the number of times it is absent. In the worked example, the odds of lung cancer for smokers is calculated as 647/622=1.04, whilst the odds of lung cancer for non-smokers is 2/27=0.07. The odds ratio is calculated by dividing the odds of the first group by the odds in the second group. In the case of the worked example, it is the ratio of the odds of lung cancer in smokers divided by the odds of lung cancer in non-smokers: (647/622)/(2/27)=14.04. If the odds ratio is greater than 1, then being a smoker is considered to be associated with having lung cancer since smoking raises the odds of having lung cancer.
The contingency table summarises the outcomes of each individual sampled in terms of whether Properties A and B are absent or present. It represents the joint frequency distribution of the two properties.
The confidence level is the probability that the confidence interval contains the true odds ratio. If the study was repeated and the range calculated each time, you would expect the true value to lie within these ranges on 95% of occasions. The higher the confidence level the more certain you can be that the interval contains the true odds ratio.