Odds Ratio – Sample Size

Calculators

Use this calculator to determine the appropriate sample size for estimating an odds ratio with a specified relative precision.

An odds ratio is a measure of association between the presence or absence of two properties.  For example, it could provide a measure of association between customers who are either older or younger than 25 and either have or have not claimed on their car insurance, in order to determine whether age is associated with the propensity to claim (the outcome of interest). The value of the odds ratio tells you how much more likely someone under 25 might be to make a claim, for example, and the associated confidence interval indicates the degree of uncertainty associated with that ratio.

Calculator

%

The relative precision dictates your margin of error. It is the percentage by which the lower limit for your confidence interval is less than the estimated odds ratio.

A smaller relative precision requires a larger sample size.

Typical choices are 90%, 95%, or 99%

%

The confidence level specifies the amount of uncertainty associated with the estimate. This is the chance that the margin of error contains the true odds ratio.

The higher the confidence level, the larger the sample size.

%

This is the proportion of absence cases (for the property you are looking for an association with) that you expect to have the outcome.

This is the odds of the outcome given presence (of the property you are looking for an association with) relative to the same outcome in the asbence of that property.

If you are planning to sample an equal number of presence and absence cases (for the property that you are looking for an association with), leave this as 1. Otherwise this is the ratio of the number of presences compared to the number of absences that you plan to sample.

Alternative Scenarios

With a relative precision of % % %
Your sample size for absence cases would be
14
833
6210
With a confidence level of % % %
Your sample size for absence cases would be
144
204
352
With an absence case prevalence of % % %
Your sample size for absence cases would be
91
144
637
With an odds ratio of
Your sample size for absence cases would be
184
144
142
With a presence to absence ratio of
Your sample size for absence cases would be
132
144
169

More Information

Worked Example

A study aims to explore the relationship between customers who are either older or younger than 25 and whether they have made a claim on their car insurance, in order to determine whether age is associated with the propensity to claim. If we would like to estimate the odds ratio with 95% confidence and a relative precision of 50%, how many samples are required?  Assuming that we plan to sample a similar proportion of customers who are older and younger than 25, that the prevalence of claiming for those over 25 is 5%, and that the odds ratio is expected to be 10, then 144 customers over 25 and 144 customers under 25 would be sufficient.  Increasing the relative precision to 90%, reduces the sample size to 14 in each group, whilst decreasing the prevalance for customers over 25 to 1% increases the sample size to 637 per group.

Formula

This calculator uses the following formula for the sample size, na, for the absence group:

na = [Zα/22 / log2(1-RP)] * [1/X + 1/Y]

where,

X = 1 / ρp(1-ρp)k, and

Y = 1 / ρa(1-ρa),

and Zα/2 is the critical value of the Normal distribution at α/2 (e.g., for a confidence level of 95%, α is 0.05 and the critical value is 1.96), RP is the relative precision (the percentage by which the lower limit for your confidence interval is less than the estimated odds ratio), ρp is the prevalance of the outcome in the presence group, ρa is the prevalence of the outcome in the absence group, and k is the ratio of presences to absences being sampled (np/na).

Discussion

The above sample size calculator provides you with the recommended number of samples required to estimate the true odds ratio with the required relative precision and confidence level.

Try changing the five inputs (the relative precision, confidence level, absence case prevalence, expected odds ratio and presence to absence ratio) to see how they affect the sample size.  By watching what happens to the alternative scenarios you can see how each input is related to the sample size and what would happen if you didn’t use the recommended sample size. The larger the sample size, the more certain you can be that the estimates reflect the population, so the narrower the confidence interval. However, the relationship is not linear, e.g., doubling the sample size does not halve the confidence interval.

For some further information, see our blog post on The Importance and Effect of Sample Size.

Definitions

Relative precision

The relative precision is the percentage by which the lower limit for your confidence interval is less than the estimated odds ratio. The relative precision dictates your margin of error, i.e., how precise your estimate is. It is the range in which the true population odds ratio is estimated to be. Note that the actual precision achieved after you collect your data will be more or less than this target amount, because it will be based on the odds ratio and prevalence observed in the data and not the expected values supplied to the calculator.

Confidence level

The confidence level is the probability that the margin of error (dictated by the relative precision) contains the true odds ratio.  If the study was repeated and the range calculated each time, you would expect the true value to lie within these ranges on 95% of occasions.  The higher the confidence level the more certain you can be that the interval contains the true odds ratio.

Expected prevalence

The expected prevalence in the absence group is the proportion of cases with an absence, of the property that you are looking for an association with, that have the outcome of interest.

Expected odds ratio

This is what you expect the odds ratio to be, i.e., the odds of the outcome given presence of the property you are looking for an association with relative to the same outcome in the absence of that property. This can often be determined by using the results from a previous study, or by running a small pilot study.

Presence to absence ratio

This is the sampling ratio of presences to absences (for the property that you are looking for an association with) that you are planning to collect.  It may be that more absence cases are available, for example, and that this group is therefore easier to sample. It is important to note, however, that a larger total sample size will be required the further the sampling ratio is from 1.

Sample size

This is the minimum sample size you need in the absence group to estimate the true population odds ratio with the required relative precision and confidence level. Multiply this number by the sampling ratio to calculate the sample size for the presence group. If non-response or drop-outs are a possibility your sample size will have to be increased accordingly. In general, the higher the response rate the better the estimate, as non-response will often lead to biases in your estimate.