A study aims to explore the relationship between customers who are either older or younger than 25 and whether they have made a claim on their car insurance, in order to determine whether age is associated with the propensity to claim. If we would like to estimate the odds ratio with 95% confidence and a relative precision of 50%, how many samples are required? Assuming that we plan to sample a similar proportion of customers who are older and younger than 25, that the prevalence of claiming for those over 25 is 5%, and that the odds ratio is expected to be 10, then 144 customers over 25 and 144 customers under 25 would be sufficient. Increasing the relative precision to 90%, reduces the sample size to 14 in each group, whilst decreasing the prevalance for customers over 25 to 1% increases the sample size to 637 per group.
This calculator uses the following formula for the sample size, na, for the absence group:
na = [Zα/22 / log2(1-RP)] * [1/X + 1/Y]
X = 1 / ρp(1-ρp)k, and
Y = 1 / ρa(1-ρa),
and Zα/2 is the critical value of the Normal distribution at α/2 (e.g., for a confidence level of 95%, α is 0.05 and the critical value is 1.96), RP is the relative precision (the percentage by which the lower limit for your confidence interval is less than the estimated odds ratio), ρp is the prevalance of the outcome in the presence group, ρa is the prevalence of the outcome in the absence group, and k is the ratio of presences to absences being sampled (np/na).
The above sample size calculator provides you with the recommended number of samples required to estimate the true odds ratio with the required relative precision and confidence level.
Try changing the five inputs (the relative precision, confidence level, absence case prevalence, expected odds ratio and presence to absence ratio) to see how they affect the sample size. By watching what happens to the alternative scenarios you can see how each input is related to the sample size and what would happen if you didn’t use the recommended sample size. The larger the sample size, the more certain you can be that the estimates reflect the population, so the narrower the confidence interval. However, the relationship is not linear, e.g., doubling the sample size does not halve the confidence interval.
For some further information, see our blog post on The Importance and Effect of Sample Size.
The relative precision is the percentage by which the lower limit for your confidence interval is less than the estimated odds ratio. The relative precision dictates your margin of error, i.e., how precise your estimate is. It is the range in which the true population odds ratio is estimated to be. Note that the actual precision achieved after you collect your data will be more or less than this target amount, because it will be based on the odds ratio and prevalence observed in the data and not the expected values supplied to the calculator.
The confidence level is the probability that the margin of error (dictated by the relative precision) contains the true odds ratio. If the study was repeated and the range calculated each time, you would expect the true value to lie within these ranges on 95% of occasions. The higher the confidence level the more certain you can be that the interval contains the true odds ratio.
The expected prevalence in the absence group is the proportion of cases with an absence, of the property that you are looking for an association with, that have the outcome of interest.
Expected odds ratio
This is what you expect the odds ratio to be, i.e., the odds of the outcome given presence of the property you are looking for an association with relative to the same outcome in the absence of that property. This can often be determined by using the results from a previous study, or by running a small pilot study.
Presence to absence ratio
This is the sampling ratio of presences to absences (for the property that you are looking for an association with) that you are planning to collect. It may be that more absence cases are available, for example, and that this group is therefore easier to sample. It is important to note, however, that a larger total sample size will be required the further the sampling ratio is from 1.
This is the minimum sample size you need in the absence group to estimate the true population odds ratio with the required relative precision and confidence level. Multiply this number by the sampling ratio to calculate the sample size for the presence group. If non-response or drop-outs are a possibility your sample size will have to be increased accordingly. In general, the higher the response rate the better the estimate, as non-response will often lead to biases in your estimate.