Assessing the Accuracy of Tumour Measurements

The Challenge

To evaluate the effectiveness of anticancer therapies in clinical drug trials, a standard set of rules, known as RECIST (Response Evaluation Criteria in Solid Tumours), is widely used. RECIST defines how to determine whether cancer patients improve, stay the same or worsen during treatment, based primarily upon the lengths of tumours measured using imaging scans taken over time. It’s not changes in length but overall tumour size that is of primary interest and the RECIST protocol simply uses tumour length as an easily obtainable proxy for tumour size. However, relying simply upon tumour length to indicate change in tumour size can give rise to misleading results as it does not properly take account of irregular tumour growth or changes in shape, for example.

Over the last 30 years, CT imaging technology has improved considerably. Increased spatial resolution of CT images, plus advanced image analysis algorithms, have now made accurate measurement of the volume of a tumour lesion feasible. However, before tumour volume measurements can be used for assessing therapy response, we need to assess their accuracy. If a radiologist was to measure the volume of a tumour lesion once and then repeat the measurement, under the same conditions, some time later there would generally be some small difference in the results, i.e., there is variability in the repeated measurements.

Every measurement has an associated error and it is important to understand the size of this error for repeated tumour volume measurements in order to determine the magnitude of change in volume that we are able to detect. If, for example, repeated tumour volume measurements differ by between ±10% when there has been no change in the tumour size, then we will need to see a greater than ±10% change in tumour volume measurements during treatment to be confident that there has been a real change. For it to be worthwhile using tumour volume measurements in assessing therapy response, we need to decide whether they really are more accurate in identifying tumour changes for individual patients, than one-dimensional length measurements suggested by RECIST.

The Approach

A study was performed, using scans from oncology clinical drug trials, to explore the variability of one-dimensional and volume tumour size measurements between repeated measurements on the same imaging scan, made first by one radiologist and then by two independent radiologists (see e.g., Zhao et al., 2013). During clinical care, it is possible for more than one radiologist to see a particular patient over the course of their therapy and therefore it is also important to assess whether two different radiologists, rather than the same one making repeat measurements, has an effect on the measurement variability.

By including the repeated measurements within a statistical model, we are able to estimate how variable both one-dimensional and volume tumour measurements are. This involves using linear mixed effects models, which account for the repeated nature of the data, to estimate components of variability. We can also estimate the level of the measurement error and construct thresholds for the percentage changes that can be detected by one-dimensional and volume measurements, by using so-called Bland-Altman methods to estimate limits of agreement.

Measurement error for either method can be affected by properties of the tumour itself, for example, the magnitude of the tumour lesion length/volume, or the location of the tumour (e.g., lung, liver, or lymph nodes). By extending the model above, we can also determine how each of these factors affects the measurement error and thresholds.

The thresholds for the two methods reflect how sensitive they are in detecting change however, as they on different scales (1D and 3D) it is not possible to compare them directly. Taking a new cohort of patients, the thresholds can be used to compare the number of patients that are diagnosed as having a real change in tumour size using both the one-dimensional and volume measurements, over a 6 week period. Comparing the proportion of patients with a change in tumour size for each method allows us to assess which is the more sensitive in determining response to therapy (see e.g., Zhao et al., 2011).

The analyses provided evidence to suggest that volume is more sensitive than a one-dimensional length in identifying tumour size changes for individual patients. Further work is certainly required and future studies will most likely involve more radiologists with differing levels of experience. Also, as this study used only one in-house computer segmentation algorithm to facilitate tumour volume calculations, the results warrant further validation using different image analysis algorithms.

The Value

Understanding the measurement error for one-dimensional and volume tumour measurements helps us to determine which is the best one to use in different circumstances and therefore helps to inform imaging protocols for use in clinical drug trials and clinical care. By choosing the method with the smallest measurement variability we can minimise the number of patients required for clinical drug trials (based on sample size calculations), consequently reducing their duration and reducing the costs of new drug development.