CTCA survival comparison statistical methodology
Selection of comparison samples
In 2009, CTCA implemented a project to compare the cancer survival outcomes between patients who were treated by CTCA since diagnosis and patients from the National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) Program database.
The comparative analyses were conducted for a total of eleven cancer types. For each cancer type, the CTCA sample contained all eligible cancer patients diagnosed between 2000 and 2005 from both CTCA at Southwestern Regional Medical Center in Tulsa, OK, and CTCA at Midwestern Regional Medical Center in Zion, IL. CTCA site managers prepared the database. Basic cancer and patient characteristics such as age at initial diagnosis, year of initial diagnosis, cancer stages, cancer primary sites, and gender were first examined with the CTCA sample. These characteristics were then used to identify the comparison cancer cases from the SEER database for our analysis.
The SEER database is an authoritative source of information on cancer incidence and survival in the United States. The SEER comparison sample for this project was chosen by matching basic characteristics on some of the most important factors that affect survival outcomes, such as age at diagnosis, cancer severity stage, and cancer site. More specifically, the SEER comparison sample included patients from the SEER database whose age at diagnosis was within the same age range as the CTCA sample at time of diagnosis. The SEER comparison sample excluded patients whose stage of cancer severity as staged by the SEER Summary Stages or SSS, (see http://www.seer.cancer.gov/tools/ssm/) or tumor sites (as coded by ICD-O-3) were not present in the CTCA sample. The latest SEER Limited-Use Database as of 1/29/2009 was used (see http://seer.cancer.gov/data/access.html) to select the SEER comparison sample.
For both CTCA and SEER samples, only cancer patients whose initial diagnosis occurred between 2000 and 2005 were analyzed. Cancer cases with missing information on either the date of initial diagnosis or the date of last contact were deleted from the CTCA database because the survival time or censoring time for such patients could not be computed. Cancer patients with missing SEER Summary Stages were also excluded from the analyses. For patients with multiple cancers in the SEER and CTCA database, only the first of the cancer type under analysis was used for the survival comparisons.
The time of survival outcome from the CTCA database was computed as the time from the initial cancer diagnosis to death, and was computed in number of years as the difference between the date of death and the date of initial diagnosis divided by 365.25. The survival outcome from the SEER database was provided by the SEER Limited-Use Data File as the number of completed years and the number of completed months. These were then converted to the number of years by dividing the total number of survival months by 12. For those patients who were still alive or lost to follow-up at the time of entering the databases, their survival time was treated as statistically censored  at the difference between the date of last contact and the date of initial diagnosis.
For each cancer type, the survival curve (defined as the probability of cancer patient survival as a function of time after the initial diagnosis) was estimated by the Kaplan-Meier nonparametric product-limit estimator . Three statistical tests were used to compare the survival curves between the CTCA database and the SEER database.
Two of these tests, the logrank test and Wilcoxon test are nonparametric in the sense they are valid to compare survival curves that have any shapes . They are different, however, in the sensitivity (or the power) to detect survival differences. The logrank test is generally the most sensitive or powerful when the risk or the hazard of death between CTCA and SEER samples is approximately proportional, whereas the Wilcoxon test tends to be more sensitive when the ratio of hazards of death is higher at earlier times than at later ones. The third test, the likelihood ratio test , is the most restrictive of the three in the sense that it is appropriate to use only for very special survival curves (called exponential distributions) whose hazards of death are constant across time.
Ninety-five percent confidence interval (95% CI) estimates for the individual survival rates as well as for the difference of survival rates between the CTCA and SEER samples at specific time points after diagnosis were based on the estimated survival curves and the relevant asymptotic normal distributions. All these analyses were implemented using the standard SAS package of statistical tests, i.e., SAS/PROC LIFETEST  and SAS macros.
This study shares similar limitations that apply to all observational studies. First, because the study was not randomized and the CTCA and SEER samples were convenience samples, no causal interpretation (i.e., cause and effect relationship) on the comparative analyses results should be attempted. Second, although some types of matching as described above were implemented to select the appropriate SEER comparison sample, many factors other than those considered in the study, and available from the database, could have contributed to the survival outcomes. The possible confounding of these factors to the analyses results cannot be ruled out. Third, the comparison for each cancer type was based on the statistical comparisons on the rate of death from all possible causes, not solely the cancer-specific death. Finally, although a large cancer sample was available from the SEER database, the sample sizes from CTCA for several cancer types were limited; therefore, a larger CTCA sample size would provide a more definitive answer for those cancer types.
- Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. New York: John Wiley, 1980.
- Lawless JF. Statistical Methods and Methods for Lifetime Data, New York: John Wiley & Sons, Inc., 1982.
- SAS Institute Inc., SAS/STAT User’s Guide, Volume 2, Version 6, 1990. Cary, NC, USA.
At the time of the analyses, the most recent stage-specific data available within the SEER database was limited to patients who were diagnosed between 2000 and 2003. We extended the CTCA sample through 2005 to enlarge the sample size. The rate of survival for CTCA patients diagnosed in 2004 and 2005 is consistent with the rate of survival of CTCA patients diagnosed between 2000 and 2003, allowing for a larger CTCA sample size to achieve more accurate estimates of survival rates.