CTCA survival comparison statistical methodology
Selection of comparison samples
In 2009, CTCA implemented a project to compare the cancer survival outcomes between patients who were considered the analytic cases (see definition at https://www.facs.org/~/media/files/quality%20programs/cancer/coc /fords/fords%20manual%2013.ashx) by Cancer Treatment Centers of America (CTCA) and patients from the National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) Program database. The results of the analysis have been updated periodically with the latest update occurring in 2014. Only the survival outcomes for patients with advanced-stage cancer from each database, as defined by SEER Summary Stage (see http://www.seer.cancer.gov/tools/ssm/) are presented.
The comparative analyses were conducted for eleven different types of cancer. For each cancer type, the 2014 CTCA sample contained all eligible cancer patients diagnosed between 2000 and 2011 from CTCA hospitals at Southwestern Regional Medical Center in Tulsa, OK; CTCA at Midwestern Regional Medical Center in Chicago, IL; CTCA at Eastern Regional Medical Center in Philadelphia, PA; and CTCA at Western Regional Medical Center in Phoenix, AZ. CTCA site managers prepared the database. Basic patient and cancer characteristics such as age at initial diagnosis, year of initial diagnosis, cancer stage, cancer primary site, and gender were first examined with the CTCA sample. These characteristics were then used to identify the comparison cancer cases from the SEER database that were used for the analyses.
The SEER database is an authoritative source of information on cancer incidence and survival in the United States. The SEER comparison sample for this project was chosen by matching basic characteristics on several of the most important factors that affect survival outcomes. The latest SEER Limited-Use Database as of 2014 was used (see http://seer.cancer.gov/data/access.html) to select the SEER comparison sample. The final survival analysis included only patients from both CTCA and SEER databases whose following cancer characteristics are shared between the two databases: the SEER Summary Stages (see http://www.seer.cancer.gov/tools/ssm/), the primary tumor sites, the cancer histologic types, gender, and the age at initial diagnosis. For example, if a specific SEER Summary Stage had only patients in one database, none of these patients were used in the analysis. To match the age at initial diagnosis, the range (i.e., the minimum and maximum ages) was computed from each sample. Only patients whose age at initial diagnosis fell into the overlap of the two ranges from the CTCA and SEER samples were included in the comparative survival analysis.
For both CTCA and SEER samples, only cancer patients whose initial diagnosis occurred between 2000 and 2011 were analyzed. Cancer cases with missing information on either the date of initial diagnosis or the date of last contact were deleted from the CTCA database because the survival time or censoring time for such patients could not be computed. Cancer patients with missing SEER Summary Stages were also excluded from the analyses. For patients with multiple cancers in the SEER and CTCA databases, only the first or primary cancer that was diagnosed was used for the survival comparisons. Patients with a histologic code (ICD-O-3) between 9590 and 9989 were excluded from the analyses because these histologic types were generally not included by SEER for any non-hematopoietic cancer types. Patients who did not receive treatment from CTCA were also excluded from the analyses.
The survival outcome from the SEER database was provided by the SEER Limited-Use Data File as the number of completed months. These numbers were then converted to the number of years by dividing the number of total months by 12. Although the exact dates for the initial diagnosis and death were available in the CTCA database, the CTCA survival outcome was computed using the same methodology as was used for the SEER database, i.e., using the number of completed months which was computed by first dividing the exact days from the initial diagnosis to death (or last contact for those who remained alive) by 365.24 (as was done by SEER), and then rounding down to the number of completed months, and finally dividing that result by 12. For those patients who were still alive or lost to follow-up at the time of entering the databases, their survival time was treated as statistically censored at the difference between the date of last contact and the date of initial diagnosis.
For each cancer type, the survival curve (defined as the probability of a cancer patient’s survival as a function of time after the initial diagnosis) was estimated by the Kaplan-Meier nonparametric product-limit estimator . Three statistical tests were then used to compare the survival curves between the CTCA database and the SEER database.
Two of these tests, the logrank test and Wilcoxon test, are nonparametric in the sense they are valid to compare survival curves that have any shapes. These tests are different, however, in the sensitivity (or the power) to detect survival differences. The logrank test is generally the most sensitive or powerful when the risk or the hazard of death between CTCA and SEER samples is approximately proportional, whereas the Wilcoxon test tends to be more sensitive when the ratio of hazards of death is higher at earlier times than at later ones. The third test, the likelihood ratio test , is the most restrictive of the three in the sense that it is appropriate to use only for very special survival curves (called exponential distributions) whose hazards of death are constant across time.
Ninety-five percent confidence interval (95% CI) estimates for the individual survival rates as well as for the difference in survival rates between the CTCA and SEER samples at specific time points after diagnosis were based on the estimated survival curves and the relevant asymptotic normal distributions. All these analyses were implemented using the standard SAS package of statistical tests, i.e., SAS/PROC LIFETEST . Adjusted analyses were also done (results not shown) using the stratified logrank test and the Wilcoxon test as well as Cox's proportional hazards models to compare the survival outcomes between the CTCA and SEER samples after adjusting for the effects of age at diagnosis, gender (except for breast and prostate cancers), race, and year of initial diagnosis. Technical details of these statistical analyses are available from CTCA.
This study has limitations. First, although a large cancer sample was available from the SEER program across many geographic regions in the United States, both samples, including the sample from CTCA, are convenience samples. The nature of these convenience samples prevents a causal interpretation of the statistical inferences. Second, although some types of matching, as described above, were implemented to select the appropriate SEER and CTCA comparison samples, the distributions of important covariates such as age at initial diagnosis, race and year of initial diagnosis were not exactly the same between the CTCA sample and SEER sample. Hence, even with the adjusted analyses, the possible confounding of these factors to the analyses and results cannot be ruled out. Further, many factors (e.g., income, access to health care/insurance, mobility) other than those considered in the analyses, and available from the databases, could also have contributed to the survival outcomes. The possible confounding of these factors to the analyses results cannot be ruled out. Third, the survival analyses were based on the statistical comparisons of the rate of death from all possible causes, not solely the cancer-specific death. Data from CTCA are not available for a statistical comparison on cancer cause-specific death rates.
- Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. New York: John Wiley, 1980.
- Lawless JF. Statistical Methods and Methods for Lifetime Data, New York: John Wiley & Sons, Inc., 1982.
- SAS Institute Inc., SAS/STAT User’s Guide, Volume 2, Version 6, 1990. Cary, NC, USA.