ABSTRACT

In epidemiological cohort studies, the occurrences of major clinical events, such as cancer, cardiovascular disease, and death, are typically infrequent, such that large cohorts are required to provide reliable information about the effects of exposures or other covariates on the event times or failure times. The covariates of interest often involve biomarker assay, genome sequencing, medical imaging, or extraction of detailed exposure histories and thus are prohibitively expensive to measure on all cohort members in a large study. A cost-effective solution to this problem is to measure the covariates on all cases, i.e., the subjects who have developed the event of interest during the follow-up, and a subset of controls, i.e., those who have not developed the event of interest by the end of the study. There are two commonly used sampling schemes for selecting controls: case-cohort sampling selects a random subcohort of the original cohort (Prentice, 1986); nested case-control sampling selects a small number of controls, usually between 1 and 5, for each observed failure time (Thomas, 1977). Such sampling schemes can drastically reduce the cost of conducting large epidemiological cohort studies while incurring little loss of statistical efficiency relative to full-cohort sampling.