ABSTRACT

Large cohort (follow-up) studies, including those of patients in randomized clinical trials, provide the most definitive evidence of the effect of treatments on clinical outcomes and of exposure to risk factors on disease outcomes. The Atherosclerosis Risk in Communities (ARIC) study, for example, ascertained nearly 16,000 subjects to investigate the impact of environmental and genetic risk factors on cardiovascular disease (Williams, 1989). The Women’s Health Initiative (WHI) randomized over 26,000 women to hormone therapy or placebo in clinical trials for women with and without an intact uterus (Anderson, 2003). In view of logistic and financial constraints associated with measurement of biomarkers on tens of thousands of subjects, both ARIC and WHI selected random samples, termed a cohort random sample and a subcohort, respectively, for whom routine bioassays of selected biomarkers were performed. Sampling was stratified on demographic factors in order to achieve desired minority representation. In a series of substudies, biomarkers were assayed using stored serum samples for additional subjects who developed one of several disease outcomes of interest. Data from the cohort sample and from disease cases outside the sample were combined using specialized techniques based on the Cox model to estimate disease risks (Prentice, 1986). However, this approach ignored large quantities of information on baseline factors available for main cohort members who were not also a disease case or in the cohort sample.