Combining Three Cohorts of WTC Rescue/Recovery Workers for Assessing Cancer Incidence and Mortality

Three cohorts including the Fire Department of the City of New York (FDNY), the World Trade Center Health Registry (WTCHR), and the General Responder Cohort (GRC), each funded by the World Trade Center Health Program have reported associations between WTC-exposures and cancer. Results have generally been consistent with effect estimates for excess incidence for all cancers ranging from 6 to 14% above background rates. Pooling would increase sample size and deduplicate cases between the cohorts. Pooling required time consuming steps: obtaining IRB approvals and legal agreements from entities involved; establishing an honest broker for managing the data; de-duplicating the pooled cohort files; applying to State Cancer Registries (SCRs) for matched cancer cases; and finalizing analysis data files. Obtaining SCR data use agreements ranged from 6.5 to 114.5 weeks with six states requiring >20 weeks. Records from FDNY (n=16,221), WTCHR (n=29,372), and GRC (n=33,427) were combined de-duplicated resulting in 69,102 unique individuals. Overall, 7,894 cancer tumors were matched to the pooled cohort, increasing the number cancers by as much as 58% compared to previous analyses. Pooling resulted in a coherent resource for future research for studies on rare cancers and mortality, with more representative of occupations and WTCexposure. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 6 January 2021 doi:10.20944/preprints202101.0105.v1 © 2021 by the author(s). Distributed under a Creative Commons CC BY license.

providing both time of arrival and duration of work at the WTC site [on the pile (area where debris from WTC towers' collapse was most concentrated) or not on the pile], kinds of tasks performed at the site, and exposure to the dust cloud.
The FDNY has information on dates and times members were at the WTC site and on the tasks performed there. The GRC has some of the information noted above in addition to the degree of exposure to the dust cloud on 9/11 at time of the building collapses.
The current project, "Incidence, Latency, and Survival after World Trade Center Exposure," was funded by the National Institute for Occupational Safety and Health (NIOSH) in 2016 to establish the "WTC Combined Rescue/recovery Cohort" (WTC CR/RC), a study population that includes rescue/recovery workers from the three cohorts described above. The overall aim of the project was to pool and de-duplicate data from three cohorts (FDNY, WTCHR, and GRC) for joint research on cancer incidence, latency and survival. The study design called for use of identical case ascertainment methods across the cohorts using state cancer registries; it also called for collaboration with the New York State Cancer Registry to manage and coordinate data pooling, and linking pooled data to 13 SCRs, ensure data security and confidentiality, and to harmonize data formats. There was also agreement to use a common exposure metric in the analysis. This paper describes the processes involved of combining data across the three WTC-exposed cohorts and linking the pooled data with state cancer registries; and the strategies for overcoming administrative challenges. .

Characteristics of the Study Cohorts
All three study cohorts -FDNY, WTCHR and the GRC -are programs funded by the NIOSH World Trade Center Health Program (WTCHP). The federal WTCHP supports 9/11-related research (WTCHR) as well as medical monitoring and treatment for 9/11-certified medical conditions for rescue/recovery workers at clinical sites, including FDNY and ISMMS, and treatment for non-rescue/recovery workers at WTCHP clinical sites.

Fire Department of the City of New York (FDNY)
The FDNY cohort consists of all firefighters and emergency medical service providers who were employed by NYC and reported being at the WTC site at least one day between 9/11/2001 and 7/25/2002 (when the WTC site closed for FDNY recovery and clean-up efforts). WTC exposure information was obtained from self-administered health questionnaires beginning on 10/2/2002 and are completed during each routine health monitoring visit (12 to 18 months, even after retirement). Demographic and identifying information including sex, race, date of birth, full social security number and full name are obtained from FDNY employee records. The WTC-exposed FDNY cohort is 84.5% firefighters, 15.5% EMS and on 9/11/2001 had a median age of 41 for firefighters and 35 for EMS. Most of the firefighters were male (98%) compared with 80% for EMS [9]. Nearly all the firefighters (99%) were present at the WTC site sometime during the first two weeks after the disaster, with 16% present at the time of the towers' collapse.
In 2011, FDNY published the first major cancer cohort study of WTC-exposed workers [4]. While FDNY now links to nine state cancer registries (Arizona, Connecticut, Florida, North Carolina, New Jersey, New York, Pennsylvania,

World Trade Center Health Registry (WTCHR)
The WTCHR was conceptualized in October 2001 to act as a registry for long-term research on individuals who were exposed to the 9/11 disaster. Recruitment and enrollment for the WTCHR were conducted by obtaining lists of potentially exposed persons via employers, unions, schools and government agencies and by outreach and multi-media campaigns, which encouraged pre-registration through calling a toll-free number or pre-registering online [11]. The WTCHR was composed of four populations at risk including: rescue/recovery/clean-up workers and volunteers who had participated in these activities at the WTC site, Staten Island Recovery Center, or barges; residents of lower Manhattan (south of Canal Street) on 9/11; occupants of destroyed and damaged buildings or persons present south of Chambers Street on 9/11; and persons who were enrolled or working at schools in lower Manhattan on 9/11, including persons younger than 18 years on 9/11. Over 71,000 persons enrolled from 9/12/2003 to 11/15/2004, including 30,665 rescue/recovery/clean-up workers in this joint cohort, by completing an initial health survey in 2003-2004. The rescue/recovery/clean-up workers and volunteers were 80% male, with a median age of 42 year on 9/11/2001. The WTCHR has continued to monitor the health of this cohort via periodic health surveys, clinically based case-control studies, and matching with appropriate sources including cancer registries, hospitalization discharge records and death records.
The WTCHR has published two cancer surveillance reports in 2012 and 2016 [5,12]. Each report obtained linked cancer data from eleven state cancer registries (California, Connecticut, Florida, Massachusetts, New Jersey, New York, North Carolina, Ohio, Pennsylvania, Texas, and Washington) comprising 91% of all enrollees living in these states during the follow-up period. The 2012 study ended data collection at the close of 2008. When year of cancer diagnosis was limited to the last two years of follow-up, the SIR for all cancers combined was 1.14 for rescue/recovery/clean-up workers and 0.92 for non-rescue/recovery/clean-up workers, neither statistically significant. For rescue/recovery/cleanup workers, three cancer types were significantly elevated including prostate, thyroid, and multiple myeloma. In contrast, for non-rescue/recovery/clean-up workers, no specific cancers were elevated. In the 2012 study, there were no significant hazard ratios (HR) relative to the lowest exposure for cancers that had a significant SIR [12]. The 2016 WTCHR cancer report had three years of additional follow-up and limited cancer cases to those diagnosed from 2007-2011 [5]. In the 2016 report the overall SIRs for both rescue/recovery/clean-up workers (SIR=1.11) and nonrescue/recovery/clean-up workers (SIR=1.08) were statistically significant. Similar to the earlier report [12], for rescue/recovery/clean-up workers prostate and thyroid cancers were elevated, as was skin melanoma. Nonrescue/recovery/clean-up workers also had elevated prostate cancers and skin melanomas with additional elevation for non-Hodgkin lymphoma and female breast cancer. For the 2016 paper, a composite weighted score reflecting estimated total exposure to dust and debris during the nine-month rescue, recovery and clean-up period was developed and used [5]; however, there were no significant cancer associations with higher vs. lower levels of exposure. For the current study, the WTCHR final analytic cohort consisted of 29,372 rescue/recovery/clean-up workers, some of whom are also members of other exposure groups in the WTCHR (e.g. residents and occupants of buildings in lower Manhattan).

General Responder Cohort (GRC)
The GRC consists of persons who were involved in rescue/recovery and clean-up efforts on the WTC effort and later enrolled in a predecessor of the WTCHP, which began in July 2002. Eligibility criteria for enrollment are based on having worked or volunteered in lower Manhattan, Staten Island, the Chief Medical Examiner's Office, and bargeloading piers four hours or more between 9/11 and 9/14/2001 or 24 hours or more in September 2001, or 80 hours or more from 9/11/2001 to 12/31/2001. Recruitment for this cohort consisted of outreach to unions and labor organizations and media campaigns. GRC members receive health monitoring visits every 12 to 18 months and treatment for WTCcertified conditions. The GRC includes participants from protective services (42%), construction (24%), buildings and grounds, maintenance and electrical, telecommunications and other installation and repair groups (10%), and other categories (19%) [6]. The median age on 9/11 was 38 years. High level exposure was 20%, categorized as 3% very highly exposed and 17% highly exposed.
Two GRC cancer incidence studies have been published. The first study was limited to 20,984 persons who enrolled in the WTCHP between 7/2002 and 12/31/2008 and identified 552 individuals with cancer through linkages with four state tumor registries (Connecticut, New Jersey, New York, and Pennsylvania) [6]. In an analysis that restricted cancer cases to those that occurred at least six months after enrollment, and for any person enrolled during the observation period, a non-statistically significant SIR of 1.06 for all cancers combined was reported, but with significant SIRs for prostate and thyroid cancers. Multivariate models assessing the association between level of WTC exposure and cancer were suggestive of a trend but were not significant. The second study was an update of the earlier one, with an additional five years of follow-up through 2013 for residents of six states (Connecticut, Florida, New Jersey, New York, North Carolina, and Pennsylvania) [7]. With the additional follow-up, nearly twice as many cancers were diagnosed as in the earlier study (N=1072); there was also an overall statistically significant SIR of 1.09 and significant SIRs for prostate and thyroid cancers. Unlike the prior study, the incidence of leukemia was also elevated. However, the associations between neither cancer overall nor prostate cancer and 9/11 exposures were statistically significant. For the current study, the GRC final analytic sample consisted of 33,427 individuals.

Creating a Combined Rescue/Recovery Cohort
The process of creating a pooled dataset across these cohorts involved a number of steps, each described below.

Establish an administrative structure
A key component of the data pooling process was selection of an "honest broker", one that would receive identifiable data, de-duplicate persons in more than one cohort, conduct matching with 13 state cancer registries and return a single, de-identified analytical file to researchers at WTCHR and FDNY for analysis. The New York State Cancer Registry (NYSCR), an essential partner in prior cancer surveillance work by all three cohorts, served as the "honest broker" for this project and their role was deemed a success story by CDC's Program of Cancer Registries [13].

Identify state cancer registries for linkage
The state cancer registries (SCRs) were selected based on the distribution of addresses on file of the rescue/recovery workers of the three cohorts and on our previous experiences performing cancer linkages [4,6,12].
Arizona, California, Connecticut, Florida, Massachusetts, New Jersey, New York, North Carolina, Ohio, Pennsylvania, Texas, Virginia, and Washington were selected.
Coverage in past cancer linkages ranged from 90 to 99% of cohort participants: 99% in FDNY cohort [14], 90% in the GRC [6], and 96% in the WTCHR cohort [5]. For all cohorts, the joint project would increase the number of state cancer registries rescue/recovery workers would be linked to and presumably result in increased coverage.

Obtain required IRB approvals and legal agreements
Before any data exchange could commence, the project required executing Data Use Agreements (DUAs) or in one case a Memorandum of Understanding (MOU) between parties involved in the project, applications to SCRs, and the completion of required Institutional Review Board (IRB) protocols and approvals ( Figure 1). NYC DOHMH required that legal agreements be established with study partners in order to receive WTCHR data, including separate DUAs with the ISMMS and NYS DOH and a MOU with FDNY. It also required DUAs with all 13 SCRs (as well as the study partner NYSCR.) Einstein IRB,  The IRB at Albert Einstein College of Medicine served as the primary IRB, approving the study as minimal risk with a waiver of informed consent. It also served as the IRB for the FDNY via an IRB Authorization Agreement, and approved FDNY's IRB study submission. IRB approvals were granted from the NYSCR and NYC DOHMH, and an IRB exemption was granted from the Icahn School of Medicine (ISMMS). Each of the 13 SCRs required a study submission to their IRBs, along with a supplemental SCR data request application for review and approval, all managed by NYC DOHMH. Applications were also submitted to the National Death Index (NDI), NYS and NYC Vital Records for the use of their mortality data.
Overall, it took two years and seven and a half months from the start of funding to complete all the legal agreements, applications and IRB approvals, and three years and 24 days until receipt of the final linked data.
The initial DUAs were based on a NYC DOHMH legal template with specific requirements for content. The process also consumed a substantial amount of administrative management time and effort and both to identify the appropriate parties for review and to obtain appropriate signatures. For example, the first seven and least cumbersome DUAs each required on average one phone call and 24 email communication and took approximately forty hours to successful completion. In some cases, legal representatives of participating agencies disagreed on language, requiring more time-consuming involvement of other officials in the negotiations.   Table 1). We estimated that each IRB application required up to 40 hours to prepare, not including responding to specific questions posed by IRB committees. The total estimated amount of time expended was approximately 680 hours since the project required approvals from a total of seventeen separate IRBs (13 states, 3 cohort institutions and Einstein). WTC exposure information was included in its entirety from the matched record that had the earliest enrollment date.
It should be noted that records from FDNY typically had the earliest enrollment date so that if an FDNY record was included, exposure information was based on FDNY information. The diagram below shows the overlapping records within the joined files ( Figure 2).

WTC exposure for Joined Cohort Data
The five previously published cancer studies from these cohorts used different definitions of WTC exposures, based on the best information available for each cohort at the time [8]. There was wide variation in the activities designated as "highest exposure" for rescue/recovery/clean-up workers, ranging from the FDNY definition, which included all individuals who arrived at the WTC site on the morning of 9/11 before the buildings collapsed, WTCHR and GRC designations which incorporated specific information about working on the pile, length of time working at site (e.g., >90 days for both WTCHR and GRC), and being caught in the dust/debris cloud at time of the towers' collapse.
Three exposure constructs are helpful in organizing available exposure information for rescue/recovery/clean-up workers: a) delineating exposure levels on 9/11; b) specifying time periods worked between 9/11 and June 25, 2002; and, c) recording tasks performed. A key assumption was that arriving on site during the morning of 9/11 would place early rescue/recovery workers on-site at the time of towers' collapse, with ensuing entrapment in the dust and debris cloud. and GRC asked directly about being in the dust/debris cloud. Those who were not present during the towers' but who worked on the pile on the WT C site on 9/11 were included in Type B. In this instance, WTCHR asked directly about working on the pile on 9/11. However, GRC included those who did not have the full dust cloud exposure on 9/11 and FDNY asked whether the firefighters engaged in rescue activities such as fire suppression or rescue/recovery at the site on 9/11. Type C would require being present in the vicinity of the WTC site on 9/11, but none of the other criteria of Type A or B. Type D would not be present at the WTC site on 9/11. The three cohorts varied considerably on the distribution of this category with only 35% of FDNY arriving at the WTC site after 9/11, 60% for WTCHR and 50% of GRC, likely representative of the occupational differences between members of the cohort and their responsibilities on and after 9/11.
A second exposure construct estimates the burden of work on the effort via earliest arrival time, time periods worked (e.g., 9/11/2001, 9/12, 9/13-9/17, 9/18 to 6/30/2002 ) and total number of days worked, but not all entities had the same information available, especially number of days worked on pile or site and which days or period of time they worked. As a result of pooling data from the three cohorts, time of arrival was more evenly distributed than for individual cohorts. For instance, 62% of the FDNY cohort reported a 9/11 start date (15% of the entire pooled data) and 4.7% who arrived after 9/17. After pooling of FDNY with the GRC and WTCHR, 39% arrived on 9/11 and 21% after 9/17, with 20% for each of the other two time periods (9/12 and 9/13-9/17).  The summary exposure data for the pooled cohort therefore includes a) dust/debris exposure on morning of 9/11 (yes/no); b) date of arrival (9/11/2001, 9/12/2001, 9/13-9/17/2001, 9/18/2001-6/30/2002); and c) ever performed tasks on pile (yes/no). This information is included for every individual in the pooled data file.

Discussion
There is considerable scientific merit in joining three distinct cohorts of rescue/recovery workers for the common goal of assessing the association between cancer incidence and exposure to pollutants during and following the WTC disaster. This manuscript provides a detailed description of the process, and the great efforts, required for the realization of that merit. First, substantial administrative preparation and persistence were required. Second, complexity was added by fundamental differences across the study designs, e.g., closed versus open enrollment, and WTC exposure level (on 9/11, after 9/11) were difficult to harmonize. Regardless of these initial differences, however, the final outcome of the joining process was a greatly strengthened resource for monitoring the impact of 9/11 exposures on cancer risk among rescue and recovery workers and volunteers.
Although the central task for combining the cohorts was merging the members, eliminating duplicates to create the WTC CR/RC Cohort and matching the cohort with cancer registries, the overwhelming consumption of resources and effort went into building of an administrative infrastructure before that process could even begin. An earlier report on matching a large cohort of jet engine manufacturing workers (225,000) with over 30 SCRs estimated it required around 400 hours to complete state IRB and applications [15]. As we also experienced, the prior study [15] reported a wide variation in the responsiveness and efficiency of SCRs on application requirements and follow-up of matching and provision of cases, which some states never completed before the researchers abandoned the effort. For this project, there was also a wide range in time from application completion to receipt of matched data. In addition to the SCRs' matching, our project also required the completion of multiple data use agreements across the administrative homes of each cohort (in addition, of course, to IRB approval within each participating entity that received the data). Typically, a grant-funded project with a three-to five-year funding period would expect that the administrative tasks would be completed in the first year. In the case of the WTC CR/RC, it required more than three years of the project's funding period before data were available for analysis, although this included the actual data linkages. Even so it was fortunate that the NYSCR served as the honest broker, bringing their substantial expertise for both merging of the cohorts and Health [16].
The sample size (N=69,102) of the WTC CR/RC is substantially larger than what individual cohorts have available for analysis. This increase amounts to 77% for FDNY, 57% for WTCHR and 52% for GRC. As a result of pooling, there are numbers of distinct advantages for assessing 9/11-related cancer, for instance, to satisfy sample size requirements for change point analysis to estimate latency or to conduct survival analysis that includes cancer stage. In addition, there is increased capability for assessing the association of 9/11 exposure with lower incidence cancers that have been identified as of concern in prior reports such as multiple myeloma [17] or kidney cancer [5] due to improved statistical power. The pooled cohort also has the potential of increasing the scope of 9/11 exposures that could be used for internal comparisons. For instance, a very large proportion of FDNY responders arrived on 9/11 in comparison to the other cohorts where arrival and duration of exposure had a greater spread. After pooling the data, we have the potential of achieving greater efficiency with more balanced numbers between exposure groups, especially with a larger reference such as time of arrival after 9/17/2001, producing more precise estimates [18]. Other than sample size increase, the pooling of the cohorts also takes advantage of the heterogeneity of the different cohorts, thereby increasing the representation of different populations of risk with a unified file of common elements.
The project goals set out by Boffetta, et al. [8], have largely been achieved, including agreement on a common set of state cancer registries with standardized elements for matching. Also, the WTC CR/RC Cohort study group can select a reference population for the study population using a clearly stated rationale and for a specific set of years. Reference populations could be modified according to analytic objectives, but with increased power to detect signals for less common cancers. Given that the prior reports on cancer as discussed above used very different 9/11 exposure schemes, the WTC CR/RC cancer analysis will use a common framework for exposure definitions. These strategies for enhanced analytic capabilities offset the limitations created by combining cohorts, which includes such things as differences in enrollment strategies, variation in level of missing information for matching, and loss of detail of WTC-exposure information.

Conclusions
The consolidation of data from three WTC-exposed rescue/recovery worker cohorts that have previously reported results on cancer association with WTC exposures has resulted in a merged file of over 69,000 unique individuals and formed the WTC CR/RC. This resource for future research has sufficient sample size for a large number of hypothesis testing in observational studies of WTC exposures in relation to outcomes, such as rare cancers and mortality. In addition, the CR/RC has an infrastructure in place, including a collaborative scientific team, an honest broker (NYSCR), DUAs and IRB protocols that require minimal update effort at regular intervals of three to five years for monitoring of