EPID600 (Spring 2013) module
XIII. Multicausality: Confounding

Questions for Case Study on Esophageal cancer in China (View instructions)

(NOTE: For some of these questions there may not be one "right answer".)

This case study was inspired by the Nova program The Cancer Detectives of Lin Xian, which recounts a series of investigations into the very high rates of esophageal cancer in Northern China (also described in Chung S. Yang. Research on esophageal cancer in China: a review Cancer Research 1980;40:2633-2644). The wide-ranging investigations considered four types of factors: physical (scratching from dried food, injury from very hot food), diet (pickled cabbage, low vitamin intake), environmental (high nitrates in soil and water), infectious agents (fungi) and their possible interactions. Specific exposures implicated included nitrosamines in the diet (nitrosamines have been found to be related to cancer and can form in the body if the chemical components are ingested), fermented and moldy foods (substances produced by microorganisms can be carcinogenic), low levels of vitamins A and C (although firm links have been difficult to establish, numerous studies have found evidence that vitamins A and C, and other micronutrients, may reduce cancer risk).

The hypothetical study enrolled a cohort of 4,000 people age 40-49 years and collected baseline data on several aspects of their diet: moldy bread (MB), pickled cabbage (PC), and insufficient vitamin C (LVC). The cohort was followed for 10 years, during which 205 cases of Barrett’s esophagus were detected. (The case study uses Barrett’s esophagus (BE), a pre-cancerous condition for adenocarcinoma of the esophagus, as the study outcome because it develops much more frequently than cancer. However, although most esophageal cancers in the U.S. are adenocarcinomas, most esophageal cancers in China develop in squamous cells.)

After carefully reviewing and cleaning their data, the investigators carried out a crude analysis to examine the relation of each dietary risk factor (MB, PC, LVC) to Barrett’s esophagus (BE). The crude analysis is shown in the first set of tables at www.unc.edu/epid600/classes/2013a/modules/casestudies/13/output1.htm#table1
[The full set of tables can be downloaded as a Word document from www.unc.edu/epid600/classes/2013a/modules/casestudies/13/BarrettsEsophagusTables.doc] (Although the analysis of a study like this would ordinarily use incidence rates based on person-years of follow-up, for simplicity we will analyze incidence proportions and assume that there was no loss to follow-up.)

The bottom row of the first table [MB(Moldy bread) by BE (Barrett’s esophagus)] shows that there were 205 cases of BE (incidence proportion 5.13%, shown in red) among the 4,000 members of the cohort. The right-most column shows that 770 of the total of 4,000 were exposed to moldy bread. The prevalence of exposure to moldy bread (19.25%) is shown in blue.

Among the 770 participants exposed to moldy bread, 73 (9.48%, in red) developed Barrett’s esophagus, whereas only 4.09% (in red) of the 3,230 participants not exposed to moldy bread developed BE. Thus, the relative risk (cumulative incidence ratio, CIR, also called incidence proportion ratio, IPR) was 2.32 (shown in red in the subtable headed “Common Relative Risk” - the term “Common” appears because the subtable is most often used to present the results of a stratified analysis that controls for other variables; we will have an example of that presently). (Note: these tables were created with the Statistical Analysis System [SAS], though the first two have been slightly reformatted. Should you be interested, you can view the SAS program used to create the dataset and to generate the analyses for this case study, at www.unc.edu/epid600/classes/2013a/modules/casestudies/13/sasprogram.htm ).

  1. a. What was the prevalence of exposure to pickled cabbage (PC, see table 2, labelled Pickled cabbage [by] Barrett’s esophagus)? (at www.unc.edu/epid600/classes/2013a/modules/casestudies/13/output1.htm#table2)

      b. What were the incidence proportions of BE for, respectively, persons exposed to and not exposed to PC?

      c. What was the relative risk (IPR) for BE in relation to exposure to PC?

  2. a. Calculate the prevalence of exposure to low vitamin C (LVC, see table 3) and locate that number in the SAS output (at www.unc.edu/epid600/classes/2013a/modules/casestudies/12/output1.htm#table3).

      b. Calculate the incidence proportions of BE for, respectively, persons exposed to and not exposed to LVC and locate those IPs in the SAS output.

      c. Calculate the relative risk (IPR) for BE in relation to exposure to LVC and locate that IPR in the SAS output. (Note: SAS uses double precision arithmetic, so you may see slight differences between your results and those in the SAS output.)


The conceptual model on which the study was based proposed that each of the three dietary risk factors was an independent contributor to esophageal cancer. However, as might be expected, the three dietary risk factors were associated with one another in the cohort, so the investigators were concerned about the possibility of confounding. In order to gain a better understanding of the possible relations among the three exposure variables, the investigators examined tables comparing the (crude) associations of the dietary factors with each other (see the three tables Moldy bread by Pickled cabbage, Pickled cabbage by Vitamin C, and Moldy bread by Vitamin C under the heading Associations among risk factors www.unc.edu/epid600/classes/2013a/modules/casestudies/13/output2.htm ).

  3. Determine whether or not moldy bread (MB) and pickled cabbage (PC) are associated and quantify the strength of their association. [Note: think about what measure(s) of association will be appropriate to quantify strength of association in this context.]

  4. a. Is there an association between PC and low vitamin C intake? Between MB and low vitamin C intake? Provide a relevant measure of the strength of each of these associations.

      b. What are the implications of this descriptive analysis for the presence of confounding?

  5. The primary strategies for avoiding or controlling confounding are restriction, balancing exposed and unexposed groups by randomization or matching, stratified analysis, and regression modeling? What are advantages and drawbacks of each method?


With stratified analysis we partition the data into subsets defined by the “covariables” and examine the exposure-outcome relation within each subset. Then we can calculate a measure of association that summarizes the associations found in the various subsets. Such an overall measure of association is usually some kind of weighted average of the stratum-specific measures.

For this case study we have partitioned the dataset into 4 subsets, so that we can control for two dichotomous covariables simultaneously. So, for example, we will examine the association between BE and MB, controlling for both PC and LVC: i) both PC and LVC, ii) PC present and LVC absent (sufficient), iii) PC absent and LVC present (low), and iv) both PC and LVC absent. These four data subsets are presented in the first 4 tables under the page title “Stratified analysis of associations with Barrett’s esophagus” www.unc.edu/epid600/classes/2013a/modules/casestudies/13/output3.htm The four tables are labeled “Table x of MB by BE, Controlling for PC= ... LVC=”, where x is 1,2,3, or 4 for the different combinations of levels of PC and LVC.

At the end of the four tables are summary estimates of the “Common Relative Risk” across the four tables. (SAS uses the term “common relative risk” to refer to a summary of the relative risks across all of the subtables. This summary is based on the assumption that the true relative risk is the same for each stratum, so that any differences across the strata arise only from random variation. A more general term is “adjusted relative risk”, which refers to any relative risk estimate that shows the relationship between the exposure and disease variables after removing the effects of one or more covariables. A standardized relative risk is an example of an adjusted relative risk that does not involve the assumption of uniformity of relative risks across the strata.)

Since we have cohort data and the disease category is column 1, we are interested in the rows labelled “Cohort (Col1 Risk)”, so the additional rows displayed by SAS have been deleted. “Mantel-Haenszel” and “Logit” refer to two different techniques for computing summary relative risks. In these data, where the numbers of observations are rather large, the two types of estimates are nearly identical. The Mantel-Haenszel estimate is often preferred when the data are sparse. We will use the Logit estimates.

  6. a. Examine the association between BE and MB in each of the four tables and then the common relative risk estimate. Describe what you see.

      b. Compare these results to the crude relative risk estimate. Is there confounding?

      c. Confounding arises from noncomparability of the exposed and unexposed groups, i.e., when the comparison group is not a good substitute for the counterfactual condition. In what way are the exposed (MB) and unexposed (not MB) groups not comparable?

Compare the joint distributions of the risk factors PC and LVC between those exposed to MB and those not exposed to MB by filling in the following table from the SAS output and commenting on the results.

`
PC LVC # MB % MB # No MB % No MB
Yes (1) Low (1)        
Yes (1) Sufficient (0)        
No (0) Low (1)        
No (0) Sufficient (0)        
Total Total   100%   100%

  7. a. Examine the stratified analyses for the associations between (a) BE and PC and (b) BE and LVC. Compare the common relative risks from the stratified analysis with the corresponding crude RR’s and interpret what you find.

      b. Which measure - the adjusted RR (from the stratified analysis) or the crude RR - is a better indicator of the independent association of the risk factor and BE? For example, if the association is causal, which RR could be used to estimate the benefit from eliminating one of the risk factors?


Although stratified analysis is a powerful and easy-to-understand method of controlling for potential confounding and obtaining adjusted estimates, it is somewhat cumbersome when there are several covariables and/or when one or more of these has multiple levels, rather than being dichotomous. In fact, since stratified analysis requires a table for each combination of covariables, controlling for 6 variables could easily require 64 tables to summarize! For these reasons it is common to construct a mathematical model that, given certain assumptions, provides a more efficient summary of the relations between each covariable and the outcome. For a cohort study estimating incidence proportion ratios, the preferred model is called "relative risk regression" (or "log binomial regression"). The analyst proposes the model structure, such as

  risk of BE = (baseline risk) x (RR from MB) x (RR from PC) x (RR from LVC)

A statistical procedure is then used to estimate what the RR’s would be given the data and this model structure.

Models for ratio measures are usually fit on the log scale, which makes the mathematics more tractable. So the procedure uses the algebraically equivalent model:

  ln(risk of BE) = ln(baseline risk) + ln(RR from MB) + ln(RR from PC) + ln(RR from LVC)

The results of the analysis are then converted back to the natural scale by exponentiation (by taking anti-logarithms)

  8. A mathematical model was fit using the SAS procedure GENMOD to estimate the relative risks for each of the three dietary risk factors while controlling for the other two. (See the SAS output Relative risks controlled using relative risk regression (www.unc.edu/epid600/classes/2013a/modules/casestudies/13/output4.htm) The relative risk estimates (on the log scale) are shown in the table called “Parameter Estimates” (www.unc.edu/epid600/classes/2013a/modules/casestudies/13/output4.htm#parameters). Find the estimated ln(RR)’s for MB, PC, and LVC, compute their antilogarithms (the EXP function in MS Excel), and compare the resulting adjusted RR’s to the adjusted ones from the stratified analysis.

  9. Since the adjusted RR controls for confounding, and the crude RR does not, does the crude RR have any value when confounding is present?


Postscript: There has been extensive research on the problem of esophageal cancer in Linxian Province since production of "The Cancer Detectives of Lin Xian", including two large intervention trials of multivitamin and mineral supplementation (Blot et al., 1993; Li et al., 1993). The larger of these trials observed reductions of 9% in all-cause mortality and 13% in cancer mortality in the group receiving supplementation with selenium, β-carotene, and vitamin E. Follow-up to these trials continues, to assess long-term effects of vitamin / mineral supplements. A recent analysis of the relation between serum selenium and esophageal cancer (Mark et al., 2000) found a 44% reduction in esophageal cancer risk (RR=0.56) among persons in the highest fourth of the distribution of serum selenium compared to the lowest fourth.

Blot WJ, Li JY, Taylor PR, et al. Nutrition intervention trials in Linxian, China: supplementation with specific vitamin / mineral combinations, cancer incidence, and disease specific mortality in the general population. J National Cancer Institute 1993;85(18):1483-1492.

Li JY, Taylor PR, Li B, et al. Nutrition intervention trials in Linxian, China: multiple vitamin / mineral supplementation, cancer incidence, and disease specific mortality among adults with esophageal dysplasia. J National Cancer Institute 1993;856(18):1492-1498.

Steven D. Mark, You-Lin Qiao, Sanford M. Dawsey, Yan-Ping Wu, Hormuzd Katki, Elaine W. Gunter, Joseph F. Fraumeni, Jr., William J. Blot, Zhi-Wei Dong, Phillip R. Taylor. Prospective Study of Serum Selenium Levels and Incident Esophageal and Gastric Cancers. J National Cancer Institute, 2000;92(21):1753-1763.


 

11/4/2008vs, 7/22/2010vs, 3/15/2011