University of North Carolina at Chapel Hill
                         School of Public Health
                        Department of Epidemiology

                   Fundamentals of Epidemiology (EPID 168)

                        Final Examination, Fall 1997
                               Answer Guide

 1.  C.  Analytic study of data collected to investigate the 
         hypothesized relationship 

 2.      a. A finding from a migrant study or studies:  "Studies of 
            migrants provide some evidence; for example, migrants to the
            United States from Japan experienced a rate of breast cancer 
            intermediate between the lower rate in Japan and the higher
            rate in the U.S." 

         b.  A finding from descriptive epidemiology: *Many possibilities,
             including either of these sentences:  
            "This finding implies a possible connection between the 
             trend toward increasing bottlefeeding in the postwar 
             period and current trends toward increasing incidence of 
             breast cancer.  Furthermore, it offers a partial 
             explanation of the international variation in breast 
             cancer rates, with rates considerably lower in less 
             developed than in developed nations." 

         c.  An association from an ecologic study: *"Micozzi found mean
             adult height and breast cancer incidence in 30 countries to be
             highly correlated (r=0.8)."
 3.  B.  Age is causally related to breast cancer risk and infant feeding
         practices have changed over time.

 4.  D.  Common exposure, rare endemic disease.

 5.  B. Secular changes in infant feeding practices result in an association
        between age and exposure to breastmilk. 

 6.  A. selecting from a pool of prevalent cases would make separation of
        factors associated with risk and those with survival more difficult. 

 7.  a.  Primary -- Primary breast cancer is a tumor that originates in the
         breast, rather than a tumor in the breast that is the result of 
         metastasis from a tumor that originated in another location or 
         tissue.  In general, tumors originating in the same organ and 
         tissue are more likely to have similar etiologies than are 
         tumors that originate in different organs.

     b.  Histologically-confirmed -- histological confirmation 
         refers to the verification of the diagnosis (of breast cancer) 
         through laboratory examination of tumor tissue.  Microscopic 
         examination of tumor cells establishes the existence and type 
         of tumor with a greater degree of certainty than does a 
         clinical diagnosis alone.  Counting only histological-confirmed 
         cases reduces the potential for false positive breast cancer 
         diagnoses and the misclassification bias will cause.

 8.  B.  The random selection of controls from the community provides a
         better estimate of breastmilk exposure among the source population. 

 9.  A.  Kappa coefficient 

10.  Table:
           Biomarker validation of women's self-report of having been breastfed

                                Breastfeeding biomarker found

                                        Yes     No      Total
           S  r    --------------------------------------------
           e  e    Breastfed            70      26       96
           l  p       
           f  o    Not breastfed        80      28      108
              r    --------------------------------------------
              t       Total            150      54      204

     Derivation:  204 cases tested (overall total), 73.5% (=150) have 
     the marker (so 54=204-150 do not), 80 are false negatives by 
     self-report (so 80 = "yes" biomarker, "no" self-report), and the 
     remaining cells and marginals are obtained from these numbers.

     a.  Sensitivity = 70 / 150 = 47%  (Answers the question, "Of women 
         who truly were breastfred, as demonstrated by the presence of the 
         biomarker for having been breastfed, what % were correctly 
         classified by self-report?"))

     b.  Specificity = 28 / 54 = 52%  (Answers the question, "Of women 
         who were not breastfed, as demonstrated by the absence of the 
         biomarker, what % were correctly classified by self-report?")

     c.  Positive predictive value (PPV) = 70 / 96 = 73%  (Answers the 
         question, "Of women classified, on the basis of their self-report, 
         as 'having been breastfed', what % were correctly classified?")

     d.  Negative predictive value (NPV) = 28 / 108 = 26%  (Answers the 
         question, "Of women classified, on the basis of their self-report, 
         as 'not having been breastfed', what % were correctly classified?")

11.  a. Table:
             Adult breast cancer by having been breastfed as an infant,
            among premenopausal women with education beyond high school

                               Case    Control Total
             Breastfed          61      93      154

             Not breastfed      69      61      130
             Total             130     154      284

         OR = (61 x 61) / (93 x 69) = 0.58.

         Interpretation:  having been breastfed appears to be protective
         against female adult breast cancer, with a reduction in risk of
         approximately 40%.

b.      Table:

             Adult breast cancer by having been breastfed as an infant,
             among premenopausal women with education beyond high school,
             assuming that 20% of controls who reported having been
             breastfed had in fact not been

                                  Cases   Controls  Total
                Breastfed           61       74      135

                Not breastfed       69       80      149
                Total              130      154      284

         Derivation:  20% of the 93 controls who reported having been 
         breastfed had not been, so 20% of 93 (=18.6->19) are switched from 
         "Breastfed" to "Not breastfed", being added to the 61 who reported 
         not having been breastfed.  The remaining 80% of 93 (=74.4->74) 
         remain in the upper row.

         OR = (61 x 80) / (74 x 69) = 1.0, i.e.  no association.

     c.  B. differential misclassification of exposure

12.     TRUE or FALSE

         a.  False - matching controls to cases does not prevent the 
         matching variable (age) from being associated with the exposure 
         (having been breastfed), so the matching cannot prevent 
         confounding.  (See also d. and e.)

         b.  True - The nurse telephoned hospitals on a frequent, regular 
         basis, to identify all breast cancer cases.

         c.  False - The difference in the proportions interviewed among 
         cases and among controls provides a great deal of potential for 
         selection bias, but if nonparticipation was not related to having 
         been breastfed then selection bias will not occur.

         d.  False - The matching caused cases and controls to have the same 
         age distribution, so it did "work"; matching would not be expected 
         to eliminate an association between age and the exposure, since 
         exposure status was not known when controls were being selected and 
         in any case would not have been used in the matching procedure.

         e.  False - The matching procedure prevented an association.

         f.  False - The association between body mass index and breast 
         cancer can be assessed by estimating odds ratios from Table 2.  To 
         avoid confounding infant feeding history we should preferably 
         assess the association separately in breastfed women and in women 
         who have not been breastfed (omitting the complexities from 
         considering body mass to be an intervening variable in the effect 
         of infant feeding history).  To avoid being misled by a possible 
         "synergism" involving infant feeding and body mass, ideally we 
         would look in the "unexposed" group.  However, although this study 
         focuses on breastfeeding, one can also consider "formula feeding" 
         as an exposure that might be "synergistic" with body mass.  So we 
         can choose either exposure group (or both).

         Here are the computations:

         From Table 2:
                                 Cases                       Controls     
                        -------------------------    -------------------------
                        Breastfed   Not breastfed    Breastfed   Not breastfed
         Body mass      ---------- --------------    ---------   -------------
         index (kg/mz)

           16-22            48          15               89           19
           23-27           103          26              125           16
            >27             90          17               91           16

         To show the details, here is a table for estimating OR's for body mass index and breast cancer:

                              Breastfed        Not breastfed        Total
         Body mass          ---------------   ---------------   ---------------
         index (kg/m sq)    Cases  Controls   Cases  Controls   Cases  Controls
           16-22              48      89        15      19        63     108
           23-27             103     125        26      16       129     141
            >27               90      91        17      16       107     107

         and the resulting OR's are [e.g., (90 * 89) / (48 * 91) = 1.83]:

                              Breastfed        Not breastfed        Total
         Body mass            ---------        -------------       ---------
         index (kg/m sq)
           16-22 (ref. level)    1.0               1.0                1.0
           23-27                1.83               2.06               1.57
            >27                 1.83               1.34               1.71

         The OR's in the total column are shown to illustrate that in this 
         case there is some confounding by breastfeeding history, at body 
         mass index level 23-27 kg/m sq.  Within either breastfed or not 
         breastfed group there is no "dose-response" relationship.

     g.  True - Generally, generally an outbreak investigation begins 
         after the outbreak has begun and the investigation seeks to 
         determine what characteristics of cases might have been responsible 
         for their disease.  If the cases happened to be part of an existing 
         cohort for which the requisite exposure information was already 
         available in some form, then a retrospective cohort study would be 
         another possibility.  If cases are still occurring a prospective 
         cohort study might be initiated, but the better an idea the 
         investigators have about which exposures to assess, the more they 
         should intervene to minimize the occurrence of additional cases.

     h.  False - for a factor to be considered a confounder, it must be 
         an independent risk factor for the outcome, but this requirement 
         does not pertain to effect modification.  For example, genital 
         ulcers cannot cause HIV by themselves, but in conjunction with a 
         sex partner who is HIV infected, genital ulcers can increase 
         (modify) the risk of HIV infection.

13.  Potential confounders are factors that are known or suspected risk
     factors for breast cancer or its detection, or at least proxies for
     such factors.

14.  a.  Breast cancer risk and no previous pregnancies

                                 Cases      Controls     Total 
           No pregnancies          50          38          88

           >= 3 pregnancies       167         216         383
           Total                  217         254         471

         OR = (50 x 216) / (38 x 167) = 1.7 (for zero vs. >= 3 pregnancies)

         Interpretation:  having never been pregnant was associated with an 
         increased breast cancer rate, with an apparent 70% greater rate 
         among  nulligravidae (women who have never been pregnant).

         Other choices of a reference level produce the same result, e.g.,

                 1-2 pregnancies as the reference level:

                        OR = (50 x 102) / (38 x 82) = 1.6.

         If both groups, 1-2 pregnancies and 3+ pregnancies are combined
         and used as the reference group, then:

                        OR = (50 x 318) / (38 x 249) = 1.7 

     b.  Height above 165 centimeters and having been breastfed
                      Height    > 165 cm    < 160 cm       Total 
           Breastfed              148         183           331

           Not breastfed           41          25            66
           Total                  189         208           397

         OR = (148 x 25) / (183 x 41) = 0.49.

         Interpretation:   Women who were breastfed were less likely
         to be over 165 cm. tall.

         Other possible OR's --

            > 165 vs. 160-165:  OR = (148 x 43) / (213 x 41) = 0.73

            > 165 vs. all others:  OR = (148 x 68) / (396 x 41) = 0.62

     c.  Breast cancer and having been breastfed (crude)

                                 Cases      Controls       Total 
           Breastfed              241         305           546

           Not breastfed           58          51           109
           Total                  299         356           655

         OR = (241 x 51) / (305 x 58) = 0.69 

         Interpretation:  having been breastfed was associated with lower
         risk of breast cancer

15.  D.  The statement refers to the (relative) risk of breast cancer 
         between women who were and were not breastfed, estimated using
         the odds ratio.

16.  a.  Estimate RR for Not breastfed as 1/OR for Breastfed:  1 / 0.69 = 1.45

            ARP  =  (RR - 1) / RR  =  (1.45 - 1) / 1.45  =  0.45/1.45  =  0.31

         Interpretation:  Some 31% of breast cancer in women who were not
         breastfed was attributable to their having not been breastfed.

         b.  If know the formula (or can derive it from the diagram and the
             "grand synthesis"):

                      P(E|D) (RR-1) 
            PARP  =  ---------------  and since breast cancer is rare, use OR.

                          -----------  (1.47-1)
                           (117+112)                  (0.51) (0.47)
         Premenopausal:  -----------------------  =  ---------------  =  0.16
                                  1.47                     1.47

                          -------------- (1.45-1)
                             (58+241)                   (0.19) (0.45)
         Postmenopausal: -------------------------  =  ---------------  =  0.06
                                                             1.45                                  1.45

         Meaning:  In women who wre not breastfed, some 16% of premenopausal 
         breast cancer and some 6% of postmenopausal breast cancer were 
         attributable to their having not been breastfed.

         OR, reason as follows:

           Proportion of exposed (Not breastfed) cases that are atttributable to not having been breastfed is:
                   ARP = (RR-1)/RR
           Since breast cancer is rare, we can estimate with
                (OR-1)/OR  =  (1.47-1) / 1.47  =  0.3197 for postmenopausal.

         However, this proportion applies only to cases who are exposed
         (because ARP is "proportion of exposed cases . . .").  So estimate
         proportion of all cases who are exposed:

           =  Pr(Exposed|Case) = 117 / (117+112) = 0.51 for postmenopausal

           Muliplying 1. by 2., 0.51 x 0.3197 = 16% for postmenopausal

     c.  The PARP for premenopausal breast cancer is expected to be 
         greater due to the secular decrease in breastfeeding during the 
         decades when these women were infants.  Thus, the proportion 
         exposed to not having been breastfed is substantially greater for 
         the premenopausal breast cancer cases.  Hence, their PARP is 

17.  Logistic model coefficients for risk factor variables are natural
     logarithms of odds ratios per one unit change in the variable. 
     So the coefficient was ln(0.70) = -0.3567

     a.  True - The odds of breast cancer vary as the product of the odds
         for age and the odds for education.

     b.  False - Only in a few special cases will the product of two odds 
         equal their sum (e.g., both odds equal zero or both odds equal two).
         The logistic model is additive in the logit (logarithm of odds), 
         multiplicative in the odds.

     c.  False - One of the reasons for using mathematical modeling is 
         that the risk factors (exposures and potential confounders) ARE 
         associated (i.e., not independently distributed)

     d.  True - Breast cancer is a rare disease.

18.  C.  The observed relative risk would be biased toward the null.

19.  Smaller sample sizes produce wider confidence intervals, so if the 
     point estimates for the crude and stratum-specific measures are about 
     the same, then the confidence intervals for the latter will be wider.

                          AGE < 60          AGE > 60           TOTAL 
                       Breast  Bottle    Breast  Bottle    Breast  Bottle
                       ------  ------    ------  ------    ------  ------
        Cases            24      40       256     100       280    140

        Controls         79      86       204      54       280    140
        OR                 0.653             0.678             1.0

     a.  Control women in older stratum are more likely to have been 
         breastfed than control women in the younger stratum, e.g., odds of 
         having been breastfed are 0.9 (79/86) among younger women and 3.8 
         for AGE > 60.

     b.  Age is a strong risk factor for breast cancer, so if breastfed 
         women were older than bottle-fed women, than a possible protective 
         effect of breastfeeding could have been offset by the greater risk 
         associated with older age.

     21.  An epidemiology graduate student finds evidence in the literature 
     that childhood sunlight exposure may affect adult breast cancer risk. 
     To explore this hypothesis, she obtains from the authors the place of 
     birth for all of the subjects in the present study and constructs a 
     sunlight exposure variable ("high" or "low") based on geologic and 
     meteorologic data for the years of the subject=B9s childhood.  Her data 
     show that 56.2% of the 219 premenopausal women who were NOT breastfed 
     as infants grew up with "high" sunlight exposure.  Based on this fact 
     and the partially-completed tables below, (a) calculate the odds ratio 
     of breast cancer with respect to breastmilk exposure within each of the 
     two sunlight exposure strata, and (b) briefly describe the relationship 
     of the sunlight exposure variable to the association between breast 
     cancer and breastmilk exposure (i.e. in relation to confounding and 
     effect modification.  (4 pts)
        High Sunlight            Cases      Controls     Total
           Breastfed Yes           44          24          68
           Breastfed No            81         *42         123
           Total                  125          66         191                        

        Low Sunlight             Cases      Controls     Total 
           Breastfed Yes           67        *120         187
           Breastfed No            36         *61          97
           Total                  103         181         284

     *  crude from Table 1 or Table 3 = 0.68
        High sunlight OR = (44x42)/(24x81) = 0.95
        Low sunlight OR = (67x61)/(120x36) = 0.95.
        Sunlight is a confounder of the protective effect of breastfeeding
        as an infant.  It is not an effect modifier.

22.  Use the data from Table 2 (Distribution of Characteristics of 
     Postmenopausal Cases and Controls) to draw separate 2 x 2 tables for 
     women who have had: a. 0 pregnancies, b. 1-2 pregnancies, c. >=3 
     pregnancies.  Be sure to include appropriate labels. (5 pts)

                      0 pregnancies      1-2 pregnancies   3 pregnancies 
                     Cases  Controls    Cases  Controls    Cases  Controls
        Breast         34      35         71      90        136     180
        Bottle         16       3         11      12         31      36
        Total          50      38         82     102        167     216

     a)  Calculate odds ratios for each of these three categories.
            0  pregnancies: OR  =  (34 x 3) / (16 x 35)  =  0.18   
           1-2 pregnancies: OR  =  (71 x 12) / (11 x 90)  =  0.86   
           >=3 pregnancies: OR  =  (136 x 36) / (31 x 180)  =  0.88

     b)  Assuming no effects of confounding, interpret your findings in
         part (a).
         There is effect modification.  The magnitude of the protective
         effect of having been breast-fed on development of breast cancer
         is dependent on pregnancy history.  Having been breast-fed is a 
         stronger protective factor for those women who never had a pregnancy.

23.  A hypothetical cross-sectional ancillary study to this report was
     conducted.  In that study a survey of breast cancer annual incidence 
     rates in geographically distinct areas was completed,  Region A in the 
     upper midwest where breast cancer mortality is high, and Region B the 
     Southeast where mortality from breast cancer is low.  The following 
     data were obtained.

                      Region A                           Region B
             Cases  Population  Rate/1000       Cases  Population  Rate/1000 
     < High School Education
      40-50    10     7,000        1.4            10    15,000        0.7 
      51-60    15    10,000        1.5            20     5,000        4.0 
      61-65    30     3,000       10             600    55,000       10.9

      Total    55    20,000                      630    75,000
      High School Education 
      40-50     5     1,000        5.0             6     2,000        3.0
      51-60     5     2,000        2.5            10    15,000        0.7
      61-65     4       500        8.0             4     1,000        4.0 

      Total    14     3,500                       20    18,000

 Grand Total   69    23,500                      650    93,000

                               Crude   2.9

         a.  Compute the overall Region B crude event rate: (1 pt)  =  7.0/1000

         Using the total population as a standard compute the following by the
         direct method of adjustment: 

         b.  Age and educational achievement adjusted rate for Region A (2 pts)
                 =  6.0/1000 
         c.  Age and educational achievement adjusted rate for Region B (2 pts)
                 =  6.3/1000 
         d.  Comparison of the overall crude rates with the age and educational
             achievement adjusted rates. 

         Briefly explain your findings.  (2 pts):  Much of the difference
         between the crude rates of the two regions is due to the different
         distributions of age and educational achievement.

24.  Causal relationship -  Comment specifically on at least two of Bradford 
     Hill's criteria for causal inference.  Include in your comments data or 
     statements from the article. (5 pts)

25.  Assuming that this relationship is causal, why might a similar study, 
     50 years from now, fail to find as strong a relationship? (2 pts)

     Formula changes (less fat), overfeeding reduced reflecting recent trends.


Schoenbach, \ epid168 \ exams 1997 Final exam - answer guide;
12/10/1998, 12/12/1998