University of North Carolina at Chapel Hill
School of Public Health, Department of Epidemiology

Epidemiology 168, Fall 1998

Midterm Exam Answer Guide

  1. a. Manifestational criteria: disease definition and classification based on observable characteristics, such as symptoms, signs, history, labloratory findings, response to treatment, prognosis.
  2. Causal criteria: disease definition and classification based on the cause of the condition,

    b. Manifestational criteria: Examples are cancers, arthritis, cholescystitis, schizophrenia, depression, addiction, insomnia, . . .

    Causal criteria : microbial diseases for which the pathogen has been identified (syphilis, TB, malaria, yellow fever, influenza, etc.), lead poisoning, birth trauma,

  3. (C)- Other choices are incorrect because controls in case-cohort studies are not matched to cases (A), contrrols are selected at random with both designs (B), and cases must be selected without regard to exposure (D).
  4. New cases or events, population at risk or source population, passage of time
  5. The size of the population may have grown (number increases even though rate does not); the age distribution of the population may have changed (e.g., influx of families with small children, outmigration of families with older children), so that age-standardized rate may not change but a greater proportion of the population may be in the higher risk age range (assuming that younger children have higher injury rates).
  6. (D)- All of the above - use of prevalent cases requires that duration is not related to exposure, controls should provide estimate of exposure in study base, and rare disease assumption is required for OR to estimate RR (though not for OR to estimate IDR).
  7. (B)- In a prospective cohort study, information on exposure is obtained before the outcome (breast cancer, in this case) has occurred. Therefore recall bias - different recall by cases and noncases - is not an issue. In a case-control study, cases and noncases may recall and report exposure with different degrees of accuracy.
  8. a. A (retrospective) cohort study.

    b. CIR = (290/2,842) / (983/3,961) = 0.411
    A cumulative measure ignores possible differences in length of follow-up between groups being compared. A crude measure ignores possible differences in the age distributions between men who have been exposed and men who have not.

    c. SMRs are an indirect method of standardization, since they are based on weighted averages for which the weights are taken from the population whose SMR is being computed rather than from a "standard" population. Unless the age (and in this case, age-calendar year interval) distributions for the populations whose SMR's are being computed are the same, then the weighted averages that make up the SMR's are based on different sets of weights and are not strictly comparable. Since age-interval distributions of exposed and unexposed workers may differ, their SMR's are not strictly comparable.

    d. Mortality rates computed with person-time denominators can be compared between exposed and unexposed person-time. These will take into account the varying amounts of follow-up for workers in different categories. Unless the person-years at risk for exposed and unexposed workers have the same age distribution, which we do not know, then adjustment for age is needed. Since there are ample numbers of deaths from any cause, mortality rates can be directly-standardized using any reasonable set of weights. Since directly-standardized rates are "strictly comparable", a ratio or difference of directly standardized rates would be a suitable measure of association.
  9. All but 85 of the 325 code 434's were correct classifications, so there were 240 (=325-85) ischemic stroke patients correctly classified by discharge code. All but 20 of the patients without code 434 were judged to have had an ischemic stroke, meaning that 20 were judged to have an ischemic stroke. Thus, there were 260 (240+20) ischemic stroke patients, of whom 240 were identified by discharge code (sensitivity=240/260). The remaining 265 (=525-260) patients did not have an ischemic stroke, and 180 of them were in fact not given a code 434 (specificity=180/265). Of the 325 code 434's, 240 had had an ischemic stroke (PPV=240/325). These data are summarized in the following table:
  10. Comparison of discharge code 434 and classification by expert panel

    Expert panel

    Discharge code


    Not ischemic


    Code 434












    a. Sensitivity= (325-85) / [(325-85+20) = 240 / 260 = 92.3%  

    b. Specificity = (200-20) / (525-260) = 180 / 265 = 68%

    c. Positive predictive value of a 434 code = (325-85) / 325 = 73.8%

    d. An ROC curve plots the value of sensitivity and specificity for each case definition or cutpoint. Examining the ROC curve shows the trade-off between sensitivity and specificity that is available for the diagnostic test or measurement method. [The area between the identity diagonal (slope = 1.0) and the ROC curve serves as a measure of accuracy that takes into account both sensitivity and specificity, with the assumption that the costs of false negatives and false positives are the same.]

    e. (B) - Due to the low specificity (50%), half of hemmorhagic strokes in the patient group will be classified as ischemic strokes.

    f. Specificity and prevalence of the condition

  11. a. Corona del Mar has a 2.9 times higher crude accident rate than Boulder.

    Corona del Mar = 51.1/1000 and Boulder = 17.6/1000. Ratio = 2.9

    b. Adjusted rates -
  12. Corona del Mar: (4579 x .0654) + (1274 x .0277) + (9399 x .0136)/15,252 = 29.9/1000

    Boulder: (4579 x .0200) + (1274 x .0200) + (9399 x .0178)/15,252 = 18.6/1000

    The cell phone/pager adjusted auto accident rate for Corona del Mar was 1.6 times that of Boulder. A portion of the difference seen in the crude rates was due to differences in the distribution of use of cell phones and pagers between the two cities.

    The standard weights are the sum of the population sizes for the two cities. The weighted rates are the rates for each city, weighted (multiplied) by the standard weights. The total of the weighted rates is the directly standardized rate. A problem in using the directly standardized rates is that there are small numbers of cellular phone and pager users in Boulder.

    The higher crude rate in Corona del Mar reflects the much higher use of cellular phones and pagers, which is associated with a much higher accident rate. The difference is reduced for the standardized rates, since these control for the different distributions of cellular phones and pagers between the two cities. However, this is a situation where it is essential to examine the specific rates, since Boulder has lower accident rates among cellular phone and pager users but a higher rate among never-users.

    Since the rates in never users are quite similar, Corona del Mar is likely to make its greatest impact on accident rates by getting motorists to reduce cellular phone and pager use while driving or finding some way to such use safer (promote the use of "designated drivers"!?).

    c.(A) Both measures obscure heterogeneity (variation) in rates across subgroups.

  13. (A) Community intervention trials of this type assign groups to treatments and collect measurements from individuals. The unit of analysis must be the same as the unit of assignment (GROUP) or both (i.e., using mixed models).
  14. a. T – a cohort study enrolls people who are free of the outcome and monitors them for the development of the outcome, so the cohort design can be used to estimate risk of the event;
  15. b. Not sure – the temporal sequence of exposure and disease can typically not be addressed in a case-control study, though in some cases (e.g., a genetic characteristic or other "exposure" that can be definitively assigned to a time prior to disease onset);

    c. F – a cohort design can readily be used to study multiple outcomes; a case-control design can readily be used to study multiple exposures;

    d. T – a randomized clinical trial often enrolls participants over a period of time, with follow-up time measured from the time of randomization;

    e. T – a cohort study begins with disease-free subjects and monitors them for development of the outcome; if the outcome is rare, many subjects must be followed to obtain an adequate number of cases;

    f. F – ecological studies use group-level variables (e.g., per capita meat consumption) and relate them to disease rates; direct assessment at the individual level is NOT made, which is the basis for the ecological fallacy (where the group data are used to infer a link at the individual level);

    g. T – correlational studies (another term for ecological studies) are often used to compare disease rates across geopolitical entities using available data;

    h. F – a case report does not involve a control group;

    i. F – cross-sectional studies measure prevalence, not risk (of a future event); they are the most statistically generalizable type of study when, as is often the case, the study population is obtained through population-sampling;

    j. F – the natural history of a disease is the process by which it develops over time; descriptive information relating to person, place, and time can at best provide only indirect information;

    k. F – as used in class, the term "attributable risk" refers to the risk difference;

    l. F – strength of association as used in epidemiology refers to the degree of change in the one variable with respect to changes in the other variable; two variables can be very strongly correlated (vary linearly or motonically) yet a large change in one may be associated with only a small change in the other (e.g., a straight line with a modest slope has a high correlation but a small degree of change in the ordinate variable for a given change in the variable on the abscissa);

    m. T – for a rare outcome, the odds ratio (OR) closely approximates the cumulative incidence ratio (CIR) and incidence density ratio (IDR), so it indicates strength of association in the epidemiologic sense; when the outcome is not rare, the OR does not approximate but does vary with the CIR and IDR, so the OR still gives an indication strength of association

    n. T – an attributable risk proportion estimates the proportion of risk that is associated with an exposure in people who are exposed; attributable risk (as used in this course) is the risk difference, which indicates the amount of risk associated with an exposure in people who are exposed; attributable risk must be adjusted for the prevalence of the exposure in order to estimate the amount of risk associated with exposure in the population as a whole;

    o. F – since case-control studies begin with people who are already cases, they avoid having to study a large number of people for a long time in order to accumulate enough cases; they can also compare cases and controls in respect to many exposures; HOWEVER, they cannot readily study many outcomes, since to do so requires enrolling cases for each of the outcomes to be studied (i.e., equivalent to conducting several case-control studies that share the same control group);

    p. F – incidence density is a (relative) rate; cumulative incidence is a proportion;

    q. F – incidence density and cumulative incidence are measures of frequency of occurrence, not of strength of associatiion;

    r. F – comparability of standardized rates and ratios across study populations requires that the standardized measures be constructed using the same set of weights; indirect standardization (e.g., via a SMR) employs the weights (the number of people in each stratum) from the study population, so measures standardized using this method are, strictly speaking, useful only for comparing a study population with the standard population used in the standardization;

    s. F – typically, general population controls will be less motivated than cases and sources of medical information for them will not be comparable to those for cases.


  16. a. ARP = (I1 - I0) / I1 = (RR-1) / RR = (1.34-1.04) / 1.34 = 0.30 / 1.34 = 22% (after rounding)
  17. The "I can't remember formulas" method:
            ARP = attributable cases / all exposed cases = attributable cases / 135
            Attributable cases = attributable risk * Exposed PY = (1.34-1.04)*100,800 = 30.24
            ARP = 30/135 = 22% (after rounding)

    Interpretation: Based on these data, 22% (about one in five) strokes in people who are physically inactive can be attributed to their physical inactivity; in other words, if physically inactive people became active early enough in their lives, their stroke incidence would decrease by 22%

    b. A key point here is that 27% is the prevalence of physically active people, whereas the exposure is physical inactivity, whose prevalence is therefore 100% - 27% = 73%

            PARP = p1(RR-1) / [1 + p1(RR-1)] = 0.73(1.286-1) / [1 + 0.73(1.286-1)]

            = (0.73 x 0.286) / (1 + 0.73 x 0.286) = 0.209 / 1.209 = 17%

    (The formula PARP = (I - I0) / I can also be used by first estimating the crude population incidence, I, as a weighted average of the incidences in exposed and unexposed, weighting by the prevalence of exposure, e.g.: I = (0.73)(1.34) + (0.27)(1.04) = 1.26, so PARP = (1.259 - 1.04) / 1.259 = 17%

     The "I can't remember formulas" method:

            PARP = Attributable cases / All cases

    Attributable cases are (1.34-1.04) x number of exposed person-years. Since we do not know the population size, represent it by n. Based on the NHANES data, 27% of people are physically active, so there are 0.73n physically inactive people (in one year, 0.73 person-years). So: Attributable cases = (1.34-1.04)(0.73) = 0.219.

    All cases are exposed cases + unexposed cases. Since we do not know the population size, let it be represented by n. Based on the prevalence of physically active people, there are 0.73n phyisically inactive and 0.27n physically active people (or person-years, if we assume a one-year period). So the total number of cases = exposed cases + unexposed cases = 0.73(1.34) + 0.27(1.04) = 1.259

     Therefore, PARP = 0.219/1.259 = 17%

    Note that these measures can be computed more precisely by using the original number of cases and person-years and not rounding intermediate results, but two significant figures is adequate for the actual result, and in this case the answer does not change.

    Explanation: Seventeen percent of all strokes in the population are attributable to physical inactivity; if everyone were physically active, there would be 17% fewer strokes.

    c. Attributable risk measures assume that the relationship is causal (i.e., that physical inactivity does in fact cause an ncrease stroke risk). Some of the above interpretations may also require that the process be reversible, so that changing to a physically active lifestyle brings risk down to the level of someone who was not inactive. Another assumption is that the rates and rate ratio observed in the cohort study hold ofr the entire population. Also, we have ignored the effects of other factors, most notably age.

  18. a. This is a retrospective cohort study (researchers developed the hypothesis in 1998).

b. High error profile: (2 + 5 + 6 + 5)/8021 = 2.24 per 1,000 women-years.

    Low error profile: (1+3+4) / 12,287 = 0.651 per 1,000 wy

    Women-years (WY) are computed as follows:






















































c. IDR= ID High / ID low = 2.24/0.651 = 3.4. Nuns with a high error communications profile are 3.4 times more likely to die from Alzheimer's Disease than nuns with a low error profile.



       Alzheimer’s Disease       

Handwriting Profile

AD Yes


High error



Low error



odds ratio = (18) (192)]/[(8) (132)] = 3.27

 e. The two are similar because the condition is fairly rare.


Back to the top To list of examinations To EPID168 home page



10/5/1999, 10/6/1999, 10/7/1999, 8/4/2000vs, 10/15/2000