University of North Carolina at Chapel Hill
School of Public Health, Department of Epidemiology
Epidemiology 168, Fall 1998
Midterm Exam Answer Guide
Causal criteria: disease definition and classification based on the cause of the condition,
b. Manifestational criteria: Examples are cancers, arthritis, cholescystitis, schizophrenia, depression, addiction, insomnia, . . .
Causal criteria : microbial diseases for which the pathogen has been identified (syphilis, TB, malaria, yellow fever, influenza, etc.), lead poisoning, birth trauma,
Comparison of discharge code 434 and classification by expert panel 

Expert panel 

Discharge code 
Ischemic 
Not ischemic 
Total 
Code 434 
240 
85 
325 
Other 
20 
180 
200 
Total 
260 
265 
525 
a. Sensitivity= (32585) / [(32585+20) = 240 / 260 = 92.3%
b. Specificity = (20020) / (525260) = 180 / 265 = 68%
c. Positive predictive value of a 434 code = (32585) / 325 = 73.8%
d. An ROC curve plots the value of sensitivity and specificity for each case definition or cutpoint. Examining the ROC curve shows the tradeoff between sensitivity and specificity that is available for the diagnostic test or measurement method. [The area between the identity diagonal (slope = 1.0) and the ROC curve serves as a measure of accuracy that takes into account both sensitivity and specificity, with the assumption that the costs of false negatives and false positives are the same.]
e. (B)  Due to the low specificity (50%), half of hemmorhagic strokes in the patient group will be classified as ischemic strokes.
f. Specificity and prevalence of the condition
Corona del Mar: (4579 x .0654) + (1274 x .0277) + (9399 x .0136)/15,252 = 29.9/1000
Boulder: (4579 x .0200) + (1274 x .0200) + (9399 x .0178)/15,252 = 18.6/1000
The cell phone/pager adjusted auto accident rate for Corona del Mar was 1.6 times that of Boulder. A portion of the difference seen in the crude rates was due to differences in the distribution of use of cell phones and pagers between the two cities.
The standard weights are the sum of the population sizes for the two cities. The weighted rates are the rates for each city, weighted (multiplied) by the standard weights. The total of the weighted rates is the directly standardized rate. A problem in using the directly standardized rates is that there are small numbers of cellular phone and pager users in Boulder.
The higher crude rate in Corona del Mar reflects the much higher use of cellular phones and pagers, which is associated with a much higher accident rate. The difference is reduced for the standardized rates, since these control for the different distributions of cellular phones and pagers between the two cities. However, this is a situation where it is essential to examine the specific rates, since Boulder has lower accident rates among cellular phone and pager users but a higher rate among neverusers.
Since the rates in never users are quite similar, Corona del Mar is likely to make its greatest impact on accident rates by getting motorists to reduce cellular phone and pager use while driving or finding some way to such use safer (promote the use of "designated drivers"!?).
c.(A) Both measures obscure heterogeneity (variation) in rates across subgroups.
b. Not sure the temporal sequence of exposure and disease can typically not be addressed in a casecontrol study, though in some cases (e.g., a genetic characteristic or other "exposure" that can be definitively assigned to a time prior to disease onset);
c. F a cohort design can readily be used to study multiple outcomes; a casecontrol design can readily be used to study multiple exposures;
d. T a randomized clinical trial often enrolls participants over a period of time, with followup time measured from the time of randomization;
e. T a cohort study begins with diseasefree subjects and monitors them for development of the outcome; if the outcome is rare, many subjects must be followed to obtain an adequate number of cases;
f. F ecological studies use grouplevel variables (e.g., per capita meat consumption) and relate them to disease rates; direct assessment at the individual level is NOT made, which is the basis for the ecological fallacy (where the group data are used to infer a link at the individual level);
g. T correlational studies (another term for ecological studies) are often used to compare disease rates across geopolitical entities using available data;
h. F a case report does not involve a control group;
i. F crosssectional studies measure prevalence, not risk (of a future event); they are the most statistically generalizable type of study when, as is often the case, the study population is obtained through populationsampling;
j. F the natural history of a disease is the process by which it develops over time; descriptive information relating to person, place, and time can at best provide only indirect information;
k. F as used in class, the term "attributable risk" refers to the risk difference;
l. F strength of association as used in epidemiology refers to the degree of change in the one variable with respect to changes in the other variable; two variables can be very strongly correlated (vary linearly or motonically) yet a large change in one may be associated with only a small change in the other (e.g., a straight line with a modest slope has a high correlation but a small degree of change in the ordinate variable for a given change in the variable on the abscissa);
m. T for a rare outcome, the odds ratio (OR) closely approximates the cumulative incidence ratio (CIR) and incidence density ratio (IDR), so it indicates strength of association in the epidemiologic sense; when the outcome is not rare, the OR does not approximate but does vary with the CIR and IDR, so the OR still gives an indication strength of association
n. T an attributable risk proportion estimates the proportion of risk that is associated with an exposure in people who are exposed; attributable risk (as used in this course) is the risk difference, which indicates the amount of risk associated with an exposure in people who are exposed; attributable risk must be adjusted for the prevalence of the exposure in order to estimate the amount of risk associated with exposure in the population as a whole;
o. F since casecontrol studies begin with people who are already cases, they avoid having to study a large number of people for a long time in order to accumulate enough cases; they can also compare cases and controls in respect to many exposures; HOWEVER, they cannot readily study many outcomes, since to do so requires enrolling cases for each of the outcomes to be studied (i.e., equivalent to conducting several casecontrol studies that share the same control group);
p. F incidence density is a (relative) rate; cumulative incidence is a proportion;
q. F incidence density and cumulative incidence are measures of frequency of occurrence, not of strength of associatiion;
r. F comparability of standardized rates and ratios across study populations requires that the standardized measures be constructed using the same set of weights; indirect standardization (e.g., via a SMR) employs the weights (the number of people in each stratum) from the study population, so measures standardized using this method are, strictly speaking, useful only for comparing a study population with the standard population used in the standardization;
s. F typically, general population controls will be less motivated than cases and sources of medical information for them will not be comparable to those for cases.
The "I can't remember formulas" method:
ARP = attributable cases / all exposed cases = attributable cases / 135
Attributable cases = attributable risk * Exposed PY = (1.341.04)*100,800 = 30.24
ARP = 30/135 = 22% (after rounding)
Interpretation: Based on these data, 22% (about one in five) strokes in people who are physically inactive can be attributed to their physical inactivity; in other words, if physically inactive people became active early enough in their lives, their stroke incidence would decrease by 22%
b. A key point here is that 27% is the prevalence of physically active people, whereas the exposure is physical inactivity, whose prevalence is therefore 100%  27% = 73%
PARP = p1(RR1) / [1 + p1(RR1)] = 0.73(1.2861) / [1 + 0.73(1.2861)]
= (0.73 x 0.286) / (1 + 0.73 x 0.286) = 0.209 / 1.209 = 17%
(The formula PARP = (I  I0) / I can also be used by first estimating the crude population incidence, I, as a weighted average of the incidences in exposed and unexposed, weighting by the prevalence of exposure, e.g.: I = (0.73)(1.34) + (0.27)(1.04) = 1.26, so PARP = (1.259  1.04) / 1.259 = 17%
The "I can't remember formulas" method:
PARP = Attributable cases / All cases
Attributable cases are (1.341.04) x number of exposed personyears. Since we do not know the population size, represent it by n. Based on the NHANES data, 27% of people are physically active, so there are 0.73n physically inactive people (in one year, 0.73 personyears). So: Attributable cases = (1.341.04)(0.73) = 0.219.
All cases are exposed cases + unexposed cases. Since we do not know the population size, let it be represented by n. Based on the prevalence of physically active people, there are 0.73n phyisically inactive and 0.27n physically active people (or personyears, if we assume a oneyear period). So the total number of cases = exposed cases + unexposed cases = 0.73(1.34) + 0.27(1.04) = 1.259
Therefore, PARP = 0.219/1.259 = 17%
Note that these measures can be computed more precisely by using the original number of cases and personyears and not rounding intermediate results, but two significant figures is adequate for the actual result, and in this case the answer does not change.
Explanation: Seventeen percent of all strokes in the population are attributable to physical inactivity; if everyone were physically active, there would be 17% fewer strokes.
c. Attributable risk measures assume that the relationship is causal (i.e., that physical inactivity does in fact cause an ncrease stroke risk). Some of the above interpretations may also require that the process be reversible, so that changing to a physically active lifestyle brings risk down to the level of someone who was not inactive. Another assumption is that the rates and rate ratio observed in the cohort study hold ofr the entire population. Also, we have ignored the effects of other factors, most notably age.
b. High error profile: (2 + 5 + 6 + 5)/8021 = 2.24 per 1,000 womenyears.
Low error profile: (1+3+4) / 12,287 = 0.651 per 1,000 wy
Womenyears (WY) are computed as follows:
End 
Start 
Years 
Women 
WY 
1980 
1930 
50 
2 
100 
1985 
1930 
55 
5 
275 
1990 
1930 
60 
6 
360 
1995 
1930 
65 
5 
325 
1980 
1930 
50 
10 
500 
1995 
1930 
65 
15 
975 
1960 
1930 
30 
25 
750 
1970 
1930 
40 
30 
1,200 
1998 
1930 
68 
52 
3,536 
Totals 
150 
8,021 
c. IDR= ID High / ID low = 2.24/0.651 = 3.4. Nuns with a high error communications profile are 3.4 times more likely to die from Alzheimer's Disease than nuns with a low error profile.
d.

Alzheimers Disease 

Handwriting Profile 
AD Yes 
AD No 
High error 
18 
132 
Low error 
8 
192 
odds ratio = (18) (192)]/[(8) (132)] = 3.27 
e. The two are similar because the condition is fairly rare.
Back to the top  To list of examinations  To EPID168 home page 
10/5/1999, 10/6/1999, 10/7/1999, 8/4/2000vs, 10/15/2000