Errata and comments on
Ann Aschengrau and George Seage III
Essentials of Epidemiology in Public Health
Sudbury MA: Jones and Bartlett 1st edition, 2003
(http://publichealth.jbpub.com/aschengrau)
Although I like this textbook a great deal,
there are a few places where I would prefer a different presentation. First, several errata.
- Erratum:
On page 417, the calculation of positive predictive
value (99.4%) has the numerator and denominator transposed. The authors
undoubtedly meant to type 9800/9860.
- Erratum:
The formula in the textbook on page 435 should have
the CI for delegates in the denominator, instead of the CI for non-delegates.
The formula in the text (bottom of page 64 - the authors use the abbreviation
APe) is correct except that it's missing a right paren.
- Erratum:
On page 447, the answer to review question
4D in chapter 16 (pp. 429-430) should be 95,900 / 99,100 =
96.8% (i.e., the intermediate result is incorrect).
Here are some comments on several other passages in the text:
- Chapter 2 Measures of disease frequency
In the discussion of “Incidence Rate” on pages 44-46,
the authors explain that incidence rates have person-time denominators
and have units of 1/time. However, these units are omitted in several instances
in the section on
“Commonly Used Measures of Disease Frequency in Public Health” on page 51,
which has on occasion led to confusion among students as to whether the denominators
are based on person-time.
The authors correctly observe that the word “rate” is often used incorrectly
for proportions or other ratios that are not rates in the strict meaning (change in one
quantity per unit change in another quantity, most often time - see reference 11 for the
chapter). The discussion (p52) also makes clear that (in their usual usage) the terms
“attack rate”, “case fatality rate”, and “survival rate”
are cumulative incidence-type measures, i.e., proportions for which there are no
units, time is not included in the denominator, and the time interval should be
specified (e.g., the 1-year “case fatality rate”).
The description (p51) of mortality (both
crude and cause-specific) and morbidity rates correctly observes that these are
usually provided for a 1-year period. The
time period is often implicit or is indicated by the use of the word “annual”,
as in “annual lung cancer mortality rate”. What is confusing, though, is that
the presentation does not make clear that mortality and morbidity rates, as usually
constructed, are indeed rates in the Elandt-Johnson (reference 11) meaning. They
have person-time denominators and, therefore, require units of 1/time. These units
are, however, frequently omitted in conventional presentations. While the authors’
examples (e.g., “864.7/100,000 population” [p51]) reflect this convention
it is important to understand that the denominator does still include time. If it were
written out fully, the rate should be stated as “864.7/100,000 population per year”.
The term “1-year mortality rate”
is not precisely equivalent to “per year”. The reason is that the
“1-year” applies to the time period during which the deaths occurred (e.g., 2005), rather than to the units in which the quantity is expressed. One could, for example,
express a 2005 mortality rate of “2,400 per 100,000 per year” as
“200 per 100,000 per month”. The time period during which the deaths occurred
is still one year (calendar 2005) and the rate is identical, but the number is now
200 per 100,000 instead of 2,400 per 100,000. (The need to specify units for an annual
rate is a subtle point that I did not realize for many years, until my colleague Charlie Poole helped me to appreciate it). But regardless of whether or not one regards the
conventional presentation as incorrect (or incomplete), it is important to be clear
that denominators for mortality and morbidity rates are based on person-time even when
the units of time are not explicitly stated.
- Chapter 3 Comparing disease frequencies
In the discussion of “Absolute Measures of Comparison” on page 62,
the statement is made “Interpreted narrowly, the RD is simply the excess number of
cases of disease associated with the exposure.” A more precise formulation would be
that the RD is the excess risk or rate of disease associated with the exposure,
which I assume is the authors’ intended meaning. The excess number of cases
would be the excess risk (or rate) multiplied by the number of exposed persons
(or amount of exposed person-time). (For additional explanation and formulas for
attributable risk measures, see my Evolving Text chapter “Relating risk factors to health outcomes” (versión
en español) – see “Measures of impact” at about page 185.
- Chapter 10 Bias
The authors begin their discussion of Selection Bias (page 254) with a paragraph that
concludes with the statement, “Selection bias does not occur in prospective cohort
and experimental studies because only the exposure has ocurred by the time of selection.”
The authors are able to make this statement because they restrict the meaning of
the phrase “selected for study” to refer only to the “selective entry
of subjects into a study.” (p270) Thus, loss to follow-up, or attrition, is
classified as observation bias (the authors' term for what EPID160 has been referring to
as “information bias.”
Aschengrau and Seage write that
selection bias can occur in a
retrospective cohort design because the exposure and disease have occurred by the time
of subject selection. However, the first example given in the textbook is a proportional
mortality study, which the authors characterize as a type of cohort study but others have
characterized as fundamentally a case-control design. In the second example, selection bias occurred
because the authors were able to determine health outcomes for only 76% of the original
cohort. Although the process is somewhat different, the missing 24% of subjects have
the same relation to the original cohort as occurs in loss to followup.
It should be remembered, though, that
although a cohort study enrolls subjects who have not yet experienced the outcome, that
is not necessarily the case for the population from which the cohort is recruited.
If the population has already experienced selective forces related to the exposure and
disease, then a cohort recruited from that population may exhibit an exposure-disease
association different from a cohort recruited through similar means in a population
that has not experienced selective forces. Although not all epidemiologists would
characterize these situations as “selection bias.“, they arguably qualify as
“systematic differences between those who are selected for study and those who are not.”
(quoted on p254 and cited to John Last, A dictionary of epidemiology, 3rd ed.)
For example, if an exposure leads to
spontaneous abortions - possibly even before a pregnancy
is clinically recognized - then that exposure may appear to be associated with a reduction in
adverse pregnancy outcomes if a cohort of pregnant women is defined at, say, first prenatal
visit. Another example is a study
of time to AIDS in which subjects are recruited after they have become HIV-infected, e.g.,
through an HIV seroprevalence study.
If the subjects have been seropositive for differing lengths of time, and exposure is related
to time of infection in relation to time of cohort enrollment, then the exposure will be
predictive of AIDS onset. The latter situation can be regarded as confounding bias, where
the confounder is time since infection, though a case can be made for calling it selection
bias.
- Chapter 12 Random error
A minor quibble is the explanation of sampling on page 303, since the
authors seem to imply that a sample is either random or nonrandom, according
to whether there is a probabilistic element in the selection process. I
don't recall seeing a definition of what the authors mean by a "probabilistic
element", but it seems to me that a sample can have a degree of randomness
and yet not be a "random sample".
A distinction that may be more useful is that between probability samples
and non-probability samples. A probability sample is one obtained
by random sampling methods and in which each member of the target population
has a known, non-zero probability of being included in the sample.
Many methods of obtaining a study population do not qualify as probability
sampling but do involve a degree of randomness, such as selecting people
as they arrive at a clinic. If you would like to read an excellent presentation
of sampling and epidemiology, see the article by my colleagues, Bill Kalsbeek
and Gerardo Heiss, Building bridges between populations and samples in epidemiological
studies. Annual Review of Public Health 2000; 21:1-23 (Bill Kalsbeek
teaches the sample survey course and runs the Survey Research Unit in the
Department of Biostatistics).
More problematic for me is Aschengrau and Seage's proposal that the P-value “gives a sense of the stability of the measure of association”
(Aschengrau and Seage, p. 308). What the authors mean by “stability”
in this context isnt defined, but if they mean reproducibility of
the estimate across multiple studies, then the indicator that they want
is the confidence interval rather than the P-value. The P-value and the
confidence interval are linked, but as the Aschengrau and Seagle point out,
the P-value is a “confounded statistic” because it reflects both
the strength of association and the amount of data on which the estimate
is based. If the observed association is very strong, it is possible to
have a very small p-value and yet a very unstable (imprecise) estimate of
that association. So I would certainly not endorse the authors recommendation
that “Epidemiologists should examine the P value when deciding how
much money to bet”. If the authors were to substitute “confidence
interval” for “P-value” in these paragraphs, I would feel
much more comfortable. (My colleague Jay Kaufman points out that the wagering
metaphor also assumes that the only source of error is sampling variability,
thereby ignoring error from bias. Aschengrau and Seage are under no illusions
about that, but its always useful to remind ourselves that most discussions
of p-values and confidence intervals assume no bias.)
I find the textbooks treatment of confidence intervals (the real
measure of stability) more congenial, but figures 12-2 and 12-3 are a little
misleading in that they show the RR estimates as being in the center of
their confidence intervals. Confidence intervals for means, proportions,
and differences (e.g., the risk difference) are symmetric about the point
estimate, but for ratio measures the symmetry holds only on the log scale.
If you look at any of the case study articles that report ORs, RRs,
or prevalence ratios, you will see that the confidence intervals appear
lopsided. But if you take the natural log of the confidence limits and of
the point estimate, you will find that the log of the RR, OR, or PR falls
precisely in the middle of the (logged) confidence limits.
If you are having trouble mastering the subtleties of p-values, statistical
significance, and confidence intervals, be reassured that you are not alone.
- Chapter 16 Screening in public health practice
The answer provided for question 4D (specificity) in chapter 16 has an error. Page 447 shows a correct table and the correct calculated value for specificity, but the calculation should read: 95,900/99,100, not 95,900/96,000. The calculation shown (but not the result) is for predictive value negative.
Vic Schoenbach
Return to EPID600 home page
2/7,11/2005,9/21,22/2005,5/26/2009
|