Errata and comments on
Ann Aschengrau and George Seage III
Essentials of Epidemiology in Public Health
Sudbury MA: Jones and Bartlett
1st edition, 2003 (http://publichealth.jbpub.com/aschengrau)

Although I like this textbook a great deal, there are a few places where I would prefer a different presentation. First, several errata.

  • Erratum: On page 417, the calculation of positive predictive value (99.4%) has the numerator and denominator transposed. The authors undoubtedly meant to type 9800/9860.
  • Erratum: The formula in the textbook on page 435 should have the CI for delegates in the denominator, instead of the CI for non-delegates. The formula in the text (bottom of page 64 - the authors use the abbreviation APe) is correct except that it's missing a right paren.
  • Erratum: On page 447, the answer to review question 4D in chapter 16 (pp. 429-430) should be 95,900 / 99,100 = 96.8% (i.e., the intermediate result is incorrect).

Here are some comments on several other passages in the text:

  • Chapter 2 – Measures of disease frequency

    In the discussion of “Incidence Rate” on pages 44-46, the authors explain that incidence rates have person-time denominators and have units of 1/time. However, these units are omitted in several instances in the section on “Commonly Used Measures of Disease Frequency in Public Health” on page 51, which has on occasion led to confusion among students as to whether the denominators are based on person-time.

    The authors correctly observe that the word “rate” is often used incorrectly for proportions or other ratios that are not rates in the strict meaning (change in one quantity per unit change in another quantity, most often time - see reference 11 for the chapter). The discussion (p52) also makes clear that (in their usual usage) the terms “attack rate”, “case fatality rate”, and “survival rate” are cumulative incidence-type measures, i.e., proportions for which there are no units, time is not included in the denominator, and the time interval should be specified (e.g., the 1-year “case fatality rate”).

    The description (p51) of mortality (both crude and cause-specific) and morbidity rates correctly observes that these are usually provided for a 1-year period. The time period is often implicit or is indicated by the use of the word “annual”, as in “annual lung cancer mortality rate”. What is confusing, though, is that the presentation does not make clear that mortality and morbidity rates, as usually constructed, are indeed rates in the Elandt-Johnson (reference 11) meaning. They have person-time denominators and, therefore, require units of 1/time. These units are, however, frequently omitted in conventional presentations. While the authors’ examples (e.g., “864.7/100,000 population” [p51]) reflect this convention it is important to understand that the denominator does still include time. If it were written out fully, the rate should be stated as “864.7/100,000 population per year”.

    The term “1-year mortality rate” is not precisely equivalent to “per year”. The reason is that the “1-year” applies to the time period during which the deaths occurred (e.g., 2005), rather than to the units in which the quantity is expressed. One could, for example, express a 2005 mortality rate of “2,400 per 100,000 per year” as “200 per 100,000 per month”. The time period during which the deaths occurred is still one year (calendar 2005) and the rate is identical, but the number is now 200 per 100,000 instead of 2,400 per 100,000. (The need to specify units for an annual rate is a subtle point that I did not realize for many years, until my colleague Charlie Poole helped me to appreciate it). But regardless of whether or not one regards the conventional presentation as incorrect (or incomplete), it is important to be clear that denominators for mortality and morbidity rates are based on person-time even when the units of time are not explicitly stated.

  • Chapter 3 – Comparing disease frequencies

    In the discussion of “Absolute Measures of Comparison” on page 62, the statement is made “Interpreted narrowly, the RD is simply the excess number of cases of disease associated with the exposure.” A more precise formulation would be that the RD is the excess risk or rate of disease associated with the exposure, which I assume is the authors’ intended meaning. The excess number of cases would be the excess risk (or rate) multiplied by the number of exposed persons (or amount of exposed person-time). (For additional explanation and formulas for attributable risk measures, see my Evolving Text chapter “Relating risk factors to health outcomes” (versión en español) – see “Measures of impact” at about page 185.

  • Chapter 10 – Bias

    The authors begin their discussion of Selection Bias (page 254) with a paragraph that concludes with the statement, “Selection bias does not occur in prospective cohort and experimental studies because only the exposure has ocurred by the time of selection.” The authors are able to make this statement because they restrict the meaning of the phrase “selected for study” to refer only to the “selective entry of subjects into a study.” (p270) Thus, loss to follow-up, or attrition, is classified as observation bias (the authors' term for what EPID160 has been referring to as “information bias.”

    Aschengrau and Seage write that selection bias can occur in a retrospective cohort design because the exposure and disease have occurred by the time of subject selection. However, the first example given in the textbook is a proportional mortality study, which the authors characterize as a type of cohort study but others have characterized as fundamentally a case-control design. In the second example, selection bias occurred because the authors were able to determine health outcomes for only 76% of the original cohort. Although the process is somewhat different, the missing 24% of subjects have the same relation to the original cohort as occurs in loss to followup.

    It should be remembered, though, that although a cohort study enrolls subjects who have not yet experienced the outcome, that is not necessarily the case for the population from which the cohort is recruited. If the population has already experienced selective forces related to the exposure and disease, then a cohort recruited from that population may exhibit an exposure-disease association different from a cohort recruited through similar means in a population that has not experienced selective forces. Although not all epidemiologists would characterize these situations as “selection bias.“, they arguably qualify as “systematic differences between those who are selected for study and those who are not.” (quoted on p254 and cited to John Last, A dictionary of epidemiology, 3rd ed.)

    For example, if an exposure leads to spontaneous abortions - possibly even before a pregnancy is clinically recognized - then that exposure may appear to be associated with a reduction in adverse pregnancy outcomes if a cohort of pregnant women is defined at, say, first prenatal visit. Another example is a study of time to AIDS in which subjects are recruited after they have become HIV-infected, e.g., through an HIV seroprevalence study. If the subjects have been seropositive for differing lengths of time, and exposure is related to time of infection in relation to time of cohort enrollment, then the exposure will be predictive of AIDS onset. The latter situation can be regarded as confounding bias, where the confounder is time since infection, though a case can be made for calling it selection bias.

  • Chapter 12 – Random error

    A minor quibble is the explanation of sampling on page 303, since the authors seem to imply that a sample is either random or nonrandom, according to whether there is a probabilistic element in the selection process. I don't recall seeing a definition of what the authors mean by a "probabilistic element", but it seems to me that a sample can have a degree of randomness and yet not be a "random sample".

    A distinction that may be more useful is that between probability samples and non-probability samples. A probability sample is one obtained by random sampling methods and in which each member of the target population has a known, non-zero probability of being included in the sample. Many methods of obtaining a study population do not qualify as probability sampling but do involve a degree of randomness, such as selecting people as they arrive at a clinic. If you would like to read an excellent presentation of sampling and epidemiology, see the article by my colleagues, Bill Kalsbeek and Gerardo Heiss, Building bridges between populations and samples in epidemiological studies. Annual Review of Public Health 2000; 21:1-23 (Bill Kalsbeek teaches the sample survey course and runs the Survey Research Unit in the Department of Biostatistics).

    More problematic for me is Aschengrau and Seage's proposal that the P-value “gives a sense of the ‘stability’ of the measure of association” (Aschengrau and Seage, p. 308). What the authors mean by “stability” in this context isn’t defined, but if they mean reproducibility of the estimate across multiple studies, then the indicator that they want is the confidence interval rather than the P-value. The P-value and the confidence interval are linked, but as the Aschengrau and Seagle point out, the P-value is a “confounded statistic” because it reflects both the strength of association and the amount of data on which the estimate is based. If the observed association is very strong, it is possible to have a very small p-value and yet a very unstable (imprecise) estimate of that association. So I would certainly not endorse the authors’ recommendation that “Epidemiologists should examine the P value when deciding how much money to bet”. If the authors were to substitute “confidence interval” for “P-value” in these paragraphs, I would feel much more comfortable. (My colleague Jay Kaufman points out that the wagering metaphor also assumes that the only source of error is sampling variability, thereby ignoring error from bias. Aschengrau and Seage are under no illusions about that, but it’s always useful to remind ourselves that most discussions of p-values and confidence intervals assume no bias.)

    I find the textbook’s treatment of confidence intervals (the real measure of stability) more congenial, but figures 12-2 and 12-3 are a little misleading in that they show the RR estimates as being in the center of their confidence intervals. Confidence intervals for means, proportions, and differences (e.g., the risk difference) are symmetric about the point estimate, but for ratio measures the symmetry holds only on the log scale. If you look at any of the case study articles that report OR’s, RR’s, or prevalence ratios, you will see that the confidence intervals appear lopsided. But if you take the natural log of the confidence limits and of the point estimate, you will find that the log of the RR, OR, or PR falls precisely in the middle of the (logged) confidence limits.

    If you are having trouble mastering the subtleties of p-values, statistical significance, and confidence intervals, be reassured that you are not alone.

  • Chapter 16 – Screening in public health practice

    The answer provided for question 4D (specificity) in chapter 16 has an error. Page 447 shows a correct table and the correct calculated value for specificity, but the calculation should read: 95,900/99,100, not 95,900/96,000. The calculation shown (but not the result) is for predictive value negative.

Vic Schoenbach

Return to EPID600 home page

2/7,11/2005,9/21,22/2005,5/26/2009