|
|
||||||||
| ABSTRACT |
|
|
|---|
| INTRODUCTION |
|
|
|---|
From 1975 to 1997, estimated human schistosomiasis prevalence in the Philippines decreased from 17.3% to reach a new equilibrium of 4.7% due to regular use of praziquantel starting in 1980 combined with both active and passive case detection.4 However, as documented by Aligui and Yu and others,9,10 the sensitivity of the tests used for national schistosomiasis screening is low and decreases with the intensity of infection. Therefore, the true national prevalence is likely to be higher than reported. Regional prevalence estimates have also been shown to be very heterogeneousvarying as a function of geography, ecology, as well as age and sex.11 In addition to the prevalence, an accurate measure of the intensity of infection with S. japonicum is essential to evaluating the transmission dynamics of the infection. This is because individuals more heavily infected will contaminate the environment with more eggs than lightly infected individuals. Also, hepatic morbidity associated with S. japonicum in the Philippines appears to be linked to chronic untreated infections rather than the intensity of infection as observed with Schistosoma mansoni, thus the importance of accurately measuring the intensity of even light infection.12 Finally, the intensity of infection can be used to assess the effectiveness of public health intervention. In effect, the intensity of infection within those affected is an important measure of intervention effectiveness and of the endemic state of an infection; the prevalence alone is not sufficient for public health decision making.13 Misclassifications errors between categories at the extremes of a continuum are not unique to helminthic infections. They can also pose a challenge in other areas of medicine such as, for example, measuring cognitive levels, measuring birth weights to define very and low birth weight babies. Neuhaus14 and Paulino and others15 have adjusted for misclassification errors in research in the area of human papilloma virus (HPV) infection, using methods very similar to those proposed here.
Our objective here is to develop a statistical model to enable reliable prediction, in an endemic situation, of age/ gender intensity "profiles" of schistosomiasis infection, adjusting for the measurement error of the Kato-Katz test and acknowledging the potential heterogeneity in infection due, in part, to differing occupational water contact and ecological patterns between communities within the same region. For each age/gender stratum, we predict the probability that an individual is in each of four contiguous egg categories. These four probabilities then make up the intensity profile for that stratum. In addition to providing reliable estimates of schistosomiasis prevalence in this area, the stratum-specific profile estimates are used subsequently to inform the transmission dynamics.16
In the current paper, we adopt a Bayesian model-based approach that has the flexibility to acknowledge all sources of uncertainty. In particular, our model incorporates the uncertainties 1) surrounding the estimates of the sensitivity and specificity of the Kato-Katz test, 2) due to missing covariate data, and 3) due to the heterogeneity in intensity of infection patterns between communities.
| MATERIALS AND METHODS |
|
|
|---|
Parasitological examination. Each stool sample was analyzed with a duplicate 50-mg Kato-Katz thick smear method.17 Positive samples were independently reviewed by two microscopists. In addition, 10% of all slides were reviewed by an expert microscopist, with an agreement of over 99%. The number of eggs found on each slide was multiplied by 10 to obtain an estimate of the number of eggs per gram (epg). The results were divided into four groups of infection: not infected (0 epg), lightly infected (1100 epg), moderately infected (101800 epg) and very heavily infected (> 800 epg). We have used a modified classification to the one recommended by the WHO to better reflect the impact of heavily infected individuals on the transmission dynamics of the infection.18
The Kato-Katz method has been shown to have a low sensitivity, especially for individuals lightly infected, as reviewed by Aligui.9 Because it is less likely that a person at least moderately infected would be misclassified by the Kato-Katz test, our model accounts only for misclassification errors between the lowest infection categories (no infection and light infection). Public health control programs in the Philippines still largely rely on the prevalence of infection to decide which villages should be targeted for closer control, so that misclassification between the two lowest categories leads to bias in prevalence estimates with important public health impact. Further, because egg counts follow a negative binomial distribution,13 most individuals are not infected or very lightly infected, so that most of the misclassification occurs here. It has been shown that S. japonicum eggs tend to be more numerous at the surface of the stool than in the center, leading to additional variability if the stool is not stirred before analysis.19 In a study comparing various methods to adjust prevalence estimates of S. japonicum, the sensitivity and specificity of a single stool Kato-Katz analysis were assumed to range between 0.41 and 0.69, and 0.90 and 1.00, respectively. These estimates were based on a review of the literature and allow for error associated with inter-microscopist variation.9
Statistical analysis. Individuals were categorized into three age groups: 06 years (Age group 1); 713 years (Age group 2), and > 13 years (Age group 3). These age categories were chosen to represent groups of homogeneous water contact patterns among young children, school children, and adolescents and adults. The distribution of intensity of infection was similar within age groups over 13 years old.
Full details of the statistical model are presented in the Appendix. Briefly, our outcome is the true but unobserved egg category of each individual in their respective age/sex/ village stratum. The outcome is modeled with four values corresponding to epg = 0, 0 < epg
100, 100 < epg
800, and epg > 800, respectively, so that we model the probability (
) that an individual is in each egg category. We used a cumulative logit model2022 to directly model these cumulative probabilities over the four categories. For example, for individuals in each age/sex/village stratum, we would first look at the probability of being at least slightly infected compared with not being infected. In a second stage, we would look at the probability of being at least moderately infected compared with slightly infected or not infected, and so on. By subtracting probabilities in adjacent categories, the probability for each egg category is available.
Incorporating the misclassification error. As discussed above, the Kato-Katz method can lead to misclassification of samples, and our model was extended to account for this misclassification. Details are given in the Appendix.
Missing values. Twenty individuals had no age data recorded. Evidence from the field and from the nonmissing data from these individuals suggest that this covariate information was missing completely at random, its absence being unrelated to the data we analyzed. We used multiple imputation for the missing ages in relative proportion to the age distribution in the remaining population.23 The posterior estimates of the intensity profiles fully incorporate the additional uncertainty arising from the missing age values.
The proportional odds assumption. Although relatively flexible in its incorporation of misclassification error and covariate information, the above model makes one particularly strong assumption: that of proportional odds. This means that the effect of each covariate is assumed to be the same across cumulative categories. For example, the odds ratio (OR) of being at least slightly infected (compared with not infected) in young children compared with adults would be exactly the same as the OR of being at least moderately infected (compared with being slightly infected or not infected) in young children compared with adults. This means that being an adult compared with a child has a constant effect on increasing intensities of infection, for example, but this does not appear to be the case. In particular, the odds of having egg count greater than a particular category in individuals in young children (07) relative to older children or adults appears to differ markedly as a function of the intensity of infection. The proportional odds assumption, however, does appear reasonable with respect to gender (data not shown). Therefore, the proportional odds assumption was relaxed for the effect of age on the intensity of infection (see the Appendix for details).
Prior distributions.
In any Bayesian analysis, information in the data is combined with any information about parameter values known prior to analysis of the current data set. When little information is available, non-informative prior distributions can be used, which in practice mean that the information in the data will dominate any final inferences. We used non-informative prior distributions for almost all parameters in our model, but there is little information in the study data about the sensitivity and specificity of the Kato-Katz test.24 We therefore used informative prior distributions for these two parameters, and assessed the sensitivity of our results to alternative prior distributions reflecting more optimistic and more pessimistic views regarding the performance of the test. In particular, we assumed the sensitivity of the Kato-Katz test, to have a mean of 0.55, and 99% Bayesian Credible Interval (BCI; Bayesian analogue of standard confidence intervals) of 0.41, 0.69. For the specificity, we assumed a mean of 0.95 and 99% BCI of 0.87, 1.00. As a robustness check to these prior assumptions, we used an "optimistic" prior with mean sensitivity 0.65 (99% BCI: 0.51, 0.77) and mean specificity 0.99 (99% BCI: 0.95, 1.00). In contrast, the pessimistic priors assume the sensitivity has mean 0.45 (99% BCI: 0.32, 0.58) and the specificity has mean 0.86 (99% BCI: 0.79, 0.95). The variation in results across these three different prior distributions (displayed in Figure 1
) will reflect how dependent our results are on the choice of prior.
|
Prediction and model verification. We obtained predictions of the patterns of intensity by age and gender, adjusted for known misclassification error and the variability between villages by using the MCMC samples plugged into the relevant prediction equations from our model. In this way, we are not only able to predict patterns likely to be seen in the future if our model is correct, but we are also able to compare these predictions from our model to the actual data we observed, as a further check on the validity of our model. Full details are provided in the Appendix.
| RESULTS |
|
|
|---|
|
|
|
|
Model validity.
The lines in Figure 4
shows the median posterior predictions with their 95% BCI of the epg categories stratified by age and village for females. The crosses show the crude observations of these estimates. A similar picture emerges for males (not shown). These predictions are obtained by conditioning on both the cumulative logit model, the misclassification error model and the demographic composition of the villages (through the nonproportional odds component of the model)they are, therefore, what we would expect to have observed "on the ground" if our model were correct. Even though we should expect some observations to fall outside the 95% BCI due to the poor sensitivity of the single stool Kato-Katz test, most observations do not differ greatly in practical terms. Hence, the model appears to well capture the overall pattern of infection intensity across the age groups.
|
| DISCUSSION |
|
|
|---|
The intensity profiles themselves can be derived in a number of ways. One approach is to assume a distributional form, say negative binomial, for the raw egg counts of individuals within a specific risk category, and then compute the required probabilities conditional on this assumption. The latter reduces to computing the proportion of the distribution between appropriate cut-pointsin our case (0,1,100,800). Rather than assuming a distributional form for the egg counts, we prefer to model the probabilities directly. As well as affording a little more flexibility in the form of the underlying intensity profiles, the major advantage of this approach in the current context is the ease with which the misclassification error can be incorporated. Our model was robust to the use of different prior distributions to adjust for the measurement error of the single stool Kato-Katz examination. Although widely advocated in the statistical literature, this is the first time (to our knowledge) that this specific approach has been adopted in the context of intensity of helminthic infections (a medline search with MeSH terms "helminthes" and "parasite egg count" found 3,314 publications, none of which had used this approach).
Cumulative logit models can be fitted within the classic estimation framework, see Agresti and Natarajan for details,21 but we find the Bayesian approach more attractive for a number of reasons. First, inferences do not rely on asymptotic arguments that break down in the face of sparse data. In addition, the Bayesian approach does not demand the maximization of complex likelihoods and so many models not estimable using a classic approach become feasible using a Bayesian approach. Most importantly, however, it affords far greater flexibility, allowing us to extend our model to acknowledge appropriately the clustering of infection within villages and the misclassification error inherent in the Kato-Katz test. Others have also used a Bayesian generalized linear model to adjust for similar misclassification errors,15 and Neuhaus has shown that not adjusting for such errors can lead to bias not only in the prevalence estimates, but also to the regression coefficients themselves.14 In parasitology, adjusting for misclassification error for estimating the prevalence of helminthic infection has been used in a few instances in human epidemiologic studies and veterinary epidemiologic studies,9,2831 but not in combination with multivariate regressions or for prediction purposes.
Our model confirms the frequently observed higher prevalence of infection in males and agrees for what had been reported for S. japonicum in the Philippines and China.1,3235 This is most likely due to the fact that, in this population, females have less water contact than males.9 However, the odds of infection in males compared with that in females are the same for all intensity of infection thresholds. In other words, males are consistently 22% more likely to be at least lightly infected, at least moderately infected or at least heavily infected compared with females. Thus, even though males are in general more infected than females, the distribution of individuals in each intensity of infection group is similar for the two genders. It may be that males and females progress through heavier intensities of infection with a similar rate. This was not the case when age groups were compared. The intensity and prevalence of infection increased with age, which is also consistent with what has been reported from the Philippines and China.1,3235 However, the rate with which the intensity of infection increases is different between very young children (06 years old) and older children and adults. Young children are less likely than older age groups to be infected or at least moderately infected, but the odds of being heavily infected (compared with non-heavily infected) is similar in all age groups. This may be because there are only a very small number of heavily infected individuals in all age groups. It also shows that infection really starts occurring between the ages of 7 and 13 years and remains similar afterwards. This is consistent with the usually observed increase in intensity and prevalence of infection in teenagers followed by a plateau.1,32 Again, this age difference is most likely due to differences in the frequency of water contacts but could also be linked to puberty.7,36
Despite the advantages outlined above, there are a number of limitations to our analysis within the specific context of the Leyte example. First, we have assumed misclassification occurs only between egg classes 1 and 2, although the model could be extended to allow more complex misclassification error mechanism. This choice was based on the experience that, in parasitological analysis, identifying a few eggs is always more challenging than counting several eggs. However, there may still be some errors between the other categories of egg counts, even though we believe that these would be minor. In a study on Schistosoma mansoni conducted in Burundi, it was clearly shown that the estimates of prevalence of moderate and heavy infections remained very similar when stools were sampled at days 1, 3, 5, 8, 10, 32, and 37, whereas the cumulative prevalence of light infection increased from 29.5% in Day 1 to 45.9% in Day 37.37 Second, our estimation of the pattern of heterogeneity between communities is based on information from only three villages and makes the assumption that these villages represent the range of prevalences and intensities that would be found in the field. The villages were sampled based on the judgment of the principal investigator of the 1981 study (R.O.). Even though this was not a random sample, we believe our results represent a significant improvement over previous approaches in which all heterogeneity was ignored.
| APPENDIX: DESCRIPTION OF OUR STATISTICAL MODEL |
|
|
|---|
100, 100 < epg
800, and epg > 800, respectively. We model the probability (
ijk) that individual j, stratum i, is in egg category k given their covariates values, Xij, represented by
![]() |
We use a hierarchical cumulative logit model2022 to directly model the cumulative probabilities,
![]() |
where qijk denotes the probability that individual j, stratum i has egg load greater than that defined by category k. At the first level of our hierarchical model, we assume a linear model for the logit of these cumulative probabilities,
![]() |
where the
k are such that
k >
k-1, k=1, . . , 4 and
0 =
, and where µij is the overall mean within each category. Because infection intensity varies with age, gender and village, at the second level of our hierarchical model, we set
![]() |
where i = 1, . . , 6; a = 1,2,3; v = 1,2,3; and I{.} is an indicator function that takes the value 1 if the logical expression in brackets is true, and zero otherwise. Age group 1 and Village 3 are taken as the baseline, giving ß1 = 0,
3 = 0. As an example, the probability that the 8-year-old boy from Village 1 is at least lightly infected (k > 1) would be given by logit1 (
+ ß2 +
1
1).
Incorporating the misclassification error.
Let yij denote the recorded (observed) egg category of individual j, stratum i, and
and
the specificity and sensitivity of the Kato-Katz test, respectively. Modeling misclassification between categories 1 and 2, we have
![]() |
![]() |
![]() |
![]() |
Relaxing the proportional odds assumption.
One way to incorporate non-proportional odds is to allow the cut points (
k) to vary with age. Specifically, we allow independent cut points
k* (k = 1,2,3) for Age group 1 (06 years), where
2* to ensure the model is identifiable. Consider individuals L and M . Under our model, the ratio of their respective odds of being in egg category greater than k is now a function of k and so varies across cumulative categories,
![]() |
Prior distributions.
As discussed in our "Materials and Methods" section, informative prior distributions are specified for
and
, reflecting the available a priori knowledge.9 in particular, we used
![]() |
![]() |
As these priors will not be significantly updated by our data, we assess the sensitivity of our results to alternative prior hypotheses reflecting more optimistic and more pessimistic views regarding the performance of the Kato-Katz test.23 Specifically, the optimistic priors (
~ Beta[100,1],
~ Beta[55,30]) reflect a view that the sensitivity has mean 0.65 and the specificity has mean 0.99. In contrast, the pessimistic priors (
~ Beta[100,14],
~ Beta[42,51]) assume the sensitivity has mean 0.45 and the specificity has mean 0.86. These priors are displayed in Figure 1
.
To complete the model specification, we assume diffuse N(0,104) hyper-priors for each of the model unknowns:
,
,
1,
2, ß1 and ß2. The cut-points are assumed a priori to be uniformly distributed over a wide range, bounded above or below (as appropriate) by zero to ensure the strict ordering
1 <
2 <
3:
1,
1* ~ U[10,0];
1,
1 * ~ U[0,10].
Estimation. Inferences are based on the joint posterior distribution of the model unknowns,
![]() |
where y = {yij},
= {
ijk},
= (
1,
3,
1*,
3*), ß = (ß1,ß2),
= (
1,
2) and the likelihood function p(y |
,
,
) is given by
![]() |
The resulting posterior distribution is analytically intractable and so we use Markov chain Monte Carlo (MCMC) methods, specifically the Gibbs sampler, to obtain samples from the marginal posteriors of the parameters.
Prediction. The primary aim of this analysis is to provide reliable and robust prediction of patterns of intensity by age and gender, adjusted for known misclassification error and the variability between villages (see below). One way to achieve this, and the approach taken in the current paper, is to empirically "mix" over villages when predicting from the model. Thus, at each iteration of the MCMC sampler, we draw village v = 1, 2, or 3 with equal probability and predict the intensity profile for a given age/gender stratum conditional on this drawn value of v. Averaging over iterations within a stratum provides the required predictions of the intensity profiles for that specific stratum. The latter incorporates appropriately the between-village variability in infection intensityeffectively we are integrating over the effect of villagewithout having to assume a specific distributional form for the between-village heterogeneity. If information were available on a greater number of villages, an alternative approach to that above would be to assume a random village effect with, say, a normal random effects distribution and at each iteration draw a new village effect from this random effects distribution. Again, within the Bayesian estimation framework, this elaboration could easily be incorporated. Such predictions also allow us to compare data the model predicts with the data that actually occurred. If our model fits well, these two sets of data (i.e., the real data set and predictions of future data from our model) should not be too different.
Received April 13, 2004. Accepted for publication July 21, 2004.
Financial support: This project was funded by the NIH/NSF Ecology of Infectious Diseases program, NIH Grant R01 TW01582. Hélène Carabin would like to thank the Wellcome Trust for support during the first part of this project. Lawrence Joseph is supported by a Senior Scientist Award from the Canadian Institutes of Health Research.
Authors addresses: Clare M. Marshall, Department of Epidemiology and Public Health, Division of Primary Health Care and Public Health, Faculty of Medicine, Imperial College, St-Marys Campus, Norfolk Place, London, W2 1PG, UK, E-mail: clare.marshall{at}imperial.ac.uk. Hélène Carabin, Department of Biostatistics and Epidemiology, College of Public Health, University of Oklahoma Health Sciences Center, Room 303, 801 NE 13th Street, Oklahoma City, Oklahoma 73116, Telephone: (1) 405-271-2229 ext. 48083, Fax: (1) 405-271-2068, E-mail: helene-carabin{at}ouhsc.edu. Lawrence Joseph, Department of Epidemiology and Biostatistics, McGill University, and Division of Clinical Epidemiology, Montreal General Hospital, Livingston Hall, 10th Floor, 1650 Cedar Avenue, Montreal, Quebec, Canada, H3G 1A4, Telephone: (1) 514-934-1934 ext. 44713, Fax: (1) 514-934-8293, E-mail: lawrence.joseph{at}mcgill.ca. Steven Riley, Department of Infectious Disease Epidemiology, Division of Primary Health Care and Public Health, Faculty of Medicine, Imperial College, St-Marys Campus, Norfolk Place, London, W2 1PG, UK, Telephone: (44) 20-7594-3288, Fax: (44) 20-7262-3495, E-mail: s.riley{at}imperial.ac.uk. Remigio Olveda, Research Institute for Tropical Medicine, Department of Health Compound, FILINVEST Corporate City, Alabang, Muntinlupa City, 1781 Philippines, Telephone: (632) 809-7599, Fax: (632) 842-2245, E-mail: r.olveda{at}ritm.gov.ph. Stephen T. McGarvey, International Health Institute and Department of Community Health, Brown University, 171 Meeting Street, Box G-B495, Providence, RI 02912, Telephone: (1) 401-863-1354, Fax: (1) 401-863-1243, E-mail: Stephen_McGarvey{at}Brown.edu.
Reprint requests: Hélène Carabin, Department of Biostatistics and Epidemiology, College of Public Health, University of Oklahoma Health Sciences Center, Room 303, 801 NE 13th Street, Oklahoma City, Oklahoma 73116, Telephone: (1) 405-271-2229 ext. 48083, Fax: (1) 405-271-2068, E-mail: helene-carabin{at}ouhsc.edu.
| REFERENCES |
|
|
|---|
This article has been cited by other articles:
![]() |
G. Raso, P. Vounatsou, B. H. Singer, E. K. N'Goran, M. Tanner, and J. Utzinger An integrated approach for risk profiling and spatial prediction of Schistosoma mansoni-hookworm coinfection PNAS, May 2, 2006; 103(18): 6934 - 6939. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. RILEY, H. CARABIN, C. MARSHALL, R. OLVEDA, A. L. WILLINGHAM, and S. T. McGARVEY ESTIMATING AND MODELING THE DYNAMICS OF THE INTENSITY OF INFECTION WITH SCHISTOSOMA JAPONICUM IN VILLAGERS OF LEYTE, PHILIPPINES. PART II: INTENSITY-SPECIFIC TRANSMISSION OF S. JAPONICUM. THE SCHISTOSOMIASIS TRANSMISSION AND ECOLOGY PROJECT Am J Trop Med Hyg, June 1, 2005; 72(6): 754 - 761. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |