AJTMH Tropical Medicine and Hygiene News
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Am. J. Trop. Med. Hyg., 74(5), 2006, pp. 779-785
Copyright © 2006 by The American Society of Tropical Medicine and Hygiene

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (6)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by KATHOLI, C. R.
Right arrow Articles by UNNASCH, T. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by KATHOLI, C. R.
Right arrow Articles by UNNASCH, T. R.
Related Collections
Right arrow Medical Entomology
Right arrow Modeling

IMPORTANT EXPERIMENTAL PARAMETERS FOR DETERMINING INFECTION RATES IN ARTHROPOD VECTORS USING POOL SCREENING APPROACHES

CHARLES R. KATHOLI AND THOMAS R. UNNASCH*
Department of Biostatistics; Division of Geographic Medicine, University of Alabama at Birmingham, Birmingham, Alabama


ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS AND DISCUSSION
 APPENDIX
 REFERENCES
 
Measuring transmission of a vector-borne infection is essential to understanding infection dynamics. When infection prevalence in the vector population is low, transmission is often measured by pool screening (also referred to as group testing). Several investigators have developed statistical methods to recover infection prevalence estimates from pool screen data. These are based on models that contain certain assumptions, and a pool screening approach must be designed to take these into account if accurate estimates of infection prevalence are to be obtained. Here we describe these assumptions and discuss appropriate sampling protocols. The sources of error inherent in pool screening are described, and we show that, under most conditions in which one would want to use group testing, most of the error results from sampling and not the pooling process. Issues involved in developing a sampling protocol, including the total number of insects to be screened and optimal pool size, are explored. The meaning of confidence intervals associated with prevalence estimates and the appropriate interpretation of these intervals are discussed.


INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS AND DISCUSSION
 APPENDIX
 REFERENCES
 
The level of transmission of any arthropod-borne infection is an important variable for studying many factors of the biology of the infectious organism (e.g., for determining the most efficient vector species). Monitoring of infection in the vector population is also an essential tool for surveillance of these infections and for measuring the success of control programs. When transmission is intense, the rate of infection in a vector population is usually high, and one can obtain an estimate of the prevalence of infection in the vector by individually examining a relatively small number of insects. However, when the prevalence of infection is low (such as at the start of an outbreak or after a successful control program), large numbers of insects need to be examined to obtain an accurate estimate of infection levels, and examining individual insects becomes very inefficient. The rapid development in recent years of assays applicable to screening "pools" or "clusters" of insects now makes it possible to efficiently estimate the infection potential of diseases spread by a vector species by screening pools of insects. However, pool screening is complicated by the fact that it is generally impossible to determine if a positive pool contained a single infected insect or more than one infected insect. The simplest method for dealing with this complication is the calculation of the minimum infection rate (MIR), which assumes that a positive pool contains only a single infected insect.1 However, this obviously may underestimate the actual level of infection. Thus, a more accurate method for calculating prevalence of infection from pool screening is desirable.2

Several researchers have investigated methods for estimating prevalence of infection from data collected from screening pools of insects. The alternative term "group testing" has been used by a number of these. Chiang and Reeves3 considered the maximum likelihood estimator (MLE) for the case of equal pool sizes. They also presented graphs giving exact confidence intervals (CIs) for selected pool sizes. Thompson4 studied the properties of the MLE for the case of equal pool sizes and suggested an optimal choice of pool size based on minimization of the mean square error. Burrows5 produced a bias reduced estimator and explored its properties. Katholi and others6 also considered the MLE and in addition gave formulas for the calculation of exact CIs suitable for hand calculation given only a table of the F-distribution. Barker7 extended the study of the MLE to the case where the sizes of the samples tested were no longer equal. In addition, she produced a simple expansion for the MLE suitable for hand calculation. She also studied moment estimators, a bias reduces estimator for the equal pool size case based on the Jackknife method, which is different from the one given by Burrows, and a Bayesian approach to the problem. Finally, she considered several different CIs based on classic methods and the Bayesian approach. Hepworth8 produced similar results for clusters of pools with equal sample sizes within the clusters. He also produced exact CIs that are different from those produced by Barker.7

Each of these approaches assumed that a collection of samples (pools) were gathered and each pool tested with the assay to see if it was positive. Point estimates and CIs are calculated, based on the total number of pools screened, the number of positive pools observed, and the pool sizes.

Pool screening is now commonly used as a tool to monitor both parasitic915 and viral1619 arthropod-borne infections. However, it is not often appreciated that each of the methods for calculating the prevalence of infection in a vector population from pool screen data is based on a model that is in turn built on certain underlying assumptions concerning the methods used for collecting the samples. Furthermore, it is often unappreciated that deriving estimates of infection prevalence from a sampling procedure (whether it be by screening individual insects collected in a trap or by screening pools) is subject to errors introduced by the sampling process and that this source of error is influenced both by the number of insects examined and by the methods used to examine them. This can lead to invalid estimates of infection rates in the vector population, because the underlying assumptions made by the statistical models may be violated. The overall goal of this study is to clarify these assumptions and indicate how different sampling protocols may be in concordance with, or in violation of, these assumptions. First, we discuss the assumptions that underlie the methods that have been developed for calculating infection rates based on the results obtained by pool screening. Second, we discuss the consequences of sampling pools on the estimates of infection rates, and in particular, how sample numbers and pool sizes affect the precision of these estimates. Finally, we provide a discussion of the correct interpretation of CIs that are obtained from estimates of infection prevalence that are obtained from these methods.


RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS AND DISCUSSION
 APPENDIX
 REFERENCES
 
Statistical models and appropriate sampling methods. The term Bernoulli trial is used to describe any test situation with two mutually exclusive outcomes (e.g., positive or negative, true or false). Screening with assays that are commonly used in pool screening (e.g., conventional polymerase chain reaction [PCR], enzyme-linked immunosorbent assay (ELISA), or even dissection) clearly fall into this category. We can identify three different approaches that may be applied to screening for a pathogen using such assays. First, one can screen each sample individually for the presence of the pathogen. Second, one can screen pools of equal size (e.g., pools where each pool contains exactly 50 individuals). Finally, we can screen pools where each pool contains different numbers of individuals. Equal pool sizes may be commonly found in situations where one is screening pools derived from samples collected from individual humans, whereas the unequal pools are most likely to occur in most experimental situations involving vector-borne pathogens, where the number of insects collected may vary from time interval to time interval or from trap to trap.

If individual insects are tested, the estimate of the probability that a given insect is infected is simply the ratio of the number of positive insects observed divided by the total number of insects examined. The underlying probability model for this situation is the binomial distribution, and CIs for the probability of infection are calculated from this by standard methods.20

When pools are sampled, the probability that a pool is not infected is (1 – p)n, where p is the probability that any given individual insect is infected, and n is the pool size. The probability model for the number of positive pools observed is again a Binomial distribution. Again, standard methods are used to calculate point estimates and CIs. When the pools are not all the same size, denote the size of the ith pool by ni; the probability that the pool is not infected is again (1 – p)ni. The distribution of the number of positive pools is no longer binomial, however, and the calculation of point estimators and CIs require considerable computation.7

The calculation methods using either equal or unequal pools assume that the samples are collected from an essentially infinite population and that the methods used in the sampling draw a random sample from this population. In these situations, the model assumes that the sample that is collected is assayed in its entirety. As shown below, the larger the sample (i.e., the more individual insects examined), the more accurate the estimate of the prevalence of infection will be. Thus, when one plans to use a pool screening approach to screen collected vectors, it is best to collect and screen as many individual insects as possible, given the limitations imposed by time and cost.

It is also possible to take another approach to screening equal sized pools that in some cases may be more efficient than screening the entire collection. Here the investigator collects and screens pools of equal size until R positive pools, each containing K insects, are obtained. Data collected in this way may be used to provide an estimate of the infection rate in the vector population. However, it is important to note that the assumptions underlying this method are different from those of the model described above. In this case, the probability model is based on the total number of trials required to find R positive pools and the appropriate probability model is no longer the binomial; it is the negative binomial. Therefore, the methods used to calculate the point estimates of infection rate and the associated CIs using this model will not be the same as those described above, and the values for the estimate of prevalence of infection will also differ somewhat from those obtained when one has screened an entire collection of insects. It is also important to note that this negative binomial model will not support the use of unequal pool sizes. A technical report describing this method and the properties of the estimates obtained is currently in preparation by the authors. In summary, one can calculate the prevalence of infection from screens involving equal or unequal pools when screening an entire collection or by adopting a sequential collection and screening protocol if one has pools of equal size. However, it is imperative that care be taken to insure that the collection protocol does not violate the underlying assumptions of the probability model that was used to construct the computational algorithm that one plans to use to analyze the resulting data. The most important of these assumptions when one is using the methods developed by Katholi and others,6 Hepworth,8 or Barker7 is that the collection screened must represent a truly random sample of an essentially infinite population.

What if one actually collects a number of insects that is greater than the laboratory is capable of screening? For example, if one faces a trap in which 1,500 mosquitoes have been collected, one might be tempted to just pull 500 of these from the trap to test. As described in detail in the appendix, such sub-sampling is allowable as long as one is sampling from an essentially infinite population, although at first glance the situation seems otherwise. In particular, the contents of each trap can be viewed as analogous to a bowl containing balls of two different colors: say red and green. A sub-sample is drawn from this bowl, and the question of interest is, "are there any red balls in the sub-sample?" A "red ball" detection assay is performed on the sub-sample and returns a yes or no answer. Sampling from a finite population containing only two kinds of objects follows the hypergeometric distribution. When this is done, the usual assumption is that the size of the overall population is known and that the proportion of red balls in this overall population is also known. The hypergeometric distribution gives the probabilities that a sub-sample of a given size contains X red balls. Because our test is a yes or no test, the probabilities of interest are the probability that our sample contains no red balls (i.e., is negative) and the complementary probability that there is at least one red ball. Again, our testing procedure is a Bernoulli trial; however, the probability of a negative sample conditional on there being M red balls in the original collection of size N is the complex quantity


Formula

where N is the number in the trap, K is the size of the sub-sample, and M is the unknown number of red balls in the trap. However, the random variable M has a binomial distribution with parameters N and p. Hence, as is shown in the Appendix, the unconditional probability that we observe a negative test is (1 – p)K. This being the case, samples found and tested as sub-samples from traps or other collection methods fit the same probability model discussed by Chiang and Reeve,3 Thompson,4 Katholi and others,6 Barker,7 Burrows,5 and Hepworth.8 However, it is important to note that these models all assume that sampling (and any subsequent sub-samples) are drawn from an essentially infinite population of insects. Thus, it is necessary to ensure that any sampling scheme to be used in conjunction with pool screening will draw an overall sample that is small compared with the total insect population present.

How many insects should be included in each pool, how many insects should be screened and how should pools be constructed? Statisticians generally consider two measures of merit when considering the quality of a point estimate. These are bias and mean square error (MSE). Because the maximum likelihood estimator of a parameter is a function of the observed random variable, it is also a random variable and hence has an expected value (mean) and a variance. Bias is a measure of the extent to which the expected value of the estimator fails to equal the true value of the parameter. Thus, if we denote the estimate of p by Formula, Bias(Formula) = E(Formula) – p. Similarly, the MSE is defined as MSE(Formula) = E(Formulap)2. The ideal estimator will have zero bias and minimum MSE. Unfortunately, it is not always easy to achieve these goals. Many investigators argue that they are willing to give up some bias for a smaller MSE. It is important, therefore, when considering an estimator to have a good grasp of the bias and MSE for the estimator and how they are influenced by ancillary factors (e.g., pool size).

For the pool screening estimator (when pool sizes are equal), it is not difficult to show that it is biased when the pool size is greater than 1 and that the bias increases as the pool size increases (see Appendix for mathematical details). The bias is in an upward direction (i.e., on average, the point estimate is somewhat larger than the actual infection rate). The size of the bias is also influenced by the value of the unknown parameter, p. Again, it can be shown that the bias increases as p increases. However, the bias is quite small when the true value of p is small and becomes substantial only as p (or the infection rate) gets larger than, say, 0.1 (or 10%).

The MSE also is influenced by both p and the pool size. Thompson4 showed that for any p, the MSE is approximately minimized when the pool size is taken to be


Formula

Tables 1Go–3GoGo give some data for the influence of the pool size (K) on the bias and the expected values of the endpoints of a 95% CI for the cases when the infection rates are 1/50 (2.0%), 1/400 (0.25%) and 1/1,000 (0.1%), respectively. It is shown in the Appendix that the key factor in the ultimate size of the bias is a combination of the pool size (K) and the number of pools (M). As discussed below, the chemistry of the assay will generally be the limiting factor in determining the upper bound of the pool size. Thus, the main factor that one may influence in a sampling scheme will be the number of pools. As this number becomes large for any pool size (K) and any unknown p, the bias tends to zero at about the rate 1/M; the MSE gets small at the same rate as well.


View this table:
[in this window]
[in a new window]
 
TABLE 1
Effect of pool size on the point estimate of infection and associated confidence interval when screening 4,000 insects with an infection rate of 2.0% (1/50)
 

View this table:
[in this window]
[in a new window]
 
TABLE 2
Effect of pool size on the point estimate of infection and associated confidence interval when screening 4,000 insects with an infection rate of 0.25% (1/400)
 

View this table:
[in this window]
[in a new window]
 
TABLE 3
Effect of pool size on the point estimate of infection and associated confidence interval when screening 4,000 insects with an infection rate of 0.10% (1/1,000)
 
Tables 1Go–3GoGo make some important points regarding the pool screening method. First of all, screening pools will result in a point estimate that is upwardly biased (i.e., somewhat larger than the actual value) for the reasons discussed above. However, this effect is generally negligible when infection rates are low. For example, as shown in Table 2Go, individually screening 4,000 insects in a population with a true infection rate of 1/400 will result exactly in the expected point estimate of 0.00250. However, dividing the population into 40 pools each containing 100 insects and screening the 40 pools will result in a point estimate of 0.00253. This estimate is just 1.4% above that obtained by examining all 4,000 insects individually.

Another point made by the calculations summarized in Tables 1Go–3GoGo is that the range between the endpoints of a 95% CI is generally much larger than the degree of bias introduced into the point estimator by the pool screening process. For example, in Table 2Go, the range between the upper and lower bounds of a 95% CI is roughly 4-fold (0.001, 0.004). The 95% CI is also fairly stable, increasing by just 9% when comparing the interval obtained from screening pools of 100 insects to the interval obtained from screening a pool size of 1 (i.e., screening each insect individually; Table 2Go). Thus, the majority of the error surrounding the point estimate of the infection rate is associated with the random variation in the sampling process itself and not with the process of pool screening. That is, most of this error is also present when one screens individual insects (by dissection for example) and needs to be considered when calculating infection rates obtained by this method as well.

Given that the point estimates derived from pool screening will be biased to some extent, questions immediately arise concerning how large the pools should be. As pointed out by Hepworth,8 if the pools are too large, it is likely that they will all test positive, leading to the estimate Formula = 1; if they are too small, we are likely to have a very large number of negative pools, and the costs associated with the testing will be unreasonable. The sensitivity and specificity of the assay will also impose limits on just how large the pools can be. The Bernoulli model assumes that the assay has perfect sensitivity for the size of the pool tested and that the specificity is also perfect; that is, there are never any false positives or false negatives. If this is not the case, the estimates produced will be biased upward by the presence of the false positives or downward by false negatives. In this regard, it is important to note that the specificity of the assay becomes particularly important when the number of true positives is likely to be very low.

Thus, the optimal pool size will be a function of both the statistical model and the chemistry of the assay used to screen the pools. For example, if one suspects that the infection rate is 1/400, the estimator of Thompson4 recommends pools of size 635. Chiang and Reeves3 suggest using a pool size of K = log(1/2)/log(1 p), which would give an equal chance of a positive or negative pool. For an infection rate of 1/400, this formula leads to a pool size of 277. Both of these are likely to be larger than can possibly be handled by the chemistry of most assays used in pool screening. For example, most PCR-based assays have been shown to be capable of handling a maximum of 100 insects in a pool while retaining an acceptable level of sensitivity and specificity.6,1015,21 Similarly, the currently commercially available antibody-based tests may be used with a maximum pool size of 50 mosquitoes.2226 When taken together, these findings suggest that the biochemical properties of the assay are more likely to impose limits on the optimal pool size than are the constraints imposed by the statistical analysis of the results. Therefore, when one is searching for a rare event (i.e., examining a population in which the prevalence of positive insects is low), it is probably best to use the largest pool size deemed possible by the chemical limitations of the assay. Furthermore, because the pool screening models all assume a sensitivity and specificity of 100%, it is imperative that the sensitivity and specificity of the assay be rigorously evaluated on synthetic pools containing different numbers of insects to ensure that the assay is performing optimally at any given pool size before that pool size is chosen for screening unknown samples.

How many insects do I need to screen to get an accurate estimate of the prevalence of infection? Before beginning the process of estimating the occurrence rate of a rare event, it is instructive to consider exactly what you might expect while collecting specimens. Suppose that the proportion of the population being sampled that exhibit the characteristic of interest is p, 0 < p < 1 (i.e., the proportion of positive insects is between 0% and 100%). If we consider the random variable X, where X = number of specimens collected before the first one with the characteristic (i.e., a positive) is found, it is well known that X has a geometric distribution. That is,


Formula

From this it is easily shown that the probability that X is less than or equal to any particular value, say z, is equal to


Formula

This formula may be used to give estimates of how probable it would be that we will observe a specimen with the characteristic of interest (i.e., a positive insect) when we have examined z specimens. Tables 4Go–6GoGo summarize the probability of detecting a positive insect at different prevalences of infection when one tests varying numbers of insects in pools of varying sizes. For example, if the event rate is 1/50, screening as few as 40 individual insects yields a better than 50% chance of finding a positive insect (Table 4Go). When the true prevalence is 1/400, screening 10 pools of size 25 yields a 47% chance of finding a positive pool, and when the pool size is 50, the chance is 71% after screening 10 pools (Tables 5Go and 6Go). Tables 4Go–6GoGo can be used to help plan the sampling process with respect to whether to screen individual insects or to screen pools, and if screening pools, how large should the pools be. Tables like these are easily calculated from equation Pr(X ≤ z) = 1 – (1 – p)Kz, where K is the pool size and z is the number of pools.


View this table:
[in this window]
[in a new window]
 
TABLE 4
Probability of detecting an infected insect when screening different numbers of individual insects collected from populations with infection rates varying from 1/1,000 to 1/5
 

View this table:
[in this window]
[in a new window]
 
TABLE 5
Probability of detecting an infected insect when screening different numbers of pools of size 25 collected from populations with infection rates varying from 1/1,000 to 1/10
 

View this table:
[in this window]
[in a new window]
 
TABLE 6
Probability of detecting an infected insect when screening different numbers of pools of size 50 collected from populations with infection rates varying from 1/10,000 to 1/100
 
In general, it can be shown that the precision of the point estimator increases as the number of pools tested increases for any fixed pool size. For example, Basanez and others27 have estimated that, when the infection rate in black flies is less than 0.2%, 6,000–13,000 individual parous flies will need to be screened to predict the true prevalence of infection with a precision of between 0.2 and 0.3. Thus, when the infection rate is low, it is important to examine a large number of specimens (at least several times that of the inverse of the expected infection rate) to produce meaningful data concerning infection rate. Of course, the ability to efficiently examine large numbers of individual insects is the reason for doing pool screening in the first place.

How should pools be constructed? As noted above, the sampling models on which all of the pool screening models are based assume that one is collecting a random sample from an essentially infinite population. Thus, it is important to try to ensure that when one is devising a sampling strategy that one attempts to obtain a sample from the overall population that is as random as possible. For example, this means that traps (and therefore collections) should be distributed as randomly as possible throughout the study area to ensure that the insects collected are as representative as possible of the overall population.

A second assumption inherent in the pool screen model is that each infection in the insect population is independent of all others (i.e., that infected insects are distributed randomly throughout the overall insect population). However, it is known that this is often not the case, and temporal and spatial aggregation of infected insects is often observed.27 One way to correct for aggregation will be to combine collections from all traps into a single population and to create pools from the combined population. This approach will produce the most accurate estimates of the overall infection prevalence in the vector population. However, spatial and temporal data may be lost when using this approach.

Finally, some mention should be made of the fact that sampling schemes are often devised to specifically target infected insects. For example, traps are often set in areas where evidence exists for ongoing transmission, or sampling strategies are used that specifically attract a sub-population of insects that are likely to be infected. These sampling strategies are therefore most effective in documenting ongoing transmission but may lead to an upwardly biased estimate of the actual prevalence of infection in the vector population.

What is the meaning of a CI in a pool screen calculation? CIs are often reported and interpreted as a measure of precision of a point estimate that has been calculated from the data. This view is inaccurate, so it is important to understand exactly what a CI is and what it says. First of all, the endpoints of a CI form a bivariate random variable that is a function of the original observations. The distribution of this bivariate random variable depends on the distribution of the original observations and the functions used to define the values. Usually this is very complex. The values computed for endpoints from the observed data are only one of possibly an infinite number of values that can result from the experiment depending on the distribution of the original data and the sample actually observed. A CI is said to "cover" the true value of the parameter of interest if the interval contains that value. Thus, for any CI derived from the experimental observations, you can never be absolutely sure that the CI actually contains the true infection rate. When we speak of a 95% CI, we mean that if we repeated the sampling process (i.e., the experiment) a large number of times (infinitely many) and calculated the interval, the proportion of times that our computed CI would cover the true value is 0.95. To put this another way, if an investigator plans to use an algorithm that produces a 95% CI, he knows before he gathers his data that the odds are 19:1 that the algorithm, when applied to his data, will produce an interval that covers the true value. The investigator does not know, however, if the value he actually calculates with his or her data actually covers the true value or not, because there is still a 1/20 chance that the true value will fall outside of the calculated interval.

Many CI procedures have coverage probabilities that vary over the range of the parameter of interest (in this case p, the rate of the event). The CI is called exact if the smallest of these coverage probabilities is at the desired level. Hence, the CI calculated will often be too conservative (i.e., have a coverage > 95%) but will never yield an interval that is too short. Clopper-Pearson intervals were included in a widely used computer program to calculate prevalence of infection from equal size pools (Pool screen v1.0).6 Barker7 obtained equivalent intervals for unequal pool sizes. These CIs are exact. However, they tend to be quite conservative for small values of p (i.e., when the infection rate is low). It should be noted that this dependence of the coverage probability of the CI for a parameter on the parameter is a feature in discrete distributions.

One final observation needs to be made here as well. The failure of a specific CI to contain a pre-specified value of the parameter, p, is not the same as a statistical test of a hypothesis in general. Some CIs are found by "unwinding" a test; however, this is not always the case. Thus, the CIs calculated by the methods described above should not be used as a stand-alone statistical method for testing hypothesis regarding the prevalence of infection in a vector population (e.g., have we succeeded in lowering the transmission rate below a defined cut-off with our control program?). Tests for specific hypotheses can be developed without too much effort, particularly in the case of equal pool sizes. For example, it is easy to show (see Appendix at www.ajtmh.org) that a test of whether the observed prevalence of infection is statistically equal to or lower than a defined cut-off value (i.e., H0: p ≤ p0 versus Ha: p > p0), based on the observed number of positive pools, T, rejects H0 whenever T > t{alpha} t{alpha} is the critical point for a level {alpha} test. To clarify, recall that the usual test procedure for a statistical test (e.g., a t test) involves the calculation of some test statistic. An appropriate table is then consulted to obtain the "critical value," and the null hypothesis is rejected at the predetermined level (e.g., {alpha} = 0.05) if the test statistic exceeds the critical value. The value {alpha} represents what statisticians call the level of type I error; that is, if the test were conducted a large number of times, the type I error represents the proportion of times we would incorrectly reject the null hypothesis when it is true. For the pool screen model the value of the critical point depends on the pool size (K), the number of pools (M), the value of p0, and the choice of {alpha} and can be calculated from the formula


Formula

This test is the uniformly most powerful test for this hypothesis.28 Because of the dependence of the critical values on M, K, and p0, it is impossible to calculate a table of critical values a priori. Although this formula looks very complex, the computations can easily be carried out using the probability calculator included in almost any statistical package.


APPENDIX
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS AND DISCUSSION
 APPENDIX
 REFERENCES
 
Mathematical justification for sub-sampling. Suppose that the investigator has collected a group of experimental units of size N by some method. Assume that this is a sample from an effectively infinite population where the probability that an individual experimental unit is positive is denoted by p. Suppose that we now draw a sample of size K from this group for testing. In the original collection, suppose that M are positive, the conditional probability that the new sample will contain X = x infected units given M in the group is given by,


Formula

Thus, the conditional probability that the sub-sample of size K will contain no infected experimental units is given by,


Formula

From this conditional probability we obtain the unconditional probability that X = 0 by using the fact that M is a random variable with Binomial (N,p) distribution. Thus, the unconditional probability is given by


Formula

Note that when M = NK + l,l = 1,2,L,K, P(X = 0|M) = 0. Hence, the unconditional distribution is given by


Formula

However, from the binomial theorem we have that


Formula

so the unconditional probability that the sub-sample contains no positive experimental units is P(X = 0) = (1 – p)K. This establishes that the model used for the Pool screen programs is appropriate in the case of sub-sampling as well.

Statistical properties of the pool screen estimator. The estimator based on testing pools gives rise to the MLE Formula = 1 –(1 – T/M)1/K where K is the pool size, M is the number of pools tested, and T is the number of positive pools observed. If this expression is expanded in a Taylor expansion about the E(T) and the expectation taken, we see that the dominant term in the expansion for the bias is given by the expression


Formula

The next term in the expansion has multiplier 1/M2 and successive terms are in terms of increasing powers of 1/M. Some simple hand calculations show that this first term accounts for most of the bias shown in Table 1Go in the body of the paper. Some comments are in order concerning the size of the bias. The quantity {[1 – (1 – p)K]/[(1 – p)K]} is easily shown to be a strictly monotone increasing function of K for any fixed p. Similarly, it is strictly monotone increasing as a function of p, 0 < p < 1 for any fixed K. Clearly, it is possible for the bias to become arbitrarily large as K increases for any fixed p and as p approaches 1 for any fixed K. On the other hand, because of the restrictions placed on the size of K by the chemistry of the assay, there is a practical upper bound on the size of K. Similarly, if the infection potential is moderate to large (say > 1 in 50), one would not do pool screening. In either case, however, the bias can be made acceptably small by increasing the number of pools, M. A similar approach shows that the leading term in the MSE is given by


Formula

This also grows under the same circumstances discussed above and the same restrictions apply. Thus, the MSE also becomes small as M becomes large. Note that this formula for the MSE is equal to the asymptotic variance of Formula as calculated from Fisher’s information. Thus, both the bias and the MSE are negligible when the number of pools (M) grows large for any pool size (K).

Testing a simple hypothesis. A test of the simple hypothesis H0: p ≤ p0 versus Ha: p > p0 can be constructed when the pool sizes are equal using the statistic T, the number of positive pools observed. Under the null hypothesis, T has the distribution,


Formula

It is easily shown that the family of distributions {g(t|p),0 < p < 1} has monotone likelihood ratio and that the statistic T is sufficient for p. By the Karlin-Rubin theorem,28 the test that rejects when T > tcrit is uniformly most powerful. Because the distribution of T is discrete, tcrit is defined to be the minimum for t isin {0,1, . . . ,m} such that PPo (T > t) ≤ {alpha}. Critical values can be calculated using the following formula:


Formula


Received August 19, 2005. Accepted for publication January 13, 2006.

Acknowledgments: The authors thank Drs. Naomi Lang-Unnasch and Eddie W. Cupp for critically reading the manuscript. We also thank the two anonymous reviewers of an earlier version of this manuscript for constructive comments. In particular, we thank the reviewer who pointed out a major conceptual error that we have corrected. We also acknowledge the role of the Onchocerciasis Control Program in the Americas and the former Onchocerciasis Control Programme in West Africa in supporting our work.

* Address correspondence to Thomas R. Unnasch, University of Alabama at Birmingham, Division of Geographic Medicine, BBRB 203, 1530 3rd Avenue South, Birmingham, AL 35294-2170. E-mail: trunnasch{at}geomed.dom.uab.edu Back

Authors’ addresses: Charles R. Katholi, PhD, University of Alabama in Birmingham, Department of Biostatistics, Ryals 317, 1665 University Blvd., Birmingham, AL 35294-0022, E-mail: ckatholi{at}uab.edu. Thomas R. Unnasch, University of Alabama at Birmingham, Division of Geographic Medicine, BBRB 203, 1530 3rd Avenue South, Birmingham, AL 35294-2170, E-mail: trunnasch{at}geomed.dom.uab.edu.


REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS AND DISCUSSION
 APPENDIX
 REFERENCES
 

  1. Nasci RS, Mitchell CJ, 1996. Arbovirus titer variation in field-collected mosquitoes. J Am Mosq Contr Assoc 12: 167–171.[ISI][Medline]
  2. Gu W, Lampman R, Novak RJ, 2003. Problems in estimating mosquito infection rates using minimum infection rate. J Med Entomol 40: 595–596.[Medline]
  3. Chiang CL, Reeves WC, 1962. Statistical estimation of virus infection rates in mosquito vector populations. Am J Hyg 75: 377–391.[Medline]
  4. Thompson KH, 1962. Estimation of the proportion of vectors in a natural population of insects. Biometrics 18: 568–578.
  5. Burrows PM, 1987. Improved estimation of pathogen transmission rates by group testing. Phytopathology 77: 363–365.
  6. Katholi CR, Toé L, Merriweather A, Unnasch TR, 1995. Determining the prevalence of Onchocerca volvulus infection in vector populations by PCR screening of pools of black flies. J Infect Dis 172: 1414–1417.[ISI][Medline]
  7. Barker JT, 2000. Statistical estimators of infection potential based on PCR pool screening with unequal pool sizes. PhD thesis, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL.
  8. Hepworth G, 1996. Exact confidence intervals for proportions estimated by group testing. Biometrics 52: 1134–1146.
  9. Rodríguez-Pérez MA, Danis-Lozano R, Rodríguez MH, Unnasch TR, Bradley JE, 1999. Detection of Onchocerca volvulus infection in Simulium ochraceum sensu lato: Comparison of a PCR assay and fly dissection in a mexican hypoendemic community. Parasitol 119: 613–619.
  10. Yamèogo L, Toè L, Hougard J-M, Boatin BA, Unnasch TR, 1999. Pool screen PCR for estimating the prevalence of Onchocerca volvulus infection in simulium damnosum sensu lato: Results of a field trial in an area subject to successful vector control. Am J Trop Med Hyg 60: 124–128.[Abstract]
  11. Williams SA, Laney SJ, Bierwert LA, Saunders LJ, Boakye DA, Fischer P, Goodman D, Helmy H, Hoti SL, Vasuki V, Lammie PJ, Plichart C, Ramzy RM, Ottesen EA, 2002. Development and standardization of a rapid, PCR-based method for the detection of Wuchereria bancrofti in mosquitoes, for xeno-monitoring the human prevalence of Bancroftian filariasis. Ann Trop Med Parasitol 96: S41–S46.
  12. Goodman DS, Orelus JN, Roberts JM, Lammie PJ, Streit TG, 2003. PCR and mosquito dissection as tools to monitor filarial infection levels following mass treatment. Filaria J 2: 11.[Medline]
  13. Guevara AG, Vieira JC, Lilley BG, López A, Vieira N, Rumbea J, Collins R, Unnasch TR, 2003. Entomolological evaluation by pool screen polymerase chain reaction of Onchocerca volvulus transmission in Ecuador following mass Mectizan distribution. Am J Trop Med Hyg 68: 222–227.[Abstract/Free Full Text]
  14. Helmy H, Fischer P, Farid HA, Bradley MH, Ramzy RM, 2004. Test strip detection of Wuchereria bancrofti amplified DNA in wild-caught Culex pipiens and estimation of infection rate by a pool screen algorithm. Trop Med Int Health 9: 158–163.[Medline]
  15. Vasuki V, Hoti SL, Sadanandane C, Jambulingam P, 2003. A simple and rapid DNA extraction method for the detection of Wuchereria bancrofti infection in the vector mosquito, Culex quinquefasciatus by Ssp 1 PCR assay. Act Trop 86: 109–114.
  16. Armstrong P, Borovsky D, Shope RE, Morris CD, Mitchell CJ, Karabatsos N, Komar N, Spielman A, 1995. Sensitive and specific colorimetric dot assay to detect eastern equine encephalomyelitis viral RNA in mosquitoes (diptera: Culicidae) after polymerase chain reaction amplification. J Med Entomol 32: 42–52.[ISI][Medline]
  17. Lanciotti RS, Kerst AJ, Nasci RS, Godsey MS, Mitchell CJ, Savage HM, Komar N, Panella NA, Allen BC, Volpe KE, Davis BS, Roehrig JT, 2000. Rapid detection of West Nile virus from human clinical specimens, field-collected mosquitoes, and avian samples by a Taqman reverse transcriptase-PCR assay. J Clin Microbiol 38: 4066–4071.[Abstract/Free Full Text]
  18. Hadfield TL, Turell M, Dempsey MP, David J, Park EJ, 2001. Detection of West Nile virus in mosquitoes by RT-PCR. Mol Cell Probes 15: 147–150.[ISI][Medline]
  19. White DJ, Kramer LD, Backenson PB, Lukacik G, Johnson G, Oliver J, Howard JJ, Means RG, Eidson M, Gotham I, Kulasekera V, Campbell S, 2001. Mosquito surveillance and polymerase chain reaction detection of West Nile virus, New York state. Emerg Infect Dis 7: 643.[ISI][Medline]
  20. Blyth CR, Still HA, 1983. Binomial confidence intervals. J Am Stat Assoc 78: 108–116.
  21. Rodríguez-Pérez MA, Lilley BG, Domínguez-Vázquez A, Segura-Arenas R, Lizarazo-Ortega C, Mendoza-Herrera A, Reyes-Villanueva F, Unnasch TR, 2004. Polymerase chain reaction monitoring of transmission of Onchocerca volvulus in two endemic states in Mexico. Am J Trop Med Hyg 70: 38–45.[Abstract/Free Full Text]
  22. Ryan JR, Dave K, Collins KM, Hochberg L, Sattabongkot J, Coleman RE, Dunton RF, Bangs MJ, Mbogo CM, Cooper RD, Schoeler GB, Rubio-Palis Y, Magris M, Romer LI, Padilla N, Quakyi IA, Bigoga J, Leke RG, Akinpelu O, Evans B, Walsey M, Patterson P, Wirtz RA, Chan AS, 2002. Extensive multiple test centre evaluation of the Vectest malaria antigen panel assay. Med Vet Entomol 16: 321–327.[ISI][Medline]
  23. Nasci RS, Gottfried KL, Burkhalter KL, Ryan JR, Emmerich E, Dave K, 2003. Sensitivity of the Vectest antigen assay for Eastern Equine Encephalitis and Western Equine Encephalitis viruses. J Am Mosq Cont Assoc 19: 440–444.[Medline]
  24. Chiles RE, Green EN, Fang Y, Goddard L, Roth A, Reisen WK, Scott TW, 2004. Blinded laboratory comparison of the in situ enzyme immunoassay, the Vectest wicking assay, and a reverse transcription-polymerase chain reaction assay to detect mosquitoes infected with West Nile and St. Louis encephalitis viruses. J Med Entomol 41: 539–544.[Medline]
  25. Santos-Ciminera PD, Achee NL, Quinnan GV Jr, Roberts DR, 2004. Use of polymerase chain reaction technique to confirm Vectest screening results in Plasmodium falciparum and Plasmodium vivax vk 210 laboratory-infected Anopheles stephensi mosquitoes. J Am Mosq Cont Assoc 20: 265–271.[Medline]
  26. Anonymous, 2005. Ramp West Nile virus test package insert. Burnaby, British Columbia, Canada: Response Biomedical Corporation.
  27. Basanez MG, Rodriguez-Perez MA, Reyes-Villanueva F, Collins RC, Rodriguez MH, 1998. Determination of sample sizes for the estimation of Onchocerca volvulus (filarioidea: Onchocercidae) infection rates in biting populations of Simulium ochraceum s.L. (diptera: Simuliidae) and its application to ivermectin control programs. J Med Entomol 35: 745–757.[Medline]
  28. Casella G, Berger RL, 2002. Statistical Inference. Duxbury, Thomson Learning Group.



This article has been cited by other articles:


Home page
Am J Trop Med HygHome page
P. FISCHER, S. M. ERICKSON, K. FISCHER, J. F. FUCHS, R. U. RAO, B. M. CHRISTENSEN, and G. J. WEIL
PERSISTENCE OF BRUGIA MALAYI DNA IN VECTOR AND NON-VECTOR MOSQUITOES: IMPLICATIONS FOR XENOMONITORING AND TRANSMISSION MONITORING OF LYMPHATIC FILARIASIS
Am J Trop Med Hyg, March 1, 2007; 76(3): 502 - 507.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (6)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by KATHOLI, C. R.
Right arrow Articles by UNNASCH, T. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by KATHOLI, C. R.
Right arrow Articles by UNNASCH, T. R.
Related Collections
Right arrow Medical Entomology
Right arrow Modeling


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS