Servicios Personalizados
Revista
Articulo
Indicadores
- Citado por SciELO
- Accesos
Links relacionados
- Similares en SciELO
Compartir
Revista mexicana de ciencias agrícolas
versión impresa ISSN 2007-0934
Rev. Mex. Cienc. Agríc vol.15 no.1 Texcoco ene./feb. 2024 Epub 25-Abr-2024
https://doi.org/10.29312/remexca.v15i1.3618
Articles
Proposal to obtain the optimal sample size of pests with an excess of zeros
1Colegio de Postgraduados-Campus Veracruz. Carretera Xalapa-Veracruz km 88.5, Manlio F. Altamirano, Veracruz, México. CP. 91963.
2Colegio de Postgraduados-Campus Montecillo. Carretera México-Texcoco km 36.5, Montecillo, Texcoco, México. CP. 56230.
In sampling of pests with low densities, it is common to obtain a large number of zeros, which is difficult to manage since the Poisson and negative binomial probability distributions are not suitable for modeling and equations to estimate the optimal sample size are not available. In this study model the excess of zeros by estimating parameters through the methods of moments and maximum likelihood of the zero-inflated Poisson and zero-inflated negative binomial distributions, and to derive equations to calculate the optimal sample size. Systematic sampling was used to select 100 trees per grove of Río Red grapefruit (Citrus paradisi Macfad) at Finca Sayula, Veracruz, Mexico (latitude 19.20722, longitude -96.35194), from June to July 2021 and January 2022. The number of leafminers (Phyllocnistis citrella Stainton) and aphids (Toxoptera citricida Kirkaldy) present in three leaves per shoot per tree, considered as a sample unit, was counted. Simulations were performed in RStudio with different proportions of zero (0.1, 0.4, and 0.6) to compare the parameters obtained in the field using the methods of moments and maximum likelihood. Equations were derived to estimate the optimal sample size in studies of pests with low densities, based on the zero-inflated Poisson and zero-inflated negative binomial probability distributions. The method of moments yields optimal sample sizes smaller than those obtained by maximum likelihood, because they distinguish the origin from zero, so its use is recommended.
Keywords: sampling; zero-inflated negative binomial; zero-inflated Poisson
En muestreos de plagas con densidades bajas es común obtener gran cantidad de ceros, lo que es difícil de manejar, ya que las distribuciones de probabilidad Poisson y binomial negativa no son adecuadas para su modelación y no se dispone de ecuaciones para estimar el tamaño de muestra óptimo. En este estudio se modelo el exceso de ceros mediante la estimación de parámetros a través de los métodos de momentos y de máxima verosimilitud de las distribuciones Poisson cero inflado y binomial negativa cero inflado, y derivar ecuaciones para calcular el tamaño de muestra óptima. Se utilizó muestreo sistemático para seleccionar 100 árboles por huerto de toronja (Citrus paradisi Macfad) Río Red, en la Finca Sayula, Veracruz, México (latitud 19.20722, longitud -96.35194), de junio a julio 2021 y enero 2022. Se contó el número de minadores (Phyllocnistis citrella Stainton) y pulgones (Toxoptera citricida Kirkaldy) presentes en tres hojas por brote por árbol, consideradas como unidad muestral. Se realizaron simulaciones en RStudio con diferentes proporciones de cero (0.1, 0.4 y 0.6) para comparar los parámetros obtenidos en campo, mediante el método de los momentos y máxima verosimilitud. Se derivaron ecuaciones para estimar el tamaño de muestra óptimo en estudios de plagas con densidades bajas, a partir de las distribuciones de probabilidad Poisson cero inflado y binomial negativa cero inflado. El método de los momentos arroja tamaños de muestra óptimos menores a aquellos obtenidos mediante máxima verosimilitud, debido a que distinguen el origen del cero, por lo que se recomienda su uso.
Palabras claves: binomial negativa cero inflado; muestreo; Poisson cero inflado
Introduction
In the population dynamics of pest organisms, count data reflect the presence and abundance of species in a fixed period of time (Hashim et al., 2021). It is common for samples of pest populations to present values of zero in excess due to the complex interactions between biotic and abiotic components, to the inherent characteristics of pest species, to spatial-temporal dependencies, to unexplained environmental heterogeneity (Zou et al., 2021) and agroecological control techniques (Villanueva-Jimenez et al., 2017; García-González et al., 2018).
The study and monitoring of the periods in which pest organisms have excess zeros can be very useful since they allow carrying out preventive management of their populations and recognizing early stages of pest invasion for the application of preventive management methods, such as those offered by precision agriculture (Jankielsohn, 2017; Clay et al., 2018), as well as the use of combat tactics before pests cause damage to crops, which would prevent the abusive use of organic-synthetic pesticides, thus also reducing damage to the environment (Shannon et al., 2018; Talaviya et al., 2020).
The excess of zeros is a theoretical and practical problem that arises when the high frequency of zeros alters the probabilities expected by the discrete variable distributions of Poisson and negative binomial (Yesilova et al., 2010; Hashim et al., 2021; Haslett et al., 2022) and no attention has been paid to the mechanisms that explain the origin of zero despite its impact on the estimation of population parameters in species of pest organisms (Haslett et al., 2022).
For the study of pest populations in agroecosystems, it is proposed to analyze the excess of zeros from the proposals of (Mullahy, 1986; Lambert 1992); that is, recognize two possible origins of zero, distinguishing between structural zero (plants without susceptible shoots for the establishment of a pest) and non-structural zero (plants with susceptible shoots free of the pest and susceptible shoots plagued), model zero by its origin with binomial distributions (Lambert, 1992; Zou et al., 2021: Haslett et al., 2022) and depending on the observed value of counts greater than zero, study the effect of overdispersion (Hall, 2000; Cheung, 2002; Doyle, 2009).
In pest counts, the optimal sample size equations for the Poisson or negative binomial distribution are used on a recurring basis, but due to the excess of zeros, the estimated optimal sample sizes are so large as to be impractical (Southwood and Henderson, 2000); however, in integrated pest management, there are no equations that estimate the optimal sample size of zero-inflated distributions, nor proposals that consider the origin of zero.
Equations estimating the optimal sample size are proposed here (Karandinos, 1976), which are adjusted to zero-inflated distributions. The objectives of the present research were: model the excess of zeros, estimate the parameters using the methods of moments and maximum likelihood of the zero-inflated Poisson and zero-inflated negative binomial distributions, and derive equations to calculate the optimal sample size.
Materials and methods
For the estimation of the optimal sample size, the excess of zeros was modeled; the parameters were determined by the methods of moments and maximum likelihood of the zero-inflated Poisson and zero-inflated negative binomial distributions and the equations for calculating the sample size were derived.
Modeling excess zeros
To model the excess of zeros, the following stages were performed: i) the absence of plant tissue that allows the pest to be housed was included as a cause of extra-zeros. In this way, there were two origins: the ‘structural zero’, when there is no susceptible tissue in the plant that can be occupied by the pest and the ‘non-structural’ zero, when there is adequate tissue in the plant, but it is not inhabited by a pest.
With this definition, the frequency of structural zero was modeled using a binomial distribution (Mullahy, 1986). Where: X is the number of structural zeros present in a sample size n, therefore:
. Where:
Thus, the probability function of the random variable X or the number of structural zeros in the sample of size n is given by:
1). If
2). Where:
The Poisson distribution is used on a sample
3). Where: λ is the mean of the number of insects in the population, excluding structural zeros (ie., sample units without susceptible tissue are not considered).
With overdispersion, the negative binomial is used, where Y is the number of insects in a unit that is not a structural zero:
4). Where: λ is the mean of the number of insects in the population, excluding structural zeros; k is an overdispersion parameter and Γ(y) is the gamma mathematical function. In this way, estimates are not affected by excess zeros (structural zeros).
It can be noted that, under this scheme, the probability of a non-structural zero is given by:
if it is Poisson and
5). The mean of this distribution is
6). The mean of this distribution is
Parameter estimation
To obtain the parameters of the distributions i) zero-inflated Poisson; and
ii) zero-inflated negative binomial, the methods of moments and maximum
likelihood were used. a) For the zero-inflated Poisson distribution, the
moment estimators for
7). With
The maximum likelihood estimators for
8); b) for the zero-inflated negative binomial distribution, there are no
moment estimators for
9). If structural zeros are excluded, the
10). Where:
The maximum likelihood estimator for
11). Based on the above, it is proposed to use the moment estimators of the negative binomial distribution (Banik and Kibria, 2009), but excluding structural zeros from the equation, as an approximation to the moments of the zero-inflated negative binomial distribution.
Derivation of equations
To derive the equations of optimal sample size, the parameters obtained from
the models iii and iv were substituted in the equations of Karandinos (1976), related to the
coefficient of variation (CV), the fixed proportion of the mean (
Distribution | Optimal sample size*, based on: | ||
---|---|---|---|
Coefficient of variation | Proportion of the mean
|
Confidence Interval h | |
General |
|
|
|
Poisson |
|
|
|
Negative binomial |
|
|
|
Zero-inflated Poisson |
|
|
|
Zero-inflated negative binomial |
|
|
|
*= to obtain the optimal sample size, the values of
Field samplings vs simulations
Six systematic samplings (n= 100) were carried out in three Río Red grapefruit (Citrus paradisi Macfad) groves at Finca Sayula, SPR de RL de CV, Veracruz, Mexico (latitude 19.20722, longitude -96.35194). Sampling data were direct counts in small units (three leaves per shoot per tree), conducted during the months of June and July 2021 and January 2022.
Three of the samplings were carried out to detect the presence of the citrus leafminer Phyllocnistis citrella Stainton and three more to detect the presence of the citrus tristeza virus vector aphid Toxoptera citricida Kirkaldy. In addition, three samplings were simulated with zero-inflated Poisson and three samplings with zero-inflated negative binomial; both with n= 100, randomly generated numbers. The simulations were performed with RStudio using the programs rbinom (100, size = 1, prob = 0.1, 0.4, 0.6), rpois (100-x, 1.5), rnbinom (100, 1.5) and zeroinfl (x∼1 | 1, dist = ‘poisson’, ‘negbin’) of the vgam and pscl libraries.
For the six field samplings, three of P. citrella (Table 2) and three of T. citricida (Table 3), and for the six simulations (Table 4), the simulated and observed proportion of structural zeros, the non-structural zeros, the overdispersion parameter k, the probability of structural zero and the optimal sample size were estimated using the coefficient of variation equations, proportion of mean and half confidence interval (Table 1).
Sampling | Method | Probability distribution | Prsz / Prnsz | k | pe | CV | D
|
h |
---|---|---|---|---|---|---|---|---|
1 | log-lik mom | ZIP ZINB ZIP ZINB | 0.33/0.43 | 1.4e-5 1.29 | 0.67 0.67 0.629 0.33 | 81 69 70 75 | 81 69 70 75 | 51 51 - 351 |
2 | log-lik mom | ZIP ZINB ZIP ZINB | 0.27/0.45 | 1.9e-5 2.69 | 0.537 0.537 0.465 0.27 | 53 55 43 102 | 53 55 43 102 | 41 41 - 472 |
3 | log-lik mom | ZIP ZINB ZIP ZINB | 0.13/0.46 | 8.1e-6 1.35 | 0.543 0.543 0.499 0.13 | 54 34 47 42 | 54 34 47 42 | 148 148 50 1151 |
log-lik= log-likelihood; mom= moments; Prsz=
proportion of structural zeros; Prnsz= proportion
of non-structural zeros; k= overdispersion parameter;
pe= estimated probability of structural zero;
optimal sample size by CV= coefficient of variation; D
Sampling | Method | Probability distribution | Prsz/ Prnsz | k | pe | CV | D
|
h |
---|---|---|---|---|---|---|---|---|
1 | log-lik mom | ZIP ZINB ZIP ZINB | 0.33/0.64 | 181.8 0.02 | 0.97 0.97 0.987 0.33 | 1061 2994 2447 18 | 1061 2994 2447 18 | - 24686 - 1207 |
2 | log-lik mom | ZIP ZINB ZIP ZINB | 0.27/0.68 | 0.426 0.056 | 0.95 0.949 0.96 0.27 | 623 450 801 17 | 623 450 801 17 | 2266 3945 - 983 |
3 | log-lik mom | ZIP ZINB ZIP ZINB | 0.13/0.84 | 0.474 0.025 | 0.97 0.969 0.978 0.13 | 1050 779 1475 12.55 | 1050 779 1475 12.55 | 5738 8486 - 854 |
log-lik= log-likelihood; mom= moments; Prsz=
proportion of structural zeros; Prnsz= proportion
of non-structural zeros; k= overdispersion parameter;
pe= estimated probability of structural zero;
optimal sample size by: CV= coefficient of variation; D
Sampling | Method | Probability distribution | Prsz | k | pe | CV | D
|
h |
---|---|---|---|---|---|---|---|---|
ZIPS1 | log-lik | ZIP | 0.1 | 4.8e-5 | 0.089 | 19 | 19 | 29 |
ZIPS2 | log-lik | ZIP | 0.4 | 0.107 | 0.479 | 45 | 45 | 31 |
ZIPS3 | log-lik | ZIP | 0.6 | 1e-5 | 0.476 | 45 | 45 | 22 |
ZINBS1 | log-lik | ZINB | 0.1 | 2.221 | 0.005 | 39 | 39 | 664 |
ZINBS2 | log-lik | ZINB | 0.4 | 0.623 | 0.429 | 32 | 32 | 1268 |
ZINBS3 | log-lik | ZINB | 0.6 | 0.656 | 0.651 | 62 | 62 | 1935 |
ZIPS= zero-inflated Poisson simulations (1-3); ZINBS=
zero-inflated negative binomial simulations (1-3); log-lik=
log-likelihood; Prsz= proportion of structural
zeros; k= overdispersion parameter; pe= estimated
probability of structural zero; optimal sample size by: CV=
coefficient of variation; D
Results and discussion
Equations proposed for estimating the optimal sample size of pests with excess zeros
The equations proposed to estimate the optimal sample size of pests with excess zeros are detailed in the methodology (Table 1).
Optimal sample size
It was found that the optimal sample size calculated by the proportion of the
mean (
The optimal sample size of half the confidence interval (h) increased as the overdispersion parameter (k) increased, resulting in very large or difficult-to-estimate optimal sample sizes when pest populations have excess zeros (Tables 2, 3 and 4).
The estimation of the optimal sample size by log-likelihood of the parameter k of the samples of P. citrella (Table 2) indicated that the samples have zero-inflated Poisson distribution. The k estimated by the moment method of the zero-inflated negative binomial distribution, by excluding structural zeros, showed that non-structural zeros and positive integer values had overdispersion.
This result is consistent with that reported by Banik and Kibria (2009), who indicated that, by conditioning or eliminating the structural zeros of a population modeled with a zero-inflated Poisson distribution, it can also be modeled with a negative binomial distribution, provided that the data of the non-structural component present overdispersion.
The values of pe for the methods of moments and log-likelihood for zero-inflated Poisson were similar, therefore, both methods are efficient for the estimation of the parameters. The estimated sample sizes for P. citrella are smaller when estimated by moments than by log-likelihood, even when the number of structural zeros (Prsz) is greater; however, the difference between the two estimates is not very large (< 20 units).
The effect of overdispersion significantly affected the sample size estimated
by h; for P. citrella, the results
indicate that estimation by CV or by
In the samplings of T. citricida (Table 3), an insect with a high tendency to aggregation, the k values estimated by log-likelihood indicate populations with zero-inflated negative binomial distribution. The value of k by the method of the moments resulted in low values, which indicates that, when excluding the structural component, the few sample units found with pest presented low variation.
The result is interesting since populations with zero-inflated negative
binomial distribution present random distribution at the farm level, but the
few occupied trees had a high number of individuals, indicating aggregation,
in accordance with the biology of the insect. The exclusion of structural
zero, the frequency of non-structural zeros, and the reduction of variation
in counts with positive integer values resulted in sample sizes very small
for CV and
The optimal sample size of the zero-inflated negative binomial distribution, calculated by moments, is smaller because it distinguishes the different origins of zero. By considering only the non-structural zeros and the positive integer values for the estimation of the sample size, a difference was established with the parameters estimated by log-likelihood that does not distinguish the origin of zero. Therefore, the method of moments for zero-inflated Poisson and zero-inflated negative binomial allows estimating optimal sample sizes similar to or smaller than those estimated by maximum likelihood.
In the simulations (Table 4), it was observed that, as the number of structural zeros increased, the sample size increased in both distributions since, as the sample size was only estimated by the log-likelihood method, when simulating, the origin of zero is not distinguished. In addition, the estimated value of the overdispersion parameter k is consistent with the values obtained in the field.
For zero-inflated Poisson, very small k values were obtained due to the proximity of the mean and variance values, while for the simulations of the zero-inflated negative binomial, the overdispersion parameter was greater than zero, indicating overdispersion, similar to that reported by Zou et al. (2021); Haslett et al. (2022).
Conclusions
The zero-inflated Poisson and zero-inflated negative binomial probability distributions allow modeling populations of pest organisms with low densities and excess zeros. The parameters obtained by the moment method distinguish the origin of zero and estimate optimal sample sizes equivalent to or less than those estimated by log-likelihood, which does not distinguish the origin of zero. A zero-inflated Poisson population can also be modeled with a negative binomial distribution, provided that the non-structural component is overdispersed.
The estimation of the optimal sample size in pest populations with excess zeros
can be performed equivalently with the coefficient of variation (CV) equation
and the mean proportion (
Bibliografía
Banik, S. and Kibria, B. M. G. 2009. On some discrete distributions and their applications with real life data. USA. JMASM. 8(2):423-447. https://doi.org/10.22237/jmasm/1257034020 . [ Links ]
Cheung, Y. B. 2002. Zero inflated models for regression analysis of count data: a study of growth and development. USA. Statist. Med. 21(10):1461-1469. https://doi.org/10.1002/sim.1088. [ Links ]
Clay, S. A.; French, B. W. and Mathew, F. M. 2018. Pest measurement and management. In: precision agriculture basics. Shanon, D. K.; Clay, D.E. and Kitchen N. R. (eds.). Ed. ASA, CSSA, and SSSA Books. USA. 93-102 pp. https://doi.org/10.2134/precisionagbasics.2016.0090 . [ Links ]
Doyle, S. R. 2009. Examples of computing power for zero-inflated and over dispersed count data. USA. JMASM. 8(2):360-376. https://doi.org/10.22237/jmasm/1257033720 . [ Links ]
Fang, R.; Wagner, B. D.; Harris, J. K. and Fillon, S. A. 2016. Zero inflated negative binomial mixed models: and important application to two microbial organisms important in oesophagitis. UK. Epidemiol. Infect. 144(1):2447-2455. http://doi.org/10.1017/S0950268816000662. [ Links ]
García-González, J. C.; López-Collado, J.; García-García, C. G.; Villanueva-Jiménez, J. A. y Nava-Tablada, M. E. 2018. Factores bióticos, abióticos y agronómicos que afectan las poblaciones de adultos de mosca pinta (Hemiptera: Cercopidae) en cultivos de caña de azúcar en Veracruz, México. México. Acta Zool. Mex. 33(3):508-517. https://doi.org/10.21829/azm.2017.3331152. [ Links ]
Hall, D. B. 2000. Zero inflated Poisson and binomial regression with random effects: a case study. USA. Biometrics. 56(1):1030-1039. https://doi.org/10.1111/j.0006-341x.2000.01030.x. [ Links ]
Hashim, L. H.; Hashim, K. H. and Shiker, M. A. K. 2021. An application comparison of two Poisson models on zero count data. UK. journal of physics: conference series, 1818(012165):1-12. http://doi:10.1088/1742-6596/1818/1/012165. [ Links ]
Haslett, J.; Parnel, A. C.; Hinde, J. and de Andrade, M. R., 2022. Modelling excess of zeros in count data: a new perspective on modelling approaches. USA. International statistical review. 90(2):216-236. https://doi.org/10.1111/insr.12479. [ Links ]
Hilbe, J. M. 2011. Negative binomial regression. Cambridge University Press. 2a Ed. UK. 346-399 pp. [ Links ]
Jankielsohn, A. 2017. The redesign of suitable agricultural crop ecosystems by increasing natural ecosystem services provided by insects. Hong Kong SAR China. Advances in ecological and environmental research. 1(1):365-381. http://www.ss-pub.org/wp-content/uploads/2017/09/AEER2017040501-1.pdf. [ Links ]
Karandinos, M. G. 1976. Optimum sample size and comments on one published formula. USA. Bull. Entomol. Soc. Amer. 22(4):417-421. https://doi.org/10.1093/besa/22.4.417 . [ Links ]
Lambert, D. 1992. Zero inflated Poisson regression, with an application to defects manufacturing. USA. Technometrics. 34(1):1-14. https://doi.org/10.2307/1269547. [ Links ]
Mullahy, J. 1986. Specification and testing of some modified count data models. Netherlands. J. Econ. 33(1):341-365. https://doi.org/10.1016/0304-4076(86)90002-3 . [ Links ]
Ramírez, I. C.; Barrera, C. J. y Correa, J. C. 2013. Efecto del tamaño de muestra y el número de réplicas bootstrap. Colombia. Inycompe. 15(1):93-101. https://www.redalyc.org/articulo.oa?id=291329165008. [ Links ]
Shannon, D. K.; Clay, D. E. and Sudduth, K. A. 2018. And introduction to precision agriculture. In: precision agriculture basics . Shanon, D. K.; Clay, D.E. and Kitchen N. R. (eds.). Ed. ASA, CSSA, and SSSA Books. USA. 1-12 pp. https://doi.org/10.2134/precisionagbasics.2016.0084. [ Links ]
Southwood, T. R. E. and Henderson, P. A. 2000. Ecological methods. Blackwell science. 3rd Ed. Oxford, UK. 7-66 pp. https://www.researchgate.net/publication/260051655-Ecological-Methods-3rd-edition. [ Links ]
Taherdoost, H. 2016. Sampling methods in research methodology, how to choose a sampling technique for research. Brazil. IJARM. 5(2):18-27. http://dx.doi.org/10.2139/ssrn.3205035 . [ Links ]
Talaviya, T.; Shah, D.; Patel, N.; Yagnik, H. and Shah, M. 2020. Implementation of artificial intelligence in agriculture for optimization of irrigation and application of pesticides and herbicides. China. Artificial Intelligence in Agric. 4(1):58-73. https://doi.org/10.1016/j.aiia.2020.04.002. [ Links ]
Villanueva-Jiménez, J. A.; Reyes-Pérez, N. y Abato-Zárate, M. 2017. Manejo integrado de plagas y sostenibilidad. In: agricultura sostenible como base para los agronegocios. Jarquín, G. R. y Huerta, P. A. (coords.). 1a Ed. Universidad Autónoma de San Luis Potosí. México. 32-42 pp. https://www.researchgate.net/publication/320779257-Manejo-Integrado-de-Plagas-y-Sostenibilidad . [ Links ]
Yesilova, A.; Kaydan, M. B. and Kaya, Y. 2010. Modeling insect-egg data with excess zero using zero-inflated regression models. Hacettepe J. Math. Stat. 39(2):273-282. http://www.hjms.hacettepe.edu.tr/uploads/c879f14e-8c0d-4f30-8bfa-e28658 a8fe0b.pdf. [ Links ]
Zou, Y.; Hanning, J. and Young, D. S. 2021. Generalized fiducial inference on the mean of zero inflated Poisson and Poisson hurdle models. Germany. J Statistical Distributions and Applications. 8(5):1-15. https://doi.org/10.1186/s40488-021-00117-0. [ Links ]
Received: November 01, 2023; Accepted: January 01, 2024