The vaginal microbiota is the community of microorganisms that reside in the vagina. For most women of reproductive age, the vaginal microbiota is dominated by one or more species of Lactobacillus.1,2 Lactobacilli have long been thought to promote vaginal health through the production of lactic acid, which acidifies the vagina and can inhibit pathogens.3,4 However, recent applications of DNA sequencing-based approaches to characterize the composition of the vaginal microbiota have revealed that different Lactobacillus species behave differently in the vagina, and that they are not all strongly associated with health.1,2 In particular, L. crispatus and L. iners are two of the most common lactobacilli found in the vaginal microbiota, and they are the species this analysis will focus on. While L. crispatus dominated vaginal microbial communities are associated with low (acidic) vaginal pH and vaginal health, L. iners dominated communities are associated with higher (more neutral) pH and may predispose women to bacterial vaginosis and accompanying sexual and reproductive health risks.1,2 This is of public health concern because bacterial vaginosis is implicated in a range of adverse outcomes, HIV and other sexually transmitted infection acquisition, pelvic inflammatory disease, infertility, adverse pregnancy outcomes, and gynecologic cancers.5-10 Studying the vaginal microbiota presents certain study design and data analysis challenges for a few reasons. First, the vaginal microbiota is a dynamic community whose composition changes over time in response to a number of different factors, including the menstrual cycle and menstruation.2,11-16 Additionally, the composition and dynamics of the vaginal microbiota can be vary widely between women.1,2 For these reasons, longitudinal studies that collect multiple vaginal samples from the same woman over time are best suited to study vaginal microbial dynamics, and the relationships between the vaginal microbiota and sexual and reproductive health outcomes. However, the applicability of ecological population POMP models to vaginal microbiota data has not been reported. The goal of this analysis is to explore variations of the Ricker discrete population model as a POMP model of L. crispatus bacterial counts from lognitudinal vaginal microbiota composition data. Compared to many other areas of health research, this field is still quite young and there is no well-agreed-upon or standard approach to data analysis. As such, now is an optimal time to explore a range of different analysis approaches in order to identify methods that perform well.
In this analysis, we will work with two vaginal microbiota composition time series that were generated as part of a larger study of the impact of tampon use on the vaginal microbiota (not yet published). This study collected biweekly vaginal swab samples from 22 women over the course of four menstrual cycles. Bacterial DNA was extracted from the swabs, and then all 16S rRNA genes in the samples were amplified by polymerase chain reaction (PCR) and sequenced.17,18 The 16S rRNA gene is present in all bacteria, and it has enough sequence dissimilarity between species that its sequence can be used to identify the bacterial species present in a sample. Moreover, because there is one copy of the 16S rRNA gene in every bacterium, the number of sequences from a species in a sample after PCR and sequencing is proportional to the number of bacteria of the species that were present in the original sample (assuming PCR and sequencing do not perform differentially between species). We will use a series of L. crispatus sequence counts and a series of L. iners sequence counts from a single subject of the study described above. Each series contains 32 observations that were collected twice-weekly over 16 weeks. The series of L. crispatus counts will be modeled, and the series of L. iners counts used as a covariate series. Both series will be transformed by a factor of 1/1000 in order to work with count data that are on a reasonable scale to simulate with the POMP models. Below is a plot of the L. crispatus and L. iners 1/1000 transformed sequence counts. In these series, each consecutive block of four weeks corresponds to a single menstrual cycle with menstruation at the end of the cycle. Considering this, it looks like there may be cyclic patterns in L. crispatus counts where, for menstrual cycles 1, 2, and 3, L. crispatus count drops at the end of the cycle during menstruation. It also looks like there may be some reciprocity in the two species’ counts: at several points, the L. iners count drops when the L. crispatus count rises, and vice versa. We will explore Ricker models incoporating both of these potential drivers of L. crispatus counts.
This model accounts for the fact that there are a finite number of niches that L. crispatus can occupy in the vagina. At each time point, the actual growth rate of the population is the potential maximum growth rate scaled by the proportion of the carrying capacity that is currently occupied (Nn / k). Ignoring process noise, when Nn < k, the exponent term will be positive and the population will grow. When Nn > k, the exponent term will be negative and the population will shrink. When Nn = k, the exponent term will equal 0 and the population will be in equilibrium. While 16S rRNA gene amplicon sequencing is a highly sensitive and specific method for measuring bacterial species counts, it is not perfect. It is reasonable to model the measured L. crispatus count as a draw from a normal distribution with mean Nn and standard deviation stdev: \[\begin{eqnarray} Y_n|N_n&\sim&\mathrm\, N[N_n,stdev]\end{eqnarray}\] With Xn being the unobserved latent process model governing L. crispatus counts, and Yn being the measurement model, the basic Ricker model can be written as a POMP model as follows: \[\begin{eqnarray} X_n=N_n\end{eqnarray}\] \[\begin{eqnarray} Y_n=Y_n\end{eqnarray}\]
The two states in this model are N and e. We will set N0 equal to 9.75, the mean observed L. crispatus count, and e0 = 0. We will assume that the variance of the process noise, \(\begin{eqnarray}\sigma^2\end{eqnarray}\), and the variance of the measurment model, stdev2, are fixed. Particle filtering the basic Ricker model over a range of possible \(\begin{eqnarray}\sigma\end{eqnarray}\) and stdev values using 1,000 particles suggests that \(\begin{eqnarray}\sigma\end{eqnarray}\) = 0.25 and stdev = 1 maximize the log likelihood of the model, so we will use these values throughout the analysis.
We will explore the two-dimensional likelihood surface of the basic Ricker model using serial particle filters (1,000 particles each) that incrementally step through a range of r and k value pairs. The following contour plot shows two peaks in the likelihood surface: one at k = 10 and r = 2, and the other at k = 10 and r = 0.2. This suggests that there may be identifiability issues for this model.
Simulating the data using the basic Ricker model with N0 = 9.75, e0 = 0, \(\begin{eqnarray}\sigma\end{eqnarray}\) = 0.25, stdev = 1, r = 2, and k = 10 appears to give mediocre approximations of the L. crispatus count series that are within a similar range as the data. However, we generally do not see decreased counts during menstruation around weeks 4, 8, 12, and 16 in the simulations. The simulations also seem to be more volatile than the data. Neither of these differences are surprising given that many factors can influence vaginal microbiota composition and constrain species counts, none of which are accounted for in the basic Ricker model.
Instead simulating with r = 0.2 also gives mediocre approximations of the data. These simulations do not stray as far outside of the range of the data, but on the other hand they do not resemble the data as well. Both of these simulations leave room for improvement, which we will expore in later models.
We will use IF2, an iterated filtering algorithm, to generate MLEs for the basic Ricker model parameters.20 We will specify a box in parameter space for IF2 to randomly draw staring parameter values from: 0.1 <u>< r <u>< 5 and 7 <u>< k <u>< 15. We will perform 300 filtering iterations with 60,000 particles each, a random walk standard deviation of 0.02, and geometric cooling with a cooling fraction of 0.5. IF2 diagnostic plots and results are below.
## Parameter set 1
## Number of iterations converging on: 85
## Average r: 2.02, Standard error: 4e-04
## Average k: 9.66, Standard error: 0.0028
## Average log likelihood: -84.36, Standard error: 0.0014
## Parameter set 2
## Number of iterations converging on: 15
## Average r: 0.19, Standard error: 8e-04
## Average k: 9.63, Standard error: 0.0098
## Average log likelihood: -83.87, Standard error: 0.0037
## t-test comparing log likelihoods: p = 3e-28
The effective sample size (eff.samplesize) of the last iteration is quite high for all time points except t=11, there are no filtering failures (nfail), and both r and k converge on one or two values early in the iterations. These features suggest that IF2 performed well and give us confidence that we have identified MLEs. The r parameter converging on two values is evidence that there may not be a unique MLE for r. It is difficult to tell from the log likelihood (loglik) convergence diagnostic plot whether either value of r is favored. Further investigation of the results shows that r = 0.19 is slightly favored over r = 2.02 according to log likelihoods (-83.87 versus -84.36), and a t-test shows that this difference in log likelihood is significant (p = 3e-28). We can conclude that parameter set 2 represents the MLEs for this basic Ricker model, that the vagina’s carrying capacity for L. crispatus is about 9,600 organisms (recall that the data are 1/1,000 transformed), and that the maximum potentail growth rate of the L. crispatus population is 0.19. In the next section of the analysis, we will attempt to improve the model and its MLEs by including the effect of menses.
As in the basic Ricker model, the actual growth rate of the population is the potential maximum growth rate scaled by the proportion of the carrying capacity that is currently occupied, but now the actual growth rate is reduced by p during menses. The measurement model will remain the same because it is reasonable to assume that menses should not impact 16S rRNA gene amplicon sequencing.
We will explore the 2D likelihood surfaces of the Ricker+menses model in the same was as we did for the basic Ricker model. The only difference is that there are three surfaces to explore for this model: the k x r surface, the k x p surface, and the the r x p surface. The k x r and k x p surfaces suggest that k = 10.25 yields near maximum log likelihood, and is in the neighborhood of the MLE. However, the r x p surface shows two peaks. One is around r = 2 and p = 0.8, which corresponds with the k x r and k x p surfaces. The second is around r = 0.3 and p = 0.2, which is not represented on the other two surfaces and may suggest an identifiability issue for r and p.
Simulating the data using the Ricker+menses model with N0 = 9.75, e0 = 0, \(\begin{eqnarray}\sigma\end{eqnarray}\) = 0.25, stdev = 1, r = 2, k = 10.25, and p = 0.8 gives somewhat better approximations of the L. crispatus count series than simulating with the basic Ricker model did. These simulations generally show decreased counts during menstruation, though not always on as large of a scale as the decreases in the data. Additionally, while some of these simulations are still more volatile than the data, they appear to be less volatile than the simulations from the basic Ricker model. Overall, these simulations suggest that the inclusion of the menses effect is an improvement over the basic Ricker model.
Simulating instead with r = 0.3 and p = 0.2 appears to further improve the Ricker+menses model. These simulations more closely resemble the data and generally capture the large peaks in L. crispauts count that occur in menstrual cycles 1 (week 2), 2 (week 6), and 3 (week 11). These simulations also appear to be less volatile than prior simulations, and do not stray outside of the range of the data as far.
We will use IF2 to generate MLEs for the Ricker+menses model as above, with the following parameter space box to randomly draw starting values from: 0.1 <u>< r <u>< 5, 7 <u>< k <u>< 15, and 0.1 <u>< p <u>< 5. IF2 diagnostic plots and results are below.
## Parameter set 1:
## Number of iterations converging on: 11
## Average r: 1.93, Standard error: 0.0028
## Average k: 10.24, Standard error: 0.0113
## Average p: 0.87, Standard error: 0.0013
## Average log likelihood: -78.75, Standard error: 0.0041
## Parameter set 2:
## Number of iterations converging on: 89
## Average r: 0.47, Standard error: 5e-04
## Average k: 11.56, Standard error: 0.0028
## Average p: 0.32, Standard error: 3e-04
## Average log likelihood: -77.07, Standard error: 0.0011
## Mann Whitney test comparing log likelihoods: p = 7.164109e-08
The effective sample size of the last iteration is quite high for all time points except t=11, filtering failures drop to 0 after a few iterations, and all parameters converge on two values early in the iterations. These features suggest that IF2 performed well and give us confidence that we have identified MLEs. Again, the fact that the IF2 iterations converged onto two parameter sets suggests that there may not be a single unique set of MLEs for r, k, and p. While parameter set 2 is only slightly favored over parameter set 1 according to log likelihoods (-77.07 versus -78.75), 89 of the 100 iterations converged on parameter set 2, and a Mann Whitney test shows that the difference in log likelihood is significant (p = 7e-8, Mann Whitney test was used because log likelihoods for parameter set 2 were non-normally distributed). The log likelihood of the MLE set for this model (parameter set 2) is greater than that from the basic Ricker model, which suggests that the inclusion of the menses effect improves the model. This is in accordance with our initial observations of the data, the simulations of these two models, and a priori knowledge that menses impacts vaginal microbiota composition. From this model, we can reasonably conclude that the vagina’s carrying capacity for L. crispauts is around 11,560 organisms (higher than suggested by the basic Ricker model), the maximum potentail growth rate of the L. crispatus population is around 0.47 (again, higher than suggested by the basic Ricker model), and that menses reduces the actual growth rate by 68% of the maximum potential growth rate. Next, we will explore whether accounting for the reciprocity between L. crispatus and L. iners counts can also improve the model.
This process model accounts for the fact that there are a finite number of niches in the vaginal microbiota, that L. crispatus and L. iners are capable of occupying the same niches, and thus that there is competition between the two species for those niches. This is one possible conceptualization of the reciprocity between L. crispatus and L. iners counts. Again, the measurement model will remain the same.
Exploring the likelihood surface of the Ricker+L. iners model as we did above again shows two peaks: one at r = 3 and k = 23, and the other at r = 0.3 and k = 20.25.
Simulating the data using the Ricker+L. iners model with N0 = 9.75, e0 = 0, \(\begin{eqnarray}\sigma\end{eqnarray}\) = 0.25, stdev = 1, r = 3, and k = 23 gives apparently worse approximations of the L. crispatus counts than simulating with the basic Ricker model did. These simulations are substantially more volatile and span a much wider ranger of count values than the data. They suggest that modeling the L. crispatus population as part of a combined L. crispatus and L. iners population does not improve the basic Ricker model.
Instead simulating with r = 0.3 and k = 20.25 gives apparently better approximations of the data that lie within a narrower range of count values, some of which capture the peaks of L. crispatus count that occurs in menstrual cycles 1 (week 2), 2 (week 6), and 3 (week 11).
We will use IF2 to generate MLEs for the Ricker+L. iners model as above, with the following parameter space box to randomly draw staring parameter values from: 0.1 <u>< r <u>< 5 and 20 <u>< k <u>< 30. IF2 diagnostic plots and results are below.
## Parameter set 1:
## Number of iterations converging on: 68
## Average r: 3.11, Standard error: 0.0012
## Average k: 22.87, Standard error: 0.0038
## Average log likelihood: -95.66, Standard error: 0.0021
## Parameter set 2:
## Number of iterations converging on smaller r: 32
## Average r: 0.30, Standard error: 7e-04
## Average k: 20.04, Standard error: 0.0080
## Average log likelihood: -84.09, Standard error: 0.0025
## t-test comparing log likelihoods: p = 8.185226e-195
Similar to the prior two models, the effective sample size of the last iteration is quite high for all time points, there were no filtering failures, and both parameters converge on two values early in the iterations. These features suggest that IF2 performed well and give us confidence that we identified MLEs. Unlike the prior models, the diagnostic plots for the Ricker+L. iners model suggest that parameter set 2 better fits the data than parameter set 1: the effective sample size and conditional log likelihood (cond.logLik) are generally higher for the iterations that converged to parameter set 1, and log likelihood for these iterations is significantly higher (-84.09 versus -95.66, p = 8e-195). The log likelihood for parameter set 2 is lower than that of the MLE set for the basic Ricker model (-83.87), suggesting that modeling L. crispauts counts as part of a combined L. crispatus and L. iners population does not improve the basic Ricker model. The next and final model will explore whether including the menses effect in the Ricker model and modeling L. crispauts counts as part of a combined L. crispatus and L. iners population can improve the basic Ricker and the Ricker+menses models.
Exploring the likelihood surfaces of the Ricker+menses+L. iners model suggests that there may be parameter identifiability issues for this model, as we saw with the Ricker+menses model. According to the k x r surface, log likelihood is maximized around k = 26.25 and r = 3.25. According to the k x p surface, log likelihood is maximized around k = 23 and p = 0.3. The r x p surface shows two log likelihood maxima: one around r = 0.5 and p = 0.2, and the other around r = 3 and p = 0.1.
Simulating the data using the Ricker+menses+L. iners model with N0 = 9.75, e0 = 0, \(\begin{eqnarray}\sigma\end{eqnarray}\) = 0.25, stdev = 1, r = 3.25, k = 26.25, and p = 0.1 gives poor approximations of the L. crispatus counts. Some of the simulations capture the decrease in L. crispatus counts that occur during menses, but they all cover a much wider range of count values than the data. On the other hand, simulating with r = 0.5, k = 23, and p = 0.3 approximates the data fairly well. These simulations capture the decrease in L. crispatus counts during menses and are much less jagged and volatile than prior simulations.
We will use IF2 to generate MLEs for the Ricker+menses+L. iners model as above, with the following parameter space box to randomly draw staring values from: 0.1 <u>< r <u>< 5, 20 <u>< k <u>< 30, and 0.01 <u>< p <u>< 3. IF2 diagnostic plots and results are below.
## Parameter set 1:
## Number of iterations converging on larger r: 71
## Average r: 3.10, Standard error: 0.0013
## Average k: 22.84, Standard error: 0.0041
## Average p: -0.03, Standard error: 0.0016
## Average log likelihood: -95.65, Standard error: 0.0024
## Parameter set 2:
## Number of iterations converging on smaller r: 29
## Average r: 0.44, Standard error: 9e-04
## Average k: 22.30, Standard error: 0.0117
## Average p: 0.18, Standard error: 5e-04
## Average log likelihood: -80.85, Standard error: 0.0019
## t-test comparing log likelihoods: p = 1.027435e-253
As before, the effective sample size of the last iteration is quite high for all time points, there were no filtering failures after the first few iterations, and all parameters converge on two values early in the iterations. These features suggest that IF2 performed well and give us confidence that we identified MLEs. The effective sample size and conditional log likelihood plots suggest that parameter set 2 better fits the data than parameter set 1. This is corroborated by the log likelihood for the two parameter sets: -80.85 versus -95.65 (p = 1e-253). The MLE set (parameter set 2) suggests that the vagina’s combined carring capacity for L. crispauts and L. iners is around 22,300 organisms, that the maximum potential growth rate of the L. crispatus population in this combined community is 0.44, and that menses reduces the actual growth rate by 41% of the maximum potential growth rate. The log likelihood for this MLE set is greater than that of the basic Ricker model MLE set (-83.87) but less than that of the Ricker+menses MLE set (-77.07), suggesting that including the menses effect in the Ricker model and modeling L. crispauts counts as part of a combined L. crispatus and L. iners population improves the basic Ricker model, but that including the menses effect and considering L. crispatus as an isolated population performs the best of these four models.
The goal of this analysis was to explore variations of the Ricker discrete population model as a POMP model of L. crispatus bacterial counts from lognitudinal vaginal microbiota composition data. Of the four models considered, the Ricker model accounting for the menses effect performed the best. This is not surprising given our observation that L. crispaus counts tend to decrease during menses, and a priori knowledge of the impact of menses on vaginal microbiota composition. More interestingly, the Ricker+menses model provides an estimate of the strength of the menses effect: menses appears to reduce the actual L. crispatus population growth rate by 68% of the maximum potential growth rate. Recent studies have found that L. crispatus dominated vaginal microbial communities are associated low vaginal pH and vaginal health,1,2 which suggests that L. crispatus performs well at providing the vagina with colonization resistance against other microbes, including pathogens. As such, a decrease in L. crispatus growth rate of this magnitude during menses may confer risk of acquiring bacterial vaginosis and/or vaginal infections during menses, which is supported by the original epidemiologic analysis of this study’s data (not yet published). It is important to note that we only applied these Ricker POMP models to a single woman’s vaginal microbiota data, and the menses effect likely differs between different women. However, this analysis has shown that these POMP models may be a valuable method for vaginal microbiota data analysis. They could easily be applied to vaginal microbiota data from a cohort of women to assess whether there is an overall trend in the menses effect on L. crisaptus population growth rate. It is somewhat dissapointing that including the menses effect in the Ricker model and modeling L. crispauts counts as part of a combined L. crispatus and L. iners population was not an improvement over the Ricker+menses model. In theory, this model should reflect the ecological processes of the vaginal microbiota better than the Ricker+menses model. It is likely that a more complicated and more effective model including the menses effect and modeling L. crispauts counts as part of a polymicrobial community can be specified, which would be of interest for future applications of POMP models to vaginal microbiota composition.
Ravel, J., et al. (2011). “Vaginal microbiome of reproductive-age women.” Proc Natl Acad Sci U S A 108(Supplement 1): 4680-4687.
Gajer, P., et al. (2012). “Temporal Dynamics of the Human Vaginal Microbiota.” Sci. Transl. Med. 4(132): 132ra152-132ra152.
Hickey, R. J., et al. (2012). “Understanding vaginal microbiome complexity from an ecological perspective.” Transl Res 160(4): 267-282.
Borges, S., et al. (2014). “The role of lactobacilli and probiotics in maintaining vaginal health.” Arch Gynecol Obstet 289(3): 479-489.
Brotman, R. M. (2011). “Vaginal microbiome and sexually transmitted infections: an epidemiologic perspective.” J. Clin. Invest. 121(12): 4610-4617.
van Oostrum, N., et al. (2013). “Risks associated with bacterial vaginosis in infertility patients: a systematic review and meta-analysis.” Human Reproduction.
Sharma, H., et al. (2014). “Microbiota and Pelvic Inflammatory Disease.” Seminars in Reproductive Medicine 32(1): 43-49.
Nelson, D. B., et al. (2016). “The role of the bacterial microbiota on reproductive and pregnancy health.” Anaerobe 42: 67-73.
Champer, M., et al. (2017). “The role of the vaginal microbiome in gynaecological cancer.” BJOG.
van de Wijgert, J. H. H. M. and V. Jespers (2017). “The global health impact of vaginal dysbiosis.” Res. Microbiol.
Lopes dos Santos Santiago, G., et al. (2011). “Longitudinal Study of the Dynamics of Vaginal Microflora during Two Consecutive Menstrual Cycles.” PLoS One 6(11): e28180.
Hickey, R. J., et al. (2013). “Effects of tampons and menses on the composition and diversity of vaginal microbial communities over time.” BJOG: An International Journal of Obstetrics & Gynaecology 120(6): 695-706.
Chaban, B., et al. (2014). “Characterization of the vaginal microbiota of healthy Canadian women through the menstrual cycle.” Microbiome 2(1): 1-12.
Priestley, C. J., et al. (1997). “What is normal vaginal flora?” Genitourinary Medicine 73(1): 23-28.
Eschenbach, D. A., et al. (2000). “Influence of the Normal Menstrual Cycle on Vaginal Tissue, Discharge, and Microflora.” Clinical Infectious Diseases 30(6): 901-907.
Srinivasan, S., et al. (2010). “Temporal Variability of Human Vaginal Bacteria and Relationship with Bacterial Vaginosis.” PLoS One 5(4): e10197.
Kozich, J. J., et al. (2013). “Development of a Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data on the MiSeq Illumina Sequencing Platform.” Applied and Environmental Microbiology 79(17): 5112-5120.
Seekatz, A. M., et al. (2015). “Fecal Microbiota Transplantation Eliminates Clostridium difficile in a Murine Model of Relapsing Disease.” Infect. Immun. 83(10): 3838-3846.
Ricker model (last updated 2018). https://en.wikipedia.org/wiki/Ricker_model
Ionides, E. L., et al. (2015). “Inference for dynamic and latent variable models via iterated, perturbed Bayes maps.” PNAS. 112(3): 719-724.