1. Vaginal microbiota background


The vaginal microbiota is the community of microorganisms that reside in the vagina. For most women of reproductive age, the vaginal microbiota is dominated by one or more species of Lactobacillus.1,2 Lactobacilli have long been thought to promote vaginal health through the production of lactic acid, which acidifies the vagina and can inhibit pathogens.3,4 However, recent applications of DNA sequencing-based approaches to characterize the composition of the vaginal microbiota have revealed that different Lactobacillus species behave differently in the vagina, and that they are not all strongly associated with health.1,2 In particular, L. crispatus and L. iners are two of the most common lactobacilli found in the vaginal microbiota, and they are the species this analysis will focus on. While L. crispatus-dominated vaginal microbial communities are associated with low (acidic) vaginal pH and vaginal health, L. iners-dominated communities are associated with higher (more neutral) pH and may predispose women to bacterial vaginosis and accompanying sexual and reproductive health risks.1,2 This is of public health concern because bacterial vaginosis is implicated in a range of adverse outcomes, incuding preterm birth, sexually transmitted infection acquisition, and gynecologic cancer.5-10

Studying the vaginal microbiota presents certain study design and data analysis challenges for a few reasons. First, the vaginal microbiota is a dynamic community whose composition changes over time in response to a number of different factors, including the menstrual cycle and menstruation.2,11-16 Additionally, the composition and dynamics of the vaginal microbiota can be vary widely between women.1,2 For these reasons, longitudinal studies that collect multiple vaginal samples from the same woman over time are best suited to study vaginal microbial dynamics, and the relationships between the vaginal microbiota and sexual and reproductive health outcomes. However, the applicability of ARMA models to vaginal microbiota data has not been reported. Given that some species are known to promote stable microbial communities and those species tend to persist in the vaginal microbiota, while other species are associated with more dynamic communities with many transient members,1,2 ARMA models are a promising candidate for describing vaginal microbtiota composition.

The goal of this analysis is to explore several applications of ARMA models to vaginal microbiota composition data from a longitudinal study. Compared to many other areas of health research, this field is still quite young and there is no well-agreed-upon or standard approach to data analysis. As such, now is an optimal time to explore a range of different analysis approaches now in order to identify methods that perform well.


2. Data exploration


This analysis will work with two vaginal microbiota composition time series that were generated as part of a larger study of the impact of tampon use on the vaginal microbiota (not yet published). This study collected biweekly vaginal swab samples from 22 women over the course of four menstrual cycles. Bacterial DNA was extracted from the swabs, and then all 16S rRNA genes in the samples were amplified by polymerase chain reaction (PCR) and sequenced.17,18 The 16S rRNA gene is present in all bacteria, and it has enough sequence dissimilarity between species that its sequence can be used to identify the bacterial species present in a sample. Moreover, because there is one copy of the 16S rRNA gene in every bacterium, the number of sequences from a species in a sample after PCR and sequencing is proportional to the number of bacteria of the species that were present in the original sample.

This analysis will use a series of L. crispatus sequence counts, and a series of L. iners sequence counts from a single subject of the study discussed above. Each series contains 32 observations that were collected twice-weekly over 16 weeks. Below is a plot of the L. crispatus and L. iners sequence counts for this subject through the course of the study.


At first glance, both series resemble white noise. Upon closer inspection, it looks like there may be a slight downward trend in L. crispatus counts over the 16 weeks. In these series, each consecutive block of four weeks corresponds to a single menstrual cycle with menstruation at the end of the cycle. Considering this, it looks like there may be cyclic patterns in L. crispatus and L. iners counts. For menstrual cycles 1, 2, and 3, L. crispatus count drops at the end of the cycle during menstruation. For menstrual cycles 1, 3, and 4, L. iners count rises sharply at the end of the cycle during menstruation. This suggests some periodic behavior in Lactobacillus species counts, so it might be of interest to fit SARMA models for these series.


3. Modeling Lactobacillus species counts with ARMA models


We will start by fitting ARMA models to the L. crispatus and L. iners count series, followed by SARMA models, and finally by linear regression models with ARMA errors. Given the exploratory nature of this analysis, AIC values will be used to identify ARMA specifications for the two series that might suit the data well.

L. crispatus count AICs
MA0 MA1 MA2 MA3 MA4 MA5
AR0 623.53 619.47 611.56 612.75 614.55 616.31
AR1 616.31 618.30 612.72 614.02 615.81 616.08
AR2 618.29 618.79 614.52 615.47 617.47 612.22
AR3 612.76 614.63 616.63 617.42 619.18 613.49
AR4 614.62 616.63 616.86 617.01 612.35 614.24

L. iners count AICs
MA0 MA1 MA2 MA3 MA4 MA5
AR0 633.62 634.73 636.60 636.69 636.27 637.95
AR1 634.94 636.69 636.89 636.88 638.01 638.16
AR2 636.19 638.15 636.93 637.76 639.75 640.14
AR3 638.06 638.68 637.82 638.40 640.37 637.87
AR4 638.54 639.76 639.70 640.36 641.75 638.39


An MA(2) model looks to be most suitable for the L. crispatus counts. AICs suggest the L. iners counts may be best modeled as white noise, which is not surprising based on the plot above. However, because the goal of this analysis is to explore whether time series models are viable options for describing vaginal microbiota composition, an MA(1) model of L. iners counts will be considered (it has the second lowest AIC). It should be noted that the L. crispatus AIC tables suggest that there may be numerical instability in maximizing and evaluating the likelihood function (AIC increases by more than two units when a single parameter is added to the ARMA(2,2) model to make the ARMA(3,2) model, and when a single parameter is added to the AR(4) model to make the ARMA(4,1) model). We should bear this in mind throughout the analysis, and take a conservative approach to interpretting AIC comparisons, especially when the AIC difference between models is small.


3.1 MA(2) model of L. crispatus counts

We will start by fitting a stationary, Gaussian MA(2) model of L. crispatus counts under the null hypothesis of no trend: \[\begin{eqnarray} Y_n=\epsilon_n+\psi_1\epsilon_{n-1}+\psi_2\epsilon_{n-2}\end{eqnarray}\] \[\begin{eqnarray} \epsilon_{1:N}&\sim&\mathrm{ iid }\, N[0,\sigma^2]\end{eqnarray}\]

## 
## Call:
## arima(x = crisp_count, order = c(0, 0, 2))
## 
## Coefficients:
##          ma1     ma2  intercept
##       0.6364  0.6168   9676.645
## s.e.  0.1680  0.1450   1150.957
## 
## sigma^2 estimated as 8780695:  log likelihood = -301.78,  aic = 611.56


The MA polynomial’s root’s are \(\begin{eqnarray} -0.52+1.16i \end{eqnarray}\) and \(\begin{eqnarray} -0.52-1.16i \end{eqnarray}\). This model is not invertible, which limits our ability to examine the model’s residuals to identify model misspecifications. However, the two roots have exactly the same value, with the exception that the imaginary part of the MA(1) root is positive whereas it is negative for the MA(2) root. This indicates that the second MA term may be redundant with the first, so we will fit a stationary, Gaussian MA(1) model instead: \[\begin{eqnarray} Y_n=\epsilon_n+\psi_1\epsilon_{n-1}\end{eqnarray}\] \[\begin{eqnarray} \epsilon_{1:N}&\sim&\mathrm{ iid }\, N[0,\sigma^2]\end{eqnarray}\]

## 
## Call:
## arima(x = crisp_count, order = c(0, 0, 1))
## 
## Coefficients:
##          ma1  intercept
##       0.3419  9745.5623
## s.e.  0.1228   827.1543
## 
## sigma^2 estimated as 12351640:  log likelihood = -306.74,  aic = 619.47


The MA polynomial’s root is -2.92, and this model is invertible. Below is an ACF plot of the residuals of the MA(1) model of the L.crispatus counts that we will use to diagnose any model misspecifications:

There is some oscillatory behavior about 0, suggesting that the L. crispatus counts may be better-modeled with an AR polynomial in addition to the MA polynomial. While the L. crispatus AIC table suggests that the best model including an AR polynomial would be ARMA(1,2), we’ve already seen that there are issues with including multiple MA terms. Moreover, the series is only 32 observations long so we should be concerned about the possibility of over-parameterizing and may want to opt for more parsimonious models whenever reasonable. We will fit a stationary, Gaussian ARMA(1,1) model: \[\begin{eqnarray} Y_n=\phi_1Y_{n-1}+\epsilon_n+\psi_1\epsilon_{n-1}\end{eqnarray}\] \[\begin{eqnarray} \epsilon_{1:N}&\sim&\mathrm{ iid }\, N[0,\sigma^2]\end{eqnarray}\]

## 
## Call:
## arima(x = crisp_count, order = c(1, 0, 1))
## 
## Coefficients:
##          ar1      ma1  intercept
##       0.5188  -0.0158   9708.698
## s.e.  0.2193   0.2113   1168.237
## 
## sigma^2 estimated as 11125882:  log likelihood = -305.15,  aic = 618.3


The AR root is 1.82, and the MA root is -34.68, so we have a causal, invertible model of L. crispatus counts. Additionally, this ARMA(1,1) model’s AIC is smaller than that of the MA(1) model of the same series, suggesting the ARMA(1,1) model is better able to predict the L. crispatus counts. We can also compare the models using a likelihood ratio test with the null hypothesis that the series is better-modeled as MA(1) than ARMA(1,1): \[\begin{eqnarray} LRT=-2logL_0-2logL_1\end{eqnarray}\] \[\begin{eqnarray} LRT=-2logL_{MA(1)}-2logL_{ARMA(1,1)}=613.48-610.3=3.18\end{eqnarray}\] \[\begin{eqnarray} LRT&\sim&\chi^2_1\end{eqnarray}\]

This test gives p=0.045. We can reject the null hypothesis and conclude that the ARMA(1,1) model is more suitbale for the L. crispatus counts than the MA(1) model.


3.2 MA(1) model of L. iners counts

We’ll take the same approach for modeling the L. iners counts as we did for the L. crispatus counts, starting with a stationary, Gaussian MA(1) model suggested by AIC: \[\begin{eqnarray} Y_n=\epsilon_n+\psi_1\epsilon_{n-1}\end{eqnarray}\] \[\begin{eqnarray} \epsilon_{1:N}&\sim&\mathrm{ iid }\, N[0,\sigma^2]\end{eqnarray}\]

## 
## Call:
## arima(x = iners_count, order = c(0, 0, 1))
## 
## Coefficients:
##          ma1   intercept
##       0.1874  10524.4251
## s.e.  0.1850    934.8927
## 
## sigma^2 estimated as 19950850:  log likelihood = -314.36,  aic = 634.73


The MA(1) polynomial’s root is -5.34, and the model is invertible. The sample ACF plot of this model’s residuals (below) shows no oscillatory behavior, and there are no significant correlations between residuals at any lag. From this analysis, it looks like an MA(1) model is a viable option for the L. iners counts.


4. Modeling Lactobacillus species counts with SARMA models


Recall the cyclic behavior in the L. crispatus and L. iners counts where L. iners count rapidly increased at the end of the menstrual cycle during menstruation, while L. crispatus count decreased. Examining a smoothed periodogram of each series and identifying the dominant frequency can provide insight into SARMA model specifications that might capture these cyclic patterns well:

## L. crispatus dominant frequency: 0.125 cycles per week

## L. iners dominant frequency: 0.0625 cycles per week


The dominant frequency for the L. crispatus count series corresponds to a cycle time of 4 weeks, which matches our observations and aligns perfectly with 4 week menstrual cycles. The analysis of the L. crispatus counts will move forward with a SARMA model that has 4-week cycles. The dominant frequency for the L. iners count series corresponds to a cycle time of 8 weeks. This doesn’t quite match our observations, but it does align with a whole number of menstural cycles. However, given that the series is only 16 weeks long, and that 4 week cycles in vaginal microbiota composition are more biologically plausible than 8 week cycles, the analysis of the L. iners counts will also move forward with a SARMA model that has 4-week cycles, as opposed to 8 week cycles.


4.1 SARMA models of L. crispatus counts


We will model the L. crispatus count series with a few stationary, Gaussian SARMA(1,1)x(P,Q)8 models with simple specifications for the cycle polynomial: SAR(1) and/or SMA(1). Recall that observations were taken biweekly, which is why the cyclic period here is 8. The models are given by: \[\begin{eqnarray} Y_n=\phi_1Y_{n-1}+\Phi_1Y_{n-8}+...+\Phi_PY_{n-8P}+\epsilon_n+\psi_1\epsilon_{n-1}+\Psi_1\epsilon_{n-8}+...+\Psi_Q\epsilon_{n-8Q}\end{eqnarray}\] \[\begin{eqnarray} \epsilon_{1:N}&\sim&\mathrm{ iid }\, N[0,\sigma^2]\end{eqnarray}\]

## 
## Call:
## arima(x = crisp_count, order = c(1, 0, 1), seasonal = list(order = c(1, 0, 0), 
##     period = 8))
## 
## Coefficients:
##          ar1      ma1    sar1  intercept
##       0.5993  -0.1516  0.3358   9773.440
## s.e.  0.2275   0.2492  0.1848   1531.313
## 
## sigma^2 estimated as 9840908:  log likelihood = -303.66,  aic = 617.32
## 
## Call:
## arima(x = crisp_count, order = c(1, 0, 1), seasonal = list(order = c(0, 0, 1), 
##     period = 8))
## 
## Coefficients:
##          ar1      ma1    sma1  intercept
##       0.5731  -0.0999  0.2340   9738.890
## s.e.  0.2233   0.2322  0.1678   1364.937
## 
## sigma^2 estimated as 10311485:  log likelihood = -304.16,  aic = 618.32
## 
## Call:
## arima(x = crisp_count, order = c(1, 0, 1), seasonal = list(order = c(1, 0, 1), 
##     period = 8))
## 
## Coefficients:
##          ar1      ma1    sar1    sma1  intercept
##       0.6230  -0.1709  0.8389  -0.578   9847.754
## s.e.  0.2244   0.2449  0.3707   0.558   1799.912
## 
## sigma^2 estimated as 9071065:  log likelihood = -303.06,  aic = 618.13


AIC prefers the SARMA(1,1)x(1,0)8 model over the two other SARMA models for L. crispatus counts, but the difference in AIC values between these models is modest. Given that the SARMA(1,1)x(1,0)8 and SARMA(1,1)x(0,1)8 models are not nested and cannot be compared via likelihood ratio test, and the numerical instability in evaluating the L. crispatus likelihood function, we should consider these models to have similar predictive skill. On the other hand, the SARMA(1,1)x(1,0)8 and SARMA(1,1)x(1,1)8 models can be compared via likelihood ratio test with the null hypothesis that the data are better-modeled by the SARMA(1,1)x(1,0)8 model. The test statistic is 1.2 (\(\begin{eqnarray} &\sim&\chi^2_1\end{eqnarray}\)), giving p=0.2, so the addition of the cyclic MA(1) polynomial does not significantly improve the SARMA(1,1)x(1,0)8 model.


This analysis suggests that SARMA(1,1)x(1,0)8 may be the most suitable SARMA model specification to describe the L. crispatus counts, but is this model any better than the ARMA(1,1) model? According to AIC, the SARMA(1,1)x(1,0)8 model has somewhat better predictive skill than the ARMA(1,1) model. Again, we should interpret this AIC difference with caution. We can compare the two models via likelihood ratio test with the null hypothesis that the the ARMA(1,1) performs better. The test statistic is 2.98 (\(\begin{eqnarray} &\sim&\chi^2_1\end{eqnarray}\)), giving p=0.052, just at the edge of significance. While it is biologically plausible that a SARMA model acocunting for cyclic behavior of the vaginal microbiota should suit our data better than an ARMA model, we did not lay out a corresponding a priori hypotheses before starting this analysis. As such, we should avoid interpreting this p value as “boarderline” or “nearly” or “suggestive of” significance. Instead, the p value shows that this test was unable to provide evidence in favor of the SARMA model over the ARMA model. We can speculate that this may be an artifact of the length of our series, but we should use that speculation in designing future studies so that repeat analyses are better-powered, and not to flimsily interpret the LRT result.


4.2 SARMA models of L. iners counts


Again, we will take the same approach for the L. iners series as we did for the L. crispatus series. We will fit a few stationary, Gaussian SARMA(0,1)x(P,Q)8 models with simple specifications for the cycle polynomial: \[\begin{eqnarray} Y_n=\Phi_1Y_{n-8}+...+\Phi_PY_{n-8P}+\epsilon_n+\psi_1\epsilon_{n-1}+\Psi_1\epsilon_{n-8}+...+\Psi_Q\epsilon_{n-8Q}\end{eqnarray}\] \[\begin{eqnarray} \epsilon_{1:N}&\sim&\mathrm{ iid }\, N[0,\sigma^2]\end{eqnarray}\]

## 
## Call:
## arima(x = iners_count, order = c(0, 0, 1), seasonal = list(order = c(1, 0, 0), 
##     period = 8))
## 
## Coefficients:
##          ma1     sar1   intercept
##       0.1802  -0.0290  10503.2221
## s.e.  0.1965   0.2243    923.2286
## 
## sigma^2 estimated as 19937807:  log likelihood = -314.36,  aic = 636.71
## 
## Call:
## arima(x = iners_count, order = c(0, 0, 1), seasonal = list(order = c(0, 0, 1), 
##     period = 8))
## 
## Coefficients:
##          ma1     sma1   intercept
##       0.1829  -0.0187  10511.0218
## s.e.  0.1919   0.1791    927.1007
## 
## sigma^2 estimated as 19943374:  log likelihood = -314.36,  aic = 636.72
## 
## Call:
## arima(x = iners_count, order = c(0, 0, 1), seasonal = list(order = c(1, 0, 1), 
##     period = 8))
## 
## Coefficients:
##          ma1     sar1    sma1  intercept
##       0.1567  -0.6078  0.5247  10480.227
## s.e.  0.2134   1.0223  1.0704    878.592
## 
## sigma^2 estimated as 19759876:  log likelihood = -314.26,  aic = 638.53


The SARMA(0,1)x(1,0)8 model and SARMA(0,1)x(0,1)8 model have the same log likelihood, and the same number of parmaeters, so these two SARMA specifications are equally suitbale for the L. iners count series. AIC prefers either the SARMA(0,1)x(1,0)8 model or SARMA(0,1)x(0,1)8 model to the SARMA(0,1)x(1,1)8 model. We can compare both the SARMA(0,1)x(1,0)8 model and the SARMA(0,1)x(0,1)8 model to the SARMA(0,1)x(1,1)8 model via LRT with the null hypothesis that the nested model is preferred. The test statistic is 0.81 (\(\begin{eqnarray} &\sim&\chi^2_1\end{eqnarray}\)), giving p=0.81, and we conclude that modeling the L. iners counts with cyclic AR(1) and MA(1) polynomials is not better than modeling this series with either the cyclic AR(1) or the cyclic MA(1) polynomial alone.


Of the SARMA models explored here, SARMA(0,1)x(1,0)8 or SARMA(0,1)x(0,1)8 may be suitable for describing the L. iners count series. But are these models any better than the MA(1) model? The MA(1) model is preferred by AIC, suggesting it has more predictive skill than either SARMA model. However, all three of these models have the same log likelihood, meaning that the addition of the cyclic polynomial (AR(1) or MA(1)) did not significantly improve the model.

Thus far in the analysis, it looks like the L. crispatus and L. iners count series are best-modeled by ARMA(1,1) and MA(1) models, respectively. It is a bit unsatisfying that the SARMA models did not perform better than the ARMA models, but it would be of interest to perform a similar analysis with longer series. We will examine one more set of models that may be able to account for the changes in L. crispatus and L. iners counts that occur during menstruation.


5. Modeling Lactobacillus species counts as trend with ARMA errors


The study that the Lactobacillus species count series were drawn from also collected data on whether the woman was menstruating when a vaginal swab was collected. This provides another option to account for L. crispatus and L. iners count changes that we see during menstruation in our models.


5.1 Linear regression models of L. crispatus counts with ARMA errors


The last set of models we will fit are linear regressions of species counts on time and/or menstruation, with ARMA errors. The model of L. crispatus counts is given as: \[\begin{eqnarray} Y_n=\alpha+Z_n\beta+\epsilon_n\end{eqnarray}\] \[\begin{eqnarray} \alpha=E[Y_n]\end{eqnarray}\]
\(\begin{eqnarray} Z_n \end{eqnarray}\)is a matrix of covariates, including time and/or menses

\[\begin{eqnarray} \epsilon_n=\phi\epsilon_{n-1}+\omega_n+\psi\omega_{n-1}\end{eqnarray}\] \[\begin{eqnarray} \omega_{1:N}&\sim&\mathrm{ iid }\, N[0,\sigma^2]\end{eqnarray}\]

## 
## Call:
## arima(x = crisp_count, order = c(1, 0, 1), xreg = week)
## 
## Coefficients:
##          ar1     ma1  intercept       week
##       0.3892  0.0189  12801.536  -373.2387
## s.e.  0.2500  0.2306   1828.424   191.6687
## 
## sigma^2 estimated as 10126314:  log likelihood = -303.59,  aic = 617.17
## 
## Call:
## arima(x = crisp_count, order = c(1, 0, 1), xreg = mens)
## 
## Coefficients:
##          ar1      ma1  intercept       mens
##       0.5521  -0.1379   10793.03  -4085.990
## s.e.  0.2179   0.2177    1017.46   1280.168
## 
## sigma^2 estimated as 8489067:  log likelihood = -300.79,  aic = 611.57
## 
## Call:
## arima(x = crisp_count, order = c(1, 0, 1), xreg = week_mens)
## 
## Coefficients:
##          ar1      ma1  intercept       week       mens
##       0.4224  -0.0935  13074.696  -284.9156  -3861.851
## s.e.  0.2590   0.2439   1521.737   161.6426   1254.159
## 
## sigma^2 estimated as 7867454:  log likelihood = -299.52,  aic = 611.04



According to AIC, the model with both time and menses has the best predictive skill for L. crispatus counts. The model with both time and menses can be compared to the model with only time via a likelihood ratio test with the null hypothesis that model with only time as a predictor performs better. The test statistic is 8.14 (\(\begin{eqnarray} &\sim&\chi^2_1\end{eqnarray}\)), giving p=0.002, and we can conclude that the addition of menses as a predictor significantly improves the model. We can also compare the model with both time and menses to the model with only menses via a likelihood ratio test with the null hypothesis that model with only menses as a predictor performs better. The test statistic is 2.54 (\(\begin{eqnarray} &\sim&\chi^2_1\end{eqnarray}\)), giving p=0.07, so the addition of time as a predictor does not significantly improve the menses-only model. Given the result of this LRT and that the AIC of the menses-only model is only slightly higher than the menses and time model (611.57 v 611.04, respectively), it is reasonable to move forward with the more parimonious menses-only model.


A compsrison of AIC values favors the menses-only linear regression model with ARMA errors to the ARMA(1,1) and SARMA(1,1)x(1,0)8 models of L. crispatus counts. This suggests that linear models accounting for factors known to influence vaginal microbiota composition and including ARMA errors may be well-suited to describe vaginal microbiota species count data. This finding is not surprising, given that a substantial body of research has demonatrated that changes to the vaginal environment influence the composition of the vaginal microbiota.1,2,19 While (S)ARMA models that use prior values and errors to predict future values might not perform poorly for modeling vaginal microbiota species count data, we wouldn’t expect these models to be the optimal choice for a system that depends on external factors. However, we also wouldn’t expect standard regression models to be optimal. Some species are known to promote stable microbial communities, and those species tend to persist in the vaginal microbiota.1,2 Other species are associated with more dynamic communities with many transient members.1,2 This lends itself well to modeling species counts with (S)ARMA models. So regression models accounting for factors known to influence vaginal microbiota composition as well as ARMA errors seem like a logical way to account for the vaginal microbiota’s dependence on it’s prior states as well as vaginal environment perturbations.


5.2 Linear regression models of L. iners counts with MA errors


Finally, we will fit a similar set of models for the L. iners counts. The models are given as: \[\begin{eqnarray} Y_n=\alpha+Z_n\beta+\epsilon_n\end{eqnarray}\] \[\begin{eqnarray} \alpha=E[Y_n]\end{eqnarray}\]
\(\begin{eqnarray} Z_n \end{eqnarray}\)is a matrix of covariates, including time and/or menses

\[\begin{eqnarray} \epsilon_n=\omega_n+\psi\omega_{n-1}\end{eqnarray}\] \[\begin{eqnarray} \omega_{1:N}&\sim&\mathrm{ iid }\, N[0,\sigma^2]\end{eqnarray}\]

## 
## Call:
## arima(x = iners_count, order = c(0, 0, 1), xreg = week)
## 
## Coefficients:
##          ma1  intercept      week
##       0.1852  11188.462  -80.6407
## s.e.  0.1845   1891.437  199.3641
## 
## sigma^2 estimated as 19850009:  log likelihood = -314.28,  aic = 636.57
## 
## Call:
## arima(x = iners_count, order = c(0, 0, 1), xreg = mens)
## 
## Coefficients:
##          ma1  intercept       mens
##       0.1865  10512.939    44.3618
## s.e.  0.1898   1067.276  1993.7058
## 
## sigma^2 estimated as 19950760:  log likelihood = -314.36,  aic = 636.73
## 
## Call:
## arima(x = iners_count, order = c(0, 0, 1), xreg = week_mens)
## 
## Coefficients:
##          ma1  intercept      week       mens
##       0.1808  11165.754  -84.3527   208.8452
## s.e.  0.1914   1897.674  202.0168  2023.8691
## 
## sigma^2 estimated as 19844481:  log likelihood = -314.28,  aic = 638.56


The models of L. iners counts with either time or menses as predictors are both favored by AIC over the model with both time and menses. The time-only and time and menses models have the same log likelihood, so the addition of menses to the model does not improve the model. The LRT comparing the menses-only and time and menses model gives a test statistic of 0.16 (\(\begin{eqnarray} &\sim&\chi^2_1\end{eqnarray}\)), and p=0.92. Again, the addition of the second predictor does not improve the model. The time-only and menses-only models have similar AIC values (636.57 and 636.73, respectively), so it is hard to comment on which may describe the L. iners counts better.

Comparison of AIC values of the MA(1), SARMA(0,1)x(0,1)8, and linear regression models with MA errors favors the MA(1) model of L. iners counts. Unlike the L. crispatus counts, it looks like the simplest model we fit, the MA(1) model, is best-suited to describe the L. iners series. For reasons discussed above, this is rather surprising.


6. Conclusions


This analysis explored several applications of ARMA models to time series of vaginal microbiota composition data with the goal of assessing whether these time series models might be useful for describing and analyzing vaginal microbiota composition. In general, we can conclude that ARMA models are a viable option for describing vaginal microbiota composition. More specifically, after fitting a range of ARMA models, SARMA models, and linear regression models with ARMA errors to two series of Lactobacillus species counts, we have reached different conclusions for the different species. L. crispatus counts were best-modeled by linear regression with ARMA errors that accounts for menstruation. As discussed above, this model corresponds well with current knowledge of vaginal microbiota dynamics. For future studies, this model presents a nice opportunity for exploring a range of different factors that might influence vaginal microbiota composition: as long as data on such factors are collected along with vaginal microbiota samples, they can be incorporated into the linear regression model.

On the other hand, L. iners counts were best-modeled by an MA(1) model that did not account for menstruation. This is surprising given not only the current knowledge of vaginal microbiota dynamics, but also the cyclic behavior of L. iners counts observed in our inital exploration of the data. It is possible that the SARMA models and linear regression models with ARMA errors including menses as a predictor did not perform better due to the sharp decrease in L. iners count seen during menstruation of the second menstrual cycle (week 8). This change was in the opposite direction of the that of cycles 1, 3, and 4, and it diluted the series-long effect of menstruation on L. iners counts. As such, it would be of interest to explore similar models using a longer time series that spans more menstrual cycles and would have more power to detect menstruation-associated changes in species counts.

In addition to the short length of the series used, this analysis was also limited by the fact that it only explored the vaginal microbiota composition of a single woman (constrained by the size of the original study and missing data for other women in that study). As discussed in the introduction, vaginal microbiota composition and dynamics vary widely between different women.1,2 While we can draw conclusions about the application of ARMA models to describe a single woman’s vaginal microbiota composition, we cannot generalize our findings to other women without first replicating them in a larger analysis that includes data from many women. Considering the limitations of this analysis (really, the limitations of the original study’s dataset), future studies should span a larger number of menstrual cycles, enroll a larger number of women, and incorporate design features that minimize missing data.


7. References

  1. Ravel, J., et al. (2011). “Vaginal microbiome of reproductive-age women.” Proc Natl Acad Sci U S A 108(Supplement 1): 4680-4687.

  2. Gajer, P., et al. (2012). “Temporal Dynamics of the Human Vaginal Microbiota.” Sci. Transl. Med. 4(132): 132ra152-132ra152.

  3. Hickey, R. J., et al. (2012). “Understanding vaginal microbiome complexity from an ecological perspective.” Transl Res 160(4): 267-282.

  4. Borges, S., et al. (2014). “The role of lactobacilli and probiotics in maintaining vaginal health.” Arch Gynecol Obstet 289(3): 479-489.

  5. Brotman, R. M. (2011). “Vaginal microbiome and sexually transmitted infections: an epidemiologic perspective.” J. Clin. Invest. 121(12): 4610-4617.

  6. van Oostrum, N., et al. (2013). “Risks associated with bacterial vaginosis in infertility patients: a systematic review and meta-analysis.” Human Reproduction.

  7. Sharma, H., et al. (2014). “Microbiota and Pelvic Inflammatory Disease.” Seminars in Reproductive Medicine 32(1): 43-49.

  8. Nelson, D. B., et al. (2016). “The role of the bacterial microbiota on reproductive and pregnancy health.” Anaerobe 42: 67-73.

  9. Champer, M., et al. (2017). “The role of the vaginal microbiome in gynaecological cancer.” BJOG.

  10. van de Wijgert, J. H. H. M. and V. Jespers (2017). “The global health impact of vaginal dysbiosis.” Res. Microbiol.

  11. Lopes dos Santos Santiago, G., et al. (2011). “Longitudinal Study of the Dynamics of Vaginal Microflora during Two Consecutive Menstrual Cycles.” PLoS One 6(11): e28180.

  12. Hickey, R. J., et al. (2013). “Effects of tampons and menses on the composition and diversity of vaginal microbial communities over time.” BJOG: An International Journal of Obstetrics & Gynaecology 120(6): 695-706.

  13. Chaban, B., et al. (2014). “Characterization of the vaginal microbiota of healthy Canadian women through the menstrual cycle.” Microbiome 2(1): 1-12.

  14. Priestley, C. J., et al. (1997). “What is normal vaginal flora?” Genitourinary Medicine 73(1): 23-28.

  15. Eschenbach, D. A., et al. (2000). “Influence of the Normal Menstrual Cycle on Vaginal Tissue, Discharge, and Microflora.” Clinical Infectious Diseases 30(6): 901-907.

  16. Srinivasan, S., et al. (2010). “Temporal Variability of Human Vaginal Bacteria and Relationship with Bacterial Vaginosis.” PLoS One 5(4): e10197.

  17. Kozich, J. J., et al. (2013). “Development of a Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data on the MiSeq Illumina Sequencing Platform.” Applied and Environmental Microbiology 79(17): 5112-5120.

  18. Seekatz, A. M., et al. (2015). “Fecal Microbiota Transplantation Eliminates Clostridium difficile in a Murine Model of Relapsing Disease.” Infect. Immun. 83(10): 3838-3846.

  19. Bautista, C. T., et al. (2016). “Bacterial vaginosis: a synthesis of the literature on etiology, prevalence, risk factors, and relationship with chlamydia and gonorrhea infections.” Military Medical Research 3: 4.