Introduction

Every year, names US parents choose for their children seem to get more unusual and more complex. The decision of what to name your baby has become so complex that some people choose to hire naming consultants to help them find unique names[1]. However, this phenomenon is mostly ancedotal and may be influenced by the number of celebrities choosing extremely unusual and complicated names for this children. In this project, we set out to explore how the complexity of US baby names has changed over time. Are names actually becoming more complex? Are we able to model how aspects of names change over time?

To answer these questions, we created three different time series, separated by gender, using data from the US Social Security Administration on first names given to babies born between 1880 and 2017[2]. Names given to fewer than five babies of the same gender in the same year are excluded; all others are included. As a result, we cannot observe trends in extremely unusual names; however, we can still observe general naming trends. The dataset gives us the number of babies given a particular name, divided by gender and year.

For each year and sex, we calculated three values aimed at capturing the complexity of names. These three were: the average length of a name, the average number of vowels in a name, and the average longest run of consonants in a name. These were weighted by how popular a name was. In other words, the denominator for these averages was the number of people born that year, not the number of unique names given. Popular names will be given more weight than unusual names. The first two values are fairly self-explanatory, but the third may benefit from an example. For each name, we calculated the longest run of consonants in a name. For example, the name “Isabella” has a longest consonant run of 2, from the “ll” part. The names “Ava” and “Astrid” would have a longest consonant run of 1 and 3, respectively.

We chose these questions because they seemed to address the question of “complexity” while being fairly easy to calculate and intuitive to explain. “Complexity” can mean very different things when it comes to names. We would generally consider very long names to be complex, but short names with a lot of consonants in a row might also fall under “complex names” especially if the repeated consonants make the name hard to pronounce. We initially got the idea to do average length and average vowel count from a data visualization we found[3] and came up with the idea for longest consonant run on our own.

We primarily focus on modeling these time series properly, with a slight focus on forecasting what these trends might look like in the next few years.

Exploratory Data Analysis

We start by exploring what these time series look like visually. First, we plot the average length of the names of all babies born 1880-2017, separated by sex.

The plots look fairly similar for both sexes. There appears to be a general increase in the average length in a name, peaking around the 1990s.

Next, we plot the average number of vowels in the names of all babies, again separated by sex:

Here, the overall shape of the time series looks very similar for male and female babies. There might be an overall slight increase from 1880 to 2017, but there also appears to be a decrease around the 1960s. It is also worth noting that this plot is markedly different from the previous one. You might have initially expected that the average number of vowels would look similar to the average length of a name, since longer names likely require more vowels. However, this is not the case.

Next, we plot the average longest consonant run in the names of all babies, again separated by sex:

For both sexes, there appears to be a dramatic decrease in the average longest consonant run starting in the 1990s. Prior to that, there appears to be a slight increase for female babies while the values for male babies stay fairly constant. The peak here appears to be around the same time as the peak for average length, which makes sense since shorter names also means shorter runs of consonants. However, the graphs are still fairly different.

Calculating these values and plotting them required some additional sources, including documentation for the stringr and ggplot2 packages and a couple chatGPT queries[4–7].

Modeling

Average Name Length

We will fit the model for both male and female together, as there is not a significant difference between them from the plot. Considering their non-linear trend, we first fit them with a quadratic trend to de-trend it.

The graph below shows the original female and male name data versus the fitted values (dotted line) from the quadratic polynomial:

## 
## Call:
## tslm(formula = ts_female ~ trend + I(trend^2))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.249907 -0.056198 -0.007336  0.064699  0.271726 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.353e+00  3.001e-02 178.346  < 2e-16 ***
## trend        1.079e-02  9.969e-04  10.825  < 2e-16 ***
## I(trend^2)  -3.322e-05  6.947e-06  -4.782 4.48e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1158 on 135 degrees of freedom
## Multiple R-squared:  0.8269, Adjusted R-squared:  0.8243 
## F-statistic: 322.5 on 2 and 135 DF,  p-value: < 2.2e-16
## 
## Call:
## tslm(formula = ts_male ~ trend + I(trend^2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.16828 -0.05264 -0.00833  0.05745  0.16053 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.546e+00  2.098e-02 264.385  < 2e-16 ***
## trend        2.907e-03  6.967e-04   4.172 5.37e-05 ***
## I(trend^2)  -1.583e-06  4.855e-06  -0.326    0.745    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08095 on 135 degrees of freedom
## Multiple R-squared:  0.6413, Adjusted R-squared:  0.636 
## F-statistic: 120.7 on 2 and 135 DF,  p-value: < 2.2e-16

Since the p‐value for the I(trend^2) term is below 0.05 in the female trend model, we retain the quadratic term. In contrast, for the male data, the p‐value for I(trend^2) exceeds 0.05, so we discard the quadratic term, keep only the linear term, and refit the model.

## 
## Call:
## tslm(formula = ts_male ~ trend)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.16601 -0.05021 -0.01047  0.05879  0.16044 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5.5508995  0.0138115  401.90   <2e-16 ***
## trend       0.0026868  0.0001724   15.58   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08068 on 136 degrees of freedom
## Multiple R-squared:  0.641,  Adjusted R-squared:  0.6384 
## F-statistic: 242.8 on 1 and 136 DF,  p-value: < 2.2e-16

Next, we will perform a spectral analysis on the residues of male and female.

## The peak frequency of female:  0.01388889

## The peak frequency of female using AIC:  0

## The peak frequency of male:  0.01388889

## The peak frequency of male using AIC:  0

The highest frequency in the smoothed female and male residuals are both 0.01389, indicating that there is period \(T=1/\omega\approx72\) years. However, we only have total record of 138 years, which means in such a super long cycle of 72 years, we can only see at most less than two complete ‘ups and downs’. Therefore, it is very difficult to reliably verify it.

And inspired by a previous project[8], when we use AIC to select the best estimators, the peak frequency are both 0 in male and female residuals, leading to an infinite period. Consequently, there are no periodic behavior in both male and female residuals.

We next fit \(ARMA(p,q)\) models to the residuals from each trend model—quadratic for female and linear for male. For each case, we consider \(p \in \{0,\dots,5\}\) and \(q \in \{0,\dots,5\}\), and then calculate the AIC of every fitted model. As shown in the class[9], we insert NA if certain \((p,q)\) pairs fail to converge or emit warnings. We use arima2[10] to help avoid optimization pitfalls.

Female Residuals (Quadratic Trend)

The polynomial is \[ Y_n \;=\;\alpha \;+\;\beta_1\,X_n \;+\;\beta_2\,X_n^2 \;+\;\varepsilon_n, \] where we estimated \(\alpha = 5.353,\; \beta_1 = 0.01079,\; \beta_2 = -3.322\times 10^{-5}\). Here, \(\varepsilon_n\) is modeled by a Gaussian \(ARMA(p,q)\). The AIC values from a grid of \((p,q)\) reveal which ARMA specification best fits the female residuals.

Male Residuals (Linear Trend)

In contrast, for the male series we use a simpler model, \[ Y_n \;=\;\alpha \;+\;\beta_1\,X_n \;+\;\varepsilon_n, \] with \(\alpha=5.5509\) and \(\beta_1=0.0026868\). Again, \(\varepsilon_n\) follows an \(ARMA(p,q)\). We search over the same grid of \(p,q\) and record AIC values, then select the \((p,q)\) combination yielding the minimum AIC for the male residuals.

Below, we present the AIC tables for both female and male residuals. From these, we pick the ARMA model (i.e., the \((p,q)\) pair) with the lowest AIC in each case, and proceed with further analysis or diagnostics accordingly.

## [1] 1 0
## <simpleError in arima2::arima(data, order = c(p, D, q)): non-stationary AR part from CSS>
## [1] 3 1
## <simpleError in arima2::arima(data, order = c(p, D, q)): non-stationary AR part from CSS>
## [1] 3 4
## <simpleError in arima2::arima(data, order = c(p, D, q)): non-stationary AR part from CSS>
## [1] 4 2
## <simpleError in arima2::arima(data, order = c(p, D, q)): non-stationary AR part from CSS>
## [1] 5 3
## <simpleError in arima2::arima(data, order = c(p, D, q)): non-stationary AR part from CSS>
## [1] 5 5
## <simpleWarning in arima2::arima(data, order = c(p, D, q)): possible convergence problem: optim gave code = 1>
## [1] 4 3
## <simpleWarning in arima2::arima(data, order = c(p, D, q)): possible convergence problem: optim gave code = 1>
## [1] 4 5
## <simpleWarning in arima2::arima(data, order = c(p, D, q)): possible convergence problem: optim gave code = 1>
## [1] 5 4
## <simpleWarning in arima2::arima(data, order = c(p, D, q)): possible convergence problem: optim gave code = 1>
AIC Table for Detrended Female Ave Length Run Data
MA0 MA1 MA2 MA3 MA4 MA5
AR0 -202.3654 -371.8514 -516.7669 -602.9879 -665.3339 -708.3743
AR1 NA -790.1621 -810.0762 -819.6741 -823.5416 -823.1413
AR2 -837.5079 -840.3016 -841.4369 -841.6460 -841.4593 -842.3246
AR3 -838.2594 NA -845.9431 -841.7014 NA -842.9953
AR4 -839.8502 -845.8986 NA -844.9930 -844.3773 -844.6008
AR5 -839.1964 -844.5868 -844.9854 NA -848.2574 NA
AIC Table for Detrended Male Ave Length Run Data
MA0 MA1 MA2 MA3 MA4 MA5
AR0 -301.1391 -471.6996 -588.2962 -665.5443 -721.4425 -750.9901
AR1 -790.0772 -823.1253 -823.6537 -838.1213 -837.0358 -836.4054
AR2 -836.0624 -846.1305 -850.9766 -848.9777 -854.2596 -852.2854
AR3 -839.1522 -851.7414 -850.6629 -848.3101 -841.1471 -852.8120
AR4 -847.2840 -846.1720 -849.5344 NA -853.0144 NA
AR5 -845.5073 -852.0398 -852.7397 -851.5648 NA -850.8473

We are going to avoid \(p+q\geq5\) situations (in case of overfitting), and thus take \(\text{ARMA}(2,2)\) model (AIC: -841.4369) for female residuals and \(\text{ARMA}(3,1)\) model (AIC: -851.7415) for male residuals from the above result.

After we check the plot of inverse roots, we find that for the female model, all the points are inside the boundary of the unit circle (although some points are close to the boundary, we still consider it within the circle), but for the male model, some of the roots are on the border of the unit circle. And this is also true for the \(\text{ARMA}(2,2)\) model.

Therefore, we finally decide to take \(\text{ARMA}(4,0)\) model (AIC: -847.2840) for the male residuals, since all the points are inside the boundary of the unit circle.

And the \(\text{ARMA}(2,2)\) model for female residuals, the \(\text{ARMA}(4,0)\) model for male residuals are both consistent with the auto.arima function from the forcast package[11]. Consequently, we decide to take these two models as our final models for the male and female residuals.

## Series: resid_female 
## ARIMA(2,0,2) with zero mean 
## 
## Coefficients:
##          ar1      ar2      ma1      ma2
##       1.9166  -0.9304  -0.3339  -0.1777
## s.e.  0.0457   0.0451   0.1031   0.1029
## 
## sigma^2 = 0.0001191:  log likelihood = 426.71
## AIC=-843.41   AICc=-842.96   BIC=-828.78
## 
## Training set error measures:
##                         ME       RMSE         MAE      MPE     MAPE      MASE
## Training set -0.0002391276 0.01075398 0.008409678 3.633532 29.66217 0.6163374
##                     ACF1
## Training set 0.002760282
## Series: resid_male 
## ARIMA(4,0,0) with zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3     ar4
##       1.3719  -0.3196  0.1884  -0.270
## s.e.  0.0825   0.1434  0.1430   0.083
## 
## sigma^2 = 0.000115:  log likelihood = 429.63
## AIC=-849.26   AICc=-848.81   BIC=-834.63
## 
## Training set error measures:
##                         ME       RMSE         MAE      MPE     MAPE      MASE
## Training set -0.0002973579 0.01056926 0.008024159 43.43996 79.35546 0.7858266
##                     ACF1
## Training set 0.006559035

Next, we plot the forecasted values for the next 20 years:

From the plot, we can see that both male and female curves show a long-run rising tendency, short-term fluctuations, and a modest upward forecast beyond the last observed year.

Finally, We will use the checkresiduals function to help us determine if we are overfitting the data or leaving out useful information is by checking the residuals.

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,0,2) with zero mean
## Q* = 4.6382, df = 6, p-value = 0.591
## 
## Model df: 4.   Total lags used: 10

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(4,0,0) with zero mean
## Q* = 11.902, df = 6, p-value = 0.0642
## 
## Model df: 4.   Total lags used: 10

Female Residuals from \(\text{ARMA}(2,2)\)

  • The residuals fluctuate around zero without any visible trend or large shifts over time. No obvious clustering of positive or negative errors.
  • Nearly all autocorrelations fall within the \(\pm2\) SE bounds (blue dashed lines), suggesting no strong leftover serial correlation.
  • The residual distribution appears roughly symmetric and bell‐shaped, indicating approximate normality.
  • Since p-value=0.591 in the Ljung-Box test[12], we fail to reject the null of “no autocorrelation,” implying the residuals are effectively white noise.

Therefore, the \(\text{ARMA}(2,2)\) model for the female series exhibits no major signs of leftover structure in the residuals, and they appear approximately normal. This indicates a good fit.

Male Residuals from \(\text{ARMA}(4,0)\)

  • Also mean‐reverting around zero, with no major pattern. Slightly more spiky at certain points, but nothing glaring.
  • Most lags are within the \(\pm2\) SE bounds, though a couple of marginal lags exist. Still, no glaring sign of strong correlation.
  • The residual distribution is near-normal, perhaps slightly skewed in the tails, but not too severe.
  • Since p-value=0.0642 in the Ljung-Box test, we fail to reject the null of “no autocorrelation,” implying the residuals are effectively white noise.

Therefore, the \(\text{ARMA}(4,0)\) fit for the male series also has residuals that mostly behave like white noise, although the Ljung–Box test’s p‐value is a bit borderline (0.0642). Overall, it suggests adequate fit.

Average Number of Vowels

Data Preparation and Quadratic Trend

For both females and males, the average vowel count fluctuates across the decades. We can see troughs around 1960s, and there is a trend of increasing after that. The quadratic trend lines capture a broad “U-shaped” curve, the line goes down and then up.

Spectral Analysis

## The peak frequency of female:  0.01388889

## The peak frequency of female using AIC:  0

## The peak frequency of male:  0.01388889

## The peak frequency of male using AIC:  0

The highest frequency in the smoothed female and male residuals are both 0.01389, indicating that there is period \(T=1/\omega\approx72\) years. However, we only have total record of 138 years, which means in such a super long cycle of 72 years, we can only see at most less than two complete ‘ups and downs’. Therefore, it is very difficult to reliably verify it.

And inspired by a previous project[8], when we use AIC to select the best estimators, the peak frequency are both 0 in male and female residuals, leading to an infinite period. Consequently, there are no periodic behavior in both male and female residuals.

ARMA Modeling of the Residuals

## [1] 2 2
## <simpleError in arima2::arima(data, order = c(p, D, q)): non-stationary AR part from CSS>
## [1] 3 3
## <simpleWarning in arima2::arima(data, order = c(p, D, q)): possible convergence problem: optim gave code = 1>
## [1] 5 3
## <simpleWarning in arima2::arima(data, order = c(p, D, q)): possible convergence problem: optim gave code = 1>
## [1] 5 2
## <simpleWarning in arima2::arima(data, order = c(p, D, q)): possible convergence problem: optim gave code = 1>
## [1] 5 3
## <simpleWarning in arima2::arima(data, order = c(p, D, q)): possible convergence problem: optim gave code = 1>
## [1] 5 4
## <simpleWarning in arima2::arima(data, order = c(p, D, q)): possible convergence problem: optim gave code = 1>
AIC Table for Detrended Female Vowel Run Data
MA0 MA1 MA2 MA3 MA4 MA5
AR0 -414.4337 -588.1207 -707.7614 -799.1073 -846.2668 -888.164
AR1 -948.3300 -981.6739 -985.0694 -994.3402 -998.6961 -1003.380
AR2 -994.0831 -1004.7086 NA -1005.7327 -1017.0563 -1002.682
AR3 -995.7844 -1013.7940 -1014.9779 NA -1015.6202 -1001.276
AR4 -1006.3510 -1002.7724 -1014.6662 -1015.6868 -1013.2826 -1015.474
AR5 -1004.7832 -1013.9299 -1014.7716 NA -1011.3572 -1013.394
AIC Table for Detrended Male Vowel Run Data
MA0 MA1 MA2 MA3 MA4 MA5
AR0 -434.1258 -614.3156 -744.3656 -850.9247 -905.7891 -958.0142
AR1 -989.0490 -1047.2033 -1068.4619 -1080.1783 -1082.1505 -1084.7668
AR2 -1094.8552 -1100.4863 -1098.7943 -1097.2726 -1094.9278 -1097.1212
AR3 -1099.1433 -1098.9816 -1102.7222 -1100.7354 -1098.0376 -1098.0724
AR4 -1098.0043 -1102.7931 -1100.8088 -1097.1156 -1100.9648 -1099.1804
AR5 -1096.2229 -1100.8132 NA NA NA -1093.1298

Based on the AIC tables, ARMA(2,4) model can be a good fit for female vowels, while the male series fits well with ARMA(2,1).

Each circle plot shows the inverse roots of the AR and MA components for the selected ARMA model. The AR roots and MA roots should lie outside the unit circle for a causual and invertible model.

Female (ARMA(2,4)): We see several points near or somewhat close to the boundary of the circle. This suggests the stationarity and invertibility of the model. Male (ARMA(2,1)): Fewer total points (because of lower MA order), and they appear to lie well within the unit circle, which indicating a non-invertible fit.

Forecasting the Future

Both female and male curves show an upward trend in the forecast region, suggesting that vowel usage might continue to increase. The magnitude of the increase is driven by the shape of the quadratic trend fit and the persistence of the ARMA residual patterns.

Average Longest Run of Consonants

We will fit a model for females and males separately. Looking at the graphs, the time series for the average longest consonant run for female names appears to have a non-linear trend. As a result, we start by fitting a quadratic polynomial to the female name data to de-trend it. The graph below shows the original female name data versus the fitted values (dotted line) from the cubic polynomial:

Next, we can plot the ACF of the residuals:

resid_female <- residuals(lm_trend_female)
acf(resid_female, main  = "Female: ACF of Residuals")

This appears to exhibit some seasonality. However, since our data is annual data and not monthly or quarterly, we thought fitting a SARMA model would be an poor choice. There is no reason for naming patterns to exhibit seasonality.

We can also perform a spectral analysis on the residues of female:

## The peak frequency of female:  0.02083333

## The peak frequency of female using AIC:  0

For female data, the smoothed approach found a mild multi-decade “peak”(\(T=1/\omega\approx48\) years) but the parametric AR approach[8] concluded the main power is still near frequency 0. This discrepancy often happens when the “peak” in the smoothed periodogram is subtle, possibly mixing with leftover trend or only covering 2-3 data cycles in the entire 138-year history. The AR method lumps that into a broad low-frequency band (peak at 0). Therefore, we consider the female data as no cycle.

Now, we fit an \(ARMA(p,q)\) model to the residuals created from the quadratic polynomial. To choose \(p\) and \(q\), we consider \(p \in (0,5)\) and \(q \in (0,5)\) and calculate the AIC for each resulting model. We use a modified version of the code shown in class[9] that inserts a NA value for any \(p,q\) combinations that produce errors or warnings. Note that we are using arima2[10] to help avoid optimization issues. Essentially, we are fitting:

\[ Y_n = \alpha + \beta_1 X_n + \beta_2 X_n^2 + \epsilon_n \] \(\alpha\), \(beta_1\), and \(\beta_2\) are obtained from the regression model and are 1.749, 0.004247, and -0.00002508 respectively. \(\epsilon_n\) is a Gaussian ARMA process. The table below displays AIC values for different \(p,q\) values for this ARMA model.

AIC Table for Detrended Female Max Consonant Run Data
MA0 MA1 MA2 MA3 MA4 MA5
AR0 -636.9663 -793.5698 -880.9617 -944.4379 -966.6744 -989.191
AR1 -1005.3556 -1026.3617 -1027.1444 -1028.6102 -1028.6497 -1034.389
AR2 -1032.6775 -1034.8926 -1042.6991 -1032.7871 -1031.3053 -1032.512
AR3 -1032.2641 -1042.9623 -1040.9765 -1038.8217 -1037.5865 -1030.522
AR4 -1034.1478 -1032.8214 -1039.5211 -1037.8585 NA -1028.567
AR5 -1032.7562 -1030.8427 -1037.8649 -1044.7639 NA NA

In this particular case, the NAs are generated by a convergence warning. As a result, we will avoid \(p, q \geq 4\). The table suggests that the best model is an \(ARMA(3,1)\) model. However, if we plot the inverse roots of this model, we notice that some of the roots are on the border of the unit circle:

The same is true for an \(ARMA(2,2)\) model. However, if we try an \(ARMA(2,1)\) model, which also had a fairly small AIC, we no longer have this issue:

Interestingly, an \(ARMA(2,1)\) model is also what the auto.arima model from the forecast package[11] suggests. Given that this model does not have the invertibility problems of models with larger \(p\) and \(q\), we will choose this \(ARMA(2,1)\) model fit to the residuals of a cubic polynomial as our final model.

Next, we plot the forecasted values for the next 10 years:

The 20 year forecast shows a dramatic decrease in the average longest consonant run of female baby names. This is likely because our model is unable to extrapolate well, which is not surprising.

The cubic fit to de-trend the data indicates a large drop past 2017. However, it is unlikely that this downward trend will continue for 20 years. We suspect that the average max consonant run might exhibit cyclical behavior beyond what we are able to observe with these 138 data points.

After discovering this, we attempted to use a LOESS smoother to detrend instead. The plot below displays the observed values with the LOESS predictions. However, this looks very similar to the cubic smoother and exhibits the same decreasing behavior for the 2000s. We cannot observe the forecasted values here, so we choose to stick with the ARMA(2,1) errors and the quadratic model. Interestingly, the best model for the residuals from the LOESS smoother is also an ARMA(2,1), reinforcing this decision.

Ultimately, we need more years of data to be able to say anything concrete about how the average longest consonant run in female baby names changes. It’s interesting that both of our de-trending options treated the two separate peaks and the valley between them as noise, which would suggest a time series that only has one peak in 138 years, which might indeed be what’s happening.

Next, we will look at fitting a model to the average max consonant run in male baby names. We start by fitting both a cubic estimator to detrend our data.

After that, we will perform a spectral analysis on the residues of male.

## The peak frequency of male:  0.03472222

## The peak frequency of male using AIC:  0.03607214

For male data, both methods point to a multi-decade cycle (\(T=1/\omega\approx28\) years) for male data. This is a more consistent result, suggesting there might be a real moderate‐period fluctuation in the male longest‐consonant‐run measure.

Next, we fit an ARMA model to the residuals generated from this cubic polynomial.

AIC Table for Quadratic Detrended Male Max Consonant Run Data
MA0 MA1 MA2 MA3 MA4 MA5
AR0 -778.5777 -940.7923 -1006.910 -1069.421 -1095.604 -1118.239
AR1 -1111.9092 -1131.6017 -1134.175 -1147.533 -1145.713 -1157.511
AR2 -1142.3752 -1126.3538 -1171.314 -1171.287 -1171.654 -1170.959
AR3 -1151.5376 -1144.3317 -1177.526 -1178.143 -1176.143 -1174.173
AR4 -1166.6604 -1163.6115 -1178.038 -1176.143 -1175.728 -1181.855
AR5 -1165.7344 -1169.2923 -1176.112 -1175.786 -1182.608 -1180.930

The AIC table suggests an ARMA(5,4) model, so we look at the inverse roots of that model:

This includes many MA and AR roots right on the border, so this is probably not the best model. We investigated more of these plots for various combinations, and settled on an \(ARMA(4,0)\) model, where the roots are not on the border of invertibility:

This looks much better, so we choose ARMA(4,0) fit to the residuals of a cubic polynomial as our best model. The plot below shoes the original data (in teal) with the fitted values (dotted black) and a forecast for the next 20 years (dotted red).

Like the forecasting for the female names, these predict that the average longest number of consonants in a row will continue to decrease. This is likely because of the cubic fit to the data indicates a decrease. In general, forecasting the average longest number of consonants is likely not a reliable thing to forecast. We include these forecasts for exploratory purposes, but caution that they should not be considered accurate.

Ultimately, both of our analyses for the average maximum consonant run indicate a single peak occurring around the 1970s. We suspect the current downward trend will have to plateau or turn around eventually, but do not have data to back this up. In particular, we suspect that trends around the number of consonants occurring in a row might have a cyclical pattern not observable in the data set we have. There do not appear to be cycles of smaller frequencies.

Discussion

Our study of US baby names over 138 years reveals notable trends in naming complexity, with increases in both average name length and vowel count, suggesting a shift toward more elaborate names, while the longest consonant run exhibits a peak around the 1970s before declining. Using polynomial trends and ARMA modeling, we found that name length trends are best captured with quadratic (for females) and linear (for males) models, while vowel counts display a broad “U-shaped” pattern with no strong cyclical component. In contrast, the longest consonant run metric showed moderate evidence of a 28-year cycle for male names but remained inconclusive for female names. Spectral analysis confirmed that most fluctuations were dominated by long-term trends rather than periodic behavior. Forecasting results indicated continued increases in name length and vowel usage, but a sharp projected decline in consonant sequences, raising concerns about the reliability of extrapolating cultural patterns from historical data. These findings suggest that naming complexity evolves in different ways across various linguistic dimensions, potentially influenced by cultural and phonetic preferences. Despite the insights gained, our study is limited by the dataset’s historical scope and the challenges of modeling sociolinguistic trends. Future research could incorporate demographic and linguistic factors to refine predictive models, exploring how societal influences shape naming conventions over time. ## Limitations Despite having data from 1880 to 2017, we likely needed more data points. We suspect that cycles of naming behavior may be very, very long as naming patterns change slowly. It might have also been useful to see if additional covariates, such as other data points on linguistic behaviors and demographics of babies being born, would have helped us build a better model.

Scholarship

We chose a very different subject for our project than previous projects. Many past projects focused on economic or environmental data. We took a different route and chose to look at a more socially-motivated time series. This may have contributed to some of our difficulties, as data occurring naturally in the environment or in financial markets generally follows more understandable patterns. It also might change more rapidly, where changes in naming patterns occur very slowly. We also chose to look at multiple related time series rather than a single one. Our goal in doing so was to address the “complexity” of names without having to choose a single definition for “complexity”. We also thought it would be interesting to compare the results of the different time series. In this sense, our project approaches our research question in a different way than most of the previous ones that set out to answer a single question about a single time series. While we believe our choice led to a more interesting project, it did mean that we had less time to devote to model fitting and diagnostics.

Like many of the past projects, we primarily used ARMA models and looked at AIC tables to determine the best model. We got the idea to look at the invertibility of the roots visually using the plot functionality available in arima and arima2 from one of the past projects[13]. Unlike many of the projects, we fit a polynomial trend to detrend our data before using ARMA methods. We observed this in one other project[14], and found one other that used LOESS smoothing[15], but it did not seem to be a common choice. However, for our data it was necessary. We also did some forecasting, which did not appear in many of the past projects. Ultimately, this was not very useful. We suspect that our dataset does not extrapolate well, but it was interesting to see what it predicted.

Futhermore, we used arima2 for some of our models where most midterm projects in the past just used arima. This likely did not matter in our analysis, but is a useful example of how arima2 can be used successfully and fairly seamlessly in time series analysis. We did have to use arima models for the forecasting but we used arima2 for the modeling process of the longest consonant run time series. Other than the forecasting, there were no points where arima2 was more difficult to use than arima, and we may have avoided optimization issues.

References.

1. Have We Reached Peak Baby Name? - The New York Times. (n.d.). Retrieved February 16, 2025, from https://www.nytimes.com/2024/06/15/style/baby-name-meme-tiktok.html
2. Wickham, H. (2021). Babynames: US baby names 1880-2017. https://CRAN.R-project.org/package=babynames
3. Baby names and culture. (2020). In Joshua Ashkinaze. https://www.joshash.space/data-science/names-analysis
4. Wickham, H. (2023). Stringr: Simple, consistent wrappers for common string operations. https://CRAN.R-project.org/package=stringr
5. OpenAI. (2025). chatGPT. Note: chatGPT was asked "what’s the easiest way to count the number of vowels in a string". This was used to calculate the average number of vowels in a name. https://chatgpt.com/
6. OpenAI. (2025). chatGPT. Note: chatGPT was asked "what’s the easiest way to get the longest run of consonants in a string". The given answer was not particularly useful, but motivated the use of ‘sapply‘ and ‘str_split‘ which was how the longest consonant run was ultimately calculated. https://chatgpt.com/
7. Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org
8. Midterm project: Analysis of BTC and DOGE. (2024). https://ionides.github.io/531w24/midterm_project/project02/blinded.html
9. Ionides, E. (2025). Notes for STATS/DATASCI 531, Modeling and Analysis of Time Series Data. https://ionides.github.io/531w25/
10. Wheeler, J., McAllister, N., & Sylvertooth, D. (2024). arima2: Likelihood based inference for ARIMA modeling. https://CRAN.R-project.org/package=arima2
11. Hyndman, R. J., & Khandakar, Y. (2008). Automatic time series forecasting: The forecast package for R. Journal of Statistical Software, 27(3), 1–22. https://doi.org/10.18637/jss.v027.i03
12. Ljung, G. M., & Box, G. E. P. (1978). On a measure of lack of fit in time series models. Biometrika, 65(2), 297–303. https://doi.org/10.1093/biomet/65.2.297
13. Anonymous. (n.d.). Midterm project: Analysis of BTC and DOGE. STATS 531 Winter 2024. https://ionides.github.io/531w24/midterm_project/project02/blinded.html
14. Anonymous. (n.d.). Investigating trends in household electricity consumption. STATS 531 Winter 2024. https://ionides.github.io/531w24/midterm_project/project12/blinded.html
15. Anonymous. (n.d.). Statistical modeling and analysis of california wildfires time series. STATS 531 Winter 2022. https://ionides.github.io/531w22/midterm_project/project16/blinded.html