I. Introduction

South Korea went through one of the most dramatic industrialization and economic development in just a few decades after the Korean war. On the other hand, it is also going through one of the most drastic changes in demographic structure in the world, which New York Times refers to as ‘South Korea’s Most Dangerous Enemy: Demographics’.

Total fertility rate (TFR), defined as the average number of children that would be born per woman if all women lived to the end of their childbearing years and bore children according to a given fertility rate at each age, in South Korea was 1.05 in 2017, decreasing from 1.17 in 2016. TFR of 2.1 is needed for population replacement, that is, to maintain stable population. South Korea ranked 200 out of 200 countries in total fertility rate (World Bank), and far below OECD average TFR of 1.68.

It is important to note that low fertility rate has far-reaching consequences. At this rate, the total population would start to decrease starting in the mid 2020s. Median age of the population has more than doubled between 1975 and 2015, from 19.6 to 41.2 years old. Economy is expected to get weaker, and more burden for the young to support the elderly population.

One of the factors driving the decrease is decrease in the number of marriages. Given that Korea has lowest rate of out-of-wedlock births among 42 OECD countries (1.9%), decrease in marriage could also signal that fewer couples would have kids, contributing to TFR decrease.

In this project, we explore monthly fertility pattern of South Korea and explore the relationship of fertility with monthly marriage rate among the younger generation to understand the relationship and time-series pattern of the fertility data.

II. Data

For the proposed analysis, we use data retrieved from Korean Statistical Information Service. I use three sets of data on

from January 2000 to November 2017. All datasets are in monthly format, corresponding to 215 months. We explore the data in the following section.

III. Visual exploration

We present time plot of the time series. First, we look at the main variable of interest, monthly birth rate over time.

It is clear from the data that there is a decreasing trend, as mentioned in the introduction, decreasing from more than 60,000 births at the beginning of the observation period to less than 30,000 in the last month. There seems to be some monthly fluctuations.

Also, we look at the periodogram and ACF plot of birth series. As can be seen from the periodogram on the left, there are peaks at various frequencies, notably at around .15, .25, .35, and .45, which correspond to periods of 6 months, 4 months, 3 months, and 2 months, respectively. Consistent with peaks in the periodogram, Threre is a peak at 12 months, and somewhat higher autocorrelation at 6 months, as well as at shorter lags, at 2, 3 months.

Next, we look at the marriage rate. I would like to note that this is not only among population in their 20-30s.

There seems to be some decreasing pattern (quadratic shape, starting from around index 50) in the marriage pattern. Distinctively, there are systematic spikes and dips, seemingly every year. Looking at the data yearly, there is a peak usually in May and dip around September. This could be because of Korean people’s shared preference over which season they prefer to get married. This suggests some type of seasonal component might be suitable for the analysis.

Again, there are peaks at various frequencies. Frequency at around .15, which corresponds to 6 months pattern is strongest, followed by another peak at frequency around 0.08, which would correspond to a period of 12 months. As expected, we s e a peak at 12 months, and somewhat higher autocorrelation at 6-month lag. Moreover, there is cyclical pattern in the data, which is consistent with the time plot analysis that there would be a cyclical pattern, possibly driven by shared seasonal preferences.

IV. Detrending data

Now we go back to focusing on the birth time series. As mentioned and shown in the time plot of the birth series, there is a distinctive decreasing pattern, which shows cubic pattern. Hence, I fit a 3rd order polynomial to detrend the data for detrending purposes.

Least squares model with cubic trend
Dependent variable:
Birth
index -517.820***
(34.458)
I(index2) 4.918***
(0.370)
I(index3) -0.014***
(0.001)
Constant 54,829.210***
(861.441)
Observations 215
R2 0.695
Adjusted R2 0.691
Residual Std. Error 3,102.883 (df = 211)
F Statistic 160.525*** (df = 3; 211)
Note: p<0.1; p<0.05; p<0.01

We see that the coefficients of 3rd order polynomial trend are highly significant. Now, we look at the plot to see how well the model captures the general pattern in the trend.

Time plot overlaid with regression fit shows that the 3rd order polynomial seems to fit the data well. Down below, I show the residual patterns.

Time plot of the residulas look much more mean stationary. However, as can be seen from the ACF plot and periodogram, the yearly (at 12 months, 6 months, 4 months, etc.) patterns are still left, which justifies the application of SARMA models.

V. Model exploration: SARMA model

V.1 Model exploration

In this section, I explore SARMA models, and foor expositional simplicity, I explore SARMA \((p, q) \times (1, 1)\). First, we compute and compare the AIC values of models without marriage as a covariate.

MA0 MA1 MA2 MA3 MA4
AR0 3914.522 3802.817 3770.836 3714.884 3706.041
AR1 3701.615 3698.721 3696.530 3691.975 3691.022
AR2 3697.960 3699.953 3696.418 3691.816 3692.841
AR3 3699.935 3699.334 3685.673 3690.619 3683.817

Now, we compare these values with AIC values with models with marriage as covariate.

MA0 MA1 MA2 MA3 MA4
AR0 3905.382 3786.307 3740.213 3697.805 3685.219
AR1 3670.544 3671.346 3671.565 3670.727 3671.348
AR2 3671.140 3672.994 3660.116 3671.794 3673.287
AR3 3672.489 3659.967 3671.124 3672.099 3674.096

Comparing the two tables, we can see that model that includes marriage as a covariate has smaller AIC values compared to models without. Among the models in the models with marriage as a covariate, SARMA \((3, 1) \times (1, 1)\), followed by SARMA \((2, 2) \times (1, 1)\), have the lowest AIC values. Because the model is similar in complexity, we do not necessarily prefer one over another. Moreover, the results are not significantly different, so I focus on SARMA \((3, 1) \times (1, 1)\).

SARMA (3, 1) x (1, 1) model
Dependent variable:
birth
ar1 1.296**
(0.654)
ar2 -0.224
(0.581)
ar3 -0.093
(0.090)
ma1 -0.441
(0.652)
sar1 0.990***
(0.005)
sma1 -0.551***
(0.069)
intercept 32,088.610
(27,624.550)
marriage 0.202***
(0.035)
Observations 215
Log Likelihood -1,837.783
sigma2 1,312,534.000
Akaike Inf. Crit. 3,693.566
Note: p<0.1; p<0.05; p<0.01

We see that marriage is significant, confirming the hypothesis that there is some relationship between the monthly number of marriages and the birth rate.

V.2 Model diagnostics

In this section, we do model diagnostics: 1) time plot of residuals to check mean stationarity, 2) Q-Q plot to check the normality assumption of the Gaussian white noise process, and lastly 3) autocorrelation plot of residuals to check independence assumption in the Gaussian white noise process assumed in the above model.

We can see from the time plot of the residual of SARMA \((3, 1)\times(1,1)\) model that the mean seems to be quite stationary around 0. Moreover, from the normal Q-Q plot, while we can see that the tails are somewhat long on both left and right, it does not seem to deviate significantly from the 45 degree line. Lastly, we can see that the residuals are not autocorrelated, which is evidence in favor of the validity of the assumption.

IV. Conclusion

We find that there is a clear decreasing trend in the raw birth rates in Korea, and that there are clear seasonal components in the birth rates. We use SARMA model and compare models with and without monthly number of marriages as a covariate. Models with marriage as a covariate performs better, and among the explored models, SARMA \((3, 1) \times (1, 1)\) performs best based on AIC criterion. Marriage was positively correlated with the detrended birth rates. Give the importance of the topic, it would be interesting to explore other factors, such as unemployment, and their role in explaining the numer of birth time series data.


Sources