\[ \ \ \ \]

Introduction

Youtuber is a rising occupation. They are a group of people who “upload, produce, or appear in videos on the video-sharing website YouTube”\(^{[1]}\). Youtubers are popular among young people and children, and some youtubers earned a lot of money. Therefore, youtuber is a most desired career for young people, even more desired than actor\(^{[2][3]}\).

Youtubers make money through ads, brand co-operations and patrons from video watchers\(^{[4]}\). All the profits are based on popularity. Therefore, to determine how successful a youtuber is, the favorite measure is the number of subscribers they have. Subscribers sign up to receive email notifications of the youtuber’s newly updated videos. People can become subscribers for free, and can unsubscribe at any time.

For a youtuber, a critical question is how fast he can attract subscribers, which determines how much money he can earn. In this report, we are going to analyze the time series of number of subscribers and estimated earnings in the recent month of two youtubers. One is pewdiepie, the youtuber with the most subscribers with a 8-year-long YouTube life\(^{[5]}\). The other is megamogwai, a player of the Gwent card game, who uploaded his first video in 2014\(^{[6]}\). We want to fit models on their statistics, to see how profitable the career is as a youtuber, and see if there is any difference between well-known and not-that-famous youtubers.

Method

Data

The two datasets are retrieved from socialblade.com\(^{[7][8]}\) on March 4, 2018. The datasets contains the statistics of pewdiepie (“pew” for short) and megamogwai (“mog” for short) from February 3rd to March 3rd.

For each dataset, the “Date” column represents the date of the record. The “Tot” column gives the total number of subscribers of the channel. The “New” column represents the difference of subscriber number between the day and the previous day. The “High” column is the highest profit earned in the day estimated by socialblade.com. For pewdiepie, the unit of the High column data is 1 thousand dollars. The datasets contain 29 rows because the website only releases the most recent 30 days.

Models

The basic models we use are ARIMA models (ARIMA(p,1,q)) for the Tot series. Because New series is the difference of the Tot series, ARMA(p,q) models are used. Seasonality of 7 is also considered to fit the New series. For the High series, ARMA models with trend of Tot is favorable.

ARMA models\(^{[9]}\): \[ Y_n = \phi_1 Y_{n-1}+\phi_2Y_{n-2}+\dots+\phi_pY_{n-p} + \epsilon_n +\psi_1 \epsilon_{n-1} +\dots+\psi_q\epsilon_{n-q} \]

SARMA models\(^{[10]}\): \[ \phi(B)\Phi(B^i)(Y_n-\mu)=\psi(B)\Psi(B^i)\epsilon_n \] In this report we will use \(\Psi(B^i)=(1-B^7)\).

ARIMA(p,1,q) models\(^{[10]}\): \[ \Phi(B)((1-B)^dY_n-\mu)=\Psi(B)\epsilon_n \]

ARMA model with trend\(^{[10]}\): \[ \Phi(B)(Y_n-\mu-\beta t_n)=\Psi(B)\epsilon_n \]

Analyses

Analysis 0 - plotting data

Here we plot the New, Tot and High series of pewdiepie against time. It looks like the channel is cummulating around 20-40 thousand subscribers each day, and the total number of subscribers is consistently increasing. The estimated earnings got a high value on the first day (Feb 3rd), and after that, the value is fluctuating around 30.

Also, we plot the autocorrelation function of the three series. The New series and the High series seem independent, while the Tot series observes high autocorrelation.

For megamogwai, the plot of the three series are shown below. The New series may be stationary, while the High series does not seem stationary.

Also, the autocorrelation functions are plotted below. The New series may be appropriate for an AR(2) model. The Tot series and the High series observe high autocorrelation

Analysis 1a - New subscriber for pewdiepie

The new subscriber series is the difference of the total subscriber series. Therefore, it is reasonable to use an ARMA(p,q) model to fit the new subscriber series, and the total subscriber series is the corresponding ARIMA(p,1,q) model. Also, a seasonality of 7 may be considered, as people are more likely to behave in a weekly pattern.

  1. ARMA models

We tried different values of p and q’s, and compare the AICs to select a most appropriate ARMA model for the New series of pewdiepie. The AICs of the models are shown below.

##          MA0      MA1      MA2      MA3      MA4      MA5
## AR0 599.0053 600.8855 599.8155 598.1012 599.3494 601.0823
## AR1 600.8422 601.2517 600.5002 599.6932 601.3045 602.4116
## AR2 601.8560 602.6333 599.8704 600.8031 600.5499 602.2059
## AR3 602.3043 604.3007 599.5415 601.5388 600.0891 604.1272
## AR4 604.2965 606.2815 601.5363 603.2913 601.5354 606.8807

The table supports ARMA(0,0) and ARMA(0,3). To select a better model, we check the residuals of the model by plotting their autocorrelation functions and q-q plots.

From the q-q plots, we can see that a normal model is more appropriate for the residuals of the ARMA(0,3) model. Therefore, an ARMA(0,3) model is more appropriate for the New subscriber series of pewdiepie.( \(Y_n=\mu+\epsilon_n+\psi_1\epsilon_{n-1}+\psi_2\epsilon_{n-2}+\psi_3\epsilon_{n-3},\ \epsilon_i\sim N(0,\sigma^2)\)).

  1. SARMA model

If we add seasonality of a week (7 days) to the ARMA(0,3) model, the AIC is 598.6720842, larger than the ARMA(0,3) model. Therefore, the ARMA(0,3) model is preferred.

  1. Result
## 
## Call:
## arima(x = pew$New, order = c(0, 0, 3))
## 
## Coefficients:
##          ma1     ma2     ma3  intercept
##       0.1067  0.6932  0.4856  27132.002
## s.e.  0.1791  0.2683  0.2257   2385.553
## 
## sigma^2 estimated as 33711284:  log likelihood = -294.05,  aic = 598.1

The model for the New subscriber number is \[ Y_n=27132+\epsilon_n+0.1067\epsilon_{n-1}+0.6932\epsilon_{n-2}+0.4856\epsilon_{n-3},\ \epsilon_i\sim N(0,3\times10^7) \]

Therefore, for the Total subscriber number, the model should be ARIMA(0,1,3), \[ (1-B)Y_n-27132=\epsilon_n+0.1067\epsilon_{n-1}+0.6932\epsilon_{n-2}+0.4856\epsilon_{n-3},\ \epsilon_i\sim N(0,3\times10^7) \]

Analysis 1b - New subscriber for megamogwai

  1. ARMA models
##          MA0      MA1      MA2      MA3      MA4      MA5
## AR0 248.5160 248.7624 244.2462 243.2443 241.1519 241.2138
## AR1 246.7610 243.7192 242.7630 244.6596 241.3954 243.0618
## AR2 240.4580 242.4562 244.0515 245.2088 243.2505 245.0617
## AR3 242.4575 244.3637 246.0124 244.6930 244.4809 241.7261
## AR4 243.5090 245.4992 240.7020 243.3447 240.5695 242.1071

The table supports ARMA(2,0), which is consistent with the autocorrelation plot of the New series. The autocorrelation functions and q-q plots of the residuals of the model is plotted below.

The The model is \[ Y_n=\mu+\epsilon_n+\phi_1(Y_{n-1}-\mu)+\phi_2(Y_{n-2}-\mu),\ \epsilon_i\sim N(0,\sigma^2) \]

  1. SARMA model

If we add seasonality of a week (7 days) to the ARMA(2,0) model, the AIC is 602.1465261, much larger than the ARMA(2,0) model. Therefore, the ARMA(2,0) model is preferred.

  1. Result
## 
## Call:
## arima(x = mog$New, order = c(2, 0, 0))
## 
## Coefficients:
##          ar1     ar2  intercept
##       0.1518  0.5613    18.2128
## s.e.  0.1697  0.1733     7.6623
## 
## sigma^2 estimated as 172:  log likelihood = -116.23,  aic = 240.46

The model for the New subscriber numebr is \[ Y_n=18.2128+\epsilon_n+0.1518(Y_{n-1}-18.2128)+0.5613(Y_{n-2}-18.2128),\ \epsilon_i\sim N(0,172) \] Therefore, for the Total subscriber number, the model should be ARIMA(2,1,0), \[ (1-0.1518B-0.5613B^2)((1-B)Y_n-18.2128)=\epsilon_n,\ \epsilon_i\sim N(0,172) \]

Analysis 2a - estimated earnings for pewdiepie

We are to fit the corresponding SARMA model and trend model on Tot series.

  1. ARMA model
##          MA0      MA1      MA2      MA3      MA4      MA5
## AR0 265.8046 263.0651 261.3072 261.4652 261.0617 261.3177
## AR1 260.3878 261.6517 262.8562 264.6369 261.8364 263.7854
## AR2 261.1330 261.6112 264.7229 266.6248 263.2380 264.7707
## AR3 262.8641 262.9209 264.1135 261.5365 263.3427 264.1098
## AR4 264.8266 264.7608 263.2371 263.4160 266.4129 265.9778

From the table we can see that the most appropriate model is ARMA(1,0).

  1. SARMA model

If we add seasonality of a week (7 days) to the ARMA(2,0) model, the AIC is 259.6736237, smaller than the ARMA(1,0) model. Therefore, the ARMA(1,0) model with a seasonality of 7 is preferred.

  1. ARMA model with trend

If we add trend by total number of subscribers to the ARMA(1,0) model, the AIC is 259.4288203, smaller than the ARMA(1,0) model and seasonality model. Therefore, the ARMA(1,0) model with trend by Tot is preferred.

  1. Result

The analysis shows that the most appropriate model for the estimated income series is the ARMA(1,0) with trend by total number model, which is represented as \(Y_n-\mu=\phi_1(Y_{n-1}-\mu)+\epsilon_n+\beta Tot\).

## 
## Call:
## arima(x = pew$High, order = c(1, 0, 0), xreg = pew$Tot/1000)
## 
## Coefficients:
##          ar1  intercept  pew$Tot/1000
##       0.8478   6775.169       -0.1105
## s.e.  0.1375   4996.055        0.0819
## 
## sigma^2 estimated as 326.5:  log likelihood = -125.71,  aic = 259.43

The model turns out to be \[ Y_n-6775.169=0.8408(Y_{n-1}-6775.169)+\epsilon_n-0.1105\times \frac{Tot}{1000} \]

Analysis 2a - estimated earnings for megamogwai

  1. ARMA model
## Warning in arima(data, order = c(p, 0, q)): possible convergence problem:
## optim gave code = 1
##          MA0      MA1      MA2      MA3      MA4      MA5
## AR0 266.8208 253.0080 252.1245 246.1005 247.3468 248.2836
## AR1 242.4631 243.6227 245.5042 246.5178 247.4774 248.2271
## AR2 243.6119 245.5869 246.0124 247.2272 249.1390 251.0127
## AR3 245.5329 245.5263 247.4684 249.1792 251.1280 252.9654
## AR4 246.4184 247.4185 246.2557 250.6716 251.6828 253.8320

From the table we can see that the most appropriate model is ARMA(1,0).

  1. SARMA model

If we add seasonality of a week (7 days) to the ARMA(2,0) model, the AIC is 242.9302337, larger than the ARMA(1,0) model. Therefore, the ARMA(1,0) null model is preferred.

  1. ARMA model with trend

If we add trend by total number of subscribers to the ARMA(1,0) model, the AIC is 242.0088789, smaller than the ARMA(1,0) model and seasonality model. Therefore, the ARMA(1,0) model with trend by Tot is preferred.

  1. Result

The analysis shows that the most appropriate model for the estimated income series is the ARMA(1,0) with trend by total number model.

## 
## Call:
## arima(x = mog$High, order = c(1, 0, 0), xreg = mog$Tot)
## 
## Coefficients:
##          ar1  intercept  mog$Tot
##       0.6941  -4204.906   0.0955
## s.e.  0.1345   2333.946   0.0525
## 
## sigma^2 estimated as 182.9:  log likelihood = -117,  aic = 242.01

The model turns out to be \[ Y_n+4204.906=0.6941(Y_{n-1}+4204.906)+\epsilon_n+0.00955\times Tot \]

Discussion

Estimated earnings

For the estimated earnings, both the data of pewdiepie and megamogwai can be fitted by an AR(1) model with trend by the total subscriber number. This means that both previous income and total subscriber number plays an important row in earnings of a youtuber. It is noticeable that the coefficient of the total subscriber number for pewdiepie is negative. It may result from the first outlier of the income data. If we leave the first record out, and fit the model again.

## 
## Call:
## arima(x = pew$High[2:29], order = c(1, 0, 0), xreg = pew$Tot[2:29]/1000)
## 
## Coefficients:
##          ar1  intercept  pew$Tot[2:29]/1000
##       0.5302  -589.7633              0.0102
## s.e.  0.2268   884.3049              0.0145
## 
## sigma^2 estimated as 51.78:  log likelihood = -95.15,  aic = 198.31

The new model has a much lower AIC than the model in Result, and can be represented as \[ Y_n-589.7633=0.5302(Y_{n-1}-589.7633)+\epsilon_n+0.0102\times \frac{Tot}{1000} \] Therefore, it is generally reasonable to say that more subscribers mean more income.

Subscriber number

For the number of new subscribers, the data of pewdiepie and the data of megamogwai show two different patterns. A MA(3) model is appropriate to fit pewdiepie data, who is famous and productive. An AR(2) model is appropriate for megamogwai, who is a normal youtuber. It is surprising to see pewdiepie’s subscriber number increase in such a steady pace. For normal youtubers like megamogwai, the subscriber number is still increasing, but in a relatively low speed. With the property of an AR model, it is crucial for the new youtubers to get a relatively high subscriber number with their first several videos, or with one extremely famous videos that is popular for about more than two days in a row. However, the already famous youtubers, according to the MA model, do not need to attract new subscribers with one or two popular videos, their number of subscribers will increase automatically.

Reference

[1]https://en.oxforddictionaries.com/definition/youtuber

[2]http://metro.co.uk/2018/01/19/children-now-more-likely-to-want-to-become-youtubers-than-actors-7241396/

[3]https://www.tubefilter.com/2017/05/24/most-desired-career-young-people-youtube/

[4]http://www.bbc.co.uk/newsbeat/article/42395224/evan-edinger-the-five-ways-youtubers-make-money

[5]https://www.youtube.com/user/PewDiePie/featured

[6]https://www.youtube.com/user/MegaaMogwai

[7]https://socialblade.com/youtube/user/pewdiepie/monthly, retrieved March 4, 2018

[8]https://socialblade.com/youtube/user/megaamogwai/monthly, retrieved March 4, 2018

[9]https://ionides.github.io/531w18/04/notes04.html

[10]https://ionides.github.io/531w18/06/notes06.html