Introduction

Massive Open Online Courses at the University of Michigan

Massive Open Online Courses, or MOOCs, are large, low- or no-stakes digital courses, typically offered by institutions of higher education on platforms such as edX or Coursera. As of 2017, an estimated 58 million students have registered for or participated in at least one MOOC (Shah 2016). The University of Michigan was one of four founding partner institutions for Coursera, and the university is a leading provider of MOOCs on the platform: over 6.5 million students have signed up for a Coursera course offered by the University of Michigan (Thomas 2018).

The Coursera platform was launched publicly in 2012, which was widely hailed as the “Year of the MOOC” due to the overwhelming enthusiasm and controversy generated by the launch of both Coursera and edX, a competing platform (Pappano 2012). During that time, the University of Michigan (and many other institutions) have launched hundreds of courses on the platform, with topics ranging from computer programming to finance and science fiction literature. These offerings have continued to expand, and courses which were offered on the platform are continuously offered in repeating, sometimes overlapping sessions. The University of Michigan, again along with several other insitutions, has constructed programs where students can use MOOC courses for credit in actual graduate and undergraduate degree programs: for example, the University of Michigan offers several “MicroMasters” programs in topics such as User Experience Design, and just recently the university announced its first fully-online degrees through Coursera (Thomas 2018). We might think that these trends would contribute to increases in MOOC enrollments.

However, it has been well-documented that the overwhelming majority of students who register for a MOOC fail to complete it, with dropout rates of over 90% being not uncommon (Jordan 2014, 2015). Furthermore, several other informal learning opportunities have emerged since the inception of the major MOOC platforms in 2012, such as interactive programming environments as well as smaller, shorter, more unstructured courses such as those offered by Udemy or LinkedIn Learning, which offer credentials directly geared toward job seekers. Finally, there is also the possibility that the “market” for MOOCs has simply become saturated: the learners most interested in taking the MOOCs on offer have already completed them. One might suspect that these trends would collectively drive MOOC participation downward.

If enrollment is waning over time, this would suggest that interest is declining and that the market for MOOCs may be saturated. However, if enrollment is consistent or increasing, this might suggest that the University of Michigan should continue to invest in its MOOC offerings, or that its current offerings are effective in generating student signups.

Finally, there are several additional questions beyond even the mere question of an upward or downward trend in enrollment over time. More detailed questions about user signup can inform the University’s ability to most effectively allocate course resources. These include, most importantly, understanding periodicity and seasonality in MOOC signup, which the university can use to determine how to allocate scarce course resources (such as support staff who assist students). Furthermore, beyond the specific question of whether there is seasonality (e.g. weekly, annual) in learner signups, we might ask the broader question of whether there is any pattern in course enrollments. This would not only assist the University of Michigan in understanding MOOC participation, but could reveal interesting patterns in MOOC research over time.

Research Questions

Based on the analysis in the previous section, this investigation explores three main research questions:

  • RQ1: How has student enrollment in MOOCs has changed over the period from 2014-11-11 to 2018-03-03? Is the long-term trend in student enrollment increasing, decreasing, stable, or showing some other pattern?
  • RQ2: Does MOOC registration at the University of Michigan follow periodic patterns, such as weekly or annual seasonality? Which effects tend to dominate?
  • RQ3: Which models most effectively model the historic demand for MOOCs, and what can more detailed inspection of models tell us about the structure of the demand for MOOCs?
  • RQ4: Does demand for MOOCs in any way track larger trends in the economy (US, global)? (This is more of an exploratory question and might be considered a secondary objective, but one we might keep in the back of our minds as we investigate.)

Data

In order to access any content from a Coursera course offered by the University of Michigan, a user (hereafter referred to as a “learner”) must create an account on the Coursera platform and “sign up” for the course. Students can sign up as any of the following roles: Learner, Pre-enrolled learner, Browser, Mentor, Instructor, Teaching Staff, University Admin, Not enrolled. The exact assignment of a student’s role depends on the level of access they desire for the course (i.e. videos only, or access to graded assignments), as well as whether the account belongs to a course staff member or a student. In this analysis, we strictly evaluate learner signups; a larger analysis would be beyond the scope of this project. We note that there were nearly 6 times more learner signups than any other type of signups: a total of 3,007,083 learner signups are contained in the data.

The data used in the analysis are derived directly from the University of Michigan’s database of Coursera signups at the Office of Academic Innovation, which is itself directly administered by Coursera. The data consists of 3,885,362 signup events for 102 courses offered by the University of Michigan on the Coursera platform from 2014-11-11 to 2018-03-03. While the data are restricted by a data usage agreement with Coursera and cannot be shared publicly, the data can be made available directly to the course instructors for evaluation of this project. When aggregated over days to produce a single record per day, the dataset contains data for 1,207 daily time points between the dates specified above, with no missing days.

Sample of learner signup data used in this analysis.
membership_role_assigned_date total_signups
2018-03-02 1906
2018-03-01 2114
2018-02-28 2296
2018-02-27 2923
2018-02-26 4215
2018-02-25 1997

A sample of the first few rows of the learner signup data used in the following analysis is shown below. The data consists of the total number of signups for the “learner” status for each day from 2014-11-11 to 2018-03-02. There are no missing observations in the data.

A time plot of the learner signups (blue line), along with the three other most common signup types (not enrolled, pre-enrolled learner, and browser) is shown above. The analysis in this paper only focuses on learner signups – by far the most common type of signup – but the plot demonstrates that there are different patterns for different types of learner signups, which justifies evaluating learner signups separately. Additionally, the plot shows that the learner signups appear to be the most complex signup type – other types have clearer periodicity and clear trends (“Browser” status appears to be declining over time, while “not enrolled” appears to be increasing). The learner signup data has more intermittent “spikes” followed by intermittent low values.

In the analysis that follows, we focus only on modeling and understanding the “learner” membership role. This is for several reasons. First, this includes the largest group of participants; as shown in the table above, the total number of “learner” registrations is nearly \(6 \times\) larger than the next most common group. Second, we focus on the “learner” role because the pattern and trend of this registration type are less immediately clear than the others, which show somewhat more obvious seasonality and trend. Furthermore, the University of Michigan and Coursera both likely have a stronger interest in “learners,” as these are the most active participants in a course and the students most likely to engage deeply with course content and to be invested in completing course activities. These are also the only types of participant who can pay for the course, and paid participants are of greatest interest to the university.

Learner Signups Over Time

In this section, we investigate signups of learners more deeply. In particular, we present initial evidence to answer RQ1 (long-term trends in MOOC learner enrollment), RQ2 (seasonal/periodic effects), and RQ4 (relationship between MOOC enrollment and broader economic indicators).

Based on the time plot of the learner data in the previous section, we can see that there appear to be periodic components to the underlying fluctuations in the data, with what appear to be regular weekly fluctuations but no apparent yearly patterns. This could be somewhat surprising – we might expect, for example, strong annual periodicity if many learners are following a traditional academic calendar (e.g. students who are supplementing their courses with MOOCs) or are working on a professional calendar (full-time workers who are learning during breaks; such as over an extended holiday vacation).

We might also be interested in the presence of weekly fluctuations in learner signups – for example, if higher-signup periods occur over the weekend (in which case learners might be signing up for courses to learn during their “free time”) or during the week (learners could be actually supplementing their daily work with MOOC content or using the MOOC to assist with problems encountered in regular work).

Knowing the presence of seasonality in either case – weekly or annually – would assist the University of Michigan in several ways. First, it would allow them to more intelligently allocate course resources, such as the hiring of part-time course teaching assistants who monitor course discussion forums to answer questions, engage with students, and address concerns. Second, knowing about seasonality of signups might also inform the University of Michigan’s release of future courses: releasing courses at the times with highest learner signup would ensure the maximum potential audience for a course.

Autocovariance and Frequency Domain Analysis

In this section, we evaluate the sample autocorrelation and periodogram to begin investigating periodic patterns in the data. The exploration in this section helps to inform modeling investigation in the following sections.

Shown above, both the sample autocorellation function (ACF) and the sample partial autocorrelation function (PACF) plots show very strong weekly activity (but also strong autocorrelation at other lags, particularly in the case of the sample ACF). There is no strong evidence of a monthly cycle, which would occur around lag 30.

Another way we can explore periodocity in the data is by frequency domain analysis. A smoothed periodogram is also shown above. The periodogram largely supports the analysis of the sample ACF and PACF above, as well as the initial views of the data. First, the largest patterns in the data is at the weekly frquency at a period of 7 days (corresponding to a frequency of \(\frac{1}{7} \approx 0.143\)). This is indicated by the large “peaks” in the periodogram at \(\approx 0.14\) whose base-to-peak distance is larger than the blue line (in the upper-right corner), which corresponds to a significant deviation from random fluctuation at this frequency. Second, we see a somewhat smaller, but apparently significant, peak near a frequency of roughly 0.28. This peak is more difficult to interpret, as it would correspond to a period of roughly 3.5 days; it may be related to the weekly frequency as well (because it is at double the frequency of a weekly fluctuation). We can also note that the periodogram does not show high power at the frequency corresponding to a period of one month (\(\frac{1}{30} = 0.0\bar{3}\)).

This periodogram, along with the ACF and PACF, largely point to strong evidence for weekly seasonality in the learner signup data, providing initial evidence for RQ2. It mostly does not answer whether there are longer-term trends in the data, such as quarterly or annual fluctuations. Since the data span more than 3 years, we should be able to detect at least large annual fluctuations, if they exist. In the following section, we perform a frequency-based decomposition of the data into low-, high-, and medium-frequency signals, which serves to (i) confim the weekly seasonality finding, and (ii) to explore longer-term fluctuations and trend in the data, which is not fully answered by the periodogram.

Trend Analysis and Seasonal Decomposition

In this section, we conduct an analysis that can provide evidence for RQ1, RQ2, and RQ4.In order to understand periodicity or cycles and long-term trend in the data, we model the learner signup data as a combination of three processes:

  • A low-frequency component, which represents the overall trend of the data;
  • A high-frequency component, which represents noise in the data, and might capture the “spikes” or peaks indicated in the initial time plots;
  • A mid-frequency component, which might correspond to the “business cycle” of registrations irrespective of the long-term trend and short-term random variation.

Using Loess smoothing, we model the signups data as a combination of these three frequencies; a plots of the original data (top) along with the low, high, and mid-frequency decompositions is shown below. Note that the time axis corresponds to the number of days since the first observation in the dataset.

We see several results from modeling the data in this manner that are relevant to our initial research questions.

First, we clearly see that the high-frequency “random” components (third panel from top) can capture a large portion of the variability in the data. There are several large “spikes” in the data, which likely correspond to the release dates of new or popular courses on the platform (although I was not able to verify this for the current assignment; course launch dates were not part of this dataset). Because these spikes are not a part of longer-term trends, they are almost entirely captured by the high-frequency component. This suggests (if we are correct that these spikes correspond to the launch of new courses) that course launches – at least for popular courses – can drive large, but temporary, increases in learner signups. So, the university could expect that there would be large surges of signups accompanying the launch of future courses (but the increase in signups would be largely temporary, and would return to normal after only a few weeks).

This high-frequency trend might also be a reflection of an interesting shift that happened over the course of this dataset: while earlier courses were typically only available during fixed time intervals, with a set “start” and “end” date, Coursera gradually transitioned most courses to an “on demand” model where the course content is available at any time. Therefore, the spikes in signups might reflect students responding to the earlier model, where they needed to begin a course at a fixed time in order to follow the prespecified schedule. Now, however, students can start and finish most courses at any time, so there is no strong incentive for large groups of students to sign up at any particular time. This appears to be at least partially reflected in the slightly smaller frequency of high-frequency spikes at later dates.

Second, we can from the plot see that there appears to be a general upward trend in signups, followed by a small decrease and subsequent stabilization, as demonstrated by the low-frequency component (second panel from top). This provides initial evidence for RQ1. With the high-frequency fluctuations in signups removed, it is clear that the long-term trend in signups is an increase, followed by a flattening out. This suggests that there is little evidence to support the hypotheses, considered above, that learner engagement is decreasing over time (e.g. due to diminished enthusiasm about MOOCs since their initial creation; market “saturation”, or other opportunities for informal learning). Instead, interest seems to remain relatively stable over time. Note that, however, the size of the long-term trend is roughly \(\frac{1}{10}\) the size of the high-frequency peaks.

Third, we see from the bottom panel of this plot that there may be at least some annual or quarterly seasonality in the cycle component. Another version of the cycle component is shown below, where 90-day quarterly intervals are plotted with dashed lines; blue lines indicate the first quarter of the year (January 1). The plot shows that there is some weak evidence of annual patterns, especially some peaks around January 1st. There are also some signs of peaks in the second quarter (shortly after April 1 each year), which would possibly indicate students completing the normal academic year beginning courses during the summer. This evidence is weak, though, and far weaker than the strong signs of weekly seasonality shown previously.

Fourth, neither the long-term low-frequency trend nor the cyclic trends closely match either US or global economic indicators, based on an initial review of such data. This provides at least preliminary insight into RQ4. If there is a connection between the global or national US economy and Coursera signups, then the relationship is complex and difficult to measure.

ARIMA Modeling

In this section, we use the insights collected from the previous sections to investigate effective ARIMA and SARIMA models for the signup data, which explores RQ2 and RQ3. Before we begin modeling, we note that the analysis of the previous sections suggested, in response to RQ2, that there is strong evidence of weekly seasonality in the data. Additionally, we showed weaker evidence of annual patterns, including weak evidence of increases near the beginning of the year, as well as during the second quarter.

We begin this section by exploring a normal ARIMA model, and evaluate the residuals to determine whether a seasonal model is appropriate or whether the non-seasonal ARIMA model is sufficient. We then proceed to explore weekly seasonal models. The ARIMA models below examine a search space of \(p \in \{1,2,3,4\}\); \(d \in \{0,1,2\}\); \(q \in \{1,2,3,4\}\); where \(p\), \(d\), and \(q\) represent the AR, differencing order, and MA terms, respectively. Models were fit using the method of maximum likelihood, instead of the default arima estimation method (conditional-sum-of-squares to find starting values, then maximum likelihood), because the conditional-sum-of-squares method led to some non-stationary results which pure MLE model estimation avoids. The complete AIC results are reproduced below as a demonstration of results.

##    p d q      aic
## 43 3 1 4 21653.16
## , , d = 0
## 
##    q
## p          1        2        3        4
##   1 21784.67 21771.57 21771.97 21770.96
##   2 21773.14 21775.99 21774.50 21775.36
##   3 21771.07 21775.28 21775.90 21773.59
##   4 21770.78 21774.52 21780.41 21750.78
## 
## , , d = 1
## 
##    q
## p          1        2        3        4
##   1 21754.99 21754.67 21752.37 21753.24
##   2 21753.34 21750.59 21758.13 21756.24
##   3 21753.37 21756.86 21754.57 21653.16
##   4 21749.65 21749.11 21754.44 21653.67
## 
## , , d = 2
## 
##    q
## p          1        2        3        4
##   1 22172.74 21752.72 21755.54 21754.96
##   2 22056.19 21751.54 21755.30 21755.99
##   3 21998.83 21766.73 21754.88 21757.08
##   4 21958.26 21747.32 21753.91 21758.00

This analysis suggests that an ARIMA(3,1,4) model is a strong fit for the data; this model achieved the minimal AIC value. Note that ARIMA models do not incorporate seasonality, and our previous analysis suggested that a weekly model might perform best. Furthermore, we note that we cannot directly compare the AIC of models with different levels of differencing (because they are non-nested), so we could also potentially consider the ARMA(4,1) or ARIMA(4,2,2) models as well. Additionally, we can observe from the full results above that there may be some convergence issues with model estimation: AIC should never increase by more than 2 with the addition of a single parameter; there are cases in which this does occur (for example, from ARIMA(1,1,4) to ARIMA(2,1,4)). Additionally, inspection of the AR and MA polynomials of the ARIMA(3,1,4) model suggest parameter redundancy; the roots of the AR and MA terms are nearly identical.

arima_best <- arima(learner_signups$total_signups, order =c(3,1,4), method = "ML")
## Warning in arima(learner_signups$total_signups, order = c(3, 1, 4), method
## = "ML"): possible convergence problem: optim gave code = 1
AR_roots <- polyroot(c(1,-coef(arima_best)[c("ar1","ar2","ar3")])) ; AR_roots
## [1]  0.6240238+0.7823398i -1.0007978+0.0000000i  0.6240238-0.7823398i
MA_roots <- polyroot(c(1,coef(arima_best)[c("ma1","ma2","ma3", "ma4")])) ; MA_roots
## [1]  0.6284912+0.7878036i -1.0000098+0.0000000i  0.6284912-0.7878036i
## [4]  1.0658752+0.0000000i

Collectively, this analysis suggests some potential numerical concerns with this estimated model; while we might prefer a simpler, more stable model (such as the ARMA(4,1) model noted above), we will proceed with the ARIMA(3,1,4) model for now to do some further inspection; other models are considered later.

Another way for us to explore potential presence of seasonality is to examine the residuals of this non-seasonal ARIMA(3,1,4) model. Below, we show three potential diagnostic plots of the ARIMA(3,1,4) model suggested by the analysis above:

The ACF function of the residuals shows moderate autocorrelation in the ARIMA(3,1,4) residuals, with some apparent oscillations that are difficult to interpret. While there is one lag (21) at which the sample autocorrelation estimator appears to be slightly over the significance threshold, we can note that 30 different lags are observed, so using a 95% confidence interval would be expected to produce roughly 1.5 significant results even simply due to random variation.

What is most important, and most clear, from this analysis is that the residuals of the ARIMA(3,1,4) model do not appear to be fully Gaussian. The residuals are highly asymmetric, with extreme high values being much more comman than extreme low values. The Q-Q plot in the center panel of the figure above demonstrates the skewness of these residuals: Gaussian-distributed residuals would hew closely to a diagonal line, but these residuals are clearly quite far from being Gaussian distributed. Note that when other models (such as the ARMA(4,1) model) were explored, there were similar issues with non-Gaussian residuals. In a subsequent section we consider a transformation to ameliorate this.

Seasonal ARIMA Modeling

In this section, we consider a weekly seasonal model, which might address some of the apparent autocorrelation in the ARIMA(3,1,4) residuals (note that this does not directly address the non-Gaussian nature of the errors, which we will explore in the following subsection on log-transformation). A stepwise search of an ARIMA model space, specifically exploring for the potential of weekly seasonality, with \(p \leq 5\), \(d \leq 2\), \(q \leq 5\) was performed according to AIC values using the auto.arima function of the forecast package in R.

The model selected according to this search was an SARIMA(1,1,1)(0,0,2)[7] model. The full coefficients of the model are shown below, along with the same diagnostic plots originally shown for the ARIMA model above. Of particular interest with respect to thise model is that the seasonal component only uses AR terms, not MA or differencing terms. Note also that the stepwise search space for the seasonal model which led to the SARIMA(1,1,1)(0,0,2)[7] model included non-seasonal models; this suggests that the model with the seasonal component fits the data better than its non-seasonal counterpart without the weekly seasonality terms. This provides at least moderate evidence that there is a weekly component to the Coursera learner signups.

Results of the SARIMA model are shown above. While the SARIMA model might have a slightly better fit than the corresponding non-seasonal model according to AIC, the residual plots show that the SARIMA model does little to improve the fit in terms of the distribution of the residuals. The SARIMA model still has highly-skewed, non-Gaussian errors.

Two plots of the data are shown below which reinforce the conclusion that a mild, but genuine, weekly effect is present in the learner signup data. First, the left panel of the plot (the bar plot) shows the median counts of weekly learner signups by weekday across the dataset. Monday and Tuesday have larger median signups (at least 3000 per day each), while other weekdays have markedly less. Surprisingly, weekend days (Friday, Saturday, Sunday) have the lowest signup counts, with medians nearly 30% lower than Monday and Tuesday. This suggests that learners are not signing up for courses during their “leisure” time on the weekends, but instead at the beginning of the work week.

The second plot, on the right-hand panel, adds more context by plotting the signups from each date as an individual point, and grouping the dates by weekday. This plot shows how weak the association by weekday is: the larger medians for Monday and Tuesday are mostly driven by the presence of a few outliers on each weekday (perhaps due to the launch of massively popular courses, which is also reflected in the large intermittent “spikes” in the data shown above).

The monthly and yearly models considered in this analysis faced apparent instability, and yielded models for which standard errors could not be computed. This is likely due to either insufficient data for estimating the many parameters in large models, or other invertibility or identifiability issues with the model. This is somewhat unsurprising, as it was noted in class that in practice, seasonal modeling can be unstable when SARMA models are used at the daily level . Note that the code for fitting an example SARIMA model with yearly and monthly seasonality is provided in a code block in the .Rmd source file, but is not shown here.

These results, collectively, suggest that while there appears to be at least some weekly fluctuation present in the data, a transformation of the data might improve the model fit to allow it to be better modeled by the ARIMA or SARIMA models, which assume Gaussian errors. In the following section, we consider such a transformation.

Log-Transformed Model

The ARIMA analyses above consistently demonstrated highly skewed, non-Gaussian error terms, which violated a fundamental assumption of the ARIMA models used. In this section, we consider whether a log-transformation (with base \(e\)) of the data can improve the model fit. A time plot of the log-transformed data and the corresponding ACF of this transformed data are shown below.

The time plot appears to show stronger stationarity (which suggests, in response to RQ1, that while the data may appear to be far from stationary on a linear scale, it is seems more nearly stationary on a log scale), as well as a more stable variance over time. The sample ACF still shows strong autocorrelation across time points of many lags which diminishes slowly over time, as well as a clear weekly component.

Below, we follow the same procedure for identifying the best ARIMA model on the log-transformed data as used above. However, this time, to avoid convergence issues which were apparent from inspection of the AIC data, we instead selected the highest-performing model with no differencing \(d = 0\), which was an ARIMA(3,2) model. This model achieved greater numerical stability, overcoming one of the issues noted with the original ARIMA(3,1,4) model above (of course, this is not due to the log-transformation, but is simply due to the selection of a simpler model with fewer parameters). The results of this ARIMA(3,2) model on the log-transformed data show that the log-transformation is indeed successful in generating more closely Gaussian residuals, as demonstrated by the residual plot (which is far less skewed) and the Q-Q plot. However, the ACF plot of the residuals demonstrates that there is still autocorrelation in these residuals (which is not necessarily a problem – but is worth noting).

Finally, we combine the benefits of the log-transormation of the data with an SARIMA model with weekly seasonality to explore the fit of an SARIMA model. Below, we show that an SARIMA(1,1,1)(0,0,2)[7] model achieves the best AIC performance according to stepwise selection, and this model achieves errors much closer to the Gaussian assumption of the general class of ARIMA models.

Conclusion

This work presented several different analyses to answer multiple research questions. Above all, the work demonstrated that the construction of the many variants of autoregressive moving average models is complex and delicate, and requires careful inspection of both the original data and the model that results.

Conclusions to Research Questions

In this section, we briefly summarize the findings with respect to the initial research questions outlines above.

  • RQ1: How has student enrollment in MOOCs has changed over the period from 2014-11-11 to 2018-03-03? Is the long-term trend in student enrollment increasing, decreasing, stable, or showing some other pattern? We demonstrated, using a frequency-based decomposition, that most of the variability of the Coursera learner signups is due to short-term fluctuation. When this fluctuation is removed, there is some evidence of a moderate increasing trend throughout 2015 and 2016, with the data leveling off after this period. On a log-transformed scale (with base \(e\)), the data are more nearly stationary. There is weak evidence for annual or quarterly trends, but our analysis here was inconclusive and we were not able to fit monthly or annual seasonal models due to insufficient data.
  • RQ2: Does MOOC registration at the University of Michigan follow periodic patterns, such as weekly or annual seasonality? Which effects tend to dominate? Weekly effects most strongly dominate, with higher signups on Mondays and Tuesdays, and much lower signups on Friday - Sunday. There is only moderate evidence for quarterly or annual effects (as noted in reply to RQ1), and no evidence for monthly effects.
  • RQ3: Which models most effectively model the historic demand for MOOCs, and what can more detailed inspection of models tell us about the structure of the demand for MOOCs? Models fit on the log-transformed data most closely accomodate the assumption of symmetric, Gaussian residuals. In particular, an ARMA(3,2) model or an SARIMA(1,1,1)(0,0,2)[7] model, which incorporates weekly seasonality, achieve a good balance of fit to the data with numerical stability. More complex models with higher-order differencing showed signs of convergence issues.
  • RQ4: Does demand for MOOCs in any way track larger trends in the economy (US, global)? This was primarily an exploratory question, but we did not see strong signs of this during the period represented by the data (2014-11-11 to 2018-03-02). If such a relationship exists, it is more complex than we were able to detect based on an initial evaluation.

With respect to the main actionable insights on the University of Michigan’s perspective, we find the following: because learner signups are much higher on Mondays and Tuesdays, the University should concentrate its course staff and support resources on those days (or shortly after, when newly-registered learners are exploring the courses). Additionally, the university should consider continuing to launch couses early in the week (if this is indeed their behavior, which it appears to be), because this is drawing learners into the course. However, because the low weekend signup numbers were surprising for a “leisure time” activity such as MOOCs, the university could also consider dedicating additional resources to signing up learners on the weekend, when many users may spend more time online.

Future Research

Several questions were not addressed by the current project and would be useful avenues for future research. First, the exploratory question RQ4 could be more closely examined in future work, by using time series similarity measures. Additionally, time series analysis methods could be used to understand learner behavior patterns over time in specific courses, not just in the course signups. This could include video-watching behavior, assignment and activity submission, forum posting, and other activity over time. Finally, future research could combine the data used here with demographic data to determine differences in signup behavior across different student types, levels of education, or global regions.

References