Final project outline

Find a time series dataset of your choice. Carry out a time series analysis, taking advantage of what we have learned in this course. It is expected that part of your project will involve a POMP analysis, using the modeling and inference approaches we have studied in the second half of this semester. You might like to consider the following points when planning your project.

  1. A common goal of POMP analysis is to connect theory to data. This means you must think about both the theory and the data. If possible, choose a dataset on a topic for which you know, or are willing to discover, some background theory. A good way to get ideas for topics to study and places to find data is to look at the past final projects from 2016 and 2018. Each of these projects should contain the data and information about where the data came from. You may want to search for your own data set, but it is also legitimate to re-analyze data from a previous final project. If you do re-analyze data, you should explain clearly how your analysis goes beyond the previous work.

  2. Computational considerations may prevent you analyzing as large a model, or as long a dataset, as you would ideally do. That is fine. Present a smaller, computationally feasible analysis and discuss possible extensions to your analysis.

  3. As for the midterm project, the time series should hopefully have at least 100 time points. You can have less, if your interests demand it. Shorter data needs additional care, since model diagnostics and asymptotic approximations become more delicate on small datasets. If your data are longer than, say, 1000 time points, you can subsample if you start having problems working with too much data.

  4. You are welcome to discuss your choice of final project with your group, and in the group instructor meetings. The projects are individual, but everyone will benefit from talking to each other about what to study and how to proceed.

To submit your project, write your report as an R markdown (Rmd) file. Submit the report by 5pm on Wednesday April 29, as a zip file containing an Rmd file and anything else necessary to allow the grader to render the Rmd file as an html document. Projects will be posted anonymously, with source code and data, unless you request some or all of the project to remain confidential. After grades are assigned, you will be invited to add your name back to your project if you choose.


Some comments on choice of data and data analysis goals


Expectations for the report. The report will be graded following the same approach used for the midterm project. It will be graded on the following categories.


Methodology not covered in class

This class has focused on ARMA and POMP models, two related approaches to time domain analysis of time series.

Time series topics on which we will spend little or no time include frequency domain analysis of multivariate time series (Shumway and Stoffer, Chapter 7) and time-frequency domain analysis using wavelets (Shumway and Stoffer, Section 4.9).

If you decide that alternative approaches are particularly relevant for your data, you can use them in your project as a complementary approach to what we have covered in class.


Group meetings

You are welcome to discuss all stages of your project among your group. This is recommended; more than ever in the current situation, we should support each other and help those who hit difficulties. If you get useful feedback from your group please make an appropriate acknowledgement, just as you would for any other source.


Plagiarism

If material is taken directly from another source, that source must be cited and the copied material clearly attributed to the source, for example by the use of quotation marks. Failing to do this is plagiarism and will, at a minimum, result in zero credit for the scholarship category and the section of the report in which the plagiarism occurs. Further discussion of plagiarism can be found in On Being a Scientist: A Guide to Responsible Conduct in Research: Third edition (2009), by The National Academies Press. Here is how the Rackham Academic and Professional Integrity Policy describes plagiarism:

11.2.2 Plagiarism

Includes:

Representing the words, ideas, or work of others as one’s own in writing or presentations, and failing to give full and proper credit to the original source.

Failing to properly acknowledge and cite language from another source, including paraphrased text.

Failing to properly cite any ideas, images, technical work, creative content, or other material taken from published or unpublished sources in any medium, including online material or oral presentations, and including the author’s own previous work.


The COVID-19 situation

Please let me know if personal hardship is affecting your ability to study for this course. The situation is fluid at the moment. Situations may arise that call for flexibility with regard to setting and assessing coursework.