Final project outline. Find a time series dataset of your choice. Carry out a time series analysis, taking advantage of what we have learned in this course. It is expected that part of your project will involve a POMP analysis, using the modeling and inference approaches we have studied in the second half of this semester. A common goal of POMP analysis is to connect theory to data. To do this, you must think about both the theory and the data. If possible, choose a dataset on a topic for which you know, or are willing to discover, some background theory. A good way to get ideas for topics to study and places to find data is to look at the past final projects from 2016, 2018 and 2020. Each of these projects should contain the data and information about where the data came from. You may want to search for your own data set, but it is also legitimate to re-analyze data from a previous final project. If you do re-analyze data, you should explain clearly how your analysis goes beyond the previous work, and you should be especially careful to give proper credit to any code you reuse. Also, note that old pomp code may need modification to run on the current version of pomp. The pomp version 2 upgrade guide can be helpful for older code. The changes from pomp2.0 to the current pomp3.3 are smaller.


Groups and subgroups. You can have any number of subgroups. If there are 5 people in your group, possible subgroup sizes are \(\{5\}\), \(\{4,1\}\), \(\{3,2\}\), \(\{3,1,1\}\), \(\{2,2,1\}\), \(\{2,1,1,1\}\), \(\{1,1,1,1,1\}\). If in doubt, groups of 2 or 3 are recommended. The priority is that everyone should be working on a dataset that interests them, and everyone should be in a position to contribute. Hopefully the subgroups will self-organize, but contact the instructor or GSI if issues arise. Please discuss progress on all subgroup projects during scheduled group meetings, though you are welcome to make additional subgroup meetings as needed.


Choice of data. As for the midterm project, the time series should hopefully have at least 100 time points. You can have less, if your interests demand it. Shorter data needs additional care, since model diagnostics and asymptotic approximations become more delicate on small datasets. If your data are longer than, say, 1000 time points, you can subsample if you start having problems working with too much data. Computational considerations may prevent you analyzing as large a model, or as long a dataset, as you would ideally do. That is fine. Present a smaller, computationally feasible analysis and discuss possible extensions to your analysis.


Data privacy and project anonymity. The projects, together with their data and source code, will be posted anonymously on the class website unless you have particular reasons why this should not be done. For example, you may have access to data with privacy concerns. The projects will be posted anonymously. After the semester is finished, you can request for your name to be added to your project if you want to.


Submission. To submit your project, write your report as an R markdown (Rmd) file. Submit the report by 11:59pm on Tuesday April 20, as a zip file containing the following:
  1. The main project file, called main.Rmd
  2. A blinded version of this, called blinded.Rmd. This version should be anonymous. For the group contribution section, you can either remove names or just say something like “Description of individual contributions removed for anonymity”.
  3. Any rda or rds files needed for the grader to re-run the Rmd files in less than about a minute of run time. There is a 500MB upload limit on Canvas, so you might have to edit your code so that you save only necessary results.
  4. Compiled versions, main.html and blinded.html
  5. Data files (unless your code downloads data from the internet) and any other files needed to make your Rmd files run. You can assume the grader has all necessary R packages installed.

As for the midterm project, please could each member of a subgroup submit a copy of their final project.


Some comments on choice of data and data analysis goals.


Methodology not covered in class. This class has focused on ARMA and POMP models, two related approaches to time domain analysis of time series. For example, we have not spent much time on frequency domain analysis of multivariate time series (Shumway and Stoffer, 3rd edition, Chapter 7). If you decide that alternative approaches are particularly relevant for your data analysis goal, you can use them in your project as a complementary approach to what we have covered in class. Eplaining and justifying an alternative approach can be a substantial component of the project.


Expectations for the report. The final report will be graded on the following categories, the same as for the midterm project.


Plagiarism. All sources are allowed. You can access any website and talk to any human about your project, as long as the interaction is properly credited. If material is taken directly from another source, that source must be cited and the copied material clearly attributed to the source, for example by the use of quotation marks. Failing to do this is plagiarism and will, at a minimum, result in zero credit for the scholarship category. For course projects, we should be even more careful with attributions than the high standards expected across academia. For example, any time you discuss your project with a classmate, or you use advice from Stack Overflow, you can add this to the acknowledgements section of your project.

Further discussion of plagiarism can be found in On Being a Scientist: A Guide to Responsible Conduct in Research: Third edition (2009), by The National Academies Press. Here is how the Rackham Academic and Professional Integrity Policy describes plagiarism:

11.2.2 Plagiarism

Includes:

Representing the words, ideas, or work of others as one’s own in writing or presentations, and failing to give full and proper credit to the original source.

Failing to properly acknowledge and cite language from another source, including paraphrased text.

Failing to properly cite any ideas, images, technical work, creative content, or other material taken from published or unpublished sources in any medium, including online material or oral presentations, and including the author’s own previous work.