Work through an R tutorial

You should have R and RStudio installed (that was homework zero). If you have never used R before, the following swirl tutorial will get you started. If you have used some R before, it is likely a good idea to work through the tutorial to make sure you have the required background for this course. More experienced users can help newer users get started.

In R or RStudio, type install.packages("swirl") to download the swirl tutorial package from the internet. Every line of code must be followed by the [ENTER] key, but usually we don’t mention that. The [ENTER] key is sometimes called [RETURN].

Now swirl is installed on your computer, you can type library("swirl") to load the package into your R session.

Finally, swirl() starts the tutorial program. Select 1: R Programming by typing 1 [ENTER] and then, again, hit 1 [ENTER] to start Lesson 1: Basic Building Blocks.

Repeat this with Lesson 3: Sequences of Numbers and Lesson 4: Vectors. We will not need the material in Lesson 2: Workspace and Files but you can do this tutorial too if you have time.

The lessons are designed to take 10-20 minutes each. In addition to introducing some knowledge, they give your fingers a chance to start practicing coding in R. You will have a chance to ask swirl questions during lab. If possible, bring your laptop to lab.

Q1. Write a brief paragraph on what you did with swirl, which tutorials you successfully completed, and the technical or conceptual obstacles (if any) that you encountered and (hopefully) overcame.


A data manipulation exercise

Here we will test some of the basics of R data manipulation which you should know or should have learned by following the tutorials above. We will look at the data in the file femaleMiceWeights.csv taken from a study of diabetes. The body weight of mice (in grams) was measured after around two weeks on one of two diets (chow or high fat). These data are in comma separated variable (csv) format. You can read the data into R with

mice <- read.csv("https://ionides.github.io/401f18/hw/hw01/femaleMiceWeights.csv")

Q2. (a) Is mice an R object of class data.frame or matrix, and how can you tell? (b) What is the difference between a dataframe and a matrix in R? (c) What is the dimension of the mice dataset? (d) What are the units, and what are the variables?

Q3. What is the exact name of the column containing the weights? Use the function colnames() to report code that extracts this.

The [ and ] symbols can be used to extract specific rows and specific columns of the table.

Q4. What is the entry in the 12th row and second column? Give code to extract it.

You should have seen that the $ character can be used to extract a column from a table and return it as a vector.

Q5. Use $ to extract the weight column and report the weight of the mouse in the 11th row and the code used to obtain it.

Q6. The length function returns the number of elements in a vector. Use length() to obtain the number of mice included in our dataset. Report your code.

Q7. To create a vector with the numbers 3 to 7, we can use seq(3,7) or, because they are consecutive, 3:7. View the data and determine what rows are associated with the high fat diet (coded as hf). Use a sequence of indices to extract the vector of weights for the corresponding animals and then use the mean() function to compute their average weight. Report this mean weight and the code used to get it.


Thinking about datasets used in class

We only expect fairly superficial understanding of the science behind datasets. However, it is worth thinking about the data enough to allow common sense reasoning about the numbers we are studying. The following questions relate to datasets and statistical questions introduced in Chapter 1 of the notes.

Q8. Is the life expectancy at birth in 2015 more or less than the lifespan you would expect for a baby born in 2015, or about the same? Explain.

Q9. Do you think the following economic variables are pro-cyclical or counter-cyclical: (i) healthcare spending; (ii) construction of new homes. Explain.


For your homework report, give answers to the questions above together with the code used to generate them. As with all homeworks for this course, don’t forget to make a statement of sources as requested on the syllabus. Homework without this statement receives zero credit.


License: This material is provided under an MIT license
Acknowledgement: This homework is derived from https://genomicsclass.github.io/book