Git has become central to collaborative computing, sharing of data and code, and open source software. A professional statistician should have at least a working knowledge of git. Git is slowly becoming incorporated into more undergraduate and graduate courses, but likely this class spans a wide range from git novices to experts. If this assignment is trivial for you, consider helping others who are new to git.
GitHub is the largest git-based internet repository, but others (such as Bitbucket) also use git. You can use git to build a local repository on your own computer, though in practice it is usually convenient to have the repository linked to an internet site.
Our tasks are
Learn some ways to think about what a git repository is and how it works.
Follow the instructions below to practice going through the process of editing a GitHub repository, making a fork, and submitting a pull request.
These instructions emphasize command-line control of git, working locally on a terminal on your laptop. If you have only used git in a web-based context, this may be new to you. We can discuss in class why command-line control is valuable for data science. Later, we will investigate command-line computing further, so if this is unfamiliar to you then please ask others for help as needed to get through this homework. Graphical user interfaces (GUI) and web applications can be useful in some situations, having different strengths and weaknesses compared to command-line tools. It is possible, and sometimes useful, to edit git repositories directly on GitHub, or to use a GUI on your own machine, but for this homework the goal is to study the command-line approach.
Use the self-teaching materials at GitHub Skills or Atlassian Tutorials to spend an hour or so advancing your knowledge of git. The Atlassian tutorials are good for learning command-line git, but they teach in the context of Bitbucket which is currently less popular than GitHub although both are based around the same git program. Alternatively, browse Karl Broman’s practical and minimal git/github tutorial which this assignment draws on.
hw08.Rmd
file from the ionides/810f24
GitHub
repository, compile this to html (for example, using Rstudio) and submit
your report via Canvas as an html file.YOUR ANSWER HERE.
YOUR ANSWER HERE.
YOU CAN DELETE THE REMAINDER OF THIS FILE WHEN SUBMITTING YOUR HOMEWORK
Get an account on GitHub, if you do not already have one.
If git is not installed already, download and install it from git-scm.com/downloads.
Set up your local git installation with your user name and email. Open a terminal and type:
$ git config --global user.name "Your name here"
$ git config --global user.email "your_email@example.com"
Don’t type the $; that just indicates that you’re doing this at the command line. On Windows, you can run these commands in the Linux emulator provided by the git client. Disclaimer: I do not run a Windows machine, so please let me know if Windows instructions are incorrect or out-of-date.
repository. A representation of the current state of a collection of files, and its entire history of modifications.
commit. A commit is a change to one or many of the files in repository. The repository therefore consists of a directed graph of all previous commits.
branch. Multiple versions of the collection of files can exist simultaneously in the repository. These versions are called branches. Branches may represent new functionality, or a bug fix, or different versions of the code with slightly different goals.
Branches have names. A special name called master is reserved for the main development branch.
Branches can be created, deleted or merged.
Each new commit is assigned to a branch.
We now have the pieces in place to visualize the graph of a git repository. [Picture credit: hades.github.io]
Take some time to identify the commits, branching events, and merging events on the graph.
git clone git@github.com:ionides/810f24
git clone https://github.com/ionides/810f24
GitHub requires an SSH connection for some actions, and so cloning by https is not recommended. If all you want to do is inspect a copy of the repository locally, https is sufficient.
You now have a local copy of the STATS 810 class materials.
The local repository remembers the address of the remote repository it was cloned from.
[ionides@doob 810f24]$ git pull
Already up-to-date.
If you tell me your GitHub username, I could in principle add you
as a developer of the ionides/810f24
GitHub repository.
Then you can commit changes directly.
However, here, let’s practice something a bit more fancy. We will follow a standard workflow for proposing a change to someone else’s GitHub repository.
Forking is making your own GitHub copy of a repository. A pull request is a way to ask the owner of the repository to pull your changes back into their version. The following steps guide you through a test example.
Go to ionides/810f24
on GitHub, for example by
searching for 810f24
.
Click fork
at the top right-hand corner, and follow
instructions to add a forked copy to your own GitHub account. It should
now show up in your account as my_username/810f24
.
Clone a local copy of the forked repository to your machine, e.g.,
git clone git@github.com:my_username/810f24
Move to the 810f24
directory and edit the file
sign_here.html
to check your own name.
It can be helpful to type
git status
regularly to check on the current state of the repository.
sign_here.html
with the version in the most
recent commit. The only difference should be the line you edited.git diff sign_here.html
810f24
,git add sign_here.html
git commit -m "sign up for my_name"
and see how the git status
has changed. Another useful
command for checking on the recent action in the repository is
git log
810f24
on
GitHub:git push
my_username/810f24
fork,
click New pull request
and follow instructions. When you
have successfully placed your pull request, the owner of the parent
repository (me) will be notifed. I will then pull the modifications from
your fork into ionides/810f24
.