Skip to content

Latest commit

 

History

History
 
 

README.md

This week covers:

  • An intro to Git and Github for sharing code
  • Command line tools
  • Exploratory data analysis with R

Day 1

Setup

Install tools: Ubuntu on Windows, GitHub for Windows, R, and RStudio

Ubuntu on Windows

  • Type bash in the Start Menu, hit enter, and then y to install Ubuntu on Windows
  • If this seems like it's hanging, hit enter
  • Create a username and password
  • Updates all packages with sudo apt-get update and sudo apt-get upgrade

Git / GitHub for Windows

  • Check that you have git under bash by typing git --version in the terminal
  • Install GitHub for Windows
  • Configure git to deal with line endings in a cross-platform friendly way: git config --global core.autocrlf true

R and RStudio

  • Download and install R from a CRAN mirror
  • Download and install RStudio
  • Open RStudio and install the tidyverse package, which includes dplyr, ggplot2, and more: install.packages('tidyverse', dependencies = T)

Text editor

Intro to Git(Hub)

Make your first commit and pull request

  • Complete this free online git course
  • Sign up for a free GitHub account
  • Then follow this guide to fork your own copy of the course repository
  • Clone a copy of your forked repository, which should be located at git@github.com/<yourusername>/coursework.git, to your local machine
  • Once that's done, create a new file in the week1/students directory, <yourfirstname>.txt (e.g., jake.txt)
  • Use git add to add the file to your local repository
  • Use git commit and git push to commit and push your changes to your copy of the repository
  • Then issue a pull request to send the changes back to the original course repository
  • Finally, configure a remote repository called upstream to point here:
    git remote add upstream git@github.com:msr-ds3/coursework
    git fetch upstream
	git merge upstream/master
  • Note: this is equivalent to git pull upstream master

Learn more (optional)

Intro to the Command Line

Learn more (optional)

Day 2

Command line exercises

  • Review intro_command_line.ipynb for an introduction to the command line
  • Download one month of the Citibike data: wget https://s3.amazonaws.com/tripdata/201402-citibike-tripdata.zip
  • Decompress it: unzip 201402-citibike-tripdata.zip
  • Rename the resulting file to get rid of ugly spaces: mv 2014-02*.csv 201402-citibike-tripdata.csv
  • See the download_trips.sh file which automates this, and can be run using bash download_trips.sh or ./download_trips.sh
  • Fill in solutions of your own under each comment in citibike.sh

Intro to R

Day 3

Counting

Plotting

Day 4

Plotting (cont'd)

Combining and reshaping data

Day 5

Guest lecture: Computational Complexity

More counting and plotting

Combining and reshaping data (cont'd)

Save your work

  • Make sure to save your work and push it to GitHub. Do this in three steps:
    1. git add and git commit and new files to your local repository. (Omit large data files.)
    2. git pull upstream master to grab changes from this repository, and resolve any merge conflicts, commiting the final results.
    3. git push origin master to push things back up to your GitHub fork of the course repository.