2 RStudio workshop day
2.1 Getting started
ANNOUNCEMENTS
Calendar
Don’t forget to check the calendar and day-by-day schedule.MSCS community
If you’re an MSCS major / minor, or plan to be, I strongly encourage you to sign up for the MSCS community listserv. This is where information is shared about department events, internships opportunities, etc.Pre-course surveys
Your responses do not fall into some void! I’m enjoying going through the responses and will absorb more and more of the info in the next few weeks.Syllabus Q&A
I’ve added a last page to the syllabus that responds to your questions from the pre-course survey.Switch to Slack
Going forward, I’ll be sending announcements via Slack. Thus it’s important to set up Slack if you haven’t already.
TODAY’S GOALS
- Get up and running with RStudio!
Heads up: In the first few weeks of class, you’ll see most of the code that we’ll need for the rest of the semester. Thus there’s a steeper learning curve at the beginning of the semester, and then things will level out. - Make some mistakes and be ok with that.
- Practice collaboration.
WHY R / RSTUDIO?
Beyond some simple calculations, you can’t do data analysis without software. So why RStudio?
- It’s free.
- It has a huge online community (thus finding help is easier than with other software).
- Fun fact: RStudio was started by Mac alum JJ Allaire and beta-tested at Mac!
- It’s used outside academia, and not just by statisticians.
- MPR journalist David Montgomery uses R for data analysis and visualizations.
- BBC uses R.
- Ahmadou Dicko discusses how humanitarians are using R to create “life saving data products.”
- Shelmith Kariuki discusses how the Kenyan government shared its census data using R, to support policy making and development.
DIRECTIONS
In-class activities are designed for your own learning – you will not hand in or be graded on the activities.
If you don’t finish an activity during class, you are expected to complete the activity outside of class.
Making mistakes
RStudio is most likely new to you. With this in mind, know this:- this activity will be bumpy
- you will make mistakes and this is a natural part of learning any new language
- the number of mistakes you make is not a predictor of your ability to thrive in this course
Collaboration
We’re sitting in groups for a reason – you should collaborate throughout this and all future activities. In doing so:- Remember that you all have different experiences, both personal and academic – some of you are seeing RStudio for the first time, some of you are in your first MSCS course.
- Be a good listener.
- Be supportive when others make mistakes.
- Stay in sync with one another while respecting that everybody works at different paces (you have different learning strategies, work styles, note taking strategies, etc). If somebody is working on exercise 10 and everybody else is working on exercise 2, that’s not a very good collaboration.
- Don’t rush. This activity isn’t due at the end of class – you can finish up outside of class.
2.2 Exercises
- Hello!
Take 5-10 minutes to check in with your group.- Share your names & PGPs.
- What other classes are you taking this semester?
- Have you used RStudio before?
- Who will share their screen? (Please share now :))
Open RStudio & install handy packages
Install thetidyverse
package – this includes a set of functions that share a common syntax or structure. Unless the authors of a package add updates, you only need to do this once all semester. To install:- If you’re working with a desktop version of R:
In the “Packages” tab (bottom right pane), click “Install”. From there type the name of the package and click “Install”. - If you’re working on Mac’s server:
Check whether these packages are already installed, i.e. if they appear in the list under the “Packages” tab. If not, follow the directions above.
- If you’re working with a desktop version of R:
- Get organized
Imagine walking into a convenience store where the snacks aren’t organized – snack items are just randomly placed throughout the store. This would make it frustratingly difficult to find your snack! Similarly, it’s important, both for this class and beyond, to start organizing files on your computer if you don’t already.- Create a folder named “STAT 155” within the Documents folder on your computer. This is where you should organize all work for this course. Why create a folder and store it within Documents instead of just saving files to the desktop or Downloads folder?
- Like the haphazard convenience store, keeping files on the desktop or Downloads will make it difficult to find your work.
- Even worse, some computer utilities automatically delete contents from the Downloads folder without warning.
- Within your STAT 155 folder, create 3 new sub-folders named: “Homework”, “In-class activities”, “Mini-project”. You should organize your future files into the appropriate folder.
- Create a folder named “STAT 155” within the Documents folder on your computer. This is where you should organize all work for this course. Why create a folder and store it within Documents instead of just saving files to the desktop or Downloads folder?
- If you’re using Mac’s RStudio server
If you plan to use RStudio on your desktop, move on the the next exercise. If you’re working on Mac’s RStudio server, you’ll want to set up the same set of folders that you did on your computer. As you create and organize files within the server, you can (and should) then download these same files to the corresponding folder on your computer. Why? Files are sometimes cleaned from the server and you’ll eventually lose access (just as with your Mac email). You have more control over the files on your computer.- Go to the “files” tab in the lower right corner of the RStudio server.
- Click on “New folder” and write “STAT 155” for the new folder name. Then “OK”.
- You should now see the STAT 155 folder in the list of files. Open that and create 3 new sub-folders: “Homework”, “In-class activities”, “Mini-project”.
- Remember to check in with your group.
Are you roughly in sync? Are you checking in and supporting one another?
- Start a new Rmd
- Open a new .Rmd file (File > New File > R Markdown…).
- Remove all contents in the file and write
# RStudio workshop day
at the very top. - In a first sentence below the title, type the following, including italics and bold:
This is my first STAT 155 document and it’s exciting! - Knit the document to html (click the “Knit” button at the top of your Rmd). This will require that you name and save your document.
- If working with a desktop RStudio: Save this to the “In-class activities” folder you created, not your Downloads.
- If working with Mac’s server: Save this to the “In-class activities” folder you created. (You’ll later export this to your computer, but right now it’s living on the server.)
- If working with a desktop RStudio: Save this to the “In-class activities” folder you created, not your Downloads.
- Create a chunk
- Within your Rmd, set up a new “R chunk” and in it:
- calculate the sum of 52 and 49
- use the
rep()
function to repeat the number “10” six times - use the
rep()
function to repeat the number “6” 10 times
- Knit again!
- Within your Rmd, set up a new “R chunk” and in it:
Naming things
Create and fill in the blanks (___
) of the new R chunk below. Afterward, knit your document again to make sure that you don’t have any errors! Note: R chunks ignore anything in a line after#
. We’ll use this feature to comment our code, ie. communicate and remember what we’re doing!# Store your age as "my_age" # Storing things is a good way to save & use later! <- ___ my_age # Confirm that your age is stored correctly my_age # Calculate how old you will be in 10 years + ___ my_age
- Practice naming things
- Create a new R chunk.
- In it, name and store the results of 2 times 2. NOTE: You pick the name. This should start with a letter and not include any spaces.
- Multiply your stored result by 2 (using the name which you stored it under). The result should be 8.
- Knit!
- Remember to check in with your group.
Are you roughly in sync? Are you checking in and supporting one another?
Importing data!
Where RStudio really shines is in working with data. How we import data into RStudio depends on its format (eg: .csv, .xls) and where it’s stored (eg: your computer, the internet). In the next exercises you’ll use Spotify data stored in a csv format and made available on the internet. You can read about and access a codebook, ie. a description of what variables are included in the data set, here.Create a new R chunk.
Import and store the data under the name
spotify
. Given its format and location, we do this by applying theread.csv()
function to the url location of the data:# Import data <- read.csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-21/spotify_songs.csv") spotify
Nothing new will appear in your document after you import the data. This isn’t because the import didn’t work, but because you stored, but didn’t print, the data. Move on to the next question!
- Get to know the data: Part I
- In the upper right hand pane in RStudio, click on the Environment tab and then the
spotify
line. Thespotify
data will pop up at left – check it out. - In this tidy dataset, what is the unit of observation? That is, what is represented in each row of the dataset?
- In the upper right hand pane in RStudio, click on the Environment tab and then the
Get to know the data: Part II
Try out each function below. Identify what each function tells you about thespotify
data and note this in the???
:# ??? [what do both numbers mean?] dim(spotify) # ??? nrow(spotify) # ??? head(spotify) # ??? names(spotify)
One more time on your own
Implement the steps below in a new R chunk. Comment your code, ie. include a#
with a description for each line.Import & name data on different Himalayan peaks from the url below. A codebook is here:
https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/peaks.csv
Show the first 6 rows of the dataset. NOTE: This gives us a quick glimpse without having to print out the entire dataset!
How many peaks are included in the dataset?
How many variables are recorded on each peak?
- Wrapping up
Knit your Rmd one more time. If you’re working with RStudio on your desktop, you’re done with this activity. If you’re working on Mac’s RStudio server, you have one more step that you should take at the end of each activity: export your activity files to your computer. To do so:- Go to the Files tab in the lower right side of RStudio.
- Click the boxes next to the two activity files: the .Rmd and the .html.
- Still within the Files tab, click on the “More” button that has a gear symbol next to it.
- Click “Export” then “Download”.
- The files were likely exported from the RStudio server to the Downloads folder on your computer. It’s important to now move them to the “In-class activities” folder that you created at the beginning of class. They are now there for safe keeping :)
- If you finish early
You can get started on the video and checkpoint for our next class. You can either do this on your own or stick around to discuss it with your group.