14  Mid-semester review



Settling in

You won’t type anything today! For future reference you can access the following:




Learning goals

Review the basics of wrangling and visualization




14.1 Warm-up




Recall the major themes of our work thus far:

Getting to know your data

  • key functions: head(), dim(), nrow(), class(), str()

Data visualization

  • picking an appropriate plot / evaluating the appropriateness of a plot
  • interpreting the results of a plot
  • building effective plots that are accessible and “professional”
  • key functions:
    • ggplot()
    • geom_bar(), geom_density(), geom_boxplot(), geom_histogram(), geom_point(), geom_line(), geom_smooth()
    • facet_wrap()
  • key features:
    • color, fill
    • fig.alt, fig.caption

Data preparation: wrangling data

  • goals
    • wrangle our data
    • obtain numerical summaries of our data
  • key functions
    • arrange() our data in a meaningful order
    • subset the data to only filter() the rows and select() the columns of interest
    • mutate() existing variables and define new variables
    • summarize() various aspects of a variable, both overall and by group (group_by())
    • count() up the number of instances of an outcome or set of outcomes

Data preparation: reshaping data

  • goal: reshape our data to fit the task at hand
  • functions:
    • pivot_longer()
    • pivot_wider()

Data preparation: joining data

  • goal: join different datasets into one
  • functions
    • mutating joins which combine columns of different datasets:
      left_join(), inner_join(), full_join()
    • filtering joins which filter rows according to membership / non-membership in another dataset:
      semi_join(), anti_join()

Data preparation: working with factor variables

  • goals
    • turn character variables into factor variables (when necessary)
    • turn factor variables into more meaningful factor variables
  • key functions
    • reordering categories / levels: fct_relevel(), fct_reorder()
    • change category labels: fct_recode()

Data preparation: working with strings

  • goal: detect, replace, or extract certain patterns from character strings
  • key functions
    • return a modified string: str_replace(), str_replace_all(), str_to_lower(), str_sub()
    • return a set of TRUE/FALSE: str_detect()
    • return a number: str_length()




IMPORTANT

A list of these key functions will be provided to you on the quiz, without the corresponding context. That list will appear something like this:

  • ggplot functions
    ggplot(), geom_bar(), geom_boxplot(), geom_density(), geom_histogram(), geom_line(), geom_point(), geom_smooth(), facet_wrap()
  • wrangling functions
    arrange(), count(), filter(), group_by(), mutate(), select(), summarize()
  • pivot_ functions
    pivot_longer(), pivot_wider()
  • _join functions
    anti_join(), full_join(), inner_join(), left_join(), semi_join()
  • fct_ functions
    fct_recode(), fct_relevel(), fct_reorder()
  • str_ functions
    str_detect(), str_length(), str_replace(), str_replace_all(), str_sub(), str_to_lower()





TODAY

We’ll practice SOME of these concepts today. Important caveats:

  • This activity is NOT an exhaustive review – it doesn’t cover every topic or every type of question you’ll be asked. For example, it overemphasizes older material as it’s less fresh.
  • Be kind to yourself! If you haven’t started studying / reviewing yet, this might feel bumpy.





14.2 Part 1: What’s the verb?

Goal: Review some of the data preparation functions we’ve learned.

Directions: In your group, complete the provided Part 1 activity. Once you’re done, let me know. I’ll then check your answers and give you Part 2.





14.3 Part 2: Quiz practice

Goal: Practice some problems that are more in the style of the quiz questions. These give you a sense of the structure, vibe, and types of questions that might be asked so that none of that comes as a surprise.

Directions: In your group, complete Part 2 of the activity. Once you’ve completed your work, work on Homework 6 (or anything else related to this class).





14.4 Wrap-up

  • Homework 6 is due tonight by 11:59pm.

  • Quiz 2 is Tuesday (October 29).

    • Study tips:
      • Make a study sheet based off of the activities. Though you can’t bring this into the quiz, it’s helpful for studying.
      • Study your study sheet!
      • Review all checkpoints, activities, and homework (in that order). Try doing the exercises without peeking at solutions. Take note of where you need to spend more time studying.
      • Review Quiz 1.
    • The following list of functions will be provided to you. It will look something like this:
      • ggplot functions
        ggplot(), geom_bar(), geom_boxplot(), geom_density(), geom_histogram(), geom_line(), geom_point(), geom_smooth(), facet_wrap()
      • wrangling functions
        arrange(), count(), filter(), group_by(), mutate(), select(), summarize()
      • pivot_ functions
        pivot_longer(), pivot_wider()
      • _join functions
        anti_join(), full_join(), inner_join(), left_join(), semi_join()
      • fct_ functions
        fct_recode(), fct_relevel(), fct_reorder()
      • str_ functions
        str_detect(), str_length(), str_replace(), str_replace_all(), str_sub(), str_to_lower()