LearnR-Part2/01_inclass_solutions.qmd at main · data-and-visualization/LearnR-Part2 · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
title: "Learn R Part II In-class Exercises"
format: html
editor: visual
---

Welcome to Quarto! This is where you will try out all of the hands-on exercises in the workshop. Begin by running these first two code chunks:

```{r}
# load packages
library(tidyverse)
```

```{r}
# load data
tswift <- read_csv("data/taylor_swift_spotify.csv")
nat_parks <- read_csv("data/nat_parks_visitors.csv")
```

## 00. Quick review

Using an `if_else()` statement, create a new variable in `tswift` that indicates if a song is "long" or "short". Name the variable `long_short`.

Songs are considered "long" if `duration_ms` is greater than 250000.

```{r}
# create long_short variable
tswift <- tswift |>
  mutate(long_short = if_else(duration_ms > 250000,
                              "long",
                              "short"))
```

## 01. How do I aggregate by collapsing?

Using `group_by()` and `summarise()`, calculate average `danceability` by `long_short`.

On average, are Taylor Swift's longer or shorter songs more "danceable"?

***Shorter***

```{r}
tswift |>
  group_by(long_short) |>
  summarise(avg_danceability = mean(danceability))
```

## 02. How do I aggregate *without* collapsing?

Alter the `tswift` data frame to add a variable that calculates average acousticness by album (without collapsing).

Bonus: Can you determine if the song "Cruel Summer" is more or less acoustic than the Lover album average?

***Less acoustic (.12 vs. .33)***

```{r}
# add average acousticness by album
tswift <- tswift |>
  group_by(album) |>
  mutate(avg_acoustic = mean(acousticness))

# find Cruel Summer
tswift |>
  select(name, acousticness, avg_acoustic) |>
  filter(name == "Cruel Summer")
```

## 03. How do I tidy data?

Pivot the `nat_parks` data frame longer so that year and visitors each make a column.

Hint: Pivot the year columns only. To specify them, you can use either of these structures:

`cols = start:stop`

`cols = -c(column1, column2`)

```{r}
# pivot data long
nat_parks_long <- nat_parks |>
  pivot_longer(cols = 3:7,
               names_to = "year",
               values_to = "visitors")

nat_parks_long
```