participant_details <- here("data-raw/dime/participant_details.csv") |>
import_dime()
participant_details
participant_details |>
pivot_longer()
participant_details |>
pivot_longer(ends_with("date"))
participant_details |>
pivot_longer(ends_with("date"), names_to = NULL, values_to = "date")
participant_details |>
pivot_longer(ends_with("date"), names_to = NULL, values_to = "date") |>
group_by(pick(-date))
participant_details |>
pivot_longer(ends_with("date"), names_to = NULL, values_to = "date") |>
group_by(pick(-date)) |>
complete(
date = seq(min(date), max(date), by = "1 day")
) |>
ungroup()
usethis::use_package("tidyr")
#' Convert the participant details data to long and clean it up.
#'
#' @param data The DIME participant details data.
#'
#' @returns A data frame.
#'
clean_participant_details <- function(data) {
cleaned <- data |>
tidyr::pivot_longer(
tidyselect::ends_with("date"),
names_to = NULL, values_to = "date"
) |>
dplyr::group_by(dplyr::pick(-date)) |>
tidyr::complete(
date = seq(min(date), max(date), by = "1 day")
) |>
dplyr::ungroup()
return(cleaned)
}
clean_participant_details(participant_details)
participant_details <- here("data-raw/dime/participant_details.csv") |>
import_dime() |>
clean_participant_details()
#' Convert the sleep types to wide format.
#'
#' @param data The cleaned DIME sleep data.
#'
#' @returns A data frame.
#'
sleep_types_to_wider <- function(data) {
wider <- data |>
tidyr::pivot_wider(
names_from = sleep_type,
names_prefix = "seconds_",
values_from = seconds_sum
)
return(wider)
}21 Pivoting your data from and to long or wide
21.1 Learning objectives
- Describe the concept of “pivoting” data to convert data in a “long” format to a “wide” format and vice versa.
- Identify situations where it is more appropriate that data is in the long or wide format.
- Apply the
pivot_longer()andpivot_wider()functions from the tidyr package to pivot data. - Recognise and appreciate the impact that seeking out and using existing functions to solve problems has on how you get things done, instead of writing custom code to do the things you want. For example, looking through the functions found in packages like tidyr.
21.3 📖 Reading task: Pivoting data for easier wrangling
Time: ~6 minutes.
Rarely is our data in the exact format we need or want it to be in to effectively process, analyse, or visualize it. Often we’ll need to do some heavy reshaping. One very powerful way to reshape data is with “pivoting”. Pivoting is when you take a dataset’s rows and convert them to columns or convert columns to rows, and in many different combinations of this.
The tidyr package within the tidyverse contains two wonderful functions for pivoting: pivot_longer() and pivot_wider(). There is a well written documentation on pivoting in the tidyr website that can explain more about it that you can read at another time.
Most use cases of pivoting are to convert to the longer form, which you can do with pivot_longer() and is what we will cover first in this session. There are many reasons pivoting longer is the most common pivot direction.
An example is if you ever need to enter data. Entering data, especially in a spreadsheet tool, in the wide form is much easier and more time efficient than entering data in long form. Consider the scenario where you are entering data from a participant for glucose values that were measured every 30 minutes. So you might enter data like this:
| person_id | glucose_0 | glucose_30 | glucose_60 |
|---|---|---|---|
| 1 | 5.6 | 7.8 | 4.5 |
| 2 | 4.7 | 9.5 | 5.3 |
| 3 | 5.1 | 10.2 | 4.2 |
If working with a spreadsheet, is this is to use tab to move to the next cell and continue entering the data. However, when it comes time to analyze the data, this wide form is very inefficient and difficult to computationally and statistically work with. So, we do data entry in wide and use functions like pivot_longer() to get the data ready for analysis. See Figure 21.1 below that visually shows what happens when you pivot from wide to long.
name and value. Notice how the values in A and B columns are stacked on top of each other in the newly created V column.
If you had, for instance, an ID column for each participant, the pivoting would look like what is shown in Figure 21.2.
id column. New columns are called name and value, as well as the old id column. Notice how, unlike the previous image, the id column is excluded when pivoting into the data on the right.
Pivoting is a conceptually challenging thing to grasp, so don’t be discouraged if you can’t understand how it works yet or of the ways you could use. As you practice using it, you will slowly begin to understand it. With pivot_longer(), the first argument is the data itself. The other arguments are:
-
cols: The columns to use to convert to long form. The input is a vector made usingc()that contains the column names, like you would use inselect()(e.g. you can use theselect_helperslikestarts_with(), or-minus to exclude). -
names_to: Optional, the default isname. If provided, it will be the name of the newly created column (as a quoted character) that contains the original column names. -
values_to: Optional, the default isvalue. Likenames_to, sets the name of the new columns.
Both pivot_longer() and its opposite pivot_wider(), which we will cover later in the session, are incredibly powerful functions. We can’t show close to everything it can do in this workshop, but if you want to learn more, read up on the documentation for it.
👷 This section is being updated. Once ready, it will be shown here.
21.4 Key takeaways
- Data is usually structured to varying degrees as wide or long format. Thinking of your data in terms of shifting it around in structure is a powerful and useful skill to have when doing data wrangling and analysis.
- When needing to convert to the longer form, use
pivot_longer()from the tidyr package. - If you need to convert to the wider form, use the tidyr function
pivot_wider(). - Seek out and test different functions within the tidyr package, it has many very helpful functions for tidying up data such as the
complete()function.
21.5 Code used in session
This lists some, but not all, of the code used in the section. Some code is incorporated into Markdown content, so is harder to automatically list here in a code chunk. The code below also includes the code from the exercises.