21  Pivoting your data from and to long or wide

21.1 Learning objectives

  1. Describe the concept of “pivoting” data to convert data in a “long” format to a “wide” format and vice versa.
  2. Identify situations where it is more appropriate that data is in the long or wide format.
  3. Apply the pivot_longer() and pivot_wider() functions from the tidyr package to pivot data.
  4. Recognise and appreciate the impact that seeking out and using existing functions to solve problems has on how you get things done, instead of writing custom code to do the things you want. For example, looking through the functions found in packages like tidyr.

21.2 💬 Discussion activity: Recall and share what you’ve learned so far

Time: ~6 minutes.

A very effective way to learn is to recall and describe to someone else what you’ve learned. So before we continue this session, take some time to think about what you’ve learned in the last two days.

  1. Take 1-2 minute and try to recall as much as you can about what you’d done over the last two days. Without looking at your notes or the other sections of the website, try to remember things about importing, robust functions, functionals, regex, and split-apply-combine.
  2. Then, for 4-5 minutes, share with your neighbour what you remember and try to describe it to each other. Maybe you will each remember different things.

21.3 📖 Reading task: Pivoting data for easier wrangling

Briefly walk through it again after they’ve read it and explain it a bit more, making use of the tables and graphs. Pivoting is often a difficult concept to grasp, to it’s good to go over it a few times. Emphasise the difficulty of it, to remind them it is ok to not get it.

Time: ~6 minutes.

Rarely is our data in the exact format we need or want it to be in to effectively process, analyse, or visualize it. Often we’ll need to do some heavy reshaping. One very powerful way to reshape data is with “pivoting”. Pivoting is when you take a dataset’s rows and convert them to columns or convert columns to rows, and in many different combinations of this.

The tidyr package within the tidyverse contains two wonderful functions for pivoting: pivot_longer() and pivot_wider(). There is a well written documentation on pivoting in the tidyr website that can explain more about it that you can read at another time.

Most use cases of pivoting are to convert to the longer form, which you can do with pivot_longer() and is what we will cover first in this session. There are many reasons pivoting longer is the most common pivot direction.

An example is if you ever need to enter data. Entering data, especially in a spreadsheet tool, in the wide form is much easier and more time efficient than entering data in long form. Consider the scenario where you are entering data from a participant for glucose values that were measured every 30 minutes. So you might enter data like this:

This is a wide dataset that is useful for data entry.
person_id glucose_0 glucose_30 glucose_60
1 5.6 7.8 4.5
2 4.7 9.5 5.3
3 5.1 10.2 4.2

If working with a spreadsheet, is this is to use tab to move to the next cell and continue entering the data. However, when it comes time to analyze the data, this wide form is very inefficient and difficult to computationally and statistically work with. So, we do data entry in wide and use functions like pivot_longer() to get the data ready for analysis. See Figure 21.1 below that visually shows what happens when you pivot from wide to long.

A diagram showing the use of the pivot longer function, where a table with two columns, A and B, is converted to a table with two columns, name and value. The original columns values are stacked in the value column and the original columns names are stacked in the name column.
Figure 21.1: Pivot longer in tidyr. New columns are called name and value. Notice how the values in A and B columns are stacked on top of each other in the newly created V column.

If you had, for instance, an ID column for each participant, the pivoting would look like what is shown in Figure 21.2.

A diagram showing the use of the pivot longer function, where a table with three columns, id, A and B, is converted to a table with three columns, id, name and value. The original columns values are stacked in the value column and the original columns names are stacked in the name column, but the original id column remains, but with the ids stacked on top of each other.
Figure 21.2: Pivot longer in tidyr, excluding an id column. New columns are called name and value, as well as the old id column. Notice how, unlike the previous image, the id column is excluded when pivoting into the data on the right.

Pivoting is a conceptually challenging thing to grasp, so don’t be discouraged if you can’t understand how it works yet or of the ways you could use. As you practice using it, you will slowly begin to understand it. With pivot_longer(), the first argument is the data itself. The other arguments are:

  1. cols: The columns to use to convert to long form. The input is a vector made using c() that contains the column names, like you would use in select() (e.g. you can use the select_helpers like starts_with(), or - minus to exclude).
  2. names_to: Optional, the default is name. If provided, it will be the name of the newly created column (as a quoted character) that contains the original column names.
  3. values_to: Optional, the default is value. Like names_to, sets the name of the new columns.

Both pivot_longer() and its opposite pivot_wider(), which we will cover later in the session, are incredibly powerful functions. We can’t show close to everything it can do in this workshop, but if you want to learn more, read up on the documentation for it.

CautionSticky/hat up!

When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the teacher 👒 🎩

Caution

👷 This section is being updated. Once ready, it will be shown here.

21.4 Key takeaways

Quickly cover this and get them to do the survey. The previous exercise will likely be challenging for them, so we will not do a discussion activity this time.

  • Data is usually structured to varying degrees as wide or long format. Thinking of your data in terms of shifting it around in structure is a powerful and useful skill to have when doing data wrangling and analysis.
  • When needing to convert to the longer form, use pivot_longer() from the tidyr package.
  • If you need to convert to the wider form, use the tidyr function pivot_wider().
  • Seek out and test different functions within the tidyr package, it has many very helpful functions for tidying up data such as the complete() function.

21.5 Code used in session

This lists some, but not all, of the code used in the section. Some code is incorporated into Markdown content, so is harder to automatically list here in a code chunk. The code below also includes the code from the exercises.

participant_details <- here("data-raw/dime/participant_details.csv") |>
  import_dime()
participant_details
participant_details |>
  pivot_longer()
participant_details |>
  pivot_longer(ends_with("date"))
participant_details |>
  pivot_longer(ends_with("date"), names_to = NULL, values_to = "date")
participant_details |>
  pivot_longer(ends_with("date"), names_to = NULL, values_to = "date") |>
  group_by(pick(-date))
participant_details |>
  pivot_longer(ends_with("date"), names_to = NULL, values_to = "date") |>
  group_by(pick(-date)) |>
  complete(
    date = seq(min(date), max(date), by = "1 day")
  ) |>
  ungroup()
usethis::use_package("tidyr")
#' Convert the participant details data to long and clean it up.
#'
#' @param data The DIME participant details data.
#'
#' @returns A data frame.
#'
clean_participant_details <- function(data) {
  cleaned <- data |>
    tidyr::pivot_longer(
      tidyselect::ends_with("date"),
      names_to = NULL, values_to = "date"
    ) |>
    dplyr::group_by(dplyr::pick(-date)) |>
    tidyr::complete(
      date = seq(min(date), max(date), by = "1 day")
    ) |>
    dplyr::ungroup()
  return(cleaned)
}
clean_participant_details(participant_details)
participant_details <- here("data-raw/dime/participant_details.csv") |>
  import_dime() |>
  clean_participant_details()
#' Convert the sleep types to wide format.
#'
#' @param data The cleaned DIME sleep data.
#'
#' @returns A data frame.
#'
sleep_types_to_wider <- function(data) {
  wider <- data |>
    tidyr::pivot_wider(
      names_from = sleep_type,
      names_prefix = "seconds_",
      values_from = seconds_sum
    )
  return(wider)
}