Appendix B β€” Solutions

This document contains the solutions, or in many ways potentials solutions, to the exercises throughout the workshop. They are hidden by default so you don’t accidentally see solutions for the other exercises.

Pre-process data as you import it

πŸ§‘β€πŸ’» Exercise: Import participant 101’s sleep data

sleep_101 <- here("data-raw/dime/sleep/101.csv") |>
  read_csv(
    show_col_types = FALSE,
    name_repair = to_snake_case,
    n_max = 100
  )

Bundling code into functions

πŸ§‘β€πŸ’» Exercise: Convert the sleep code into a function

#' Import a participant's sleep data from DIME.
#'
#' @param file_path Path to the participant's sleep file.
#'
#' @return Outputs a data frame/tibble.
#'
import_sleep <- function(file_path) {
  sleep <- file_path |>
    read_csv(
      show_col_types = FALSE,
      name_repair = to_snake_case,
      n_max = 100,
    )
  return(sleep)
}

Making robust and general-purpose functions

πŸ§‘β€πŸ’» Exercise: Finish setting the dependencies

Console
usethis::use_package("readr")
usethis::use_package("snakecase")
usethis::use_package("fs")
docs/learning.qmd
import_cgm <- function(file_path) {
  cgm <- file_path |>
    readr::read_csv(
      show_col_types = FALSE,
      name_repair = snakecase::to_snake_case,
      n_max = 100
    )
  return(cgm)
}

Doing many things at once with functionals

πŸ§‘β€πŸ’» Exercise: Convert into a function to use it on the sleep data

#' Import all DIME CSV files in a folder into one data frame.
#'
#' @param folder_path The path to the folder that has the CSV files.
#'
#' @return A single data frame/tibble.
#'
import_csv_files <- function(folder_path) {
  files <- folder_path |>
    fs::dir_ls(glob = "*.csv")

  data <- files |>
    purrr::map(import_dime) |>
    purrr::list_rbind(names_to = "file_path_id")
  return(data)
}

Cleaning characters and dates

πŸ§‘β€πŸ’» Exercise: Using NSE in your function

#' Prepare the date columns in DIME CGM and sleep data for joining.
#'
#' @param data The data that has the datetime column.
#' @param column The datetime column to convert to date and hour.
#'
#' @returns A tibble/data.frame
#'
prepare_dates <- function(data, column) {
  prepared_dates <- data |>
    dplyr::mutate(
      date = lubridate::as_date({{ column }}),
      hour = lubridate::hour({{ column }}),
      .before = {{ column }}
    )
  return(prepared_dates)
}

πŸ§‘β€πŸ’» Exercise: Convert β€˜get ID’ code into a function

#' Get the participant ID from the file path column.
#'
#' @param data Data with `file_path_id` column.
#'
#' @return A data.frame/tibble.
#'
get_participant_id <- function(data) {
  data_with_id <- data |>
    dplyr::mutate(
      id = stringr::str_extract(
        file_path_id,
        "[:digit:]+\\.csv$"
      ) |>
        stringr::str_remove("\\.csv$") |>
        as.integer(),
      .before = file_path_id
    ) |>
    dplyr::select(-file_path_id)
  return(data_with_id)
}

Using split-apply-combine to help in processing

πŸ§‘β€πŸ’» Exercise: Create a clean_sleep() function

#' Clean and prepare the sleep data for joining.
#'
#' @param data The sleep dataset.
#'
#' @returns A cleaner data frame.
#'
clean_sleep <- function(data) {
  cleaned <- data |>
    get_participant_id() |>
    dplyr::rename(datetime = date) |>
    prepare_dates(datetime) |>
    summarise_column(seconds, list(sum = sum))
  return(cleaned)
}
#' Clean and prepare the CGM data for joining.
#'
#' @param data The CGM dataset.
#'
#' @returns A cleaner data frame.
#'
clean_cgm <- function(data) {
  cleaned <- data |>
    get_participant_id() |>
    prepare_dates(device_timestamp) |>
    dplyr::rename(glucose = historic_glucose_mmol_l) |>
    # You can decide what functions to summarise by.
    summarise_column(glucose, list(mean = mean, sd = sd))
  return(cleaned)
}

Pivoting your data from and to long or wide

πŸ§‘β€πŸ’» Exercise: Create a new function to pivot sleep to wider

#' Convert the sleep types to wide format.
#'
#' @param data The cleaned DIME sleep data.
#'
#' @returns A data frame.
#'
sleep_types_to_wider <- function(data) {
  wider <- data |>
    tidyr::pivot_wider(
      names_from = sleep_type,
      names_prefix = "seconds_",
      values_from = seconds_sum
    )
  return(wider)
}