Want to help out or contribute?

If you find any typos, errors, or places where the text may be improved, please let us know by providing feedback either in the feedback survey (given during class) or by using GitLab.

On GitLab open an issue or submit a merge request by clicking the "Edit this page " button on the side of this page.

B Continued hands-on exercises

Note: We’re still testing to see what best works for this section.

This session is all about reinforcing the skills you’ve learned in this course by continuing to use them. You can either continue working on the MMASH dataset and complete the exercises below or you could try to apply these skills to another dataset (such as your own). If you continue with the MMASH, we strongly encourage you to work together with your group to ask questions and to help each other out in understanding what to do and what to code (we are also here for help).

B.1 MMASH exercises

B.1.1 Fix up some remaining data issues

There are some issues with the data still. Try to work through these tasks to clean the data up more.

  1. In the cleaned data, take a look at user 21. Notice some problems? Go into the raw data as well as the data documentation on the MMASH website and try to figure how what happened. How you might fix the problem? Try your fix within the data-raw/mmash.R cleaning script.

  2. If you do count(mmash, day) in the Console while the mmash dataset is loaded, you’ll see that the day column has a weird value of -29. Look into the data documentation on the MMASH website and see if you can find an explanation for this. How would you go about solving this issue? Write up code to fix this issue in the data-raw/mmash.R script.

B.1.2 Import and process the activity data

We have a few other datasets that we could join together, but would likely require more processing in order to appropriately join with the other datasets. Complete these tasks in the doc/lesson.Rmd file:

  1. Create a new header called ## Exercise: Importing activity data
  2. Create a new code chunk below this new header.
  3. Starting the workflow from the beginning (e.g. with the spec() process), write code that imports the Activity.csv data into R.
  4. Convert this code into a new function using the workflow you’ve used from this course:
    • Call the new function import_activity.
    • Include one argument called file_path
    • Test that it works.
    • Add Roxygen documentation and explicit package links (::) with the functions.
    • Move the newly created function into the R/functions.R.
    • Use the new function in doc/lesson.Rmd and use source() (Ctrl-Shift-S) to run it.
  5. Import all the user_ datasets with import_multiple_files() and the import_activity() function.
  6. Pipe the results into mutate() and create a new column called activity_seconds that is based on subtracting end and start.
    • Use ?mutate and check the examples in the help document that pops up if you don’t recall how to use this function.
  7. You’ll notice that the activity column is numeric. Look into the Data Description and find out what each column represents and what the numbers mean in the column activity. Then think about or complete these tasks:
    • What is the advantage and disadvantage of using numbers instead of text to describe categorical data like in the activity column?
    • Using the case_when() function (within mutate()) we learned about, convert the activity numbers into more meaningful character data.
  8. Knit the R Markdown document to see if it is reproducible.
  9. Add and commit your changes to the Git history.
Click for the (possible) solution. Click only if you are really struggling or you are out of time for the exercise.

import_activity <- function(file_path) {
    activity_data <- vroom::vroom(
        file_path,
        col_select = -1,
        col_types = vroom::cols(
            activity = vroom::col_double(),
            start = vroom::col_time(format = ""),
            end = vroom::col_time(format = ""),
            day = vroom::col_double(),
            .delim = ","
        ),
        .name_repair = snakecase::to_snake_case
    ) 
    return(activity_data)
}

activity_df <- import_multiple_files("Activity.csv", import_activity)

activity_df %>% 
    mutate(activity_duration = end - start)

B.1.3 Process and join sleep and questionnaire data

There are still a few datasets that you can join in with the current datasets like sleep.csv and questionnaire.csv. Using the workflows in Section 10.10 as a guide, start from the beginning and import, process, clean, make functions, and join these two datasets in with the others so that they get included in the data/mmash.rda final dataset. Afterwards, do some descriptive analysis using the function tidy_summarise_by_day().

Note: User 11 has no sleep data, so you will eventually have to drop user 11 from the dataset before summarizing the data.

B.1.4 Create a second dataset of only the Actigraph and RR data

The Actigraph and RR datasets contain a lot of interesting and useful data that gets destroyed when we first summarise and then join them with the other datasets. While we can’t meaningfully join all this data with the other datasets, we can join them on their own as separate datasets.

Using the workflows in Section 10.10 as a guide, start from the beginning and import, process, clean, make functions, and create a final dataset of only the Actigraph.csv and RR.csv datasets.

  • Join only these two datasets by user_id, day, and time.
  • Name the new dataset actigraph_rr and save it to data/ by using another usethis::use_data() line in the data-raw/mmash.R script.

Than think of how some of these questions might be answered, using the Data Description as a guide:

  1. What times of the day are people lying down (inclinometer_lying) most often? How long are they lying down for?
  2. When are people most likely to be sitting down (inclinometer_sitting)? Does this also correspond to a lower heart rate or a lower interbeat interval (from the RR data) compared to when standing?
  3. Are people who have more activity throughout the day also reporting better sleep quality (like number_of_awakenings)?
  4. Does the self-reported activity match the accelerometry data?

B.2 Other datasets to use

If you have completed the exercises above and still wanted to practice the skills, here are some other datasets you can use (aside from your own):