Want to help out or contribute?

If you find any typos, errors, or places where the text may be improved, please let us know by providing feedback either in the feedback survey (given during class) or by using GitLab.

On GitLab open an issue or submit a merge request by clicking the "Edit this page " button on the side of this page.

7 Save time, don’t repeat yourself: Making functions

Here we will cover the second block, “Workflow” in Figure 7.1.

Section of the overall workflow we will be covering.

Figure 7.1: Section of the overall workflow we will be covering.

And your folder and file structure should look like (use fs::dir_tree(recurse = 2) if you want to check using R):

LearnR3
├── data/
│   └── README.md
├── data-raw/
│   ├── README.md
│   ├── mmash-data.zip
│   ├── mmash/
│   │  ├── user_1
│   │  ├── ...
│   │  └── user_22
│   └── mmash.R
├── doc/
│   ├── README.md
│   └── lesson.Rmd
├── R/
│   ├── functions.R
│   └── README.md
├── .gitignore
├── DESCRIPTION
├── LearnR3.Rproj
└── README.md

7.1 Learning objectives

  1. Learn what functions are in R and how to create and use them.
  2. Learn a workflow of using R Markdown, source() (or Ctrl-Shift-S in RStudio), and restarting R (Ctrl-Shift-F10) as a tool and process for developing functions that can be later easily (re-)used.
  3. Learn what R package dependency management is and how it can simplify your data analysis work.
  4. Continue practising Git version control to manage changes to your files.

7.2 The basics of a function

Take 5 min and read this section until it says to stop. We’ve mentioned functions multiple times, but what is a function? At its core, a function in R is anything that does an action. A function is a bundled sequence of steps that achieve a specific action. For instance, the + (to add) is a function, mean() is a function, [] (to subset or extract) is a function, and so on. In simple terms, functions are made of a function call, its arguments, and the function body: function(argument1, argument2) { ...body with R code... }.

Because R is open source, anyone can see how things work underneath. So, if we want to see what a function does underneath, we type out the function name without the () into the Console and run it. If we do it with the function sd() which calculates the standard deviation, we see:

sd
#> function (x, na.rm = FALSE) 
#> sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), 
#>     na.rm = na.rm))
#> <bytecode: 0x55e3f1b60b90>
#> <environment: namespace:stats>

Here you see sd() as the arguments x and na.rm. Within the function body is how it calculates the standard deviation, which is the square root of the variance. In this code, the var() is inside the sqrt() function, which is exactly what it should be.

So, if you learn how to create your own functions, it can make doing your work easier and more efficient because you don’t have to repeat yourself later. Making functions always has a basic structure of:

  1. Giving a name to the function (e.g. mean).
  2. Starting the function call using function(), assigning it to the name with <-. This tells R that the name is a function object.
  3. Optionally providing arguments to give to the function call, for instance function(argument1, argument2, argument3).
  4. Filling out the body of the function, with the arguments (if any) contained inside, that does some action.
  5. Optionally, use return() to indicate what you want the function to output. For learning purposes, we’ll always use return() to help show us what is the final function output.
For instructors: Click for details.

Emphasize that we will be using this workflow for creating functions all the time throughout course and that this workflow is also what you’d use in your daily work.

While there is no minimum or maximum number of arguments you can provide for a function (e.g. you could have zero or dozens of arguments), its good practice for yourself and for others to have as few arguments as necessary to get the job done. So, the structure is:

name <- function(argument1, argument2) {
    # body of function
    output <- ... code ....
    return(output)
}

Writing your own functions can be absolutely amazing and fun and powerful… but you also often want to pull your hair out with frustration at errors that are difficult to understand and fix. The best way to deal with this is by debugging. Due to time and to the challenge of making meaningful debugging exercises (solutions to problems are very dependent on the project), read Appendix D in your own time for some instructions on debugging and dealing with another common problem you might encounter with R.

Ok, stop here and we’ll go over it together. Let’s write a simple example. First, create a new Markdown header called ## Making a function and create a code chunk below that with Ctrl-Alt-I. Then, inside the function, we’ll write this code out:

add_numbers <- function(num1, num2) {
    added <- num1 + num2
    return(added)
}

You can use the new function by running the above code and writing out your new function, with arguments to give it.

add_numbers(1, 2)
#> [1] 3

The function name is fairly good… add_numbers is read as “add numbers”. While we generally want to write code that describes what it does by reading it, it’s also good practice to add some formal documentation to the function. Use the “Insert Roxygen Skeleton” in the “Code” menu (or by typing Ctrl-Shift-Alt-R) and you can add template documentation right above the function. It looks like:

#' Title
#'
#' @param num1 
#' @param num2 
#'
#' @return
#' @export
#'
#' @examples
add_numbers <- function(num1, num2) {
    added <- num1 + num2
    return(added)
}

In the Title area, this is where you type out a brief sentence or several words that describe the function. Creating a new paragraph below this line allows you to add a more detailed description. The other items are:

  • @param num: These lines describe what each argument (also called parameter) is for and what to give it.
  • @return: This describes what output the function gives. Is it a data.frame? A plot? What else does the output give?
  • @export: Tells R that this function should be accessible to the user of your package. Since we aren’t making packages, delete it.
  • @examples: Any lines below this are used to show examples of how to use the function. This is very useful when making packages, but not really in this case. So we’ll delete it. Let’s write out some documentation for this function:
#' Add two numbers together.
#'
#' @param num1 A number here.
#' @param num2 A number here.
#'
#' @return Returns the sum of the two numbers.
#'
add_numbers <- function(num1, num2) {
    added <- num1 + num2
    return(added)
}

Once we’ve created that, let’s open up the Git Interface (Ctrl-Alt-M) and add and commit these changes to our history.

7.3 Exercise: Brainstorm and discuss why and what you could make as a function

Time: 15 min

You’ve learned the basics of making your own, custom function. Now, as a group, brainstorm and discuss some ways that you might make functions in your own work to help reduce repetition. What type of code might you make as a function for your own project? Do you think others, maybe in your research group, might use this function too? Afterwards, all the groups will briefly share what they thought of.

7.4 Making a function for vroom

Now that we have a basic understanding of what a function looks like, let’s apply it to something we’re doing right now: Importing our data.

Making functions is a series of steps:

  1. Write code that works and does what you want.
  2. Enclose it as a function with name <- function() { ... }, with an appropriate and descriptive name.
  3. Create arguments in the function call (function(argument1, argument2)) with appropriate and descriptive names, then replace the code with the argument names where appropriate.
  4. Rename any objects created to be more generic and include the return() function at the end to indicate what the function will output.
  5. Run the function and check that it works.
  6. Add the Roxygen documentation tags (with Ctrl-Alt-Shift-R or “Code -> Insert Roxygen Skeleton” menu item while the cursor is in the function).
For instructors: Click for details.

Emphasize that we will be using this workflow for creating functions all the time throughout course and that this workflow is also what you’d use in your daily work.

So, step one. Let’s take the code we wrote for importing the user_info data and convert that as a function:

user_1_info_data <- vroom(
    user_1_info_file,
    col_select = -1,
    col_types = cols(
        gender = col_character(),
        weight = col_double(),
        height = col_double(),
        age = col_double(),
        .delim = ","
    ),
    .name_repair = snakecase::to_snake_case
)

Next we wrap it in the function call and give it an appropriate name. In this case, import_user_info is descriptive and meaningful. Make sure to style it correctly with Ctrl-Shift-A.

import_user_info <- function() {
    user_1_info_data <- vroom(
        user_1_info_file,
        col_select = -1,
        col_types = cols(
            gender = col_character(),
            weight = col_double(),
            height = col_double(),
            age = col_double(),
            .delim = ","
        ),
        .name_repair = snakecase::to_snake_case
    )
}

Then, we add arguments in the function and replace within the code. Here, we have only one thing that we would change: The file path to the dataset. So, a good name might be file_path.

import_user_info <- function(file_path) {
    user_1_info_data <- vroom(
        file_path,
        col_select = -1,
        col_types = cols(
            gender = col_character(),
            weight = col_double(),
            height = col_double(),
            age = col_double(),
            .delim = ","
        ),
        .name_repair = snakecase::to_snake_case
    )
}

Then we clean things up by renaming user_1_info_data since we would like to also import more than just user_1. A nice object name would be info_data. Add the return() function at the end with the object you want your function to output.

import_user_info <- function(file_path) {
    info_data <- vroom(
        file_path,
        col_select = -1,
        col_types = cols(
            gender = col_character(),
            weight = col_double(),
            height = col_double(),
            age = col_double(),
            .delim = ","
        ),
        .name_repair = snakecase::to_snake_case
    )
    return(info_data)
}

Great! Now we need to test it out. Let’s try on two datasets, two user_info.csv files in the user_1 and user_2 folders.

import_user_info(here("data-raw/mmash/user_1/user_info.csv"))
#> # A tibble: 1 × 4
#>   gender weight height   age
#>   <chr>   <dbl>  <dbl> <dbl>
#> 1 M          65    169    29
import_user_info(here("data-raw/mmash/user_2/user_info.csv"))
#> # A tibble: 1 × 4
#>   gender weight height   age
#>   <chr>   <dbl>  <dbl> <dbl>
#> 1 M          95    183    27

Awesome! It works. The final stage is adding the Roxygen documentation.

#' Import MMASH user info data file.
#'
#' @param file_path Path to user info data file.
#'
#' @return Outputs a data frame/tibble.
#'
import_user_info <- function(file_path) {
    info_data <- vroom(
        file_path,
        col_select = -1,
        col_types = cols(
            gender = col_character(),
            weight = col_double(),
            height = col_double(),
            age = col_double(),
            .delim = ","
        ),
        .name_repair = snakecase::to_snake_case
    )
    return(info_data)
}

A massive advantage of using functions is that if you want to make a change to all your code, you can very easily do it by modifying the function and it will change all your other code too. Now that we have a working function, let’s add and commit it to the Git history with the RStudio Git Interface.

7.5 Exercise: Repeat with the saliva data

Time: 15 min

Take the code you created for importing the saliva data set from Exercise 6.3 (not the code related to using spec()) and make it into a function. A helpful tip: To move around an R Markdown or R script more easily, open up the “Document Outline” on the side by clicking the button in the top right corner of the R Markdown pane or by using Ctrl-Shift-O.

  1. Create a new markdown header ## Exercise for importing the saliva data as a function.
  2. Create a new code chunk below that (Ctrl-Alt-I).
  3. Paste the code you used from the exercise into the code chunk and begin converting it into a function, like we did above.
    • Wrap it with the function() {...}
    • Make a meaningful name (use import_saliva)
    • Make an argument for the file path (file_path) and replace user_1_saliva_file with file_path in the vroom() code
    • Rename the output object to saliva_data and put it in the return() function
    • Test that it works
    • Create the Roxygen documentation

Use the below code as a guide:

# Need to also add the Roxygen documentation 
# ("Code -> Insert Roxygen Skeleton")
import_saliva <- function(file_path) {
    # Paste the code to import saliva data you created 
    # from previous exercise
    saliva_data <- ___(
        ___
    )
    return(saliva_data)
}

# Test that the function works
___(here("data-raw/mmash/user_1/saliva.csv"))
Click for the (possible) solution. Click only if you are really struggling or you are out of time for the exercise.

#' Import the MMASH saliva dataset.
#'
#' @param file_path Path to the user saliva data file.
#'
#' @return Outputs a data frame/tibble.
#'
import_saliva <- function(file_path) {
    saliva_data <- vroom(
        file_path,
        col_select = -1,
        col_types = cols(
            samples = col_character(),
            cortisol_norm = col_double(),
            melatonin_norm = col_double(),
            .delim = ","
        ),
        .name_repair = snakecase::to_snake_case
    )
    return(saliva_data)
}

# Test that this works
# import_saliva(here("data-raw/mmash/user_1/saliva.csv"))

7.6 Continuing the workflow

We’ve created two functions. Now we need to move those functions from the doc/lesson.Rmd file and into the R/ folder. We do this for a few reasons:

  1. To prevent the R Markdown document from becoming too long and having a large portion of R code over other text.
  2. To make it easier to maintain and find things in your project.
  3. To make use of the source() function.

We want to store our functions in the file R/functions.R script so its easier to source them. Cut and paste only the import_user_info() function we created in doc/lesson.Rmd, including the Roxygen documentation, and paste it into the newly created R/functions.R.

Once we have it in there, let’s test out the workflow. Restart our R session with either Ctrl-Shift-F10 or from the “Session -> Restart R” menu item. Move back into the doc/lesson.Rmd and add source(here("R/functions.R")) to the code chunk called setup at the top. Run that line of code. Then go to where you wrote:

import_user_info(here("data-raw/mmash/user_1/user_info.csv"))

Now run this line. What happens now? You may get an error about not finding the vroom() function. If you put library(vroom) in the setup code chunk, you might not get an error. If you did get an error, that’s because R doesn’t know what the vroom() function is. This is where we start getting into package dependency management.

What is package dependency management? Whenever you use an R package, you depend on it for your code to work. The informal way to show what packages you use is by using the library() function. But if you come back to the project, or get a new computer, or someone else is working on your project too, how will they know which packages your project depends on? Do they have to search through all your files just to find all library() functions you used and then install those packages individually? A much better way here is to formally indicate your package dependency so that installing dependencies is easy. We do this by making use of the DESCRIPTION file.

Open up the DESCRIPTION file. You may or may not see something that looks like:

Package: LearnR3
Type: Project
Version: 0.0.1
Imports:
    knitr,
    rmarkdown,
    distill
Encoding: UTF-8

If it doesn’t look like this, replace all of your current text with the text above. Notice the Imports: key. This is where information about packages are added. To quickly add a package, go to the Console and type out:

usethis::use_package("vroom")

You will see a bunch of text about adding it to Imports. If you look in your DESCRIPTION file now, you’ll see something like:

Imports: 
    knitr,
    rmarkdown,
    distill,
    vroom

Now, if you or someone else wants to install all the packages your project depends on, they can do that by going to the Console and running:

remotes::install_deps()

This function finds the DESCRIPTION file and installs all the packages in Imports. Let’s add the other dependencies by typing in the Console:

usethis::use_package("here")
usethis::use_package("fs")
usethis::use_package("snakecase")

Since we will also make use of the tidyverse set of packages later in the course, we’ll also add tidyverse as a dependency. Since the tidyverse is a large collection of packages, the recommended way to add this particular dependency is with:

usethis::use_package("tidyverse", type = "Depends")

If you look in the DESCRIPTION file now, you see that the new Depends field has been added with tidyverse right below it. There are fairly technical reasons why we need to put tidyverse in the Depends field that you don’t need to know about for this course, aside from the fact that it is a common practice in R projects. At least in this context, we use the Depends field for tidyverse because of one big reason: the usethis::use_package() function will complain if we try to put tidyverse in the Imports and it recommends putting it in the Depends field.

Depends: 
    tidyverse

Great! Now that we’ve formally established package dependencies in our project, we also need to formally declare which package each function comes from inside our own functions. Before getting into the correct way, we need to quickly cover the incorrect way that you may or may not have seen how others have done it on websites or in script files. Sometimes people use library() or require() inside functions like:

add_numbers <- function(num1, num2) {
    library(packagename)
    ...code...
    return(added)
}

Or:

add_numbers <- function(num1, num2) {
    require(packagename)
    ...code...
    return(added)
}

This is very bad practice and can have some unintended and serious consequences that you might not notice or that won’t give any warning or error. The correct way of indicating which package a function comes from is instead by using packagename::, which you’ve seen and used many times in this course. We won’t get into the reasons why this is incorrect because it can quickly get quite technical.

For instructors: Click for details.

You can also talk about why require() shouldn’t be used compared to library(). The problem with require() is that if the package can’t be loaded, it won’t throw an error, it only checks if the package is available and will otherwise continue running the code. On the other hand, library() will throw an error if it can’t find the package, which is what you expect if your code depends on a package.

Another reason to use packagename:: for each function from an R package you use in your own function is that it explicitly tells R (and us the readers) where the function comes from. Because the same function name can be used by multiple packages, if you don’t explicitly state which package the function is from, R will use the function that it finds first… which isn’t always the function you meant to use.

We also do this step at the end of making the function because doing it while we create it can be quite tedious. Alright, let’s go into R/functions.R and add vroom:: to each of the vroom functions we’ve used:

import_user_info <- function(file_path) {
    info_data <- vroom::vroom(
        file_path,
        col_select = -1,
        col_types = vroom::cols(
            gender = vroom::col_character(),
            weight = vroom::col_double(),
            height = vroom::col_double(),
            age = vroom::col_double(),
            .delim = ","
        ),
        .name_repair = snakecase::to_snake_case
    )
    return(info_data)
}

Test that it works by restart the R session (Ctrl-Shift-F10 or “Session -> Restart R”) and source the file with Ctrl-Shift-S, then go to the Console and type out:

import_user_info(here::here("data-raw/mmash/user_1/user_info.csv"))

It should work as expected! Now that we’ve done that, let’s add and commit the changes made through the Git interface.

7.7 Exercise: Discuss why tracking dependencies might help you

Time: 5 min

Before moving on to the next exercise, discuss with your group about:

  • How tracking dependencies might help you in your own work.
  • Some personal experiences, if you have had any, where the lack of explicit dependencies made your work harder and more confusing.

7.8 Exercise: Move and update the rest of the functions

Time: 20 min

Repeat this process of making functions by doing this to the rest of the code you worked on previously that imported the RR.csv and Actigraph.csv data.

  1. Convert the importing code into functions while in the doc/lessons.Rmd file. Include the Roxygen documentation and use packagename:: to be explicit about where the function comes from.

    • Name the new functions import_rr and import_actigraph.
  2. Move the function into R/functions.R.

  3. Restart R, source() the functions file (Ctrl-Shift-S), and test that the functions work by running them in the Console. The below code should run without a problem if you did it right:

    import_rr(here("data-raw/mmash/user_1/RR.csv"))
    import_actigraph(here("data-raw/mmash/user_1/Actigraph.csv"))

Also update the import_saliva() function you created by being explicit about where the functions come from (e.g. with the packagename::). Afterwards, add and commit the changes to the Git history.

Use this code template as a guide for making the functions.

# Insert Roxygen documentation too
___ <- function(___) {
    ___ <- ___::___(
        ___,
        col_select = ___,
        col_types = ___::cols(
            ___,
            .delim = ","
        ),
        .name_repair = snakecase::to_snake_case
    )
    return(___)
}
Click for the (possible) solution. Click only if you are really struggling or you are out of time for the exercise.

#' Import the MMASH saliva dataset.
#'
#' @param file_path Path to the user saliva data file.
#'
#' @return Outputs a data frame/tibble.
#'
import_saliva <- function(file_path) {
    saliva_data <- vroom::vroom(
        file_path,
        col_select = -1,
        col_types = vroom::cols(
            samples = vroom::col_character(),
            cortisol_norm = vroom::col_double(),
            melatonin_norm = vroom::col_double(),
            .delim = ","
        ),
        .name_repair = snakecase::to_snake_case
    )
    return(saliva_data)
}

#' Import the MMASH RR dataset (heart beat-to-beat interval).
#'
#' @param file_path Path to the user RR data file.
#'
#' @return Outputs a data frame/tibble.
#'
import_rr <- function(file_path) {
    rr_data <- vroom::vroom(
        file_path,
        col_select = -1,
        col_types = vroom::cols(
            ibi_s = vroom::col_double(),
            day = vroom::col_double(),
            # Converts to seconds
            time = vroom::col_time(format = ""),
            .delim = ","
        ),
        .name_repair = snakecase::to_snake_case
    ) 
    return(rr_data)
}

#' Import the MMASH Actigraph dataset (accelerometer).
#'
#' @param file_path Path to the user Actigraph data file.
#'
#' @return Outputs a data frame/tibble.
#'
import_actigraph <- function(file_path) {
    actigraph_data <- vroom::vroom(
        file_path,
        col_select = -1,
        col_types = vroom::cols(
            axis_1 = vroom::col_double(),
            axis_2 = vroom::col_double(),
            axis_3 = vroom::col_double(),
            steps = vroom::col_double(),
            hr = vroom::col_double(),
            inclinometer_off = vroom::col_double(),
            inclinometer_standing = vroom::col_double(),
            inclinometer_sitting = vroom::col_double(),
            inclinometer_lying = vroom::col_double(),
            vector_magnitude = vroom::col_double(),
            day = vroom::col_double(),
            time = vroom::col_time(format = ""),
            .delim = ","
        ),
        .name_repair = snakecase::to_snake_case
    )
    return(actigraph_data)
}

7.9 Summary

For instructors: Click for details.

Quickly cover this before finishing the session and when starting the next session.

  • Functions in R are anything that does an action
  • Functions have five components:
    • The three required ones are the function call with function() { }, the function body between the { }, and an output (usually set with return())
    • The two optional ones are assigning the function to a named object with <- and the function arguments put within function()
  • Write function documentation by using Roxygen
  • Use use_package() for the DESCRIPTION file as well as packagename::functionname() to explicit state the packages your function depends on