6 Bundling code into functions

6.1 Learning objectives

Describe and identify the individual components of a function as well as the workflow for creating one, and then use that workflow to create a function that imports data.
Describe and apply a workflow of prototyping code into a working function in a Quarto document, moving the function into a script (called, e.g., functions.R) once prototyped and tested, and then using source() to load the functions into the R session. End this workflow with rendering the Quarto document with Ctrl-Shift-K or with the Palette (Ctrl-Shift-P, then type “render”) to ensure reproducibility.

6.2 📖 Reading task: The basics of a function

🧑‍🏫 Teacher note

Repeat and reinforce the part of what functions are made of, their structure, that all actions are functions, and all functions are objects (but not that all objects are functions).

Time: ~5 minutes.

The first thing to know about R is that everything is an “object” and that some objects can do an action. These objects that do an action are called functions. You’ve heard or read about functions before during the workshop, but what is a function?

A function is a bundled sequence of steps that achieve a specific action and you can usually tell if an object is an action if it has a () at the end of it’s name. For example, mean() is function to calculate the mean or sd() is a function to calculate the standard deviation. It isn’t always true that functions end in () though, which you’ll read about shortly. For instance, the + is a function that adds two numbers together, the [] is a function that is used to subset or extract an item from a list of items like getting a column from a data frame, or <- is a function that creates a new object from some value.

All created functions have the same structure: they are assigned a function name with <-, it uses function() to give it its parameters or arguments, and it has the internal sequence of steps called the function body that are wrapped in {}:

Console

function_name <- function(argument1, argument2) {
  # body of function with R code
}

Notice that this uses two functions to create this function_name function object:

<- is the action (function) that will create the new object function_name.
function() is the action (function) to tell R that this object is an action (function) whenever it is used with a () at the end, e.g. function_name().

Because R is open source, anyone can see how things work underneath. So, if you want to see what a function does underneath, you would type out the function name without the () into the Console and run it. If we do it with the function sd() which calculates the standard deviation, we see:

Console

sd

function (x, na.rm = FALSE) 
sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), 
    na.rm = na.rm))
<bytecode: 0x56232ef2a3c0>
<environment: namespace:stats>

Here you see sd() as the arguments x and na.rm. In the function body it shows how it calculates the standard deviation, which is the square root of the variance. In this code, the var() is inside the sqrt() function, which is exactly what it should be if you know the math (you don’t need to).

Normally you can tell if something is a function if it has () at the end of the name. But there are special functions, like +, [], or even <-, that do an action but that don’t have () at the end. These are called operator functions. Which is why the <- is called the assignment operator, because it assigns something to a new object. To see how they work internally, you would wrap ` around the operator. So for + or <- it would be:

Console

`+`

function (e1, e2)  .Primitive("+")

`<-`

.Primitive("<-")

You’ll see something called .Primitive. Often operators are written in very low level computer code to work, which are called “primitives”. This is way beyond the scope of this workshop to explain primitives and so we won’t go into what that means and why.

To show that they are a function, you can even use them with their () version like this:

1 + 2

[1] 3

`+`(1, 2)

[1] 3

`<-`(x, 1)
x

[1] 1

x <- 1
x

[1] 1

But hopefully you can see that using it with the () isn’t very nice to read or use!

If you can learn to make your own functions, it can help make your life and work much easier and more efficient! That’s because you can make a sequence of actions that you can then reuse again and again. And luckily, you will be making many functions throughout this workshop. Making a function always follows a basic structure:

Give a name to the function (e.g. mean).
Use function() to tell R the new object will be a function and assigning it to the name with <-.
Optionally provide arguments to the function object, for example function(argument1, argument2, argument3).
Fill out the body of the function, with the arguments (if any) contained inside, that does some sequence of actions.
Optionally, use return() to indicate what final output you want the function to have. For learning purposes, we’ll always use return() to help show us what is the final function output but it isn’t necessary.

Teacher note

Emphasize that we will be using this workflow for creating functions all the time throughout workshop and that this workflow is also what you’d use in your daily work.

While there is no minimum or maximum number of arguments you can provide for a function (e.g. you could have zero or dozens of arguments), its generally good practice and design to have as few arguments as necessary to get the job done. Part of making functions is to reduce your own and others cognitive load when working with or reading code. The fewer arguments you use, the lower the cognitive load. So, the structure is:

name <- function(argument1, argument2) {
    # body of function
    output <- ... code ....
    return(output)
}

Writing your own functions can be absolutely amazing and fun and powerful, but you also often want to pull your hair out with frustration at errors that are difficult to understand and fix. One of the best ways to deal with this is by making functions that are small and simple, and testing them as you use them. The smaller they are, the less chance you will have that there will be an error or issue that you can’t figure out. There’s also some formal debugging steps you can do but due to time and to the challenge of making meaningful debugging exercises since solutions to problems are very dependent on the project and context, there is some extra material in Appendix A that you can look over in your own time. It contains some instructions on debugging and dealing with some common problems you might encounter with R.

Sticky/hat up!

When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the teacher 👒 🎩

6.3 Creating our first function

🧑‍🏫 Teacher note

Take your time slowly going over this, especially taking about the Roxygen documentation template.

Let’s create a really basic function to show the process. First, create a new Markdown header called ## Making a function to add numbers and create a code chunk below that with Ctrl-Alt-I or with the Palette (Ctrl-Shift-P, then type “new chunk”) . Then, inside the code chunk, we’ll write this code:

docs/learning.qmd

add_numbers <- function(num1, num2) {
  added <- num1 + num2
  return(added)
}

You can use the new function by running the above code and writing out your new function, with arguments to give it.

docs/learning.qmd

add_numbers(1, 2)

[1] 3

The function name is fairly good; add_numbers is read as “add numbers”. While we generally want to write code that describes what it does by reading it, it’s also good practice to add some formal documentation to the function. Use the “Insert Roxygen Skeleton” in the “Code” menu, by typing Ctrl-Shift-Alt-R or with the Palette (Ctrl-Shift-P, then type “roxygen comment”), and you can add template documentation right above the function. Make sure your cursor is within the function in order for the Roxygen template to be added to your function. It looks like:

docs/learning.qmd

#' Title
#'
#' @param num1
#' @param num2
#'
#' @return
#' @export
#'
#' @examples
add_numbers <- function(num1, num2) {
  added <- num1 + num2
  return(added)
}

In the Title area, this is where you type out a brief sentence or several words that describe the function. Creating a new paragraph below this line allows you to add a more detailed description. The other items are:

@param num: These lines describe what each argument (also called parameter) is for and what to give it.
@return: This describes what output the function gives. Is it a data.frame? A plot? What else does the output give?
@export: Tells R that this function should be accessible to the user of your package. Since we aren’t making packages, delete it.
@examples: Any lines below this are used to show examples of how to use the function. This is very useful when making packages, but not really in this case. So we’ll delete it. Let’s write out some documentation for this function:

docs/learning.qmd

#' Add two numbers together.
#'
#' @param num1 A number here.
#' @param num2 A number here.
#'
#' @return Returns the sum of the two numbers.
#'
add_numbers <- function(num1, num2) {
  added <- num1 + num2
  return(added)
}

Once we’ve created that and before moving on, let’s style our code with the Palette (Ctrl-Shift-P, then type “style file”), render the Quarto document with Ctrl-Shift-K or with the Palette (Ctrl-Shift-P, then type “render”), and then open up the Git Interface with Ctrl-Alt-M or with the Palette (Ctrl-Shift-P, then type “commit”) to add and commit these changes to the Git history before then pushing to GitHub.

6.4 📖 Reading task: Workflow for prototyping and creating functions

🧑‍🏫 Teacher note

Highlight the workflow and diagram. Reinforce this workflow and that we will be using it all throughout this workshop.

Time: ~6 minutes.

At the level of the code, the way you prototype code is to:

Write it out in Quarto so that does what you want.
Convert that code into a function.
Test that the function works either in the Quarto document or in the R Console.
Fix the function if it doesn’t work.
Restart the R console with Ctrl-Shift-F10 or with the Palette (Ctrl-Shift-P, then type “restart”) or render with Ctrl-Shift-K or with the Palette (Ctrl-Shift-P, then type “render”) to test that the function works.
Whenever the function works, add and commit the changes to the Git history with Ctrl-Alt-M or with the Palette (Ctrl-Shift-P, then type “commit”) (or commit after you move the function to the R/functions.R script, which we will talk about in the next session).

Figure 6.1: Workflow for prototyping code in Quarto, converting to a function, testing it, rendering or restarting, and committing to Git.

Either restarting R or rendering the Quarto document is the only way there is to be certain the R workspace is in a clean state. When code runs after a clean state, it improves the chances that your code and project will be reproducible.

We use Git because it is the best way of keeping track of what was done to your files, when, and why. It helps to keep your work transparent and makes it easier for you to share your code by uploading to GitHub. Using version control should be a standard practice to doing better science since it fits with the philosophy of doing science (e.g., transparency, reproducibility, and documentation).

Sticky/hat up!

When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the teacher 👒 🎩

6.5 Making a function for importing our data

Now that we have a basic understanding of what a function looks like, let’s apply that to what we’re doing right now: Importing our data.

Teacher note

Emphasize that we will be using this workflow for creating functions all the time throughout workshop and is also a common workflow when making functions in R.

While you read about the general workflow above, the more detailed steps for making a function is to:

Write code that works and does what you want.
Enclose it as a function with name <- function() { ... }, with an appropriate and descriptive name.
Create arguments in the function call (function(argument1, argument2)) with appropriate and descriptive names, then replace the code in the function body with the argument names where appropriate.
Rename any objects created to be more generic and include the return() function at the end to indicate what the function will output.
Run the function and check that it works.
Add the Roxygen documentation tags with Ctrl-Shift-Alt-R or with the Palette (Ctrl-Shift-P, then type “roxygen comment”) while the cursor is in the function.

In docs/learning.qmd, create a new Markdown header called ## Import 101's cgm data with a function and create a code chunk below that with Ctrl-Alt-I or with the Palette (Ctrl-Shift-P, then type “new chunk”) .

So, step one. Let’s copy and paste the code we previously for importing the cgm_101 data and convert that as a function:

docs/learning.qmd

cgm_101 <- here("data-raw/dime/cgm/101.csv") |>
  read_csv(
    show_col_types = FALSE,
    name_repair = to_snake_case,
    n_max = 100
  )

Next we wrap it in the function call and give it an appropriate name. In this case, import_cgm is descriptive and meaningful.

docs/learning.qmd

import_cgm <- function() {
  cgm_101 <- here("data-raw/dime/cgm/101.csv") |>
    read_csv(
      show_col_types = FALSE,
      name_repair = to_snake_case,
      n_max = 100
    )
}

Then, we add arguments in the function and replace within the code. Here, we have only one thing that we would change: The file path to the dataset. So, a good name might be file_path. It’s also good practice to not hard code the use of here() within a function. Instead, it’s good design to give functions a full file path that it can use internally. Then when we use the function, we would use here() with the correct path in the function argument. So it would be:

docs/learning.qmd

import_cgm <- function(file_path) {
  cgm_101 <- file_path |>
    read_csv(
      show_col_types = FALSE,
      name_repair = to_snake_case,
      n_max = 100
    )
}

Then we simplify things internally by renaming cgm_101 to simply cgm, since we would like to be able to import other participant CGM data later. Finally, we will add the return() function at the end with the object we want the function to output.

docs/learning.qmd

import_cgm <- function(file_path) {
  cgm <- file_path |>
    read_csv(
      show_col_types = FALSE,
      name_repair = to_snake_case,
      n_max = 100
    )
  return(cgm)
}

Great! Now we need to test it out. Let’s try on two cgm datasets for 101’s and 102’s files:

docs/learning.qmd

here("data-raw/dime/cgm/101.csv") |>
  import_cgm()

# A tibble: 100 × 2
   device_timestamp    historic_glucose_mmol_l
   <dttm>                                <dbl>
 1 2021-03-18 08:15:00                     5.8
 2 2021-03-18 08:30:00                     5.4
 3 2021-03-18 08:45:00                     5.1
 4 2021-03-18 09:01:00                     5.3
 5 2021-03-18 09:16:00                     5.3
 6 2021-03-18 09:31:00                     4.9
 7 2021-03-18 09:46:00                     4.7
 8 2021-03-18 10:01:00                     4.8
 9 2021-03-18 10:16:00                     5.5
10 2021-03-18 10:31:00                     5.7
# ℹ 90 more rows

here("data-raw/dime/cgm/102.csv") |>
  import_cgm()

# A tibble: 100 × 2
   device_timestamp    historic_glucose_mmol_l
   <dttm>                                <dbl>
 1 2021-03-19 09:02:00                     2.2
 2 2021-03-19 09:17:00                     2.2
 3 2021-03-19 09:32:00                     2.2
 4 2021-03-19 09:46:00                     2.2
 5 2021-03-19 17:29:00                     5.1
 6 2021-03-19 17:44:00                     4.7
 7 2021-03-19 17:59:00                     4.9
 8 2021-03-19 18:14:00                     5.6
 9 2021-03-19 18:29:00                     5.9
10 2021-03-19 18:45:00                     6.2
# ℹ 90 more rows

Awesome! It works 🎉 The final stage is to add the Roxygen documentation.

docs/learning.qmd

#' Import one participants CGM data from the DIME dataset.
#'
#' @param file_path Path to the CGM data file.
#'
#' @return Outputs a data frame/tibble.
#'
import_cgm <- function(file_path) {
  cgm <- file_path |>
    read_csv(
      show_col_types = FALSE,
      name_repair = to_snake_case,
      n_max = 100
    )
  return(cgm)
}

A massive advantage of using functions is that if you want to make a change to what your code does, like if you fix a mistake, you can very easily do it by modifying the function and it will change all your other code too.

Now that we have a working function, run styler with the Palette (Ctrl-Shift-P, then type “style file”), render with Ctrl-Shift-K or with the Palette (Ctrl-Shift-P, then type “render”), and then add and commit the changes to the Git history with Ctrl-Alt-M or with the Palette (Ctrl-Shift-P, then type “commit”) before then pushing to GitHub.

6.6 🧑‍💻 Exercise: Convert the sleep code into a function

Time: ~10 minutes.

We’ve converted the code to import the CGM data into a function. Now, let’s do the same for the sleep data. Use the code you made from the exercise in Section 5.7 and convert that into a function using the same steps we did from above.

Tip

A helpful tip: To move around a Quarto or R script more easily, open up the “Document Outline” on the side by clicking the button in the top right corner of the Quarto pane or by using Ctrl-Shift-O or with the Palette (Ctrl-Shift-P, then type “outline”).

The code that you write to test that the function works should look like the below and it should output something similar:

here("data-raw/dime/sleep/101.csv") |>
  import_sleep()

# A tibble: 100 × 3
   date                sleep_type seconds
   <dttm>              <chr>        <dbl>
 1 2021-05-24 23:03:00 wake           540
 2 2021-05-24 23:12:00 light          180
 3 2021-05-24 23:15:00 deep          1440
 4 2021-05-24 23:39:00 light          240
 5 2021-05-24 23:43:00 wake           300
 6 2021-05-24 23:48:00 light          120
 7 2021-05-24 23:50:00 rem           1350
 8 2021-05-25 00:12:30 wake           870
 9 2021-05-25 00:27:00 rem            360
10 2021-05-25 00:33:00 light          210
# ℹ 90 more rows

here("data-raw/dime/sleep/102.csv") |>
  import_sleep()

# A tibble: 100 × 3
   date                sleep_type seconds
   <dttm>              <chr>        <dbl>
 1 2021-06-04 00:54:00 light          180
 2 2021-06-04 00:57:00 deep          2340
 3 2021-06-04 01:36:00 light          900
 4 2021-06-04 01:51:00 deep          1470
 5 2021-06-04 02:15:30 light          120
 6 2021-06-04 02:17:30 rem            720
 7 2021-06-04 02:29:30 light         2700
 8 2021-06-04 03:14:30 deep           300
 9 2021-06-04 03:19:30 light         1440
10 2021-06-04 03:43:30 rem           1620
# ℹ 90 more rows

Create a new Markdown header at the bottom of docs/learning.qmd called ## Exercise to make function to import sleep data.
Below the Markdown header, create a new code chunk with Ctrl-Alt-I or with the Palette (Ctrl-Shift-P, then type “new chunk”).
Paste the code you used from the exercise in Section 5.7 into the code chunk and begin converting it into a function, like we did above.
- Wrap it with the function() {...}.
- Make a meaningful name (use import_sleep).
- Make an argument for the file path (file_path) and replace the here("") code with file_path.
- Rename the output object from sleep_101 to simply sleep.
- End the function with using return() to output the sleep object.
- Use the function and run it in the code chunk to test that it works on both 101 and 102’s sleep data.
- Add the Roxygen documentation with Ctrl-Shift-Alt-R or with the Palette (Ctrl-Shift-P, then type “roxygen comment”) and then fill in the details, like we did above.
Run styler while in the docs/learning.qmd file with the Palette (Ctrl-Shift-P, then type “style file”).
Render the Quarto document with Ctrl-Shift-K or with the Palette (Ctrl-Shift-P, then type “render”).
Finally, add and commit the changes to the Git history, using Ctrl-Alt-M or with the Palette (Ctrl-Shift-P, then type “commit”). Then push to GitHub.

Sticky/hat up!

When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the teacher 👒 🎩

🧑‍🏫 Teacher note

Mention that the import_sleep() function is identical to the import_cgm(). Briefly say that in the next session we will go over making more general functions.

6.7 Key takeaways

Teacher note

Quickly cover this and get them to do the survey before moving on to the discussion activity.

Everything in R is an object.
Every action in R is a function and every function is an object.
Functions contain a sequence of steps that do actions to an object.
Functions have five components:
- The three required ones are using function() { }, the code in the function body between the { }, and an output (usually set with return()).
- The two optional ones are assigning the function as a new object with <- and the function arguments put within function().
Document functions by using Roxygen.
Keep functions small and simple, so it is easier to test and fix them.
Use few arguments in functions to reduce cognitive load.

6.8 💬 Discussion activity: What are some tasks that could be functions?

Time: ~6 minutes.

As we prepare for the next session and the break, get up, walk around, and discuss with your neighbour some of the following questions:

What are some tasks you do that are repetitive or that you do multiple times with very small changes each time you do the task?
How might you use functions in your work? Can you think of specific tasks or situations where you could use one?

6.9 Code used in session

This lists some, but not all, of the code used in the section. Some code is incorporated into Markdown content, so is harder to automatically list here in a code chunk. The code below also includes the code from the exercises.

sd
`+`
`<-`
1 + 2
`+`(1, 2)

`<-`(x, 1)
x
x <- 1
x
add_numbers <- function(num1, num2) {
  added <- num1 + num2
  return(added)
}
add_numbers(1, 2)
#' Title
#'
#' @param num1
#' @param num2
#'
#' @return
#' @export
#'
#' @examples
add_numbers <- function(num1, num2) {
  added <- num1 + num2
  return(added)
}
#' Add two numbers together.
#'
#' @param num1 A number here.
#' @param num2 A number here.
#'
#' @return Returns the sum of the two numbers.
#'
add_numbers <- function(num1, num2) {
  added <- num1 + num2
  return(added)
}
cgm_101 <- here("data-raw/dime/cgm/101.csv") |>
  read_csv(
    show_col_types = FALSE,
    name_repair = to_snake_case,
    n_max = 100
  )
import_cgm <- function() {
  cgm_101 <- here("data-raw/dime/cgm/101.csv") |>
    read_csv(
      show_col_types = FALSE,
      name_repair = to_snake_case,
      n_max = 100
    )
}
import_cgm <- function(file_path) {
  cgm_101 <- file_path |>
    read_csv(
      show_col_types = FALSE,
      name_repair = to_snake_case,
      n_max = 100
    )
}
import_cgm <- function(file_path) {
  cgm <- file_path |>
    read_csv(
      show_col_types = FALSE,
      name_repair = to_snake_case,
      n_max = 100
    )
  return(cgm)
}
here("data-raw/dime/cgm/101.csv") |>
  import_cgm()
here("data-raw/dime/cgm/102.csv") |>
  import_cgm()
#' Import one participants CGM data from the DIME dataset.
#'
#' @param file_path Path to the CGM data file.
#'
#' @return Outputs a data frame/tibble.
#'
import_cgm <- function(file_path) {
  cgm <- file_path |>
    read_csv(
      show_col_types = FALSE,
      name_repair = to_snake_case,
      n_max = 100
    )
  return(cgm)
}
#' Import a participant's sleep data from DIME.
#'
#' @param file_path Path to the participant's sleep file.
#'
#' @return Outputs a data frame/tibble.
#'
import_sleep <- function(file_path) {
  sleep <- file_path |>
    read_csv(
      show_col_types = FALSE,
      name_repair = to_snake_case,
      n_max = 100,
    )
  return(sleep)
}