Console
sd
function (x, na.rm = FALSE)
sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
na.rm = na.rm))
<bytecode: 0x5581eb3ff598>
<environment: namespace:stats>
🚧 We are doing major changes to this workshop, so much of the content will be changed. 🚧
Describe and identify the individual components of a function as well as the workflow to creating them, and then use this workflow to make one to import some data.
Describe and apply a workflow of prototyping code into a working function in a Quarto document, moving the function into a script (called, e.g., functions.R
) once prototyped and tested, and then using source()
to load the functions into the R session. End this workflow with rendering the Quarto document with Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”) to ensure reproducibility.
Repeat and reinforce the part of what functions are made of, their structure, that all actions are functions, and all functions are objects (but not that all objects are functions).
Time: ~5 minutes.
The first thing to know about R is that everything is an “object” and that some objects can do an action. These objects that do an action are called functions. You’ve heard or read about functions before during the workshop, but what is a function?
A function is a bundled sequence of steps that achieve a specific action and you can usually tell if an object is an action if it has a ()
at the end of it’s name. For example, mean()
is function to calculate the mean or sd()
is a function to calculate the standard deviation. It isn’t always true that functions end in ()
though, which you’ll read about shortly. For instance, the +
is a function that adds two numbers together, the []
is a function that is used to subset or extract an item from a list of items like getting a column from a data frame, or <-
is a function that creates a new object from some value.
All created functions have the same structure: they are assigned a function name with <-
, it uses function()
to give it its parameters or arguments, and it has the internal sequence of steps called the function body that are wrapped in {}
:
Console
function_name <- function(argument1, argument2) {
# body of function with R code
}
Notice that this uses two functions to create this function_name
function object:
<-
is the action (function) that will create the new object function_name
.function()
is the action (function) to tell R that this object is an action (function) whenever it is used with a ()
at the end, e.g. function_name()
.Because R is open source, anyone can see how things work underneath. So, if you want to see what a function does underneath, you would type out the function name without the ()
into the Console and run it. If we do it with the function sd()
which calculates the standard deviation, we see:
Console
sd
function (x, na.rm = FALSE)
sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
na.rm = na.rm))
<bytecode: 0x5581eb3ff598>
<environment: namespace:stats>
Here you see sd()
as the arguments x
and na.rm
. In the function body it shows how it calculates the standard deviation, which is the square root of the variance. In this code, the var()
is inside the sqrt()
function, which is exactly what it should be if you know the math (you don’t need to).
Normally you can tell if something is a function if it has ()
at the end of the name. But there are special functions, like +
, []
, or even <-
, that do an action but that don’t have ()
at the end. These are called operator functions. Which is why the <-
is called the assignment operator, because it assigns something to a new object. To see how they work internally, you would wrap ` around the operator. So for +
or <-
it would be:
Console
`+`
function (e1, e2) .Primitive("+")
`<-`
.Primitive("<-")
You’ll see something called .Primitive
. Often operators are written in very low level computer code to work, which are called “primitives”. This is way beyond the scope of this workshop to explain primitives and so we won’t go into what that means and why.
To show that they are a function, you can even use them with their ()
version like this:
1 + 2
[1] 3
`+`(1, 2)
[1] 3
`<-`(x, 1)
x
[1] 1
x <- 1
x
[1] 1
But hopefully you can see that using it with the ()
isn’t very nice to read or use!
If you can learn to make your own functions, it can help make your life and work much easier and more efficient! That’s because you can make a sequence of actions that you can then reuse again and again. And luckily, you will be making many functions throughout this workshop. Making a function always follows a basic structure:
mean
).function()
to tell R the new object will be a function and assigning it to the name with <-
.function(argument1, argument2, argument3)
.return()
to indicate what final output you want the function to have. For learning purposes, we’ll always use return()
to help show us what is the final function output but it isn’t necessary.Emphasize that we will be using this workflow for creating functions all the time throughout workshop and that this workflow is also what you’d use in your daily work.
While there is no minimum or maximum number of arguments you can provide for a function (e.g. you could have zero or dozens of arguments), its generally good practice and design to have as few arguments as necessary to get the job done. Part of making functions is to reduce your own and others cognitive load when working with or reading code. The fewer arguments you use, the lower the cognitive load. So, the structure is:
name <- function(argument1, argument2) {
# body of function
output <- ... code ....
return(output)
}
Writing your own functions can be absolutely amazing and fun and powerful, but you also often want to pull your hair out with frustration at errors that are difficult to understand and fix. One of the best ways to deal with this is by making functions that are small and simple, and testing them as you use them. The smaller they are, the less chance you will have that there will be an error or issue that you can’t figure out. There’s also some formal debugging steps you can do but due to time and to the challenge of making meaningful debugging exercises since solutions to problems are very dependent on the project and context, there is some extra material in Appendix A that you can look over in your own time. It contains some instructions on debugging and dealing with some common problems you might encounter with R.
When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the teacher 👒 🎩
Take your time slowly going over this, especially taking about the Roxygen documentation template.
Let’s create a really basic function to show the process. First, create a new Markdown header called ## Making a function to add
and create a code chunk below that with Ctrl-Alt-ICtrl-Alt-I or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “new chunk”) . Then, inside the code chunk, we’ll write this code:
docs/learning.qmd
add_numbers <- function(num1, num2) {
added <- num1 + num2
return(added)
}
You can use the new function by running the above code and writing out your new function, with arguments to give it.
docs/learning.qmd
add_numbers(1, 2)
[1] 3
The function name is fairly good; add_numbers
is read as “add numbers”. While we generally want to write code that describes what it does by reading it, it’s also good practice to add some formal documentation to the function. Use the “Insert Roxygen Skeleton” in the “Code” menu, by typing Ctrl-Shift-Alt-RCtrl-Shift-Alt-R or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “roxygen comment”), and you can add template documentation right above the function. Make sure your cursor is within the function in order for the Roxygen template to be added to your function. It looks like:
docs/learning.qmd
#' Title
#'
#' @param num1
#' @param num2
#'
#' @return
#' @export
#'
#' @examples
add_numbers <- function(num1, num2) {
added <- num1 + num2
return(added)
}
In the Title
area, this is where you type out a brief sentence or several words that describe the function. Creating a new paragraph below this line allows you to add a more detailed description. The other items are:
@param num
: These lines describe what each argument (also called parameter) is for and what to give it.@return
: This describes what output the function gives. Is it a data.frame? A plot? What else does the output give?@export
: Tells R that this function should be accessible to the user of your package. Since we aren’t making packages, delete it.@examples
: Any lines below this are used to show examples of how to use the function. This is very useful when making packages, but not really in this case. So we’ll delete it. Let’s write out some documentation for this function:docs/learning.qmd
#' Add two numbers together.
#'
#' @param num1 A number here.
#' @param num2 A number here.
#'
#' @return Returns the sum of the two numbers.
#'
add_numbers <- function(num1, num2) {
added <- num1 + num2
return(added)
}
Once we’ve created that and before moving on, let’s style our code with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “style file”), render the Quarto document with Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”), and then open up the Git Interface with Ctrl-Alt-MCtrl-Alt-M or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “commit”) to add and commit these changes to the Git history.
Now that we have a basic understanding of what a function looks like, let’s apply that what we’re doing right now: Importing our data.
Emphasize that we will be using this workflow for creating functions all the time throughout workshop and is also a common workflow when making functions in R.
Making a function is a series of steps:
name <- function() { ... }
, with an appropriate and descriptive name.function(argument1, argument2)
) with appropriate and descriptive names, then replace the code in the function body with the argument names where appropriate.return()
function at the end to indicate what the function will output.In docs/learning.qmd
, create a new Markdown header called ## Import 101's cgm data with a function
and create a code chunk below that with Ctrl-Alt-ICtrl-Alt-I or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “new chunk”) .
So, step one. Let’s copy and paste the code we previously for importing the cgm_101
data and convert that as a function:
Next we wrap it in the function call and give it an appropriate name. In this case, import_cgm
is descriptive and meaningful.
Then, we add arguments in the function and replace within the code. Here, we have only one thing that we would change: The file path to the dataset. So, a good name might be file_path
. It’s also good practice to not hard code the use of here()
within a function. Instead, it’s good design to give functions a full file path that it can use internally. Then when we use the function, we would use here()
with the correct path in the function argument. So it would be:
docs/learning.qmd
import_cgm <- function(file_path) {
cgm_101 <- file_path |>
read_csv(
show_col_types = FALSE,
name_repair = to_snake_case,
n_max = 100,
)
}
Then we simplify things internally by renaming cgm_101
to simply cgm
, since we would like to be able to import other participant CGM data later. Finally, we will add the return()
function at the end with the object we want the function to output.
Great! Now we need to test it out. Let’s try on two cgm
datasets for 101’s and 102’s files:
docs/learning.qmd
import_cgm(here("data-raw/dime/cgm/101.csv"))
# A tibble: 100 × 2
device_timestamp historic_glucose_mmol_l
<dttm> <dbl>
1 2021-03-18 08:15:00 5.8
2 2021-03-18 08:30:00 5.4
3 2021-03-18 08:45:00 5.1
4 2021-03-18 09:01:00 5.3
5 2021-03-18 09:16:00 5.3
6 2021-03-18 09:31:00 4.9
7 2021-03-18 09:46:00 4.7
8 2021-03-18 10:01:00 4.8
9 2021-03-18 10:16:00 5.5
10 2021-03-18 10:31:00 5.7
# ℹ 90 more rows
import_cgm(here("data-raw/dime/cgm/102.csv"))
# A tibble: 100 × 2
device_timestamp historic_glucose_mmol_l
<dttm> <dbl>
1 2021-03-19 09:02:00 2.2
2 2021-03-19 09:17:00 2.2
3 2021-03-19 09:32:00 2.2
4 2021-03-19 09:46:00 2.2
5 2021-03-19 17:29:00 5.1
6 2021-03-19 17:44:00 4.7
7 2021-03-19 17:59:00 4.9
8 2021-03-19 18:14:00 5.6
9 2021-03-19 18:29:00 5.9
10 2021-03-19 18:45:00 6.2
# ℹ 90 more rows
Awesome! It works 🎉 The final stage is to add the Roxygen documentation.
docs/learning.qmd
#' Import one participants CGM data from the DIME dataset.
#'
#' @param file_path Path to the CGM data file.
#'
#' @return Outputs a data frame/tibble.
#'
import_cgm <- function(file_path) {
cgm <- file_path |>
read_csv(
show_col_types = FALSE,
name_repair = to_snake_case,
n_max = 100,
)
return(cgm)
}
A massive advantage of using functions is that if you want to make a change to what your code does, like if you fix a mistake, you can very easily do it by modifying the function and it will change all your other code too.
Now that we have a working function, run styler with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “style file”), render with Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”), and then add and commit the changes to the Git history with Ctrl-Alt-MCtrl-Alt-M or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “commit”).
Time: ~10 minutes.
We’ve converted the code to import the CGM data into a function. Now, let’s do the same for the sleep data. Use the code you made from the exercise in Section 5.5 and convert that into a function using the same steps we did from above.
A helpful tip: To move around a Quarto or R script more easily, open up the “Document Outline” on the side by clicking the button in the top right corner of the Quarto pane or by using Ctrl-Shift-OCtrl-Shift-O or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “outline”).
docs/learning.qmd
called ## Exercise to make function to import sleep data
.function() {...}
.import_sleep
).file_path
) and replace the here("")
code with file_path
.sleep_101
to simply sleep
.return()
to output the sleep
object.101
and 102
’s sleep data.docs/learning.qmd
file with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “style file”).The code that you write to test that the function works should look and what it should output should be similar to the below code chunk:
import_sleep(here("data-raw/dime/sleep/101.csv"))
# A tibble: 100 × 3
date sleep_type seconds
<dttm> <chr> <dbl>
1 2021-05-24 23:03:00 wake 540
2 2021-05-24 23:12:00 light 180
3 2021-05-24 23:15:00 deep 1440
4 2021-05-24 23:39:00 light 240
5 2021-05-24 23:43:00 wake 300
6 2021-05-24 23:48:00 light 120
7 2021-05-24 23:50:00 rem 1350
8 2021-05-25 00:12:30 wake 870
9 2021-05-25 00:27:00 rem 360
10 2021-05-25 00:33:00 light 210
# ℹ 90 more rows
import_sleep(here("data-raw/dime/sleep/102.csv"))
# A tibble: 100 × 3
date sleep_type seconds
<dttm> <chr> <dbl>
1 2021-06-04 00:54:00 light 180
2 2021-06-04 00:57:00 deep 2340
3 2021-06-04 01:36:00 light 900
4 2021-06-04 01:51:00 deep 1470
5 2021-06-04 02:15:30 light 120
6 2021-06-04 02:17:30 rem 720
7 2021-06-04 02:29:30 light 2700
8 2021-06-04 03:14:30 deep 300
9 2021-06-04 03:19:30 light 1440
10 2021-06-04 03:43:30 rem 1620
# ℹ 90 more rows
When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the teacher 👒 🎩
Mention that the import_sleep()
function is identical to the import_cgm()
. Briefly say that in the next session we will go over making more general functions.
Time: ~6 minutes.
As we prepare for the next session, get up, walk around, and discuss with your neighbour some of the following questions: