Type: Project
Package: LearnR3
Title: Analysis Project for LearnR3
Version: 0.0.1
Encoding: UTF-8
7 Making robust and general-purpose functions
🚧 We are doing major changes to this workshop, so much of the content will be changed. 🚧
7.1 Learning objectives
- Explain what R package dependency management is, why it is necessary when writing code and ensuring reproducibility.
- Use tools like
usethis::use_package()
to manage dependencies in an R project and use::
to explicit use a specific function from a specific package in any functions you make. - Identify and describe some basic principles and patterns for writing more robust, reusable, and general-purpose code.
7.2 💬 Discussion activity: How might you make your functions more general-purpose?
Time: ~8 minutes.
One of the more powerful features of making functions is that you can easily reuse them in other sections of the file or in other files, which we will cover how to do later in this session. Part of that power comes from knowing how to make functions that are general enough to be used in other settings too.
From the previous session, we made the import_cgm()
function together and in the exercises you made the import_sleep()
function. These two functions basically do the same thing. So why have two? We don’t need two! So let’s refactor the import_cgm()
function so it is clearer what it does and can do.
The first step is to look at the code to see what it does and see how you can modify it to be more general-purpose.
- Take 2 minutes to look over the two functions you made and think about what you could do to make it more general-purpose.
- Take 3 minutes to discuss with your neighbour what you think could be done to make it more general-purpose and try to come to a conclusion.
- Then all together and in the remaining time, everyone will share what they’ve thought of.
Try not to look ahead 😜 We won’t generalise the function yet, first we will make it more robust, and then we will generalise it.
7.3 📖 Reading task: Making your function more robust with explicit dependencies
Time: ~10 minutes.
Before we make the function more general-purpose, this is a good time to talk about package dependencies and making your function more robust and trust-worthy.
So what is a package dependency and how do you manage it? Whenever you use an R package in your project, you depend on it in order for your code to work. The informal way to “manage” dependencies is by doing what you’ve already done before: using the library()
function to load the package into R.
As you read others code online or from other researchers, you may notice that sometimes the function require()
is used to load packages like the library()
function is used. The problem with require()
is that if the package can’t be loaded, it doesn’t give an error. That’s because require()
only checks if the package is available and will otherwise continue running the code. As we’ll cover in this course, this can be a very bad thing because if a package isn’t loaded, it can change the behaviour of some of your code and give you potentially wrong results. On the other hand, library()
will give an error if it can’t find the package, which is what you expect if your code depends on a package.
So, what happens if you come back to the project, or get a new computer, or someone else is working on your project too and they want to make sure they have the packages your project needs? How will they know what packages your project uses? What do they do to get those packages installed? Do they have to search through all your files just to find all library()
functions you used and then install those packages individually and manually? A much better way here is to formally indicate your package dependency so that installing dependencies is easy! And we do this by making use of the DESCRIPTION
file.
The advantage of using the DESCRIPTION
file is that it is a standard file used by R projects to store metadata about the project, including which packages are needed to run the project. It also means there are many helper tools available that use this DESCRIPTION
file, including tools to install all the packages you need.
So, if you or someone else wants to install all the packages your project depends on, all you or they have to do is go to the Console and type out (you don’t need to do this right now):
Console
pak::pak()
This function looks into the DESCRIPTION
file and installs all the packages listed as dependencies there.
Where are these package dependencies listed in the DESCRIPTION
file? Open up your DESCRIPTION
file, which you can do quickly with Ctrl-.Ctrl-., typing the file name out, and hitting enter to open it. Your file may or may not look like the below text. If it doesn’t, it isn’t a problem as the text is just to give you an idea of what it might look like.
Notice the Imports:
key. This is where information about package dependencies is added. In the next section, we will go over how to add packages to this field.
7.4 Explicitly state a project’s package dependencies
There are a few ways to add package dependencies to the DESCRIPTION
file. The most straightforward way is to manually write the package you need in the Imports:
section of DESCRIPTION
file. But, there are a few issues with that, mainly, you may not add it correctly. The other, better way to add dependencies is to use the usethis::use_package()
function.
Since we’ve used the here package in our code, let’s add it as a dependency. Go to the Console and let’s type out how to add it. Don’t write this code in your Quarto document, since you don’t want to run it every time you render the document.
Console
usethis::use_package("here")
You will see a bunch of text about adding it to Imports
. If you look in your DESCRIPTION
file now, you’ll see something like:
Type: Project
Package: LearnR3
Title: Analysis Project for LearnR3
Version: 0.0.1
Imports:
here
Encoding: UTF-8
Since we will also make use of the tidyverse set of packages later in the workshop, we’ll also add tidyverse as a dependency.
Console
usethis::use_package("tidyverse")
Error in `refuse_package()`:
✖ tidyverse is a meta-package and it is rarely a good idea to
depend on it.
Please determine the specific underlying package(s) that provide the
function(s) you need and depend on that instead.
ℹ For data analysis projects that use a package structure but do not
implement a formal R package, adding tidyverse to 'Depends' is a
reasonable compromise.
Call `use_package("tidyverse", type = "depends")` to achieve this.
This gives an error though. That’s because the tidyverse is a large collection of packages, so as stated by the message, the recommended way to add this particular dependency is with:
Console
usethis::use_package("tidyverse", type = "Depends")
If you look in the DESCRIPTION
file now, you see that the new Depends
field has been added with tidyverse right below it.
Type: Project
Package: LearnR3
Title: Analysis Project for LearnR3
Version: 0.0.1
Depends:
tidyverse
Imports:
here
Encoding: UTF-8
There are fairly technical reasons why we need to put tidyverse in the Depends
field that you don’t need to know about for this workshop, aside from the fact that it is a common practice in R projects. At least in this context, we use the Depends
field for tidyverse because of one big reason: the usethis::use_package()
function will complain if we try to put tidyverse in the Imports
and it recommends putting it in the Depends
field. The other reason is that you never directly use the tidyverse package, but rather the individual packages that it loads.
Great! Now that we’ve formally established package dependencies in our project, we also need to formally declare which package each function comes from inside our own functions.
7.5 Explicitly state which package a function comes from
One important way of making more robust functions is by coding the exact packages each of our functions come from that we use in our own function. That makes it much easier to reuse, won’t break as easily, and will give more predictable results each time you run it.
Regarding the use of library()
and require()
, you may think that one way of telling your function what package to use is to include library()
or require()
inside the function. This is an incorrect way to do it and can often give completely wrong results without giving any error or warning. Sometimes, on some websites and help forums, you may see code that looks like this:
Or:
This is very bad practice and can have some unintended and serious consequences without giving any warning or error. We won’t get into the reasons why this is incorrect because it can quickly get quite technical and is out of the scope of this workshop.
The correct way to explicitly use a function from a package is using something we’ve already used before with usethis::use_package()
: By using ::
! For every function inside your package, aside from functions that come from {base}
, use package_name::function_name()
.
When we use package_name::function_name
for each function in our function, we are explicitly telling R (and us the readers) where the function comes from. This can be important because sometimes the same function name can be used by multiple packages, for example the filter()
function. So if you don’t explicitly state which package the function is from, R will use the function that it finds first—which isn’t always the function you wanted to use. We also do this step at the end of making the function because doing it while we create it can be quite tedious.
Let’s start doing that with our function. We may not always know which package a function comes from, but we can easily find that out. Let’s start with the first action in our function: read_csv()
. In the Console:
Console
?read_csv
This will open the help page for the read_csv()
function. If you look at the top left corner, you’ll see the package name in curly brackets {}
. This tells you which package the function comes from. In this case, it is readr. So, we can update our function to use readr::read_csv()
instead of just read_csv()
:
There is still more to do, but now it’s your turn to try.
7.6 🧑💻 Exercise: Finish setting the dependencies
Time: ~10 minutes.
While we added readr to the function, we haven’t added it to the
DESCRIPTION
file yet. In the Console, useusethis::use_package()
to add the readr package to theDESCRIPTION
file.There is one other function we use in the
import_cgm()
function. Find it, figure out what package it comes from, use::
to explicitly state the package, and add it toDESCRIPTION
file usingusethis::use_package()
in the Console.
- There is one other package we’ve used that we haven’t added to the
DESCRIPTION
file. We used the package in thedata-raw/dime.R
file. Open that file, which you can do with Ctrl-.Ctrl-. and typing “dime.R” and selecting the file from the menu. In that file, find the package we used and add it to theDESCRIPTION
file usingusethis::use_package()
in the Console.
- Finally, style the code using the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “style file”), render the
docs/learning.qmd
file with Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”) to test that everything still works, and then add and commit the changes to Git with Ctrl-Alt-MCtrl-Alt-M or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “commit”).
7.7 📖 Reading task: Principles to make general-purpose and reusable functions
Briefly reinforce what they read by slowly going through these points below about making generalised functions. Emphasise the principles below, especially the “do one thing” and “keep it small”.
Time: ~3 minutes.
Recall from our discussion at the start of this session about making our import_cgm()
function more general. There are a few ways we could do it, before first, let’s go over some general principles of making functions that are more general-purpose and reusable. These principles are:
- Have the function’s input (the arguments) be generic enough to take different types of objects as long as they are the same “type”. In our case, we have two functions that both take a path “type”. A path is a very general input, so we can keep it as is.
- Have the function’s output be a common “type”, like a vector or data frame. When working in R, it’s a very good practice to have functions output a data frame, since many functions, especially in the tidyverse, take a data frame as input.
- Make the first argument of the function be something that can be “pipe-able”. That way you can chain together your functions with the
|>
operator. In this case, either always havedata
as the first argument to work well with piping from tidyverse functions. - Make your function do one conceptual thing well. For example, read data from a file and make it cleaner. Or convert all columns that are characters into numbers.
- Keep the function small. It is easier to be reused, easier to test, and easier to debug when it has fewer lines of code.
7.8 Generalising our import function
So, let’s make our import_cgm()
function more general-purpose. We know that both the import_cgm()
and import_sleep()
functions do basically the same thing:
- Have a file path as an argument.
- Read in the path to a CSV file with
readr::read_csv()
. - Convert column names to snake case with
snakecase::to_snake_case()
. - Limit the number of rows imported with
n_max
. - Quiet the message about the column types with
show_col_types = FALSE
. - Output the imported data frame.
So, we can combine the two functions into one function that does all of these things. We could call the function a lot of different names (naming is really hard in coding), but let’s keep it simple and call it import_dime()
. We want this function to be able to import different CSV files, for example:
docs/learning.qmd
Let’s generalise the function! Rather than internally say cgm
or sleep
, we can keep it simple can call it data
. Create a new header at the bottom of the docs/learning.qmd
file called ## Import DIME data function
and create a code chunk with Ctrl-Alt-ICtrl-Alt-I or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “new chunk”). Then we’ll write the new function from scratch:
docs/learning.qmd
import_dime <- function(file_path) {
data <- file_path |>
readr::read_csv(
show_col_types = FALSE,
name_repair = snakecase::to_snake_case,
n_max = 100
)
return(data)
}
Before testing it out, let’s make the Roxygen documentation for it with Ctrl-Shift-Alt-RCtrl-Shift-Alt-R or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “roxygen comment”):
docs/learning.qmd
#' Import data from the DIME study dataset.
#'
#' @param file_path Path to the CSV file.
#'
#' @returns A data frame.
#'
import_dime <- function(file_path) {
data <- file_path |>
readr::read_csv(
show_col_types = FALSE,
name_repair = snakecase::to_snake_case,
n_max = 100
)
return(data)
}
Below the function, write out these two lines of code to test that it works:
docs/learning.qmd
import_dime(here("data-raw/dime/cgm/101.csv"))
# A tibble: 100 × 2
device_timestamp historic_glucose_mmol_l
<dttm> <dbl>
1 2021-03-18 08:15:00 5.8
2 2021-03-18 08:30:00 5.4
3 2021-03-18 08:45:00 5.1
4 2021-03-18 09:01:00 5.3
5 2021-03-18 09:16:00 5.3
6 2021-03-18 09:31:00 4.9
7 2021-03-18 09:46:00 4.7
8 2021-03-18 10:01:00 4.8
9 2021-03-18 10:16:00 5.5
10 2021-03-18 10:31:00 5.7
# ℹ 90 more rows
import_dime(here("data-raw/dime/sleep/101.csv"))
# A tibble: 100 × 3
date sleep_type seconds
<dttm> <chr> <dbl>
1 2021-05-24 23:03:00 wake 540
2 2021-05-24 23:12:00 light 180
3 2021-05-24 23:15:00 deep 1440
4 2021-05-24 23:39:00 light 240
5 2021-05-24 23:43:00 wake 300
6 2021-05-24 23:48:00 light 120
7 2021-05-24 23:50:00 rem 1350
8 2021-05-25 00:12:30 wake 870
9 2021-05-25 00:27:00 rem 360
10 2021-05-25 00:33:00 light 210
# ℹ 90 more rows
This should work without any problems 🎉 Let’s style the code with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “style file”) and then render the document with Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”) to test that everything works. If everything works, let’s add and commit the changes to the Git history using Ctrl-Alt-MCtrl-Alt-M or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “commit”).
7.9 Easily reuse stable, robust functions by storing them in the R/
folder
We’ve now created one general-purpose function that we can use later to import many different types of data files. We’ve made it more robust and have tested it so we can be certain it is fairly stable now. So, let’s move the function into a location that we could be able to re-use it in other Quarto documents (if we had more). Since we already have a file called R/functions.R
, we will keep all our stable and tested functions in there.
So, in the docs/learning.qmd
file, only cut the function and it’s Roxygen documentation, open the R/functions.R
with Ctrl-.Ctrl-., and then paste into this file.
The code in the R/functions.R
file should now look like this:
R/functions.R
#' Import data from the DIME study dataset.
#'
#' @param file_path Path to the CSV file.
#'
#' @returns A data frame.
#'
import_dime <- function(file_path) {
data <- file_path |>
readr::read_csv(
show_col_types = FALSE,
name_repair = snakecase::to_snake_case,
n_max = 100
)
return(data)
}
We move the function over into this file for a few reasons:
- To the Quarto document from becoming too long and having too many different functions and code throughout it.
- To make it easier to maintain and find things in your project since you know that all stable, tested functions are in the
R/
folder. - To make use of the
source()
function to load the functions into any Quarto document you want to use them in.
Once we have cut and pasted it into the R/functions.R
file, let’s include source()
in the Quarto document. Open the docs/learning.qmd
file and go to the top of the file to the setup
code chunk. Add the line source(here("R/functions.R"))
to the bottom of the code chunk. This will load the functions into the Quarto document when it is rendered. This means that we can use the functions in the R/functions.R
file without having the actual code be in the Quarto document.
Since we’ve also explicitly used the snakecase::to_snake_case()
function in the import_dime()
function, we don’t need the library(snakecase)
line from the setup
code chunk. So let’s also remove that.
The setup
code chunk should look like this now:
```{r setup}
library(tidyverse)
library(here)
source(here("R/functions.R"))
```
And the bottom of the Quarto document should still have the code:
docs/learning.qmd
import_dime(here("data-raw/dime/cgm/101.csv"))
# A tibble: 100 × 2
device_timestamp historic_glucose_mmol_l
<dttm> <dbl>
1 2021-03-18 08:15:00 5.8
2 2021-03-18 08:30:00 5.4
3 2021-03-18 08:45:00 5.1
4 2021-03-18 09:01:00 5.3
5 2021-03-18 09:16:00 5.3
6 2021-03-18 09:31:00 4.9
7 2021-03-18 09:46:00 4.7
8 2021-03-18 10:01:00 4.8
9 2021-03-18 10:16:00 5.5
10 2021-03-18 10:31:00 5.7
# ℹ 90 more rows
import_dime(here("data-raw/dime/sleep/101.csv"))
# A tibble: 100 × 3
date sleep_type seconds
<dttm> <chr> <dbl>
1 2021-05-24 23:03:00 wake 540
2 2021-05-24 23:12:00 light 180
3 2021-05-24 23:15:00 deep 1440
4 2021-05-24 23:39:00 light 240
5 2021-05-24 23:43:00 wake 300
6 2021-05-24 23:48:00 light 120
7 2021-05-24 23:50:00 rem 1350
8 2021-05-25 00:12:30 wake 870
9 2021-05-25 00:27:00 rem 360
10 2021-05-25 00:33:00 light 210
# ℹ 90 more rows
But not have the code to make the import_dime()
in the Quarto document.
Let’s test it that it works. Render the document with Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”) and check that it works. If it does, then we can add and commit the changes to both the docs/learning.qmd
and R/functions.R
file.
7.10 Key takeaways
- Make it easier to collaborate with yourself in the future and with others by explicitly setting which packages your project depends on. Use
usethis::use_package()
to set the dependency for you in theDESCRIPTION
file. - Create more re-usable and easier to test and debug functions by keeping them small (few lines of code) and that do one (conceptual) thing at a time. Less is more!
- Make your function more robust by explicitly stating which packages the code you use in your function comes from by using
package_name::function_name()
. - Keep your stable, robust functions in a separate file for easier re-use across your files, for instance, in the
R/functions.R
file. You can re-use the functions by usingsource(here("R/functions.R"))
in your Quarto documents.
7.11 💬 Discussion activity: How robust might your code be or code that you’ve read?
Time: ~6 minutes.
As we prepare for the next session and to help reinforce what you’ve learned in this session, get up and walk around with your neighbour and talk about some of these questions.
Part of improving your coding skills is to think about how you can improve your code and the code of others. No one writes perfect code and no one writes great code the first time. Or the second, or the third time. Often code will be refactored multiple times before it is (sufficiently) stable and robust. That is just how coding works.
Being open and receptive to constructive critique and feedback is an essential skill to have as both a researcher and for coding. So it’s important to seek out feedback and to give feedback on your own and others’ code, and try to improve it.
- Think about code you’ve written or that you’ve read from others (online or colleagues). How robust do you think it was? What are some things you could do to make it more robust?
- Together with your neighbour, discuss some of these things you’ve thought about. Try to find out if you have similar thoughts or ideas on how to improve things.
7.12 Code used in this session
This lists some, but not all, of the code used in the section. Some code is incorporated into Markdown content, so is harder to automatically list here in a code chunk. The code below also includes the code from the exercises.
import_cgm <- function(file_path) {
cgm <- file_path |>
readr::read_csv(
show_col_types = FALSE,
name_repair = snakecase::to_snake_case,
n_max = 100
)
return(cgm)
}
usethis::use_package("readr")
usethis::use_package("snakecase")
usethis::use_package("fs")
import_dime <- function(file_path) {
data <- file_path |>
readr::read_csv(
show_col_types = FALSE,
name_repair = snakecase::to_snake_case,
n_max = 100
)
return(data)
}
#' Import data from the DIME study dataset.
#'
#' @param file_path Path to the CSV file.
#'
#' @returns A data frame.
#'
import_dime <- function(file_path) {
data <- file_path |>
readr::read_csv(
show_col_types = FALSE,
name_repair = snakecase::to_snake_case,
n_max = 100
)
return(data)
}
import_dime(here("data-raw/dime/cgm/101.csv"))
import_dime(here("data-raw/dime/sleep/101.csv"))
#' Import data from the DIME study dataset.
#'
#' @param file_path Path to the CSV file.
#'
#' @returns A data frame.
#'
import_dime <- function(file_path) {
data <- file_path |>
readr::read_csv(
show_col_types = FALSE,
name_repair = snakecase::to_snake_case,
n_max = 100
)
return(data)
}