17  Making robust and reusable functions

While we made a simple function for some simple code, reusing that function to read in data from other files while using it in other Quarto documents or R scripts means it needs to be robust enough and accessible to be used in those other places. This session we’ll make our functions more robust, reusable, and accessible.

17.1 Learning objectives

  1. Explain what R package dependency management is, why it is necessary to write robust code and ensure reproducibility.
  2. Use tools like usethis::use_package() to manage dependencies in an R project and use :: to explicit use a specific function from a specific package in any functions you make.
  3. Describe and apply a workflow of prototyping code into a working function in a Quarto document, moving the function into a script (called, e.g., functions.R) once they are robust and tested, and then using source() to allow easy reuse of functions in other R scripts or Quarto documents.

17.2 📖 Reading task: Making your function more robust with explicit dependencies

Time: ~10 minutes.

Our read() function works fine right now, but there’s a problem with it: it is fragile. That’s because it relies on us loading the packages it uses first before we can use it. So this is a good time to learn about package dependencies and making your function more robust and trust-worthy.

So what is a package dependency and how do you manage it? Whenever you use an R package in your project, you depend on it in order for your code to work. The informal way to “manage” dependencies is by doing what you’ve already done before: using the library() function to load the package into R.

As you read others code online or from other researchers, you may notice that sometimes the function require() is used to load packages like the library() function is used. The problem with require() is that if the package can’t be loaded, it doesn’t give an error. That’s because require() only checks if the package is available and will otherwise continue running the code. As we’ll cover in this course, this can be a very bad thing because if a package isn’t loaded, it can change the behaviour of some of your code and give you potentially wrong results. On the other hand, library() will give an error if it can’t find the package, which is what you expect if your code depends on a package.

So, what happens if you come back to the project, or get a new computer, or someone else is working on your project too and they want to make sure they have the packages your project needs? How will they know what packages your project uses? What do they do to get those packages installed? Do they have to search through all your files just to find all library() functions you used and then install those packages individually and manually? A much better way here is to formally indicate your package dependency so that installing dependencies is easy! And we do this by making use of the DESCRIPTION file.

The advantage of using the DESCRIPTION file is that it is a standard file used by R projects to store metadata about the project, including which packages are needed to run the project. It also means there are many helper tools available that use this DESCRIPTION file, including tools to install all the packages you need.

So, if you or someone else wants to install all the packages your project depends on, all you or they have to do is go to the Console and type out (you don’t need to do this right now):

Console
pak::pak()

This function looks into the DESCRIPTION file and installs all the packages listed as dependencies there.

Where are these package dependencies listed in the DESCRIPTION file? Open up your DESCRIPTION file, which you can do quickly with Ctrl-., typing the file name out, and hitting enter to open it. Your file may or may not look like the below text. If it doesn’t, that’s ok as the below text is just to give you an idea of what it might look like.

Type: Project
Package: LearnR3
Title: Analysis Project for LearnR3
Version: 0.0.1
Encoding: UTF-8

While we cannot see how package dependencies are defined yet, we will in the next session create an Imports: key and go over how to add packages to this field.

CautionSticky/hat up!

When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the teacher 👒 🎩

17.3 Explicitly state a project’s package dependencies

There are a few ways to add package dependencies to the DESCRIPTION file. The most straightforward way is to manually write the package you need in the Imports: section of DESCRIPTION file. But, there are a few issues with that, mainly, you may not add it correctly. The other, better way to add dependencies is to use the usethis::use_package() function.

Since we’ve used the here package in our code, let’s add it as a dependency. Go to the Console and let’s type out how to add it. Don’t write this code in your Quarto document, since you don’t want to run it every time you render the document.

Console
usethis::use_package("here")

You will see a bunch of text about adding it to Imports. If you look in your DESCRIPTION file now, you’ll see something like:

Type: Project
Package: LearnR3
Title: Analysis Project for LearnR3
Version: 0.0.1
Imports:
    here
Encoding: UTF-8

Since we will also make use of the tidyverse set of packages later in the workshop, we’ll also add tidyverse as a dependency.

Console
usethis::use_package("tidyverse")
Error in `refuse_package()`:
✖ tidyverse is a meta-package and it is rarely a good idea to
  depend on it.
Please determine the specific underlying package(s) that provide the
function(s) you need and depend on that instead.
ℹ For data analysis projects that use a package structure but do not
  implement a formal R package, adding tidyverse to 'Depends' is a
  reasonable compromise.
Call `use_package("tidyverse", type = "depends")` to achieve this.

This gives an error though. That’s because the tidyverse is a large collection of packages, so as stated by the message, the recommended way to add this particular dependency is with:

Console
usethis::use_package("tidyverse", type = "Depends")

The usethis::use_package() still gives us a warning, but it isn’t an error so we can ignore it. If you look in the DESCRIPTION file now, you see that the new Depends field has been added with tidyverse right below it.

Type: Project
Package: LearnR3
Title: Analysis Project for LearnR3
Version: 0.0.1
Depends:
    tidyverse
Imports:
    here
Encoding: UTF-8

There are fairly technical reasons why we need to put tidyverse in the Depends field that you don’t need to know about for this workshop, aside from the fact that it is a common practice in R projects. At least in this context, we use the Depends field for tidyverse because of one big reason: the usethis::use_package() function will complain if we try to put tidyverse in the Imports and it recommends putting it in the Depends field. The other reason is that you never directly use the tidyverse package, but rather the individual packages that it loads.

Great! Now that we’ve formally established package dependencies in our project, we also need to formally declare which package each function comes from inside our own functions.

17.4 Explicitly state which package a function comes from

Put on the projector the callout block below and explain why we want to make dependencies within functions more explicit and why using library() and require() is a bad idea.

One important way of making more robust functions is by coding the exact packages each of our functions come from that we use in our own function. That makes it much easier to reuse, won’t break as easily, and will give more predictable results each time you run it.

Important

Regarding the use of library() and require(), you may think that one way of telling your function what package to use is to include library() or require() inside the function. This is an incorrect way to do it and can often give completely wrong results without giving any error or warning. Sometimes, on some websites and help forums, you may see code that looks like this:

add_numbers <- function(num1, num2) {
  library(package_name)
  ...code...
  return(added)
}

Or:

add_numbers <- function(num1, num2) {
  require(package_name)
  ...code...
  return(added)
}

This is very bad practice and can have some unintended and serious consequences without giving any warning or error. We won’t get into the reasons why this is incorrect because it can quickly get quite technical and is out of the scope of this workshop.

The correct way to explicitly use a function from a package is using something we’ve already used before with usethis::use_package(): By using ::! For every function inside your package, aside from functions that come from base R, use package_name::function_name().

When we use package_name::function_name for each function in our function, we are explicitly telling R (and us the readers) where the function comes from. This can be important because sometimes the same function name can be used by multiple packages, for example the filter() function. So if you don’t explicitly state which package the function is from, R will use the function that it finds first—which isn’t always the function you wanted to use. We also do this step at the end of making the function because doing it while we create it can be quite tedious.

Let’s start doing that with our function. We may not always know which package a function comes from, but we can easily find that out. Let’s start with the first action in our function: read_csv(). In the Console:

Console
?read_csv

This will open the help page for the read_csv() function. If you look at the top left corner, you’ll see the package name in curly brackets {}. This tells you which package the function comes from. In this case, it is readr. So, we can update our function to use readr::read_csv() instead of just read_csv():

read <- function(file_path, max_rows = 100) {
  data <- file_path |>
    readr::read_csv(
      show_col_types = FALSE,
      name_repair = to_snake_case,
      n_max = max_rows
    )
  return(data)
}

There is still more to do, but now it’s your turn to try.

17.5 🧑‍💻 Exercise: Finish setting the dependencies

Time: ~10 minutes.

In this exercise, you will finish setting the dependencies for the read() function as well as the DESCRIPTION file.

  1. While we added readr to the function, we haven’t added it to the DESCRIPTION file yet. In the Console, use usethis::use_package() to add the readr package to the DESCRIPTION file.
  2. There is one other function we use in the read() function. Find it, figure out what package it comes from, use :: to explicitly state the package, and add it to DESCRIPTION file using usethis::use_package() in the Console.
  3. Finally, style the code using the Palette (Ctrl-Shift-P, then type “style file”), render the docs/learning.qmd file with Ctrl-Shift-K or with the Palette (Ctrl-Shift-P, then type “render”) to test that everything still works, and then add and commit the changes to Git with Ctrl-Alt-M or with the Palette (Ctrl-Shift-P, then type “commit”). While in the interface, push to GitHub.
CautionSticky/hat up!

When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the teacher 👒 🎩

17.6 📖 Reading task: Easily reuse stable, robust functions by storing them in the R/ folder

Briefly explain the workflow a bit more, highlighting the diagram.

Time: ~5 minutes.

While you use Quarto to test out and prototype code, you’ll use R scripts, like R/functions.R, to keep the code you have already tested out already and are fairly confident that it works as intended. This workflow, of creating code and converting it into a function, is called a “function-based workflow”. This is an incredibly common workflow in R projects and forms the basis for many other workflows and tools, such as ones that are covered in the advanced workshop.

So you’ll use Quarto (docs/learning.qmd) to write and test out code, convert it into a function (that we will cover in this workshop), and then move it into R/functions.R script. We have this split to create a separation, cognitively and physically, between the prototyping code and the finalized, tested code. Then, within the Quarto document we can source() the R/functions.R script so we have access to the stable and tested code. We use source() to tell R to look in the file we give it, run all the code in the file, and include the contents into our working environment. That way, you can keep more code in other locations to make your code more organised. This workflow is represented below in Figure 17.1.

flowchart TD
  quarto[/"Quarto:<br>docs/learning.qmd"/] --> code(Prototyping<br>R code)
  code --> test("Testing that<br>code works")
  test -- "Cut & paste<br>Commit to Git"--> functions[/"R/functions.R"/]
  functions -- "source()" --> quarto
Figure 17.1: Workflow for prototyping code in Quarto, moving it to an R script, then sourcing the script from Quarto.
CautionSticky/hat up!

When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the teacher 👒 🎩

17.7 Cutting and pasting functions in R/functions.R

Really emphasize to cut and paste, so that the function in the docs/learning.qmd file is deleted and no longer kept in the Quarto document.

We’ve now created one general-purpose function that we can use later to import many different types of data files. We’ve made it more robust and have tested it so we can be certain it is fairly stable now. Let’s move the function into a location so that we would be able to re-use it in other Quarto documents (if we had more). Since we already have a file called R/functions.R, we will keep all our stable and tested functions in there.

So, in the docs/learning.qmd file, only cut the function and it’s Roxygen documentation, open the R/functions.R with Ctrl-., and then paste into this file.

The code in the R/functions.R file should now look like this:

R/functions.R
#' Read in one nurses' stress data file.
#'
#' @param file_path Path to the data file.
#' @param max_rows Maximum number of rows to read.
#'
#' @returns Outputs a data frame/tibble.
#'
read <- function(file_path, max_rows = 100) {
  data <- file_path |>
    readr::read_csv(
      show_col_types = FALSE,
      name_repair = snakecase::to_snake_case,
      n_max = max_rows
    )
  return(data)
}

We move the function over into this file for a few reasons:

  1. To avoid the Quarto document from becoming too long and having too many different functions and code throughout it.
  2. To make it easier to maintain and find things in your project since you know that all stable, tested functions are in the R/ folder.
  3. To make use of the source() function to load the functions into any Quarto document you want to use them in.

Once we have cut and pasted it into the R/functions.R file, let’s include source() in the Quarto document. Open the docs/learning.qmd file and go to the top of the file to the setup code chunk. Add the line source(here("R/functions.R")) to the bottom of the code chunk. This will load the functions into the Quarto document when it is rendered. This means that we can use the functions in the R/functions.R file without having the actual code be in the Quarto document.

While we’re there, let’s remove the library(snakecase) since we’ve explicitly called it in our function with snakecase::to_snake_case(). The setup code chunk should look like this now:

```{r setup}
library(tidyverse)
library(here)
source(here("R/functions.R"))
```

Then, in the code chunks in the Quarto document where you’ve written the read() function (if you didn’t cut and paste it into R/functions.R), let’s stop Quarto from running it by adding eval: false to the code chunk options. That way, you can keep the code in your docs/learning.qmd file for reference, but it won’t be run when you render the document. For those code chunks, it should look like:

```{r}
#| eval: false
# Code ...
```

Let’s test it that it works. Render the document with Ctrl-Shift-K or with the Palette (Ctrl-Shift-P, then type “render”) and check that it works. If it does, then we can add and commit the changes to both the docs/learning.qmd and R/functions.R file before then pushing to GitHub.

17.8 💬 Discussion activity: How robust might your code be or code that you’ve read?

Time: ~6 minutes.

Before we end for the day, help reinforce what you’ve learned in this session by discussing with your neighbour some of these questions.

Tip

Part of improving your coding skills is to think about how you can improve your code and the code of others. No one writes perfect code and no one writes great code the first time. Or the second, or the third time. Often code will be refactored multiple times before it is (sufficiently) stable and robust. That is just how coding works.

Being open and receptive to constructive critique and feedback is an essential skill to have as both a researcher and for coding. So it’s important to seek out feedback and to give feedback on your own and others’ code, and try to improve it.

  1. Think about code you’ve written or that you’ve read from others (online or colleagues). How robust do you think it was? What are some things you could do to make it more robust?
  2. Together with your neighbour, discuss some of these things you’ve thought about. Try to find out if you have similar thoughts or ideas on how to improve things.

17.9 Key takeaways

Quickly cover this and get them to do the survey before moving on to the discussion activity.

  • Make it easier to collaborate with yourself in the future and with others by explicitly setting which packages your project depends on. Use usethis::use_package() to set the dependency for you in the DESCRIPTION file.
  • Create more re-usable and easier to test and debug functions by keeping them small (few lines of code) and that do one (conceptual) thing at a time. Less is more!
  • Make your function more robust by explicitly stating which packages the code you use in your function comes from by using package_name::function_name().
  • Keep your stable, robust functions in a separate file for easier re-use across your files, for instance, in the R/functions.R file. You can re-use the functions by using source(here("R/functions.R")) in your Quarto documents.

17.10 Code used in session

This lists some, but not all, of the code used in the section. Some code is incorporated into Markdown content, so is harder to automatically list here in a code chunk. The code below also includes the code from the exercises.