Console
pak::pak()
R/
folderR/functions.R
usethis::use_package()
to manage dependencies in an R project and use ::
to explicit use a specific function from a specific package in any functions you make.Time: ~8 minutes.
One of the more powerful features of making functions is that you can easily reuse them in other sections of the file or in other files, which we will cover how to do later in this session. Part of that power comes from knowing how to make functions that are general enough to be used in other settings too.
From the previous session, we made the import_cgm()
function together and in the exercises you made the import_sleep()
function. These two functions basically do the same thing. So why have two? We don’t need two! So let’s refactor the import_cgm()
function so it is clearer what it does and can do.
The first step is to look at the code to see what it does and see how you can modify it to be more general-purpose.
Try not to look ahead 😜 We won’t generalise the function yet, first we will make it more robust, and then we will generalise it.
Time: ~10 minutes.
Before we make the function more general-purpose, this is a good time to talk about package dependencies and making your function more robust and trust-worthy.
So what is a package dependency and how do you manage it? Whenever you use an R package in your project, you depend on it in order for your code to work. The informal way to “manage” dependencies is by doing what you’ve already done before: using the library()
function to load the package into R.
As you read others code online or from other researchers, you may notice that sometimes the function require()
is used to load packages like the library()
function is used. The problem with require()
is that if the package can’t be loaded, it doesn’t give an error. That’s because require()
only checks if the package is available and will otherwise continue running the code. As we’ll cover in this course, this can be a very bad thing because if a package isn’t loaded, it can change the behaviour of some of your code and give you potentially wrong results. On the other hand, library()
will give an error if it can’t find the package, which is what you expect if your code depends on a package.
So, what happens if you come back to the project, or get a new computer, or someone else is working on your project too and they want to make sure they have the packages your project needs? How will they know what packages your project uses? What do they do to get those packages installed? Do they have to search through all your files just to find all library()
functions you used and then install those packages individually and manually? A much better way here is to formally indicate your package dependency so that installing dependencies is easy! And we do this by making use of the DESCRIPTION
file.
The advantage of using the DESCRIPTION
file is that it is a standard file used by R projects to store metadata about the project, including which packages are needed to run the project. It also means there are many helper tools available that use this DESCRIPTION
file, including tools to install all the packages you need.
So, if you or someone else wants to install all the packages your project depends on, all you or they have to do is go to the Console and type out (you don’t need to do this right now):
Console
pak::pak()
This function looks into the DESCRIPTION
file and installs all the packages listed as dependencies there.
Where are these package dependencies listed in the DESCRIPTION
file? Open up your DESCRIPTION
file, which you can do quickly with Ctrl-.Ctrl-., typing the file name out, and hitting enter to open it. Your file may or may not look like the below text. If it doesn’t, it isn’t a problem as the text is just to give you an idea of what it might look like.
Type: Project
Package: LearnR3
Title: Analysis Project for LearnR3
Version: 0.0.1
Encoding: UTF-8
While we cannot see how package dependencies are defined yet, we will in the next session create an Imports:
key and go over how to add packages to this field.
When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the teacher 👒 🎩
There are a few ways to add package dependencies to the DESCRIPTION
file. The most straightforward way is to manually write the package you need in the Imports:
section of DESCRIPTION
file. But, there are a few issues with that, mainly, you may not add it correctly. The other, better way to add dependencies is to use the usethis::use_package()
function.
Since we’ve used the here package in our code, let’s add it as a dependency. Go to the Console and let’s type out how to add it. Don’t write this code in your Quarto document, since you don’t want to run it every time you render the document.
Console
usethis::use_package("here")
You will see a bunch of text about adding it to Imports
. If you look in your DESCRIPTION
file now, you’ll see something like:
Type: Project
Package: LearnR3
Title: Analysis Project for LearnR3
Version: 0.0.1
Imports:
here
Encoding: UTF-8
Since we will also make use of the tidyverse set of packages later in the workshop, we’ll also add tidyverse as a dependency.
Console
usethis::use_package("tidyverse")
Error in `refuse_package()`:
✖ tidyverse is a meta-package and it is rarely a good idea to
depend on it.
Please determine the specific underlying package(s) that provide the
function(s) you need and depend on that instead.
ℹ For data analysis projects that use a package structure but do not
implement a formal R package, adding tidyverse to 'Depends' is a
reasonable compromise.
Call `use_package("tidyverse", type = "depends")` to achieve this.
This gives an error though. That’s because the tidyverse is a large collection of packages, so as stated by the message, the recommended way to add this particular dependency is with:
Console
usethis::use_package("tidyverse", type = "Depends")
If you look in the DESCRIPTION
file now, you see that the new Depends
field has been added with tidyverse right below it.
Type: Project
Package: LearnR3
Title: Analysis Project for LearnR3
Version: 0.0.1
Depends:
tidyverse
Imports:
here
Encoding: UTF-8
There are fairly technical reasons why we need to put tidyverse in the Depends
field that you don’t need to know about for this workshop, aside from the fact that it is a common practice in R projects. At least in this context, we use the Depends
field for tidyverse because of one big reason: the usethis::use_package()
function will complain if we try to put tidyverse in the Imports
and it recommends putting it in the Depends
field. The other reason is that you never directly use the tidyverse package, but rather the individual packages that it loads.
Great! Now that we’ve formally established package dependencies in our project, we also need to formally declare which package each function comes from inside our own functions.
One important way of making more robust functions is by coding the exact packages each of our functions come from that we use in our own function. That makes it much easier to reuse, won’t break as easily, and will give more predictable results each time you run it.
Regarding the use of library()
and require()
, you may think that one way of telling your function what package to use is to include library()
or require()
inside the function. This is an incorrect way to do it and can often give completely wrong results without giving any error or warning. Sometimes, on some websites and help forums, you may see code that looks like this:
Or:
This is very bad practice and can have some unintended and serious consequences without giving any warning or error. We won’t get into the reasons why this is incorrect because it can quickly get quite technical and is out of the scope of this workshop.
The correct way to explicitly use a function from a package is using something we’ve already used before with usethis::use_package()
: By using ::
! For every function inside your package, aside from functions that come from {base}
, use package_name::function_name()
.
When we use package_name::function_name
for each function in our function, we are explicitly telling R (and us the readers) where the function comes from. This can be important because sometimes the same function name can be used by multiple packages, for example the filter()
function. So if you don’t explicitly state which package the function is from, R will use the function that it finds first—which isn’t always the function you wanted to use. We also do this step at the end of making the function because doing it while we create it can be quite tedious.
Let’s start doing that with our function. We may not always know which package a function comes from, but we can easily find that out. Let’s start with the first action in our function: read_csv()
. In the Console:
Console
?read_csv
This will open the help page for the read_csv()
function. If you look at the top left corner, you’ll see the package name in curly brackets {}
. This tells you which package the function comes from. In this case, it is readr. So, we can update our function to use readr::read_csv()
instead of just read_csv()
:
There is still more to do, but now it’s your turn to try.
Time: ~10 minutes.
In this exercise, you will finish setting the dependencies for the import_cgm()
function as well as the DESCRIPTION
file.
DESCRIPTION
file yet. In the Console, use usethis::use_package()
to add the readr package to the DESCRIPTION
file.import_cgm()
function. Find it, figure out what package it comes from, use ::
to explicitly state the package, and add it to DESCRIPTION
file using usethis::use_package()
in the Console.DESCRIPTION
file. We used the package in the data-raw/dime.R
file. Open that file, which you can do with Ctrl-.Ctrl-. and typing “dime.R” and selecting the file from the menu. In that file, find the package we used and add it to the DESCRIPTION
file using usethis::use_package()
in the Console.docs/learning.qmd
file with Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”) to test that everything still works, and then add and commit the changes to Git with Ctrl-Alt-MCtrl-Alt-M or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “commit”). While in the interface, push to GitHub.When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the teacher 👒 🎩
Briefly reinforce what they read by slowly going through these points below about making generalised functions. Emphasise the principles below, especially the “do one thing” and “keep it small”.
Time: ~3 minutes.
Recall from our discussion at the start of this session about making our import_cgm()
function more general. There are a few ways we could do it, before first, let’s go over some general principles of making functions that are more general-purpose and reusable. These principles are:
|>
operator. In this case, either always have data
as the first argument to work well with piping from tidyverse functions.When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the teacher 👒 🎩
So, let’s make our import_cgm()
function more general-purpose. We know that both the import_cgm()
and import_sleep()
functions do basically the same thing:
readr::read_csv()
.snakecase::to_snake_case()
.n_max
.show_col_types = FALSE
.So, we can combine the two functions into one function that does all of these things. We could call the function a lot of different names (naming is really hard in coding), but let’s keep it simple and call it import_dime()
. We want this function to be able to import different CSV files, for example:
Let’s generalise the function! Rather than internally say cgm
or sleep
, we can keep it simple can call it data
. Create a new header at the bottom of the docs/learning.qmd
file called ## Import DIME data function
and create a code chunk with Ctrl-Alt-ICtrl-Alt-I or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “new chunk”). Then we’ll write the new function from scratch:
docs/learning.qmd
import_dime <- function(file_path) {
data <- file_path |>
readr::read_csv(
show_col_types = FALSE,
name_repair = snakecase::to_snake_case,
n_max = 100
)
return(data)
}
Before testing it out, let’s make the Roxygen documentation for it with Ctrl-Shift-Alt-RCtrl-Shift-Alt-R or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “roxygen comment”):
docs/learning.qmd
#' Import data from the DIME study dataset.
#'
#' @param file_path Path to the CSV file.
#'
#' @returns A data frame.
#'
import_dime <- function(file_path) {
data <- file_path |>
readr::read_csv(
show_col_types = FALSE,
name_repair = snakecase::to_snake_case,
n_max = 100
)
return(data)
}
Below the function, write out these two lines of code to test that it works:
docs/learning.qmd
here("data-raw/dime/cgm/101.csv") |>
import_dime()
# A tibble: 100 × 2
device_timestamp historic_glucose_mmol_l
<dttm> <dbl>
1 2021-03-18 08:15:00 5.8
2 2021-03-18 08:30:00 5.4
3 2021-03-18 08:45:00 5.1
4 2021-03-18 09:01:00 5.3
5 2021-03-18 09:16:00 5.3
6 2021-03-18 09:31:00 4.9
7 2021-03-18 09:46:00 4.7
8 2021-03-18 10:01:00 4.8
9 2021-03-18 10:16:00 5.5
10 2021-03-18 10:31:00 5.7
# ℹ 90 more rows
here("data-raw/dime/sleep/101.csv") |>
import_dime()
# A tibble: 100 × 3
date sleep_type seconds
<dttm> <chr> <dbl>
1 2021-05-24 23:03:00 wake 540
2 2021-05-24 23:12:00 light 180
3 2021-05-24 23:15:00 deep 1440
4 2021-05-24 23:39:00 light 240
5 2021-05-24 23:43:00 wake 300
6 2021-05-24 23:48:00 light 120
7 2021-05-24 23:50:00 rem 1350
8 2021-05-25 00:12:30 wake 870
9 2021-05-25 00:27:00 rem 360
10 2021-05-25 00:33:00 light 210
# ℹ 90 more rows
This should work without any problems 🎉 Let’s style the code with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “style file”) and then render the document with Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”) to test that everything works. If everything works, let’s add and commit the changes to the Git history using Ctrl-Alt-MCtrl-Alt-M or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “commit”). Then, push to GitHub.
R/
folderBriefly explain the workflow a bit more, highlighting the diagram.
Time: ~5 minutes.
While you use Quarto to test out and prototype code, you’ll use R scripts, like R/functions.R
, to keep the code you have already tested out already and are fairly confident that it works as intended. This workflow, of creating code and converting it into a function, is called a “function-based workflow”. This is an incredibly common workflow in R projects and forms the basis for many other workflows and tools, such as ones that are covered in the advanced workshop.
So you’ll use Quarto (docs/learning.qmd
) to write and test out code, convert it into a function (that we will cover in this workshop), and then move it into R/functions.R
script. We have this split to create a separation, cognitively and physically, between the prototyping code and the finalized, tested code. Then, within the Quarto document we can source()
the R/functions.R
script so we have access to the stable and tested code. We use source()
to tell R to look in the file we give it, run all the code in the file, and include the contents into our working environment. That way, you can keep more code in other locations to make your code more organised. This workflow is represented below in Figure 7.1.
flowchart TD quarto[/"Quarto:<br>docs/learning.qmd"/] --> code(Prototyping<br>R code) code --> test("Testing that<br>code works") test -- "Cut & paste<br>Commit to Git"--> functions[/"R/functions.R"/] functions -- "source()" --> quarto
When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the teacher 👒 🎩
R/functions.R
Really emphasize to cut and paste, so that the function in the docs/learning.qmd
file is deleted and no longer kept in the Quarto document.
We’ve now created one general-purpose function that we can use later to import many different types of data files. We’ve made it more robust and have tested it so we can be certain it is fairly stable now. Let’s move the function into a location so that we would be able to re-use it in other Quarto documents (if we had more). Since we already have a file called R/functions.R
, we will keep all our stable and tested functions in there.
So, in the docs/learning.qmd
file, only cut the function and it’s Roxygen documentation, open the R/functions.R
with Ctrl-.Ctrl-., and then paste into this file.
The code in the R/functions.R
file should now look like this:
R/functions.R
#' Import data from the DIME study dataset.
#'
#' @param file_path Path to the CSV file.
#'
#' @returns A data frame.
#'
import_dime <- function(file_path) {
data <- file_path |>
readr::read_csv(
show_col_types = FALSE,
name_repair = snakecase::to_snake_case,
n_max = 100
)
return(data)
}
We move the function over into this file for a few reasons:
R/
folder.source()
function to load the functions into any Quarto document you want to use them in.Once we have cut and pasted it into the R/functions.R
file, let’s include source()
in the Quarto document. Open the docs/learning.qmd
file and go to the top of the file to the setup
code chunk. Add the line source(here("R/functions.R"))
to the bottom of the code chunk. This will load the functions into the Quarto document when it is rendered. This means that we can use the functions in the R/functions.R
file without having the actual code be in the Quarto document.
The setup
code chunk should look like this now:
And the bottom of the Quarto document should still have the code:
But not have the code to make the import_dime()
in the Quarto document.
Let’s test it that it works. Render the document with Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”) and check that it works. If it does, then we can add and commit the changes to both the docs/learning.qmd
and R/functions.R
file before then pushing to GitHub.
Quickly cover this and get them to do the survey before moving on to the discussion activity.
usethis::use_package()
to set the dependency for you in the DESCRIPTION
file.package_name::function_name()
.
R/functions.R
file. You can re-use the functions by using source(here("R/functions.R"))
in your Quarto documents.Time: ~6 minutes.
Before we end for the day, help reinforce what you’ve learned in this session by discussing with your neighbour some of these questions.
Part of improving your coding skills is to think about how you can improve your code and the code of others. No one writes perfect code and no one writes great code the first time. Or the second, or the third time. Often code will be refactored multiple times before it is (sufficiently) stable and robust. That is just how coding works.
Being open and receptive to constructive critique and feedback is an essential skill to have as both a researcher and for coding. So it’s important to seek out feedback and to give feedback on your own and others’ code, and try to improve it.
This lists some, but not all, of the code used in the section. Some code is incorporated into Markdown content, so is harder to automatically list here in a code chunk. The code below also includes the code from the exercises.
pak::pak()
usethis::use_package("here")
usethis::use_package("tidyverse")
usethis::use_package("tidyverse", type = "Depends")
?read_csv
import_cgm <- function(file_path) {
cgm <- file_path |>
readr::read_csv(
show_col_types = FALSE,
name_repair = to_snake_case,
n_max = 100
)
return(cgm)
}
usethis::use_package("readr")
usethis::use_package("snakecase")
usethis::use_package("fs")
import_cgm <- function(file_path) {
cgm <- file_path |>
readr::read_csv(
show_col_types = FALSE,
name_repair = snakecase::to_snake_case,
n_max = 100
)
return(cgm)
}
here("data-raw/dime/cgm/101.csv") |>
import_dime()
here("data-raw/dime/sleep/101.csv") |>
import_dime()
import_dime <- function(file_path) {
data <- file_path |>
readr::read_csv(
show_col_types = FALSE,
name_repair = snakecase::to_snake_case,
n_max = 100
)
return(data)
}
#' Import data from the DIME study dataset.
#'
#' @param file_path Path to the CSV file.
#'
#' @returns A data frame.
#'
import_dime <- function(file_path) {
data <- file_path |>
readr::read_csv(
show_col_types = FALSE,
name_repair = snakecase::to_snake_case,
n_max = 100
)
return(data)
}
here("data-raw/dime/cgm/101.csv") |>
import_dime()
here("data-raw/dime/sleep/101.csv") |>
import_dime()
#' Import data from the DIME study dataset.
#'
#' @param file_path Path to the CSV file.
#'
#' @returns A data frame.
#'
import_dime <- function(file_path) {
data <- file_path |>
readr::read_csv(
show_col_types = FALSE,
name_repair = snakecase::to_snake_case,
n_max = 100
)
return(data)
}