Appendix A — Extra material

Warning

🚧 We are doing major changes to this workshop, so much of the content will be changed. 🚧

A.1 Debugging functions

Debugging is one of activities that seems really scary and difficult, but once you try it and use it, is not nearly as intimidating as it seemed. To debug, which means to find and fix problems in your code, there are several ways, the simplest of which is by inserting the browser() function into the the start of your function, re-running the function by either manually running it or using source with Ctrl-Shift-S or with the Palette (Ctrl-Shift-P, then type “source”), and using it again.

For instance, we have a function like this:

test_debugging <- function(number) {
  new_number <- number + number
  return(new_number)
}
test_debugging(5)

To start the debugger, insert the browser() function into your function:

test_debugging <- function(number) {
  browser()
  new_number <- number + number
  return(new_number)
}
test_debugging(5)

And re-run and use the function again, which will pop up a new debugging panel in RStudio. Sadly, we can’t show this on the website since it only works in RStudio. When you are in RStudio, the debugger will open up and it will show a few things:

A yellow line will highlight the code in the function, along with a green arrow on the left of the line number.
The Console will now start with Browse[1]> and will have text like debug at ....
There will be new buttons on the top of the Console like “Next”, “Continue”, and “Stop”.
The Environment pane will be empty and will say “Traceback”.

In this mode you can really investigate what is happening with your code and how to fix it. The way to figure out what’s wrong is by running the code bit by bit. This debug environment is empty except for the actions that occur within it, so it really can help figure things out.

A.2 Non-standard evaluation (NSE)

Writing your own functions that use tidyverse functions, you may eventually encounter an error that might not be very easy to figure out. Here’s an example where you want to use your own arguments in filter().

test_nse <- function(data, filter_condition) {
  data |>
    dplyr::filter(filter_condition)
}

CO2 |>
  tibble() |>
  test_nse(conc > 100)

The error occurs of something called “non-standard evaluation” (or NSE). NSE is a feature of R and is used quite a lot throughout R (e.g. library()), but is especially used in the tidyverse packages. It’s one of the first things programmers who use other languages complain about when they use R, because it is such a foreign thing in other programming languages.

But NSE is what allows you to use formulas (e.g. y ~ x + x2 in modeling) or allows you to type out select(Gender, BMI) or library(purrr). In “standard evaluation”, these would instead be select("Gender", "BMI") or library("purrr"). So NSE gives flexibility and ease of use for the user (we don’t have to type quotes every time) when doing data analysis, but can give some headaches when programming in R, like when making functions. There’s more detail about this on the dplyr website, which will give a few options to deal with NSE while programming with tidyverse packages, the simplest of which is to wrap the argument with {{}}.

test_nse <- function(data, filter_condition) {
  data |>
    dplyr::filter({{ filter_condition }})
}

CO2 |>
  tibble() |>
  test_nse(conc > 100)

CO2 |>
  tibble() |>
  test_nse(uptake < 30)

A.3 Functionals and for loops

The concept of functional programming can be difficult to grasp. As an alternative to the above explanation, one could also explain functional programming in relation to what it enables. For example, imagine that you have a vector and would like to investigate whether any given number inside this vector is above 5. This can be accomplished using a variety of programming styles. First, we could create a function to check whether a given number is above 5, and manually check each element of the vector:

# Create vector of 10 numbers
numbers <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

# Function to check if a number is above 5
over_five <- function(number) {
  if (number > 5) {
    return(TRUE)
  } else {
    return(FALSE)
  }
}

# Check each element
over_five(numbers[1])
over_five(numbers[2])
over_five(numbers[3])
over_five(numbers[4])
over_five(numbers[5])
over_five(numbers[6])
over_five(numbers[7])
over_five(numbers[8])
over_five(numbers[9])
over_five(numbers[10])

Such an approach is verbose, not “functional” in style, and very inflexible. Imagine if the vector had contained 100 numbers instead of 10. In such a case, it would have been unreasonable to manually check each element. As an alternative we could accomplish the same by using a for loop:

# Initialize a vector to capture the output in the for loop
output <- vector("logical", length = length(numbers))

# Use seq_along to get for loop to start from 1 and end at the end of numbers
for (item in seq_along(numbers)) {
  output[item] <- over_five(numbers[item])
}
output

This for loop is taking each item from numbers, using over_five() on it, and then saving the output into the object output. So this loop allows us to do the same as we did manually, but more precisely and with less code. If the vector has more items or less, it doesn’t matter to the for loop since it will run no matter the length. You might notice already that there are several technical things going on, like the use of vector(), seq_along(), and output[item]. We won’t explain them here, but this is meant to highlight that loops aren’t easy to actually use properly.

For some situations, for loops are the perfect solution. However, they are not R’s strength, but rather functional programming is. In this case, we replace the for loop with the functional map(), which would make the code shorter and more robust.

library(tidyverse)
# Functional
map(numbers, over_five)

This works great as is, but sometimes you might not need to create a new function to use inside of map(). Instead we could use an anonymous function with \() (or either function() or ~).

# Anonymous function
map(numbers, \(x) if (x > 5) TRUE else FALSE)