Want to help out or contribute?

If you find any typos, errors, or places where the text may be improved, please let us know by providing feedback either in the feedback survey (given during class) or by using GitHub.

On GitHub open an issue or submit a pull request by clicking the " Edit this page" link at the side of this page.

Appendix C — Extra material

C.1 Functionals and for loops

The concept of functional programming can be difficult to grasp. As an alternative to the above explanation, one could also explain functional programming in relation to what it enables. For example, imagine that you have a vector and would like to investigate whether any given number inside this vector is above 5. This can be accomplished using a variety of programming styles. First, we could create a function to check whether a given number is above 5, and manually check each element of the vector:

# Create vector of 10 numbers
numbers <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

# Function to check if a number is above 5
over_five <- function(number) {
  if (number > 5) {
    return(TRUE)
  } else {
    return(FALSE)
  }
}

# Check each element
over_five(numbers[1])
#> [1] FALSE
over_five(numbers[2])
#> [1] FALSE
over_five(numbers[3])
#> [1] FALSE
over_five(numbers[4])
#> [1] FALSE
over_five(numbers[5])
#> [1] FALSE
over_five(numbers[6])
#> [1] TRUE
over_five(numbers[7])
#> [1] TRUE
over_five(numbers[8])
#> [1] TRUE
over_five(numbers[9])
#> [1] TRUE
over_five(numbers[10])
#> [1] TRUE

Such an approach is verbose, not “functional” in style, and very inflexible. Imagine if the vector had contained 100 numbers instead of 10. In such a case, it would have been unreasonable to manually check each element. As an alternative we could accomplish the same by using a for loop:

# Initialize a vector to capture the output in the for loop
output <- vector("logical", length = length(numbers))

# Use seq_along to get for loop to start from 1 and end at the end of numbers
for (item in seq_along(numbers)) {
  output[item] <- over_five(numbers[item])
}
output
#>  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

This for loop is taking each item from numbers, using over_five() on it, and then saving the output into the object output. So this loop allows us to do the same as we did manually, but more precisely and with less code. If the vector has more items or less, it doesn’t matter to the for loop since it will run no matter the length. You might notice already that there are several technical things going on, like the use of vector(), seq_along(), and output[item]. We won’t explain them here, but this is meant to highlight that loops aren’t easy to actually use properly.

For some situations, for loops are the perfect solution. However, they are not R’s strength, but rather functional programming is. In this case, we replace the for loop with the functional map(), which would make the code shorter and more robust.

library(tidyverse)
#> ── Attaching core tidyverse packages ──────────── tidyverse 2.0.0 ──
#> ✔ dplyr     1.1.4     ✔ readr     2.1.5
#> ✔ forcats   1.0.0     ✔ stringr   1.5.1
#> ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
#> ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
#> ✔ purrr     1.0.2     
#> ── Conflicts ────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Functional
map(numbers, over_five)
#> [[1]]
#> [1] FALSE
#> 
#> [[2]]
#> [1] FALSE
#> 
#> [[3]]
#> [1] FALSE
#> 
#> [[4]]
#> [1] FALSE
#> 
#> [[5]]
#> [1] FALSE
#> 
#> [[6]]
#> [1] TRUE
#> 
#> [[7]]
#> [1] TRUE
#> 
#> [[8]]
#> [1] TRUE
#> 
#> [[9]]
#> [1] TRUE
#> 
#> [[10]]
#> [1] TRUE

This works great as is, but sometimes you might not need to create a new function to use inside of map(). Instead we could use an anonymous function with \() (or either function() or ~).

# Anonymous function
map(numbers, \(x) if (x > 5) TRUE else FALSE)
#> [[1]]
#> [1] FALSE
#> 
#> [[2]]
#> [1] FALSE
#> 
#> [[3]]
#> [1] FALSE
#> 
#> [[4]]
#> [1] FALSE
#> 
#> [[5]]
#> [1] FALSE
#> 
#> [[6]]
#> [1] TRUE
#> 
#> [[7]]
#> [1] TRUE
#> 
#> [[8]]
#> [1] TRUE
#> 
#> [[9]]
#> [1] TRUE
#> 
#> [[10]]
#> [1] TRUE

C.2 Debugging functions

Debugging is one of activities that seems really scary and difficult, but once you try it and use it, is not nearly as intimidating as it seemed. To debug, which means to find and fix problems in your code, there are several ways, the simplest of which is by inserting the browser() function into the the start of your function, re-running the function by either manually running it or using source with Ctrl-Shift-S or with the Palette (Ctrl-Shift-P, then type “source”), and using it again.

For instance, we have a function like this:

test_debugging <- function(number) {
  number + number
}

To start the debugger, insert the browser() function into your function:

test_debugging <- function(number) {
  browser()
  number + number
}

And re-run and use the function again, which will pop up a new debugging panel in RStudio. Sadly, we can’t show this on the website since it only works in RStudio (we may add a video as some point). When you are in the debugger will open up and it will show a few things:

  • A yellow line will highlight the code in the function, along with a green arrow on the left of the line number.
  • The Console will now start with Browse[1]> and will have text like debug at ....
  • There will be new buttons on the top of the Console like “Next”, “Continue”, and “Stop”.
  • The Environment pane will be empty and will say “Traceback”.

In this mode you can really investigate what is happening with your code and how to fix it. The way to figure out what’s wrong is by running the code bit by bit. This debug environment is empty except for the actions that occur within it, so it really can help figure things out.

C.3 Non-standard evaluation (NSE)

Writing your own functions that use tidyverse functions, you may eventually encounter an error that might not be very easy to figure out. Here’s an example where you want to use your own arguments in filter().

test_nse <- function(data, filter_condition) {
  data |>
    dplyr::filter(filter_condition)
}

CO2 |>
  tibble() |>
  test_nse(conc > 100)
#> Error in `dplyr::filter()`:
#> ℹ In argument: `filter_condition`.
#> Caused by error:
#> ! object 'conc' not found

The error occurs of something called “non-standard evaluation” (or NSE). NSE is a feature of R and is used quite a lot throughout R (e.g. library()), but is especially used in the tidyverse packages. It’s one of the first things computer scientists complain about when they use R, because it is such a foreign thing in other programming languages. But NSE is what allows you to use formulas (e.g. y ~ x + x2 in modeling) or allows you to type out select(Gender, BMI) or library(purrr). In “standard evaluation”, these would instead be select("Gender", "BMI") or library("purrr"). So NSE gives flexibility and ease of use for the user (we don’t have to type quotes every time) when doing data analysis, but can give some headaches when programming in R, like when making functions. There’s more detail about this on the dplyr website, which will give a few options to deal with NSE while programming with tidyverse packages, the simplest of which is to wrap the argument with {{}}.

test_nse <- function(data, filter_condition) {
  data |>
    dplyr::filter({{ filter_condition }})
}

CO2 |>
  tibble() |>
  test_nse(conc > 100)
#> # A tibble: 72 × 5
#>    Plant Type   Treatment   conc uptake
#>    <ord> <fct>  <fct>      <dbl>  <dbl>
#>  1 Qn1   Quebec nonchilled   175   30.4
#>  2 Qn1   Quebec nonchilled   250   34.8
#>  3 Qn1   Quebec nonchilled   350   37.2
#>  4 Qn1   Quebec nonchilled   500   35.3
#>  5 Qn1   Quebec nonchilled   675   39.2
#>  6 Qn1   Quebec nonchilled  1000   39.7
#>  7 Qn2   Quebec nonchilled   175   27.3
#>  8 Qn2   Quebec nonchilled   250   37.1
#>  9 Qn2   Quebec nonchilled   350   41.8
#> 10 Qn2   Quebec nonchilled   500   40.6
#> # ℹ 62 more rows
CO2 |>
  tibble() |>
  test_nse(uptake < 30)
#> # A tibble: 43 × 5
#>    Plant Type   Treatment   conc uptake
#>    <ord> <fct>  <fct>      <dbl>  <dbl>
#>  1 Qn1   Quebec nonchilled    95   16  
#>  2 Qn2   Quebec nonchilled    95   13.6
#>  3 Qn2   Quebec nonchilled   175   27.3
#>  4 Qn3   Quebec nonchilled    95   16.2
#>  5 Qc1   Quebec chilled       95   14.2
#>  6 Qc1   Quebec chilled      175   24.1
#>  7 Qc2   Quebec chilled       95    9.3
#>  8 Qc2   Quebec chilled      175   27.3
#>  9 Qc3   Quebec chilled       95   15.1
#> 10 Qc3   Quebec chilled      175   21  
#> # ℹ 33 more rows