4  Introduction to workshop

Warning

🚧 We are doing major changes to this workshop, so much of the content will be changed. 🚧

Introduction slides

The slides contain speaking notes that you can view by pressing β€˜S’ on the keyboard.

4.1 πŸ“– Reading task: The big picture

Go over the text again, to reinforce the workflows shown in the diagrams and, briefly, the reasons why we do certain things. Especially reinforce why we have the separation between learning.qmd and functions.R.

Time: ~15 minutes.

This section provides an bigger picture view of what we will be doing, why we want to do it, how we will be going about doing it, and what it will look like in the end. So the overall practical aim of this workshop is:

To apply a reproducible and programmatic approach to efficiently downloading a dataset that has multiple data files, then processing and cleaning them all, and saving them as a single data file to use for later analysis.

In our case, during the workshop we want to process the DIME data so that we have a single dataset to work with for later (hypothetical) analyses. Which can be visually summarised as:

data-raw/cgm/101.csv

data-raw/cgm/*.csv

data-raw/cgm/127.csv

R code to
process data

Dataframe
for analysis

Figure 4.1: Big picture aim of the workshop.

We also want to make sure that whatever processing we do to the data is reproducible, so we’ll be writing the code in a way that enables and enforces reproducibility of the code.

So how specifically will we do this? At the highest level, everything revolves around files and folders. Below is what the project currently looks like and what it will look like at the end of the workshop. Notice, the only differences here are the addition of a new file in the data/ folder and a new HTML file in the docs/ folder. The new file in the data/ folder is the final dataset we will be creating, and the new HTML file in the docs/ folder is the rendered Quarto document that we will be creating.

Currently looks like:

LearnR3
β”œβ”€β”€ data/
β”‚   └── README.md
β”œβ”€β”€ data-raw/
β”‚   β”œβ”€β”€ README.md
β”‚   β”œβ”€β”€ dime.zip
β”‚   β”œβ”€β”€ dime/
β”‚   β”‚  β”œβ”€β”€ cgm/
β”‚   β”‚  β”œβ”€β”€ sleep/
β”‚   β”‚  └── participant_details.csv
β”‚   └── dime.R
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ README.md
β”‚   └── learning.qmd
β”œβ”€β”€ R/
β”‚   β”œβ”€β”€ functions.R
β”‚   └── README.md
β”œβ”€β”€ .gitignore
β”œβ”€β”€ DESCRIPTION
β”œβ”€β”€ LearnR3.Rproj
β”œβ”€β”€ README.md
└── TODO.md

End of workshop:

LearnR3
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ README.md
β”‚   └── dime.csv <- Added
β”œβ”€β”€ data-raw/
β”‚   β”œβ”€β”€ README.md
β”‚   β”œβ”€β”€ dime.zip
β”‚   β”œβ”€β”€ dime/
β”‚   β”‚  β”œβ”€β”€ cgm/
β”‚   β”‚  β”œβ”€β”€ sleep/
β”‚   β”‚  └── participant_details.csv
β”‚   └── dime.R
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ README.md
β”‚   β”œβ”€β”€ learning.html <- Added
β”‚   └── learning.qmd
β”œβ”€β”€ R/
β”‚   β”œβ”€β”€ functions.R
β”‚   └── README.md
β”œβ”€β”€ .gitignore
β”œβ”€β”€ DESCRIPTION
β”œβ”€β”€ LearnR3.Rproj
β”œβ”€β”€ README.md
└── TODO.md

This structure, particularly the use of the .Rproj file and the DESCRIPTION file (that we will cover in this workshop) are part of a β€œproject-based workflow” that is commonly used. We use this type of workflow because we want to follow some standard conventions in R, like having a DESCRIPTION metadata file, keeping raw data in the data-raw/ folder, and keeping (most) R scripts in the R/. We also want to keep things structured to make it easier for others and ourselves to reproduce the work. Below are sections for the different workflows we will be using in the workshop.

Download or copy data to data-raw/, only change with R

In general, you want to store your original raw data that you’d use for your project within a folder in your R Project and never change it. This is because you want to keep the original data intact, so you can always go back to it if you need to. Since the raw data is rarely directly used for analyses, you’d use R to process the data and save the processed data to a new file that you’d then use for your analysis. This workflow is visually represented below in

In R Projects, the common convention is to keep the raw data in the data-raw/ folder, which is what you’ve already done during the pre-workshop tasks. This is where you would download or copy the data files you need for your project.

Download
or copy

Read only

Save

Original
data file(s)

data-raw/
folder

Process data with R,
raw data not changed

data/
folder

Figure 4.2: Workflow for downloading or copying data to the data-raw/ folder and processing it into the data/ folder.

Use Quarto as a sandbox to prototype code

Quarto is an incredibly powerful tool when working to ensure your code is reproducible. For this workshop, the Quarto document we use is at docs/learning.qmd.

With Quarto, it’s easy to quickly test code out and make sure it works. Think of using the Quarto file as a sandbox to test out and play with code, without fear of messing things up. And with Quarto, when you render the document to, for example, HTML and it runs all the code from beginning to end in a new environment (so completely clean). Which means you can test a part of the reproducibility of your code. Because of this feature, it allows you to detect errors and issues with the code, making Quarto an excellent prototyping tool. The workflow is represented below in .

Check
reproducibility

Continue
or fix issues

Quarto:
docs/learning.qmd

Prototyping
R code

Render

HTML file:
docs/learning.html

Figure 4.3: Workflow for using Quarto to prototype code, assess reproducibility, and fix issues.

For this workshop, we also use Quarto because it’s easier from a teaching perspective. You as the learner can easily mix together text and notes to yourself as you learn, alongside the code you write out during the code-alongs and the exercises. We also use it to force you to practice working within Quarto documents, which is a key component to reproducibility, in a learning and supportive environment.

Use R/functions.R to keep stable, tested code

While you use Quarto to test out and prototype code, you’ll use R scripts like R/functions.R to keep the code you have tested out already and are fairly confident that it works as you want it to. This workflow, of creating code and converting it into a function, is called a β€œfunction-based workflow”. This is an incredibly common workflow in R projects and forms the basis for many other workflows and tools, such as ones that are covered in the advanced workshop.

So you’ll use Quarto (docs/learning.qmd) to write and test out code, convert it into a function (that we will cover in this workshop), and then move it into R/functions.R script. We have this split to cognitively and physically have a separation between prototyping code and keeping finalized, tested code. Then, within the Quarto document we can source() the R/functions.R script so we have access to the stable and tested code. This workflow is represented below in .

Cut & paste
Commit to Git

source()

Quarto:
docs/learning.qmd

Prototyping
R code

Testing that
code works

R/functions.R

Figure 4.4: Workflow for prototyping code in Quarto, moving it to an R script, then sourcing the script from Quarto.

Workflow for prototyping code

At the level of the code, the way you prototype code is to:

  1. Write it out in Quarto so that does what you want.
  2. Convert that code into a function.
  3. Test that the function works either in the Quarto document or in the R Console.
  4. Fix the function if it doesn’t work.
  5. Restart the R console with Ctrl-Shift-F10 or with the Palette (Ctrl-Shift-P, then type β€œrestart”) or render with Ctrl-Shift-K or with the Palette (Ctrl-Shift-P, then type β€œrender”) to test that the function works.
  6. Whenever the function works, add and commit the changes to the Git history with Ctrl-Alt-M or with the Palette (Ctrl-Shift-P, then type β€œcommit”).

Commit

As needed

As needed

Write code
in Quarto

Convert to
function

Test function in
Quarto or Console

Fix function

Git history

Either:

Restart R
session

Render
Quarto

Figure 4.5: Workflow for prototyping code in Quarto, converting to a function, testing it, rendering or restarting, and committing to Git.

Either restarting R or rendering the Quarto document is the only way there is to be certain the R workspace is in a clean state. When code runs after a clean state, it improves the chances your code and project will be reproducible.

We use Git because it is the best way of keeping track of what was done to your files, when, and why. It helps to keep your work transparent and makes it easier for you to share your code by uploading to GitHub. Using version control should be a standard practice to doing better science since it fits with the philosophy of doing science (e.g., transparency, reproducibility, and documentation).

Note

While we covered GitHub in the introductory workshop, we can’t assume everyone will have taken that workshop. Because of that, we won’t be using GitHub in this workshop.

Sticky/hat up!

When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the teacher πŸ‘’ 🎩

4.2 πŸ§‘β€πŸ’» Exercise: What types of workflows do you use?

Time: ~8 minutes.

The process of learning is about taking the content you just learned and trying to integrate it into your own context and situation, as well as talking about it to enforce it into your brain. So:

  • Take 2 minutes to think about the workflows you use in your own work and how they compare to the ones you’ve read about above and that we all went over briefly. Try to be exact and specific in what you do and how you do it.
  • Then, with your neighbour, take 6 minutes where each of you share what you’ve thought about and discuss it. How do both of your ways of working compare to each other and to what was described above?
Sticky/hat up!

When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the teacher πŸ‘’ 🎩