10  Create an R Project

One of the first, basic steps to ensuring a reproducible and modern data analysis workflow is to keep everything self-contained in a single location. In RStudio, this is done by using R Projects. Read all of this reading task from the introduction workshop to learn about R Projects and how they help keeping things self-contained. You don’t need to do any of the exercises or activities.

Caution

Depending on your institution’s policies or procedures, you may have some type of file synching software like OneDrive or Dropbox installed on your computer and your IT may have set it up so that (nearly) all your files and folders are synched through this software. Sometimes IT has set it up so it is fairly straightforward to not use it, but often, it can be extremely difficult to find a folder that isn’t being synched. This is usually the case for Windows users and less for MacOS users, but they can sometimes have these programs installed and setup for them too.

Unfortunately, these types of software can cause very subtle, small issues when working with Git, R, and RStudio that are very difficult to troubleshoot, for both you and for us as the teachers. They cause these subtle issues because they are constantly working to synchronize files and folders. This can sometimes cause conflicts with files when using Git and especially when doing more data intensive work that creates and deletes many files often. You may also notice that your computer is slower when working with R and RStudio while using these synched folders. That’s because of the constant synching of files. It isn’t the fault of R and RStudio, but is instead because of the synching software.

A related issue is when you store your files on a shared drive, like H: or E: or U: drives on Windows. This is again because your institution may have a procedure for using these folders to “more easily share with colleagues” or to keep everything on their servers. But these shared drives have a very similar issue as using Dropbox or OneDrive, because they also have to continually upload and download over the internet from your computer to your work’s remote server. Any time your computer needs to regularly communicate over the internet, like when using files stored on a server in the shared drives, things will be slower. So, you may feel like RStudio is slow, but it is actually because of the way your computer has been setup by your IT.

So in general, we tend to strongly recommend not creating your R projects in these synched folders or in these shared drives. We’ve learned that many, common problems we encounter during the workshops are because of these synched folders or shared drives. That’s why we will be getting you to create your R Project on your Desktop on your computer (not shared drive). Usually, but not always, this folder is not tracked by these synching programs.

So find the path to your Desktop folder at C:\Users\yourusername\Desktop\ for Windows users and /Users/yourusername/Desktop/ for MacOS and use that as the location for your R Project.

There are several ways to organise a project folder. You’ll be using the structure from the package prodigenr. The project setup can be done by either:

  1. Using RStudio’s New Project menu item: “File -> New Project -> New Directory”, scroll down to “Scientific Analysis Project using prodigenr” and name the project “LearnR3” in the Directory Name, saving it to the “Desktop” as described above with Browse. When you use this method, the new RStudio Project will automatically open up for you.
  2. Or, running the function prodigenr::setup_project("~/Desktop/LearnR3") in the R Console. The ~ is a shortcut for your home directory, so this function will create a new folder called LearnR3 on your Desktop. When you use this method, you will need to use your file manager to navigate to the Desktop folder and open the LearnR3.Rproj file inside of the newly created LearnR3/ folder.

After opening the new RStudio Project, run this next function in the R Console to create a new R script file called functions.R in the R/ folder. This is where you will be moving all the functions that you will create during the workshop.

Console
usethis::use_r("functions", open = FALSE)

Here you use the usethis package to help set things up. usethis is an extremely useful package for managing R Projects and we highly recommend checking it out more to see how you can use it more in your own work.

10.1 Quarto

We teach and use Quarto (which is a more powerful, next generation version of R Markdown) because it is one of the very first steps to being reproducible and because it is a very powerful tool for doing data analysis. If you haven’t used Quarto before or would like a refresher, please read over the Quarto section of the introduction workshop.

If you are not already in the LearnR3 project, open it up by either clicking the LearnR3.Rproj file in the LearnR3/ folder or by using the “File -> Open Project” menu. Then, run the two functions below in the Console when RStudio is in the LearnR3 project, which will create two new files called learning.qmd and cleaning.qmd in the docs/ folder. The learning.qmd file is where you will be building up your code, prototyping functions, writing notes to yourself, and playing around. The cleaning.qmd file is where you will be moving code that works and using the functions you’ll make during the workshop, to ultimately make a final working dataset.

Throughout the workshop, you will use the docs/learning.qmd document to build up and run your code, as well as to prototype the functions you’ll create before you move them into the R/functions.R file and ultimately create the main file for processing and cleaning the dataset in the docs/cleaning.qmd document.