Want to help out or contribute?

If you find any typos, errors, or places where the text may be improved, please let us know by providing feedback either in the feedback survey (given during class), by using GitLab, or directly in this document with hypothes.is annotations.

  • Open an issue or submitting a merge request on GitLab.
  • Hypothesis Add an annotation using hypothes.is. To add an annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the in the upper right-hand corner of the page.

2 Course syllabus

Reproducibility and open scientific practices are increasingly demanded of, and needed by, scientists and researchers in our modern research environments. We increasingly produce larger and more complex amounts of data that often need to be heavily cleaned, reorganized, and processed before it can be analyzed. This data processing often consumes the majority of the time spent coding and doing data analysis. And even though this stage of data analysis is so time-consuming, there is little to no training and support provided for it. This has led to minimal attention, scrutiny, and rigour in describing, detailing, and reviewing these procedures in studies, and contributes to the systemic lack of code sharing among researchers. All together, this aspect of research is often completely hidden and is likely to be the source of many irreproducible results.

With this course, we aim to begin addressing this gap. Using a highly practical approach that revolves around code-along sessions (instructor and learner coding together), hands-on exercises, and group work, participants of the course will be able to:

  1. Learn and demonstrate what an open and reproducible data analysis workflow looks like.
  2. Learn and apply the fundamental tools and skills for conducting a reproducible and modern analysis for a research project.
  3. Apply programming techniques to process and manage data in a reproducible and well-documented way.
  4. Learn where to go to get help and to continue learning modern data analysis skills.

The course will enable participants to answer questions such as:

  • What does a modern data analysis setup and workflow look like?
  • How can I ensure that my data analysis project is reproducible?
  • How can I create pipelines that get, process, and clean my data quickly and that works regardless of whether there is one data file or hundreds of data files (i.e. it scales well)?
  • How can I write code that is more reproducible, readable, and that can be easily re-used for my future self and for my collaborators and colleagues?

By the end of the course, participants will: have improved their competency in processing and wrangling datasets; have improved their proficiency in using the R statistical computing language; know how to write re-usable and well-documented code; and know how to make modern and reproducible data analysis projects.

2.1 Is this course for you?

This course is designed a specific way and is ideal for you if:

  • You are a researcher, preferably working in the biomedical field (ranging from experimental to epidemiological). Specifically, this course targets those working in diabetes and metabolism.
  • You currently or will soon do some quantitative data analysis.
  • You either:
    • have taken the introduction to R course offered in June by the DDA, since this course is a natural extension to that one;
    • know a little to a moderate amount of R (or computing in general);
    • know how to use R and have some familiarity with the tidyverse and RStudio.

Considering that this is a natural extension of the introduction to R course, I will be incorporating tools learned in that course, including basic Git usage as well as use of RStudio R projects. If you do not have familiarity with these tools, you will need to go over the material from the introduction course beforehand (more details about pre-course tasks will be sent out a couple of weeks before the course).

While I have these assumptions to help focus the content of the course, if you have an interest in learning R but don’t fit any of the above assumptions, you are still welcome to attend the course! We welcome everyone, that is until the course capacity is reached.

During the course, we will:

  • learn how to use R, specifically those in the mid-beginner to early-intermediate level
  • focus only on the data processing and cleaning stage of a data analysis project
  • teach from a reproducible research and open scientific perspective (e.g. by making use of Git)
  • be using practical, applied, and hands-on lessons and exercises

And we will not learn:

  • the basics of using R and RStudio
  • statistics (these are already covered by most university curriculum)

2.2 Schedule

The workshop is structured as a series of participatory live-coding sessions (instructor and learner coding together) interspersed with hands-on exercises and group work, using either a practice dataset or some other real-world dataset. There are some lectures given, mainly at the start and end of the workshop. The general schedule outline is shown in the below table.

Table 2.1: Overall course schedule for the 2 days.
Date and time Session topic
Day 1
9:30 Arrival; coffee and snacks
10:00 Introduction to the course
11:00 Importing data, fast! (with short break)
12:30 Lunch
13:30 Save time, don’t repeat yourself
14:45 Coffee break and snacks
15:00 Save time, don’t repeat yourself (with short break)
17:30 End of day survey
TBA Dinner together
Day 2
8:30 Processing datasets for cleaning
10:00 Coffee break and snacks
10:15 Processing datasets for cleaning (with short break)
12:00 Lunch
13:00 Workflow to analyzing your tidy data
14:45 Coffee break and snacks
15:00 Workflow to analyzing your tidy data (with short break)
16:15 What next? Applying open and reproducible practices in real-life
16:30 Closing remarks and end of day survey
16:45 Farewell

2.3 Location

The course will be taking place at Aarhus University in two different buildings:

  • Day 1, Sept. 8th: Building 1264, room 209

  • Day 2, Sept. 9th: Building 1231, room 228