If you find any typos, errors, or places where the text may be improved, please let us know by providing feedback either in the feedback survey (given during class) or by using GitHub.
On GitHub open an issue or submit a pull request by clicking the " Edit this page" link at the side of this page.
Intermediate Reproducible Research in R
An intermediate workshop on modern approaches and workflows to processing data
Welcome!
Reproducibility and open scientific practices are increasingly demanded of, and needed by, scientists and researchers in our modern research environments. As our tools for generating data become more sophisticated and powerful, we also need to start using more sophisticated and powerful tools for processing it. Training on how to use these tools and how to build modern data analysis skills is lacking for researchers, even though this work is highly time-consuming and technical. As a consequence of an unawareness of the need for these skills, how exactly data is processed is poorly, if at all, described in scientific studies. This hidden aspect of research could have major impacts on the reproducibility of studies. Therefore, this course was created specifically to start addressing these types of problems.
The course is designed as a series of participatory live-coding lessons, where the instructor and learners code together, and is interspersed with hands-on exercises and group work using real-world datasets. This website contains all of the material for the course, from reading material to exercises to images. It is structured as a book, with “chapters” as lessons, given in order of appearance. We make heavy use of the website throughout the course where code-along sessions follow the material on the website nearly exactly (with slight modifications for time or more detailed explanations).
The course material was created using Quarto to write the lessons and create the book format, GitHub to host the Git repository of the material, and GitHub Actions with Netlify to build and host the website. The original source material for this course is found on the r-cubed-intermediate
GitHub repository.
Want to contribute to this course? Check out the README file as well as the CONTRIBUTING file on the GitHub repository for more details. The main way to contribute is by using GitHub and creating a new issue to make comments and give feedback for the material.
Target audiences
This website and its content are targeted to three groups:
- For the learners to use during the course, both to follow along in case they get lost and also to use as a reference after the course ends. The learner is someone who is currently or will soon actively be doing research (e.g. a PhD or postdoc), who is likely in biomedical research, and who has no or little knowledge on coding in R. A more detailed description of who the learner is can be found in 1.1 Is this course for you?.
- For the instructors to use as a guide for when they do the code-along sessions and lectures.
- For those who are interested in teaching, who may not have much experience or may not know where to start, to use this website as a guide to running and instructing their own workshops.
Re-use and licensing
The course is licensed under the Creative Commons Attribution 4.0 International License so the material can be used, re-used, and modified, as long as there is attribution to this source.
Acknowledgements
The course material draws inspiration from these excellent resources:
- R for Data Science
- Advanced R
- R Packages
- UofTCoders Reproducible Quantitative Methods for EEB
- Software and Data Carpentry workshop material
The Danish Diabetes and Endocrinology Academy hosted, organized, and sponsored this course. A huge thanks to them for their involvement, support, and sponsorship! Steno Diabetes Center Aarhus and Aarhus University employs Luke, who is the lead instructor and curriculum developer.