Welcome to the Intermediate R3 workshop!

  • ✔️ Pick a group from the basket and go to that table
  • ✔️ Introduce yourself to your table mates
  • ✔️ Open your LearnR3.Rproj RStudio project

Introduce teachers and helpers.

Motivation for this workshop

How do you exactly do data analysis? What’s the workflow? 🤔

When I started out doing research during my Masters, I always wondered how do researchers go about doing data analysis… what was their workflow exactly like. No one ever really taught that. Even on online tutorials, mostly pieces of code or how to use code are taught… but never teaching the bigger picture… How do researchers in their daily work write R code and do their data analysis?

This question is the reason why the overall workflow is the primary focus in this workshop and partly with the beginner one. I try to focus on the bigger picture and the overall workflow you would do for doing data analysis.

So, why is it that there isn’t much information on how researchers do data analysis?

🤷😕…Because code sharing is almost non-existent in science

Very few papers provide code (2)

It’s because code sharing basically doesn’t exist in the vast majority of scientific fields. You’ve probably read a methods section in a paper and wondered how exactly they did it.

Code sharing: From scientific principle of “reproducibility”

…often confused with “replicability” (3) (see also American Statistical Association statement)

Replicability

  • Repeating a study by independently performing another identical study
  • Linked to the “irreproducibility crisis” (aka “irreplicability crisis”)

Reproducibility

  • Generating the exact same results when using the same data and code
  • Question: If we can’t even reproduce a study’s results, how can we expect to replicate it?

How can we check reproducibility if no code is given?

This is a little bit of a rhetorical question 😝

Very low reproducibility in most of science (4)

Even in institutional code and data archive, executability is low! (5)

  • Code taken from Harvard Dataverse Project data repositories
  • Only 25% could be executed without some “cleaning up”
  • After some automatic cleaning, ~50% could execute

Recent large study on general reproducibility of projects that shared code.

Initially only 25% of the R scripts could be executed (doesn’t mean results were reproduced though). After doing automatic and some manual code cleaning, than about half could be executed. That’s not bad.

Since scripts were taken from Dataverse.org, researchers who upload their code and projects to it probably are a bit more aware and knowledgeable about general reproducibility and coding then the average researcher, so the results are a bit biased.

Scientific culture is not well-prepared for analytic and computation era

These issues can be fixed by creating and nurturing a culture of openness

All of this is because of a problem with our culture in research. We aren’t open, we don’t really share, and don’t often follow basic principles of science. To fix this, we need to start creating and nurturing a better and healthier culture. We all can be involved in that, we all have that power to do something, even if its small thing.

Goal of this course: Start changing the culture by providing the training

References

1.
Leek JT, Jager LR. Is most published research really false? Annual Review of Statistics and Its Application. 2017 Mar;4(1):109–22.
2.
Considine EC, Thomas G, Boulesteix AL, Khashan AS, Kenny LC. Critical review of reporting of the data analysis step in metabolomics. Metabolomics. 2017 Dec;14(1).
3.
Plesser HE. Reproducibility vs. Replicability: A brief history of a confused terminology. Frontiers in Neuroinformatics. 2018 Jan;11.
4.
Samuel S, Mietchen D. Computational reproducibility of jupyter notebooks from biomedical publications. GigaScience. 2024;13.
5.
Trisovic A, Lau MK, Pasquier T, Crosas M. A large-scale study on research code quality and execution. Scientific Data. 2022 Feb;9(1).
Welcome to the Intermediate R 3 workshop! ✔️ Pick a group from the basket and go to that table ✔️ Introduce yourself to your table mates ✔️ Open your LearnR3.Rproj RStudio project

  1. Slides

  2. Tools

  3. Close
  • Welcome to the Intermediate R3 workshop!
  • Motivation for this workshop
  • How do you exactly do data analysis? What’s the workflow? 🤔
  • 🤷😕…Because code sharing is almost non-existent in science
  • Code sharing: From scientific principle of “reproducibility”
  • How can we check reproducibility if no code is given?
  • Very low reproducibility in most of science (4)
  • Even in institutional code and data archive, executability is low! (5)
  • Scientific culture is not well-prepared for analytic and computation era
  • These issues can be fixed by creating and nurturing a culture of openness
  • Goal of this course: Start changing the culture by providing the training
  • References
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • r Scroll View Mode
  • ? Keyboard Help