Welcome to the Intermediate R3 workshop!

  • ✔️ Accept the GitHub Organization invite
  • ✔️ Pick a group from the basket and go to that table
  • ✔️ Introduce yourself to your table mates
  • ✔️ Open your LearnR3.Rproj RStudio project

Motivation for this workshop

How do you exactly do data processing and analysis? What’s the actual workflow? 🤔

🤷😕…Because code sharing is almost non-existent in science

Very few papers provide code Considine et al. (2017)

Code sharing: From scientific principle of “reproducibility”

…often confused with “replicability” (Plesser 2018) (see also American Statistical Association statement)

Replicability

  • Repeating a study by independently performing another identical study
  • Linked to the “irreproducibility crisis” (aka “irreplicability crisis”)

Reproducibility

  • Generating the exact same results when using the same data and code
  • Question: If we can’t even reproduce a study’s results, how can we expect to replicate it?

How can we check reproducibility if no code is given?

This is a little bit of a rhetorical question 😝

Very low reproducibility in most of science Trisovic et al. (2022)

Scientific culture is not well-prepared for analytic and computation era

Especially with the rise of “AI” tools

With “AI” tools, generating (or “vibing”) lots of code is easy… but…

Writing code has never been the hard part

Designing, knowing what to do, planning, making decisions with trade-offs, etc have always been the hard parts.

LLM tools are just tools, that predict an output based on an average of their data

If you aren’t experienced in code, how can you assess it?

These issues can be fixed by creating and nurturing a culture of openness

Goal of this course: Start changing the culture by providing the training

Workshop overview

Where does this course fit in the research lifecycle?

flowchart LR
  Collection --> Processing:::focus --> Analysis --> Writing --> Publishing
  classDef focus stroke:#f96,stroke-width:4px;

Setup and layout

  • Workshop is mix of:
    • “Code-alongs” (we type and explain, you type along)
    • Hands-on coding, discussing, and reading exercises
    • Extra hands-on exercises (we’ll move on once everyone finished the main ones)
    • A team project at the end
  • Less about coding, more about:
    • Connecting and collaborating with others
    • Thinking about designing first, coding second
  • Networking and connecting:
    • Draw table name from bowl each morning
    • Introduce yourself to those at your table
    • During lunch, sit beside someone you don’t know
    • Discussion activities
  • Feedback survey for every session

Getting or asking for help 🙋‍♀️

  • Put the origami hat 🎩 on your laptop to get help
  • There are lots of helpers
  • Table mates, help out too
  • We’re all learning here!
  • This is a supportive and safe environment
  • Remember our Code of Conduct

Practice using origami hats: Have you opened RStudio to LearnR3.Rproj?

Activities

🚶‍♀️🚶‍♂️ How do you perceive your skill in R?

🚶‍♀️🚶‍♂️ What percent of time do you spend doing data processing (if you do any)?

🚶‍♀️🚶‍♂️ Do you wish you did more or less data processing?

Alright, return to your seats! 🪑

References

Considine, E. C., G. Thomas, A. L. Boulesteix, A. S. Khashan, and L. C. Kenny. 2017. “Critical Review of Reporting of the Data Analysis Step in Metabolomics.” Metabolomics 14 (1). https://doi.org/10.1007/s11306-017-1299-3.
Leek, Jeffrey T., and Leah R. Jager. 2017. “Is Most Published Research Really False?” Annual Review of Statistics and Its Application 4 (1): 109–22. https://doi.org/10.1146/annurev-statistics-060116-054104.
Plesser, Hans E. 2018. “Reproducibility Vs. Replicability: A Brief History of a Confused Terminology.” Frontiers in Neuroinformatics 11 (January). https://doi.org/10.3389/fninf.2017.00076.
Samuel, Sheeba, and Daniel Mietchen. 2024. “Computational Reproducibility of Jupyter Notebooks from Biomedical Publications.” GigaScience 13. https://doi.org/10.1093/gigascience/giad113.
Trisovic, Ana, Matthew K. Lau, Thomas Pasquier, and Mercè Crosas. 2022. “A Large-Scale Study on Research Code Quality and Execution.” Scientific Data 9 (1). https://doi.org/10.1038/s41597-022-01143-6.