Welcome to the Intermediate R3 course!

  • ✔️ Pick a group from the basket and go to that table
  • ✔️ Introduce yourself
  • ✔️ Open your LearnR3.Rproj RStudio project
  • ✔️ Check that your data-raw/ looks like that on the right
data-raw/
├── README.md
├── mmash
│   ├── user_1
│   ├── user_10
│   ├── user_11
│   ├── ...
│   ├── user_7
│   ├── user_8
│   └── user_9
├── mmash-data.zip
└── mmash.R

Motivation for this course

How do you exactly do data analysis? What’s the workflow? 🤔

🤷😕…Because code sharing is almost non-existent in science

Very few papers provide code (2)

Code sharing: From scientific principle of “reproducibility”

…often confused with “replicability” (3) (see also American Statistical Association statement)

Replicability

  • Repeating a study by independently performing another identical study
  • Linked to the “irreproducibility crisis” (aka “irreplicability crisis”)

Reproducibility

  • Generating the exact same results when using the same data and code
  • Question: If we can’t even reproduce a study’s results, how can we expect to replicate it?

These issues can be fixed by creating and nurturing a culture of openness

Goal of this course? Start changing the culture by providing the training

Course details

Setup and layout

  • Course is mix of:
    • “Code-alongs” (we type and explain, you type along)
    • Hands-on coding, discussing, and reading exercises
    • Dedicated practice time (potentially on your own data)
  • All material is online (and openly licensed)
  • Resources Appendix
    • Material for further learning
  • Reading tasks are “callout” blocks marked by the blue line on the left side of the text
  • Schedule listed is a guide only, some sessions are longer, others shorter
  • Less about coding, more about connecting with others
    • During lunch, try to sit beside someone you don’t know
    • Several networking activities (usually after lunch)
  • Feedback collected at end of every day, it is so helpful and important for improving the course 😁

Getting or asking for help 🙋‍♀️🙋‍♂️

  • Put the sticky on your laptop to get help
    • There are lots of helpers
    • Team members, help out too
  • We’re all learning here!

Practice using stickies: Does your data-raw/ look as this?

data-raw/
├── README.md
├── mmash
│   ├── user_1
│   ├── user_10
│   ├── user_11
│   ├── ...
│   ├── user_7
│   ├── user_8
│   └── user_9
├── mmash-data.zip
└── mmash.R

Activities

🚶🚶‍♀️ How do you perceive your skill in R?

🚶🚶‍♀️ Who has had formal training in specifically coding in R or in general?

🚶‍♀️🚶 Who has struggled with using R?

🙋 Who has seen or worked with “true” data analysis pipelines (e.g. run a single command and everything gets re-done)?

💬 For those that have done or are doing data processing, does it take up a lot of time? How much?

Return to your seats! 🪑

References

1.
Leek JT, Jager LR. Is most published research really false? Annual Review of Statistics and Its Application. 2017 Mar;4(1):109–22.
2.
Considine EC, Thomas G, Boulesteix AL, Khashan AS, Kenny LC. Critical review of reporting of the data analysis step in metabolomics. Metabolomics. 2017 Dec;14(1).
3.
Plesser HE. Reproducibility vs. Replicability: A brief history of a confused terminology. Frontiers in Neuroinformatics. 2018 Jan;11.