What next? Reproducibility in research

First thing: True reproducibility is basically impossible

Better: “reproducibility at dissemination / archiving” or “inspectability” (1)

Ask if learners can put their laptop lids down, so you have their full attention.

True reproducibility means being able to re-create the results on any computer at any point in the near future. But with different software versions, unmaintained or difficult to install dependencies, constant updates, differences between operating systems, and data sharing (which is a bit of a joke in health research). Reproducibility quickly becomes impossible. Hence why focusing on being reproducible at the time of publication or at the very least focus on ensuring your data analysis is “inspectable”, meaning that someone can come and read your code and understand the logic of what you are doing, even if they can’t directly reproduce the results.

Few share code within health sciences (8)

  • Estimating the reproducibility of scientific studies is currently very difficult because of:
    • Nearly non-existent publishing of code/data
    • General lack of awareness of and training in it

How can we check reproducibility if no code is given?

Possible role models as research groups: Jeff Leek and Ben Marwick. Or Steno Aarhus’ GitHub account!

But that doesn’t even matter, because we can’t have reproducibility if no one shares their code!

And this is no joke. Getting data on this is difficult, but the research that has been done shows that almost no one is sharing their code. The estimates range between fields in health science from zero to maybe five percent of published studies. The only area that is doing pretty well is bioinformatics, at about 60% of published studies.

We as researchers try to find our niche, make our own space in the research world. Sometimes its a real struggle to find that niche.. but guys! No one is doing this, no one is sharing their code! You start doing the simplest thing of sharing your code and you will be one of very very few people who do. And this isn’t a niche, this is a gaping hole in our modern scientific process. A huge hole.

Multiple benefits, from personal to philosophical

It’s a core principle of the scientific method: Verification

Learning more from others: For PhD students to senior researchers

More exposure and visibility: More output to show and be seen

So few are doing open science, this is a great niche!

Easier and quicker collaboration (aside from the learning part)

More productive as you can more easily work on your own project

Finding better opportunities outside of academia

And the last one is that one reason you don’t see a lot of researchers sharing their code or being more reproducible is.. they end up getting picked up by industry and paid really well or decide to leave academia because of the barriers.

Just as an example, I found a Norwegian group who had a really inefficient workflow and decided to re-build their workflow to make use of programming, to be reproducible, to have a pipeline. I looked up the lead author as well as several other of the co-authors and guess what… many of them now work in really great companies as data scientists or software engineers, probably making a lot of money and having potentially a less stressful life.

Strong institutional barriers, such as …

You will encounter a lot of resistance, a lot of barriers and hardship.

Lack of adequate awareness, support, infrastructure, training

At the institutional level, there is no real awareness of this, no support or infrastructure. You’re basically doing this on your own. Which probably isn’t that uncommon anyway.

Your organization is moving in the right direction to resolve this issue, but actions tell more than words.

Research culture values publications over all else

What would you spend your time on if we didn’t have this publication-obsession?

Research culture and incentives pretty much only care about publishing journal articles. Creating software tools or datasets to be shared, meh. Making teaching materials to help other researchers, meh. Communicating your science to the public and doing outreach, meh. Doing actual science that might take years and not lead to any “hard papers”, meh.

Imagine if the number of publications and where you published didn’t matter for getting funding or getting a research job. What would you spend your time on? What would you do differently compared to now?

Legal and privacy concerns about sharing data, intellectual property protection, patents

Legal and privacy concerns are big topics that institutions in particular focus on a lot, about ownership and so on, since research can lead to commercialization and the potential for profit. For individual researchers, we often worry about these concerns too much and sometimes stops us from doing work because we’re afraid we’re doing something wrong

Strong personal barriers like …

Fear of …

  • Fear of being scooped or ideas being stolen
  • Errors and public humiliation

And there aren’t just institutional barriers. We as researchers have fears of being scooped, of embarrassment and humiliation for your methods being gasp wrong. Which is actually just part of science.

Overwhelmed with everything that we should do better

Strongly rewarded to get things done, not get things right.

It is also really overwhelming, having so many things to think about to make sure you’re doing solid science. No researcher in the past had to consider and think and know as much as we have to know and to do. Another reason why we need more team science, to distribute the tasks and skills.

Need to constantly stay updated

You also have to constantly stay updated, and that can be tiring.

So… what you can do right now to be more open and reproducible?

Follow some core principles

  • Use open source tools wherever possible

  • Use plain text as often as possible

  • Upload and share publicly early and often (e.g. to GitHub or Zenodo)

  • Upload and share publicly as many things as possible

Use social actions to be more open

  • Do code/paper reviews through through GitHub

  • Require writing everything in Markdown

  • Agree on a standard folder and file structure for projects

Teach others!

It’s also a great way to learn 😉 😉

Lastly.. you can teach. Teach others. Use these teaching materials. Or get involved with this workshop next year. Or now. Actually, several participants from the beginner workshop are or will soon be helping to improve the material for next year. There are also so many other things you can get involved in, aside from this workshop. Let me know if you’re interested!

References

1.
Gil Y, David CH, Demir I, Essawy BT, Fulweiler RW, Goodall JL, et al. Toward the geoscience paper of the future: Best practices for documenting and sharing research from data to software to provenance. Earth and Space Science. 2016 Oct;3(10):388–415.
2.
Considine EC, Thomas G, Boulesteix AL, Khashan AS, Kenny LC. Critical review of reporting of the data analysis step in metabolomics. Metabolomics. 2017 Dec;14(1).
3.
Rauh S, Torgerson T, Johnson AL, Pollard J, Tritz D, Vassar M. Reproducible and transparent research practices in published neurology research. bioRxiv. 2019 Sep;
4.
Evans S, Fladie IA, Anderson JM, Tritz D, Vassar M. Evaluation of reproducible and transparent research practices in sports medicine research: A cross-sectional study. bioRxiv. 2019 Sep;
5.
Rauh SL, Johnson BS, Bowers A, Tritz D, Vassar BM. Evaluation of reproducibility in urology publications. bioRxiv. 2019 Sep;
6.
Hughes T, Niemann A, Tritz D, Boyer K, Robbins H, Vassar M. Transparent and reproducible research practices in the surgical literature. bioRxiv. 2019 Oct;
7.
Peng RD, Dominici F, Zeger SL. Reproducible epidemiologic research. American Journal of Epidemiology. 2006 Mar;163(9):783–9.
8.
Seibold H, Czerny S, Decke S, Dieterle R, Eder T, Fohr S, et al. A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses. Wicherts JM, editor. PLOS ONE. 2021 Jun;16(6):e0251194.
What next? Reproducibility in research

  1. Slides

  2. Tools

  3. Close
  • What next? Reproducibility in research
  • First thing: True reproducibility is basically impossible
  • Better: “reproducibility at dissemination / archiving” or “inspectability” (1)
  • Few share code within health sciences (8)
  • How can we check reproducibility if no code is given?
  • Multiple benefits, from personal to philosophical
  • It’s a core principle of the scientific method: Verification
  • Learning more from others: For PhD students to senior researchers
  • More exposure and visibility: More output to show and be seen
  • So few are doing open science, this is a great niche!
  • Easier and quicker collaboration (aside from the learning part)
  • More productive as you can more easily work on your own project
  • Finding better opportunities outside of academia
  • Strong institutional barriers, such as …
  • Lack of adequate awareness, support, infrastructure, training
  • Research culture values publications over all else
  • Legal and privacy concerns about sharing data, intellectual property protection, patents
  • Strong personal barriers like …
  • Fear of …
  • Overwhelmed with everything that we should do better
  • Need to constantly stay updated
  • So… what you can do right now to be more open and reproducible?
  • Follow some core principles
  • Use social actions to be more open
  • Teach others!
  • References
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • r Scroll View Mode
  • ? Keyboard Help