class: center, middle, inverse, title-slide .title[ # Welcome to the Intermediate R
course! ] --- # Welcome! Please do this as we get ready: <!-- Introduce instructors and helpers. --> .pull-left[ - ✅ Go to your assigned table and group (see list) - ✅ Introduce yourself to your group members - ✅ Open your `LearnR3.Rproj` RStudio project - ✅ Make sure your `data-raw/` looks like the thing on the side - **NO** other files should be there besides these (some of you have other files and folders) - If not, re-do the pre-course download data tasks and ask us for help ] .pull-right[ ```text data-raw/ ├── README.md ├── mmash │ ├── user_1 │ ├── user_10 │ ├── user_11 │ ├── ... │ ├── user_7 │ ├── user_8 │ └── user_9 ├── mmash-data.zip └── mmash.R ``` ] --- class: middle # Motivation for this course ## How do you *exactly* do data analysis? What's the workflow? 🤔 ??? When I started out doing research during my Masters, I always wondered *how* do researchers go about doing data analysis... what was their workflow *exactly* like. No one ever really taught that. Even on online tutorials, mostly pieces of code or how to use code are taught... but never teaching the bigger picture... How do researchers in their daily work write R code and do their data analysis? This question is the reason why the overall workflow is the primary focus in this course and partly with the beginner one. I try to focus on the bigger picture and the overall workflow you would do for doing data analysis. So, why is it that there isn't much information on how researchers do data analysis? --- class: middle # 🤷...Because code sharing is almost non-existent in science .footnote[Very few papers provide code <a name=cite-Leek2017a></a><a name=cite-Considine2017a></a>[(https://doi.org/10.1146/annurev-statistics-060116-054104); (https://doi.org/10.1007/s11306-017-1299-3)]] ??? It's because code sharing basically doesn't exist in the vast majority of scientific fields. You've probably read a methods section in a paper and wondered how exactly they did it. --- ## Code sharing: From scientific principle of "reproducibility" ...often confused with "replicability" <a name=cite-Plesser2018a></a>[(https://doi.org/10.3389/fninf.2017.00076)]<sup>1</sup> -- .pull-left[ ### Replicability - Repeating a study by *independently* performing another identical study - Linked to the "irreproducibility crisis"<sup>2</sup> ] -- .pull-right[ ### Reproducibility - Generating the exact same results when using the same data and code - *Question*: If we can't even *reproduce* a study's results, how can we expect to replicate it? ] .footnote[ 1. Also from an American Statistical Association [statement](https://www.amstat.org/asa/files/pdfs/POL-ReproducibleResearchRecommendations.pdf). 2. Or rather "irreplicability crisis". ] --- class: middle # These issues can be fixed by creating and nurturing a culture of openness --- class: middle # Goal of this course? Start changing the culture by providing the training --- # Course setup and layout - Course is mix of: - "Code-alongs" (we type and explain, you type along) - Hands-on coding, discussing, and reading exercises - Dedicated practice time (potentially on your own data) - Use r3 package to help with learning - True to our mission, material publicly accessible and [openly licensed](https://r-cubed-intermediate.rostools.org/license.html) (<https://r-cubed-intermediate.rostools.org/>) - [Resources Appendix](https://r-cubed-intermediate.rostools.org/resources.html) - Material for further learning - Useful R packages to use ??? Explain more why doing the reading exercises... --- ## Getting or asking for help 🙋 .pull-left[ - Put the sticky on your laptop to get help - There are lots of helpers - Team members, help out too ] .pull-right[ - We're all learning here! - This is a supportive and safe environment - Remember our [Code of Conduct](conduct.html) ] --- ## Practice using stickies: Does your `data-raw/` look as this? ```text data-raw/ ├── README.md ├── mmash │ ├── user_1 │ ├── user_10 │ ├── user_11 │ ├── ... │ ├── user_7 │ ├── user_8 │ └── user_9 ├── mmash-data.zip └── mmash.R ``` --- class: middle, center ## Activity: Stand and arrange based on question ??? We're going to do a "stand and re-arrange yourself" activity based on some questions I ask. --- class: middle, center ## How do you perceive your skill in R? ??? Along the wall, arrange to one side is "novice/basic" and other side is "advanced". --- class: middle, center ## Who has had formal training in specifically *coding* in R or in general? ??? Staying where you are, raise your hand if you would answer yes to this question. If it was part of a statistics course, it doesn't really count. --- class: middle, center ## Who has struggled with using R? ??? Again, staying where you are, raise your hand if you've struggled or still struggle with R? --- class: middle, center # Now back to your seat --- class: middle, center ## Who has seen or worked with "true" data analysis pipelines (e.g. run a single command and *everything* gets re-done)? ??? More for my personal interest. The word pipeline gets thrown around a lot, but how many have encountered, made, or used true pipelines? Automated from beginning to end. --- class: middle, center ## For those that have done or are doing data processing, does it take up a lot of time? How much? ??? Question, mostly for my own information... --- class: middle, center # We are always looking to improve the course ## So your feedback is so helpful and important 😁 ??? your feedback will improve this and we're collaborating together. You're learning, but also helping to improve the material and in how we teach you. What makes sense, what is confusing, and so on. --- # References, 1 ``` ## Warning in `[[.BibEntry`(x, ind): subscript out of bounds ``` <a name=bib-Leek2017a></a>[](#cite-Leek2017a) J. T. Leek and L. R. Jager. "Is Most Published Research Really False?" In: _Annual Review of Statistics and Its Application_ 4.1 (Mar. 2017), pp. 109-122. DOI: [10.1146/annurev-statistics-060116-054104](https://doi.org/10.1146%2Fannurev-statistics-060116-054104). <a name=bib-Considine2017a></a>[](#cite-Considine2017a) E. C. Considine, G. Thomas, et al. "Critical Review of Reporting of the Data Analysis Step in Metabolomics". In: _Metabolomics_ 14.1 (Dec. 2017). DOI: [10.1007/s11306-017-1299-3](https://doi.org/10.1007%2Fs11306-017-1299-3). <a name=bib-Plesser2018a></a>[](#cite-Plesser2018a) H. E. Plesser. "Reproducibility Vs. Replicability: A Brief History of a Confused Terminology". In: _Frontiers in Neuroinformatics_ 11 (Jan. 2018). DOI: [10.3389/fninf.2017.00076](https://doi.org/10.3389%2Ffninf.2017.00076). --- # References, 2 ``` ## Warning in `[[.BibEntry`(x, ind): subscript out of bounds ``` <a name=bib-Plesser2018a></a>[](#cite-Plesser2018a) H. E. Plesser. "Reproducibility Vs. Replicability: A Brief History of a Confused Terminology". In: _Frontiers in Neuroinformatics_ 11 (Jan. 2018). DOI: [10.3389/fninf.2017.00076](https://doi.org/10.3389%2Ffninf.2017.00076).