Since I recorded my previous online course three years ago, one of the topics that I didn't really appreciate at the time but now think is quite important, is Computational Reproducibility. The goal of computational reproducibility is to make sure that if you use the same data and the same code as someone else uses, you can reproduce their results. So you see somebody who published an article, and they make their data and code available, then using this data and code you can actually reproduce the results that are reported in their article. We can compare this with replication. In the case of a replication study, what we do is we collect new data and then we also see if we can find the same results as reported in earlier research. The goal with computational reproducibility really is to just take the same results and the same data and then observe the same result. So in many ways, I think that making sure that someone else can computationally reproduce the results that you report in a scientific article should be a good goal in any scientific workflow, a sort of a minimum standard that you should aim for but it's actually not that easy. And it requires quite a lot of training to make sure that you share data and code in a way that someone else can reuse it. Now making sure that other people can reuse your data and code, is quite important in terms of making it able to verify the results that you report in your article, but also to reuse the data that you collected for other purposes. It's important to realize that there is no single standard workflow to implement computational reproducibility. It really strongly depends on the kind of data that you collect and the software tools that you're currently using. Moreover, novel software solutions continuously emerge. This is a very active area and it's strongly technology driven. So it's likely they will work in slightly different ways in the future. For now, what I'll try is to give you some recommendations based on what I see around me. Now this is necessarily a little bit biased based on workflows that I'm most familiar with. But I'll try to make it as broadly relevant as possible. First of all, if you have a choice for different software packages to perform your analysis with or other kinds of tools, it's useful to consider open source software. Open source software is licensed in a way that makes it free to use by anyone. This, as a consequence, makes sure that it's widely accessible. If you want other people to be able to computationally reproduce your work, then they need the software packages that you used. Especially for people who don't have a lot of financial resources, making sure that the software is freely available is really an important factor in making sure that they can replicate and reuse your analysis. It's also sensible to try to incorporate a version control system in your workflow. Version control allows you to track all changes to files over time. You can store each different version and always go back to a previous version. This is really useful in the cases of where you made some sort of mistake or you introduced a change that was undesirable. So you can always come back and see exactly where this change was introduced. You can also review individual changes and accept them and see if they are really the changes that you want to incorporate. If you collaborate with others, you can also keep track of who made changes to files. There are two widely used platforms for version control. The first is GitLab, which is open source and the second is GitHub, which is currently owned by Microsoft. So there's some choices here to be made, although I have to say that personally I use GitHub because it nicely integrates with many other packages that I use. But there are some downsides. For example, recently, some scholars from specific countries lost access to GitHub due to boycotts. By now you've probably noticed that in this course we use R for statistical computations together with R Studio as a user-interface. Personally, I like this workflow but other solutions exist. For example, many people use Python for statistical computations. A benefit of the R environment is that it nicely integrates with, for example, GitHub as a version control system. But also, R Markdown exists, which makes it possible to write completely reproducible manuscripts. In these manuscripts, you combine text and data analysis. As you program, you keep these side-by-side and at the moment that you compile a document, you create the PDF file that you want to submit to a journal, the document actually takes the raw data, performs the calculations and then incorporates the answers or the results in the manuscript. So you never copy paste, but you generate all the analysis within the document. This is an example of an R Markdown file. You can see some normal readable text, for example, the mean reaction time (in seconds) of participants in the congruent condition. So you can normally read this and then there's a little bit of R code namely; we tell R to round the mean of a specific variable. This will be incorporated in the final document not as code, but just as a single number which you can normally read. But whenever you have access to the R Markdown document, it means you can always see how each individual number has been calculated from the raw data. So this makes the manuscript, the final report, perfectly reproducible as long as you have access to the raw data and you have access to the R Markdown script. Then you can always see how specific numbers in the final manuscript were calculated. It means that any change in the raw data will also lead to updates in the texts and the tables and even the figures that you generate with your code based on the raw data. This can be useful for example if you're planning to perform sequential analysis. If you plan to analyze your data after 50 participants, and then if the original result or the analysis is not statistically significant, you collect another 50 participants. It means you just compile the analysis code after 50 and after 100 people, and all the results will be automatically updated. It's important to realize that if you have a reproducible workflow like this or if you use R Markdown, you can still make errors. You might not make copy-paste errors, but now you're making coding errors. But these will exist, I've made them. These errors are now perfectly reproducible, which is nice because the moment you try to figure out why there is an error, you can actually track back and see what introduced it, what went wrong. It also means that this increased level of transparency means that your errors will become more visible, and there's no way around it. If you hide all the calculations and the raw data, it's very difficult to identify mistakes that were made. If you make everything transparently available, it will be much easier for people to find errors in your work. There's not much you can do about making errors. You can try to limit them, but it will be almost impossible that the entire scientific enterprise is completely error free. People will still make some errors and you will be confronted with the possibility that somebody is going to point out errors in your work. This is fine. These errors are becoming more visible and we will have to learn to deal with this. We all make mistakes, we're all fallible and if you're just willing to correct your mistakes whenever somebody points them out, that's perfectly fine. A more specific, but I think nice thing of writing your manuscript in R Markdown is that software packages such as papaja exist that allow you to quite easily format your manuscript according to specific layout rules. In psychology, we very often use the APA layout and you can just specify this layout in your R Markdown document and papaya will turn it into a nice looking PDF file that adheres to all these rules. You will also need to keep track of your references if you write in R Markdown and want to create a completely reproducible manuscript. Zotero is an example of open source software that is a reference manager. So it allows you to keep track of the literature you read. You can use Better Bit Tex and citr to also incorporate citations in R Markdown. So in the end, you really have one package that contains the data, the references, the analysis script, and can be compiled into the final manuscript. If you want to make it possible for other people to completely reproduce the analyses that you report in your manuscript, you of course need to be able to share the raw data that is used to generate these results. It is therefore important to ask participants for permission to share the data in the informed consent form they sign before you collect data. Make sure that you do this whenever it is possible - that's not always the case - so that you're able to share the raw data and the code. There are some tools online that can help you to do this. This is an example of a research data management support website from Utrecht University in the Netherlands, which I think is doing a pretty good job in helping you to implement this in practice. They suggest for example adding a statement such as "I understand that the research data without any personal information that could identify me, may be shared with others." Sharing a completely reproducible manuscript and the underlying data can already be done when you're submitting a manuscript for peer review. I quite often get these manuscripts nowadays, where I can have access to not just the raw data, but also the code. So I can really go in and see what's going on. Now sometimes I find this really helpful in figuring out what people are actually doing or how certain results are calculated. In my opinion, sharing raw data and the analysis script with the manuscript you're submitting can increase the quality of peer reviews. Now of course the peer review process is typically very untransparent. So there's no way to know for sure. I can just tell you from my personal experience that it really helps me to figure out what people are doing. If you share data and code and maybe even materials and stimuli that you used in your research, it's very important to add a license to this work. This communicates to other people how they can actually use your data, code, and materials. So implement some licensing system whenever you upload your materials to a repository. If you upload your data with a manuscript, it's important to upload the data to a repository that guarantees long-term storage. Although it's very difficult to predict the future, this typically means that they guarantee your data will still be there 50 years from now. Which seems quite reasonable. Also, make sure that you add a digital object identifier: a stable link that will keep working. Don't link to data on your website which might change in three years. A Digital Object Identifier makes sure that if you link to something in a published manuscript, it will still be available at this link in the future. It's very nice if you would also be so considerate to add a codebook to data that you share. This is nice for yourself six months from now, but definitely for other people who are trying to reuse your data. Your data-set might make sense to you at this moment even without any clear labels or descriptions of the variables, but it won't in a couple of years from now. Other people won't be able to reuse your data without the presence of a codebook. We actually tried to reproduce quite a number of scientific publications. And we noticed that many of these publications lack something like a codebook when it would have been especially helpful for us to reproduce the original results Try to make sure the data is FAIR. This is a new requirement that makes sure that your data is Findable, Accessible, Interoperable, and Reusable. Some of the more modern technological solutions to guarantee computational reproducibility are Docker or Code Ocean. The idea here is that it makes it possible for you to capture the entire environments in which the analyses were performed. In the case of Docker, you'll actually just create an image of the entire operating system and every software that you use to analyze your data. This is a rather large file and it requires a little bit more of technological sophistication to get it to work. Code Ocean is a similar idea, but here these capsules of a software environment are basically in an online web browser. You can go to this browser, create some sort of operating environment such as R, upload your data and your code, and make sure that everything is reproducible online. Then you can store this capsule and make sure that this is always available for people in the future, so that they can reproduce your results by just going to this online website. My practical advice would be to try to reproduce your own results before you share them with others, or if possible, ask a collaborator to check the computational reproducibility of your code. You might just want to reproduce your own analysis on a different computer; that's already a very good starting point. But if a collaborator manages to run the same analysis script on the data and get the results as you report them in the manuscript, that's really a good test to do before you send your data, and results, and manuscripts to a scientific journal. Doing computationally reproducible research is a skill that you need to train. For many of us myself included, this is not something that we were ever explicitly trained in. So there's a lot of stuff that we'll need to learn. Don't feel overwhelmed. Pick something that you would like to incorporate in your research workflow, take one step at a time, and eventually you'll get there in the end.