Unit 13
Unit 13:
Reproducing Research
|
Unit 13: Assignment #1 (due before 11:59 pm Central on MON JUL 27):
- To begin learning about reproducible research:
- First, read CK-12’s (no date) definition of “Replication” and make sure you understand the following:
- “The results of an investigation are not likely to be well accepted unless the investigation is repeated … and the same result is always obtained.”
- “Getting the same result when an experiment is repeated is called replication.”
- “Similarly, getting the same result when a data analysis is repeated is also called replication.”
- Second, read Professor Gernsbacher’s (no date) “Replication (and Replicability) versus Reproduction (and Reproducibility)” and make sure you understand the following:
- “Some scientists use the term replication (and replicability) as synonyms of reproduction (and reproducibility).”
- Other scientists distinguish between the terms replicate/replication/replicability versus reproduce/reproduction/reproducibility.
- For example, Peng (2018) defines “reproducibility as [a group of independent] researchers analyzing the [previously collected] data” of a study previously conducted by another group of researchers.”
- Therefore, to researchers like Peng, reproducible research means data that other researchers have verified through analyzing those data again (and hopefully getting the same results!).
- In this course, you may use the terms replicate/replication/replicability and reproduce/reproduction/reproducibility interchangeably; however, be aware that some scientists do make a distinction among the terms.
- Next, to learn ways that you and other scientists can ensure your data and data analyses are reproducible:
- First, read Poldrack’s (2020) Chapter 13 “Doing Reproducible Data Analysis” and make sure you understand the following:
- Doing reproducible data analyses means making both your data and your data analysis code available to other researchers.
- Your data analysis code means the formulas and functions you used to analyze your data.
- Writing formulas and functions to analyze data, rather than using “point-and-click” software (such as SPSS), allows other researchers to more easily reproduce your data analyses.
- Similarly, using free and open-source software (such as Google Sheets), rather than commercial software packages, allows other researchers to more easily reproduce your data analyses.
- Second, read again a section from Professor Gernsbacher’s (2020) chapter “Teaching Research Transparency in Psychological Science” (which you read back in Unit 5) and make sure you understand the following:
- Making your data and your data analyses available to other people is an important component of research transparency.
- Swap-checking your data and data analyses with other people is a great way to ensure your data and their analyses are reproducible.
- Prior to sharing your data and data analyses, it’s important to ensure that your data and analyses are
- comprehensive (all data and analyses that contributed to the results are included);
- self-explanatory (all labels and formulas are interpretable to someone else);
- self-contained (all data and analyses are available in one place or, if needed, with links to other places); and
- organized.
- Now, choose THREE of the following brief news reports, each of which reports the occurrence of an error in data analysis.
- Alabama Reporter’s (2020) “Alabama COVID Hospitalizations Are at an All-Time High after Data Error Correction“
- Alois’s (2020) “Oops. Data Error Causes Up to $125 Million Bitcoin Loss at FCoin Exchange“
- BBC News’ (2017) “UK Schoolboy Corrects NASA Data Error“
- Bloomberg News’ (2019) “Data Error Undermines Study about CRISPR Babiesʼ Lifespans“
- Bloomberg News’ (2020) “Data Error Mars U.S. Jobless-Claims Report for Second Week” [NOTE: “to mar” means to damage or spoil in some way]
- CNBC News’ (2020) “Mea Culpa — Twilio CEO Assures Jim Cramer ‘Simple Math Error’ Won’t Happen Again” [NOTE: “mea culpa” means acknowledging one’s error]
- Cycling News’ (2020) “Lachlan Morton’s Everesting Record Attempt Comes Up Short“
- Houston Chronicle’s (2019) “Calpine Admits Error that Sent Power Prices Soaring“
- CBS Miami’s (2020) “Error Causes Medicaid Shortfall for Florida Hospitals“
- Radio New Zealand’s (2020) “Data Error Leaves 300 Renal Patients with Missed Follow-Ups” [NOTE: “renal” means relating to the kidneys]
- Saskatoon Star’s (2020) “City Hall Blames ‘Highly Manual’ Processes, Staff Inexperience for Tax Error“
- Tanzania’s National Media (date) “Boxrec Admits Data Error in Mwakinyoʼs Latest Ranking“
- For fun, read xckd’s (2019) cartoon “Data Error.”
- Although the cartoon, like all of xkcd’s cartoons, is meant to be funny, the first option suggested to Meghan, “Redo your analysis [after correcting your error] and share whatever results you can, whether positive or negative. It’s disappointing, but these things happen” is excellent advice.
- As illustrated by the news reports you just read, data analysis errors happen.
- Go to the Unit 13: Assignment #1 Discussion Board and create a new post of at least 200 words in which you answer the following questions for EACH of the THREE news reports you chose:
- First, which article did you choose?
- Second, why did you choose this article (e.g., do you have any personal experience in this area; it’s okay if you don’t)?
- Third, to the best of your understanding, what was the data analysis error?
- Fourth, speculate (which means to guesstimate) whether the data analysis error could have been avoided by doing any of the following:
- swap-checking data analyses with a colleague
- making data and data analyses available to other people
- using transparent formulas and functions to analyze data, rather than using more opaque apps or the like.
Unit 13: Assignment #2 (due before 11:59 pm Central on MON JUL 27):
- In this assignment, you’re going to learn what p-hacking is.
- First, read WiRED’s (2019) article, “We Are All ‘p-Hacking’ Now.” While reading this article, make sure you understand the following:
- p-hacking refers to researchers cherry-picking only the results that have p < .05.
- p-hacking often comes about because a researcher conducts more analyses than they should, and they only report those analyses with p < .05.
- “This kind of fiddling around” allows researchers to continue to analyze their data (usually in unexpected or unwarranted ways) “until they get the answer that they want.”
- “p-hacking can allow researchers to get most studies to reveal significant relationships between truly unrelated variables.”
- Second, read xkcd’s (2011) cartoon “Significance.”
- A transcript of the cartoon is here.
- Be sure to read each frame of the cartoon closely (Spoiler Alert: One of the frames shows a p < .05 result rather than a p > .05 result.)
- Third, read thoroughly through the explanation of xkcd’s (2011) cartoon “Significance,” which is on the last page of the document. While reading this explanation, make sure you understand the following:
- The cartoon illustrates how if someone conducts multiple experiments, they can cherry-pick (p-hack) only the result they want.
- The scientists tried 20 different colors, and only one time — only one 1 out of 20 times (or 5% of the time) — did they find a result that was considered significant at p < .05.
- “By testing so many different colors without adjusting their p-value, they are likely to find a false positive.”
- “If the probability that each trial gives a false positive result is 1 in 20, then by testing 20 different colors it is likely that at least one jelly bean test will give a false positive.”
- If you’re unclear about what false positives, false negatives, true positives, and true negatives mean, looking through this handout should help.
- Fourth, to see some real-life examples of the dilemma addressed by the xkcd cartoon, look at this tweet and this tweet.
- Now, to get a feel for how easy it is to p-hack (though, of course, beyond this assignment, you shouldn’t p-hack!):
- First, go to NRICH’s “Hypothesis Testing” simulator.
- Second, set the number of Red balls to 4 and the number of Green balls to 4. (In other words, create an experiment that has 4 red balls and 4 green balls, and, therefore, there should not be a significant difference in the number of red versus green balls selected.)
- Third, keep the Null Hypothesis at .500 (an even probability of selecting as many red balls as green balls), which means the Alternative Hypothesis is ≠ .500 (an uneven probability of selecting red versus green balls).
- Fourth, set the Number of trials in the experiment to 20.
- Fifth, click “Repeat Experiment” as many times as you need to click (i.e., as many experiments as you need to run) until you can p-hack a p-value < .05.
- Take a screenshot of your simulation, which should look something like this, and save the screenshot as YourLastName_PSY-210_Unit13_p-Hacking_20Trials.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- Sixth, click “Reset” and keep all the settings the same EXCEPT change the Number of trials in the experiment to 100.
- Seventh, click “Repeat Experiment” as many times as you need to click (i.e., as many experiments as you need to run) until you can p-hack a p-value < .05.
- Take a screenshot of your simulation and save the screenshot as YourLastName_PSY-210_Unit13_p-Hacking_100Trials.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- Go to the Unit 13: Assignment #2 Discussion Board and create a new post of at least 200 words in which you do the following:
- First, embed both of the screenshots you saved (YourLastName_PSY-210_Unit13_p-Hacking_20Trials.xxx and YourLastName_PSY-210_Unit13_p-Hacking_100Trials.xxx).
- Second, answer the following four questions:
- Had you heard of p-hacking before?
- Do you understand how p-hacking works?
- Do you understand why researchers might be motivated to p-hack?
- Are you surprised how easy it is to p-hack?
- Third, given all you know now, pinky swear that you won’t try to p-hack in the future.
Unit 13: Assignment #3 (due before 11:59 pm Central on TUE JUL 28):
- In this assignment, you’re going to learn about other Questionable Research Practices (in addition to p-hacking).
- First, watch Veritasium’s (2016) video, “Is Most Published Research Wrong?”
- Second, read Schimmack’s (2015) article, “Questionable Research Practices: Definition, Detect, and Recommendations for Better Practices.”
- Third, read the Association of Psychological Science (2012) article, “Questionable Research Practices Surprisingly Common.”
- Go to the Unit 13: Assignment #3 Discussion Board and make a new post of at least 200 words in which you answer the following questions:
- What are Questionable Research Practices?
- Why do you think so many researchers engage in Questionable Research Practices?
- Of the seven Questionable Research Practices listed by Schimmak (2015), which one do you consider to be the most ethical?
- Of the seven Questionable Research Practices listed by Schimmak (2015), which one do you consider to be the least ethical?
Unit 13: Assignment #4 (due before 11:59 pm Central on TUE JUL 28):
- To demonstrate what you have learned about reproducible research, create a Teaching Document that can be used to teach other Psychology majors. You do not need to teach the information to other Psych majors, but you do need to create a Teaching Document.
- First, choose your medium. Your choices are (1) a PowerPoint; (2) an Infographic; or (3) a comic strip (e.g., this comic strip from The Nibs).
- Second, remember that your Teaching Document needs to capture as much of the information that you have learned in this Unit as possible, including information from each of the first three assignments.
- Third, save your teaching document as a PDF named YourLastname_PSY-210_Unit13_Reproducibility.pdf (no .pptx, .ppt, .doc, .docx, or any other file types except for .pdf will be graded).
- Fourth, learn how to test the size of your PDF by reading through this handout.
- Then, test the size of your PDF.
- If the size of your PDF is too large to email to yourself, reduce the size of your PDF by following the suggestions in this handout.
- Go to the Unit 13: Assignment #4 Discussion Board and make a new Discussion Board post to which you should attach — not embed, but attach — your Teaching Document (in PDF).
- First, look underneath the textbox where you typically type (or paste into) the Discussion Board, and you will see the “Attach” tool; it is the word “Attach” preceded by a paperclip icon.
- Second, click on the “Attach” tool. Browse to the .pdf file on your computer and select your .pdf file.
- Third, upload your .pdf file.
- Fourth, click on “Post Reply.”
- Fifth, make sure that do not attach your .pdf file by using the “Files” menu option on the left-hand side of the Discussion Board. Instead, use only the “Attach” tool that is found underneath the Discussion Board text box.
- Sixth, make sure that the PDF you attached is named YourLastname_PSY-210_Unit13_Reproducibility.pdf
Unit 13: Assignment #5 (due before 11:59 pm Central on SUN Aug 2):
- Meet online with your NEW Chat Group (which you formed during Unit 8) for a one-hour text-based Group Chat at a time/date that your Chat Group previously arranged.
- BEFORE you meet with your Chat Group, each member of the Chat Group must do ALL of the following:
- First, meet the last family member of the General Linear Model family that we will cover in this course: the t-test.
- Very simply: The t-test is similar to a one-way ANOVA, which you learned about in Unit 12.
- For both a one-way ANOVA and a t-test, the y variable is continuous, and the x variable is discrete (categorical).
- The discrete (categorical) x variable in a one-way ANOVA can have three categories (or levels), such as painting houses, being a barista, and nannying, and it can have four categories (levels), such as Psychology, Political Science, Sociology, and History majors; in fact, it can have even more than four categories (levels).
- However, in a t-test the discrete (categorical) x variable can have only two categories.
- Second, given what you’ve learned in this Unit — namely, that analyzing data by writing formulas and functions rather than using “point-and-click” software (such as online calculators) facilitates other researchers reproducing your data analyses — you’re going to write formulas (with your chosen data management platform) to calculate t-tests.
- A formula for approximating a t-test in Microsoft Excel, Google Sheets, and Apple Numbers is the following:
- = (Mean1-Mean2) / (SQRT ( (StDev1^2 / N1) + (StDev2^2 / N2) ) )
- Mean1 and Mean2 refer to the Means of each of the two categories you are analyzing.
- StDev2 and StDev2 refer to the Standard Deviations of each of the two categories.
- N1 and N2 refer to the number of observations in each of the two categories.
- SQRT is the function for calculating a square root; ^2 is the function for squaring a value.
- NOTE: When you use the above formula, you’ll need to close the gaps by taking out the extra spaces, which are included in the above formula only to make it more readable on this screen.
- Your formula should look something like this (depending, of course, on which cells contain your Mean1, Mean2, StDev1, StDev2, N1, and N2).
- Third, calculate some t-tests using your chosen data management platform and using data for which you’ve already calculated Means, Standard Deviations, and Ns during Unit 9 (therefore, look in your Unit 9 folder for these data and your previous calculations).
- If your Lastname comes FIRST alphabetically in your NEW Chat Group:
- Calculate a t-test to examine whether the dependent, continuous y-variable, Actors’ Heights, is affected by (i.e., depends on) these two categories of the independent, discrete x-variable: Men Actors versus Women Actors.
- Calculate a t-test to examine whether the dependent, continuous y-variable, Batting Averages, is affected by (i.e., depends on) these two categories of the independent, discrete x-variable: Milwaukee Brewers baseball team versus the Chicago Cubs baseball team.
- If your Lastname comes LAST alphabetically in your NEW Chat Group:
- Calculate a t-test to examine whether the dependent, continuous y-variable, Actors’ Age at their First Oscar Award, is affected by (i.e., depends on) these two categories of the independent, discrete x-variable: Men Actors versus Women Actors.
- Calculate a t-test to examine whether the dependent, continuous y-variable, NCAA Coaches’ Salaries, is affected by (i.e., depends on) these two categories of the independent, discrete x-variable: Football Coaches versus Basketball Coaches.
- If your Lastname comes NEITHER first nor last alphabetically in your NEW Chat Group:
- Calculate a t-test to examine whether the dependent, continuous y-variable, Actors’ Heights, is affected by (i.e., depends on) these two categories of the independent, discrete x-variable: Men Actors versus Women Actors.
- Calculate a t-test to examine whether the dependent, continuous y-variable, NCAA Coaches’ Salaries, is affected by (i.e., depends on) these two categories of the independent, discrete x-variable: Football Coaches versus Basketball Coaches.
- Again, we realize that some of the baseball teams’ batting averages are artificially inflated because the players were rookies with late-season call-ups, and we apologize that the actor data accentuate a gender binary.
- Fourth, for each of the t-tests you calculate, also calculate a p-value using your chosen data management platform.
- If you’re using Microsoft Excel, use this formula.
- If you’re using Google Sheets, use this formula.
- If you’re using Apple Numbers, use this formula.
- For all three data management platforms, the part of the formula in parenthesis refers to the following:
- all the data values in the first category, all the data values in the second category, the number 2 (to indicate a two-tailed test), and the number 2 (to indicate two independent samples/categories).
- Remember from Unit 8:
- If the p-value is low enough (e.g., p < .050), you can reject the null hypothesis that the dependent variable is NOT affected by (does not depend on) the two categories of the independent variable.
- If the p-value is not low enough (e.g., p ≥ .050), you cannot reject the null hypothesis that the dependent variable is NOT affected by (does not depend on) the two categories of the independent variable.
- Other than the above steps, DO NOT begin working on any of the steps listed below until your Chat Group begins their one-hour Group Chat.
- DURING your one-hour Group Chat, each Chat Group member is going to swap-check data analyses with other Chat Group members because swap-checking is a great way to ensure your data analyses are reproducible.
- First, remember, as you learned in Professor Gernsbacher’s (2020) chapter “Teaching Research Transparency in Psychological Science,” that swap-checking involves two researchers swapping their data analyses so that Researcher A can try to replicate Researcher B’s analyses, and Researcher B can try to replicate Researcher A’s analyses.
- Second, swap-check the one-way ANOVAs you calculated in Unit 12: Assignment #4 with three categories/levels.
- Remember that the Chat Group member whose Lastname comes FIRST alphabetically in your Chat Group previously analyzed the Food Trucks Data Set and the Summer Jobs Data Set.
- The Chat Group member whose Lastname comes LAST alphabetically previously analyzed the Internet Providers Data Set and the Sick Leave Data Set.
- The Chat Group member whose Lastname comes NEITHER first nor last previously analyzed the Food Trucks Data Set and the Sick Leave Data Set.
- IMPORTANT: Each analysis MUST be swap-checked by a Chat Group member who did NOT previously analyze that data set for Unit 12: Assignment #4
- Therefore, for each analysis, the Chat Group member who did NOT previously analyze that data set will, during the Group Chat, analyze that data set.
- For each data set, you need to check the Means, the Standard Deviations, the Ns, the F-ratio, and the p-value.
- For each ANOVA, you also need to check whether you can reject the null hypothesis.
- Third, swap-check the one-way ANOVAs you calculated in Unit 12: Assignment #4 with four categories/levels.
- Remember that the Chat Group member whose Lastname comes FIRST alphabetically previously analyzed the Dog Obedience Data Set and the Textbook Costs Data Set.
- The Chat Group member whose Lastname comes LAST alphabetically previously analyzed the Family Vacations Data Set and the Wedding Gifts Data Set.
- The Chat Group member whose Lastname comes NEITHER first nor last previously analyzed the Dog Obedience Data Set and the Wedding Gifts Data Set.
- IMPORTANT: Each analysis MUST be swap-checked by a Chat Group member who did NOT previously analyze that data set for Unit 12: Assignment #4.
- Therefore, for each analysis, the Chat Group member who did NOT previously analyze that data set will, during the Group Chat, analyze that data set.
- For each data set, you need to check the Means, the Standard Deviations, the Ns, the F-ratio, and the p-value.
- For each ANOVA, you also need to check whether you can reject the null hypothesis.
- Fourth, swap-check the t-tests you calculated for the current Unit 13: Assignment #5 before you met as a Chat Group.
- Remember that the Chat Group member whose Lastname comes FIRST alphabetically previously analyzed the Actors’ Heights Data Set and the Batting Average Data Set.
- The Chat Group member whose Lastname comes LAST alphabetically previously analyzed the Age at First Oscar Data Set and the NCAA Coaches’ Salaries Data Set.
- The Chat Group member whose Lastname comes NEITHER first nor last previously analyzed the Actors’ Heights Data Set and the NCAA Coaches’ Salaries Data Set.
- IMPORTANT: Each analysis MUST be swap-checked by a Chat Group member who did NOT previously analyze that data set before the Chat Group met.
- Therefore, for each analysis, the Chat Group member who did NOT previously analyze that data set will, during the Group Chat, analyze that data set.
- For each data set, you need to check the Means, the Standard Deviations, the Ns, the t-value, and the p-value.
- For each t-test, you also need to check whether you can reject the null hypothesis.
- Fifth, end your one-hour Group Chat by saying goodbye to your Chat Group because this is your last Group Chat of the course!
- AT THE END of your one-hour Group Chat do the following:
- Nominate one member of your Chat Group (who participated in the Chat) to make a post on the Unit 13: Assignment #5 Discussion Board that summarizes your Group Chat in at least 200 words.
- Be sure to report in your 200-word summary who swap-checked each analysis.
- Nominate a second member of your Chat Group (who participated in the Group Chat using the browser Chrome on their laptop, rather than on their mobile device) to save the Chat transcript, as described in the Course How To (under the topic, “How To Save and Attach a Chat Transcript”).
- This Chat Group member needs to make a post on the Unit 13: Assignment #5 Discussion Board and attach the Chat transcript, saved as a PDF, to that Discussion Board post.
- Remember to attach the Chat transcript by clicking on the word “Attach.” (Do not click on the sidebar menu “Files.”)
- Nominate a third member of your Chat Group (who also participated in the Chat) to make another post on the Unit 13: Assignment #5 Discussion Board that states the name of your Chat Group, the names of the Chat Group members who participated the Chat, the date of your Chat, and the start and stop time of your Group Chat.
- If only two students participated in the Group Chat, then one of those two students needs to do two of the above three tasks.
- Record a typical Unit entry in your own Course Journal for the current Unit, Unit 13.
Congratulations, you have finished Unit 13! Onward to Unit 14! |
|