Unit 11
Unit 11:
Categorical vs Continuous Relationships
|
Unit 11: Assignment #1 (due before 11:59 pm Central on MON JUL 20):
- To begin this Unit, you’re going to learn how to evaluate the difference between expected and observed categorical (discrete) frequencies.
- First, read the first two paragraphs of Poldrack’s (2020) Chapter 12, “Chi-Square Test for Goodness of Fit.” While reading these two paragraphs, make sure you understand the following:
- When we talk about categorical relationships, we are talking about relationships between discrete measurements.
- Remember, as you learned back in Unit 2, that discrete measurements cannot be subdivided into parts. For example, the total number of children in a class is a discrete measurement because there can be a total of 12 students or 14 students, but there can’t be a total of 12.563 children.
- If you’re still unclear about discrete measurements, be sure to review Unit 2.
- Second, because in the current Unit 11, you’ll be learning to use chi-square tests, learn the following:
- The word “chi” is the English representation of the Greek letter that looks like a fancy lower case x.
- In spoken English, the word “chi” is pronounced “khi” (like “hi” with a “k” sound first).
- Next, you’ll learn how to use a chi-square test to assess what’s known as “goodness of fit.”
- First, read the remaining paragraphs in Poldrack’s (2020) Chapter 12, “Chi-Square Test for Goodness of Fit.” While reading this chapter section, make sure you understand the following:
- A chi-square goodness-of-fit test allows us to test whether the frequencies of discrete data that we observed differ from the frequencies of discrete data that we expected under the null hypothesis.
- The key points to remember are that we’re comparing what we observed with what we expected, and, for a goodness-of-fit test, what we expected is based on the null hypothesis.
- If, as in the candy bag example Poldrack gives, we expected an even split of three types of candy, then that even split is our null hypothesis.
- Second, to cement your understanding of using a chi-square goodness-of-fit test, read an excerpt from StatisticsSolution’s (no date) article, “Chi-Square Goodness of Fit Test.”
- Remember from Unit 6 that observed frequencies (and probabilities) are often called empirical frequencies (and probabilities), because we have empirically observed them.
- Therefore, a chi-square goodness-of-fit test determines how well empirical (or OBSERVED) distributions fit theoretical (or EXPECTED) distributions.
- When calculating a chi-square goodness-of-fit test:
- the null hypothesis predicts that the OBSERVED frequencies will not differ from the EXPECTED frequencies, and
- the alternative hypothesis predicts that the OBSERVED frequencies will differ from the EXPECTED frequencies.
- Third, note that both Poldrack’s (2020) Chapter 12, “Chi-Square Test for Goodness of Fit” and StatisticsSolution’s (no date) article, “Chi-Square Goodness of Fit Test” tell us the following:
- To begin calculating a chi-square goodness-of-fit test, we need to first calculate the observed frequencies and the expected frequencies.
- To complete calculating a chi-square goodness-of-fit test ourselves, we also need to calculate differences and square, then sum, those differences.
- However, for this Unit, we will use online calculators, which means we only need to calculate the observed frequencies and the expected frequencies.
- Now, you’re going get some experience conducting a chi-square goodness-of-fit test.
- First, imagine the following scenario (which was created by Professor Richard Landers of the University of Minnesota):
- You run a small business with four employees: Albert, Camilla, Jimmy, and Susan. Because you need three employees at work at any given time, only one employee at a time has the day off.
- Of course, everyone wants Saturdays off. One of your employees has confronted you and said that you favor some employees over others in giving them Saturdays off.
- To investigate this, you pull up a long list of which employees have had Saturdays off each week, for the past two years, and you calculate a chi-square goodness-of-fit test to investigate the employee’s concern.
- Second, remembering your unique data set number, which you were given in Unit 3, download your unique Saturday Off Data Set.
- If you’re using the browser Chrome or the browser Firefox, click on the link for your data set, below. When prompted, save the file to your PSY-210_Summer2020_Unit11 folder.
- If you’re using the browser Safari, right-click on the link for your data set, below, and select “Download Linked File.”
- Third, import your unique Saturday Off Data Set .csv file into a blank spreadsheet in your chosen data management platform.
- Remember to follow Andrews’ (2020) Import Data how-to article.
- Save the new spreadsheet in which you have imported your unique Saturday Off Data Set, naming the file, YourLastName_PSY-210_Unit11_SaturdayOff_Data
- Fourth, using your newly created Saturday Off Data Set spreadsheet, create a Frequency Distribution Table for Discrete Data.
- Fifth, create three additional columns in your Saturday Off Data Set Frequency Distribution Table, so that it now looks something like this or it now looks something like this (again, your frequencies will differ from these example screenshots because of the unique data set you were assigned).
- The first additional column you’ll create is another list of your Categories.
- The second additional column you’ll create are your Absolute Frequencies, only now you’ll call that column Observed Frequency, because your absolute frequencies are the frequencies you observed in this data set.
- The third additional column you’ll create is your Null Expected Frequency.
- To calculate each category’s Null Expected Frequency, write a formula that divides your Observed Frequency Total (e.g., 400) by the total number of categories, which for the Saturday Off Data Set is 4.
- The total number of categories in this data set is 4 because there are four employees: Albert, Camilla, Jimmy, and Susan.
- Each Null Expected Frequency is the total number of data values (e.g., 400) divided by the total number of categories (e.g., 4) because the null hypothesis predicts an even split.
- Be sure to calculate a Total of your Null Expected Frequencies and ensure that total equals 400 (because you were given 400 data values).
- Take a screenshot of your final Saturday Off Frequency Distribution Table and save the screenshot as YourLastName_PSY-210_Unit11_SaturdayOff_Frequency.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- Last step!
- First, choose ONE of these online chi-square goodness-of-fit calculators:
- Second, using the values in your Saturday Off Frequency Distribution Table, fill in the online chi-square goodness-of-fit calculator with the following:
- your Categories (if required),
- your Observed Frequencies (required), and
- your Null Expected Frequencies (required).
- Third, after clicking the “calculate” button, take a screenshot of the chi-square statistic and p-value (and degrees of freedom if provided), and name the screenshot YourLastName_PSY-210_Unit11_SaturdayOff_Chi-Square.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- From the chi-square statistic and p-value you calculated on your Saturday Off Data Set, can you reject the null hypothesis that the observed frequencies did not differ from the expected frequencies (an even split)?
- Remember from Unit 8, if the p-value is low enough (e.g., p < .050), we can reject the null hypothesis (of an even split).
- If the p-value is not low enough (e.g., p ≥ .050), we cannot reject the null hypothesis (of an even split).
- To get more experience calculating chi-square goodness-of-fit tests:
- First, imagine the following scenario (which was also created by Professor Richard Landers of the University of Minnesota):
- You run a successful store at which you’re always eager to introduce new products.
- Therefore, you recently offered samples of three new products to every customer who entered your store.
- You then asked your customers to choose which product they preferred. You recorded these preferences for Product A, Product B, and Product C.
- To examine whether any of the products are more likely to be chosen, you will conduct a chi-square goodness-of-fit test.
- Second, remembering your unique data set number, which you were given in Unit 3, download your unique Product ABC Data Set.
- If you’re using the browser Chrome or the browser Firefox, click on the link for your data set, below. When prompted, save the file to your PSY-210_Summer2020_Unit11 folder.
- If you’re using the browser Safari, right-click on the link for your data set, below, and select “Download Linked File.”
- Third, import your unique Product ABC Data Set .csv file into a blank spreadsheet in your chosen data management platform.
- Save the new spreadsheet in which you have imported your unique Product ABC Data Set, naming the file, YourLastName_PSY-210_Unit11_ProductABC_Data
- Fourth, using your newly created Product ABC Data Set spreadsheet, create a Frequency Distribution Table for Discrete Data.
- Initially, your Product ABC Data Set Frequency Distribution Table should look something like this or something like this — although your frequencies will differ from these example screenshots because of the unique data set you were assigned.
- Fifth, create three additional columns in your Product ABC Data Set Frequency Distribution Table, so that it now looks something like this or it now looks something like this (again, your frequencies will differ from these example screenshots because of the unique data set you were assigned).
- As before, the first additional column you’ll create is another list of your Categories.
- As before, the second additional column you’ll create are your Absolute Frequencies now being called Observed Frequency, because your absolute frequencies are the frequencies you observed in this data set.
- As before, the third additional column you’ll create is your Null Expected Frequency.
- To calculate each category’s Null Expected Frequency, you’ll again write a formula that divides your Observed Frequency Total (e.g., 300) by the total number of categories; however, the total number of categories for the Product ABC Data Set is 3.
- The total number of categories in this data set is 3 because there are three products: Product A, Product B, and Product C.
- As before, each Null Expected Frequency is the total number of data values (e.g., 300) divided by the total number of categories (e.g., 3) because the null hypothesis predicts an even split.
- Be sure to calculate a Total of your Null Expected Frequencies and ensure that total equals 300 (because you were given 300 data values).
- Sixth, take a screenshot of your final Product ABC Frequency Distribution Table and save the screenshot as YourLastName_PSY-210_Unit11_ProductABC_Frequency.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- Calculate the chi-square goodness-of-fit statistic for your Product ABC observed versus expected frequencies using ONE of the (above listed) online chi-square goodness-of-fit calculators.
- First, you must use a different online calculator than you used before (for your Saturday Off data).
- Second, after clicking the “calculate” button, take a screenshot of the chi-square statistic and p-value (and degrees of freedom, if provided), and name the screenshot YourLastName_PSY-210_Unit11_ProductABC_Chi-Square.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- Third, from the chi-square statistic and p-value you calculated on your Product ABC Data Set, can you reject the null hypothesis that the observed frequencies did not differ from the expected frequencies (an even split)?
- Remember from Unit 8, if the p-value is low enough (e.g., p < .050), we can reject the null hypothesis (of an even split).
- If the p-value is not low enough (e.g., p ≥ .050), we cannot reject the null hypothesis (of an even split).
- Go to Unit 11: Assignment #1 Discussion Board and create a new post in which you do the following:
- First, in the first sentence of your Discussion Board post, state your unique data set number (e.g., “My unique data set number is 001”).
- Second, embed the screenshot of your final Saturday Off Frequency Distribution Table (the screenshot you named YourLastName_PSY-210_Unit11_SaturdayOff_Frequency.xxx).
- Third, embed the screenshot of the Saturday Off chi-square goodness-of-fit calculation you made with the online calculator (the screenshot you named YourLastName_PSY-210_Unit11_SaturdayOff_Chi-Square.xxx).
- Fourth, report the chi-square statistic and p-value.
- Can you reject the null hypothesis of an even split?
- Fifth, embed the screenshot of your final Product ABC Frequency Distribution Table (the screenshot you named YourLastName_PSY-210_Unit11_ProductABC_Frequency.xxx).
- Sixth, embed the screenshot of the Product ABC chi-square goodness-of-fit calculation you made with the online calculator (the screenshot you named YourLastName_PSY-210_Unit11_ProductABC_Chi-Square.xxx).
- Seventh, report the chi-square statistic and p-value.
- Can you reject the null hypothesis of an even split?
- Eighth, name three specific instances in your past, present, or future that you think it would have been or it will be useful to conduct a chi-square goodness-of-fit test.
Unit 11: Assignment #2 (due before 11:59 pm Central on MON JUL 20)
- In this assignment, you’ll learn how to use chi-square to conduct what’s known as a “test of independence.”
- First, read an excerpt from Frost’s (no date) article, “How the Chi-Square Test of Independence Works.” While reading this chapter section, make sure you understand the following:
- “A chi-square test of independence determines whether a relationship exists between two discrete (categorical) variables.”
- “If the two discrete variables are dependent, then the frequencies of one variable will be depend upon the frequencies of the other variable.”
- “If the two variables are independent, then the frequencies of one variable do not depend on the frequencies of the other variable.”
- Second, read Poldrack’s (2020) Chapter 12, “Contingency Tables and the Chi-Square Test of Independence.” While reading this chapter section, make sure you understand the following:
- A chi-square test of independence allows us to test whether two discrete measures are related to, or contingent on, one another.
- The null hypothesis of a chi-square test of independence predicts that the two measures will be independent.
- The standard way to prepare data for a chi-square test of independence is by creating a Contingency Frequency Table, which presents the frequency of observations that fall into each possible combination — each contingency.”
- To compute the degrees of freedom of a chi-square test of independence we use the formula df = (the number of Rows in our Contingency Frequency Table minus 1) * (the number of Columns in our Contingency Frequency Table minus 1).
- Now, you’ll learn how to make a Contingency Frequency Table.
- First, complete Andrews’ (2020) tutorial “Using Excel’s [Google Sheets’, and Numbers’] COUNTIFS Function to Make a Contingency Frequency Table for Discrete Data.”
- Although you aren’t required to take a screenshot of the Contingency Frequency Table you create while working through this tutorial, it’s definitely in your best interest to make sure you work through the entire tutorial.
- You’ll need to know how to make a Contingency Frequency Table to complete the rest of this assignment.
- Next, you’ll get some experience calculating a chi-square test of independence, which as you know from reading Poldrack’s chapter, requires making a Contingency Frequency Table.
- First, imagine the following scenario (which was also created by Professor Richard Landers of the University of Minnesota):
- You own three clothing stores at three locations: your East Store, your South Store, and your West Store.
- At each of your three stores’ locations, you sell three price ranges of clothes: Budget Items, Mid-Range Items, and High Fashion Items.
- You’d like to know whether sales of these different priced clothes depends on the different locations of the stores; therefore, you conduct a chi-square test of independence.
- Second, remembering your unique data set number, which you were given in Unit 3, download your unique Clothing Sales Data Set.
- If you’re using the browser Chrome or the browser Firefox, click on the link for your data set, below. When prompted, save the file to your PSY-210_Summer2020_Unit11 folder.
- If you’re using the browser Safari, right-click on the link for your data set, below, and select “Download Linked File.”
- Third, import your unique Clothing Sales Data Set .csv file into a blank spreadsheet in your chosen data management platform.
- Save the new spreadsheet in which you have imported your unique Clothing Sales Data Set, naming the file, YourLastName_PSY-210_Unit11_ClothingSales_Data
- Fourth, using your newly created Clothing Sales Data Set spreadsheet and based on what you learned in Andrew’s (2020) how-to article, create a Contingency Frequency Table for your Clothing Sales Data Set.
- Your Clothing Sales Contingency Frequency Table should look something like this — although your frequencies will differ from this example screenshot because of the unique data set you were assigned.
- Fifth, take a screenshot of your Clothing Sales Contingency Frequency Table and save the screenshot as YourLastName_PSY-210_Unit11_ClothingSales_Frequency.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- Now, calculate your chi-square test of independence.
- Choose ONE of these online chi-square test of independence calculators:
- Second, using the Absolute Frequencies in your Clothing Sales Contingency Frequency Table, fill in the online chi-square test of independence calculator:
- Third, after clicking the “calculate” button, take a screenshot of the chi-square statistic and p-value (and degrees of freedom if provided), and name the screenshot YourLastName_PSY-210_Unit11_ClothingSales_Chi-Square.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- From the chi-square statistic and p-value you calculated on your Clothing Sales Data Set, can you reject the null hypothesis that the two variables (store location and clothing price range) are independent?
- Remember from Unit 8, if the p-value is low enough (e.g., p < .050), we can reject the null hypothesis (of independence).
- If the p-value is not low enough (e.g., p ≥ .050), we cannot reject the null hypothesis (of independence).
- Again, you’re going to get some experience calculating a chi-square test of independence.
- First, imagine the following scenario (which was also created by Professor Richard Landers of the University of Minnesota):
- You are the CEO of a large company. To reduce employee turnover (which means employees leaving your corporation), you implemented a new company-wide training program two years ago.
- However, you’re not sure if the training is equally effective in reducing employee turnover among employees who work in your service department, sales department, and warehouse.
- Therefore, you retrieved a list of all current and former employees who received the training. Your list also includes whether each current or former employee works or used to work in the service department, the sales department, or the warehouse.
- What you want to know is whether being a current versus former employee is contingent of working in the service department, sales department, or the warehouse.
- In other words, you want to know whether employee turnover depend on the department in which the employee works (or used to work). Therefore, you conduct a chi-square test of independence.
- Second, remembering your unique data set number, which you were given in Unit 3, download your unique Employee Turnover Data Set.
- If you’re using the browser Chrome or the browser Firefox, click on the link for your data set, below. When prompted, save the file to your PSY-210_Summer2020_Unit11 folder.
- If you’re using the browser Safari, right-click on the link for your data set, below, and select “Download Linked File.”
- Third, import your unique Employee Turnover Data Set .csv file into a blank spreadsheet in your chosen data management platform.
- Save the new spreadsheet in which you have imported your unique Employee Turnover Data Set, naming the file, YourLastName_PSY-210_Unit11_EmployeeTurnover_Data
- Fourth, using your newly created Employee Turnover Data Set spreadsheet and based on what you learned in Andrew’s (2020) how-to article, create a Contingency Frequency Table for your Employee Turnover Data Set.
- Your Employee Turnover Contingency Frequency Table should look something like this — although your frequencies will differ from this example screenshot because of the unique data set you were assigned.
- Fifth, take a screenshot of your Employee Turnover Contingency Frequency Table and save the screenshot as YourLastName_PSY-210_Unit11_EmployeeTurnover_Frequency.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- Calculate the chi-square test of independence statistic for your Employee Turnover Data Set using ONE of the (above listed) online chi-square test of of independence calculators.
- First, you must use a different online calculator than you used before (for your Clothing Sales Data Set).
- Second, using the Absolute Frequencies in your Employee Turnover Contingency Frequency Table, fill in the online chi-square test of independence calculator.
- Third, fter clicking the “calculate” button, take a screenshot of the chi-square statistic and p-value (and degrees of freedom, if provided), and name the screenshot YourLastName_PSY-210_Unit11_EmployeeTurnover_Chi-Square.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- Fourth, from the chi-square statistic and p-value you calculated on your Employee Turnover Data Set, can you reject the null hypothesis that the two variables (employee turnover and the department in which the employee works or used to work) are independent?
-
- Remember from Unit 8, if the p-value is low enough (e.g., p < .050), we can reject the null hypothesis (of independence).
- If the p-value is not low enough (e.g., p ≥ .050), we cannot reject the null hypothesis (of independence).
- Go to the Unit 11: Assignment #2 Discussion Board and do the following:
- First, in the first sentence of your Discussion Board post, state your unique data set number (e.g., “My unique data set number is 001”).
- Second, embed the screenshot of your final Clothing Sales Frequency Distribution Table (the screenshot you named YourLastName_PSY-210_Unit11_ClothingSales_Frequency.xxx).
- Third, embed the screenshot of the Clothing Sales chi-square goodness-of-fit calculation you made with the online calculator (the screenshot you named YourLastName_PSY-210_Unit11_ClothingSales_Chi-Square.xxx).
- Fourth, report the chi-square statistic and p-value.
- Can you reject the null hypothesis of independence?
- Fifth, embed the screenshot of your final Employee Turnover Frequency Distribution Table (the screenshot you named YourLastName_PSY-210_Unit11_EmployeeTurnover_Frequency.xxx).
- Sixth, embed the screenshot of the Employee Turnover chi-square goodness-of-fit calculation you made with the online calculator (the screenshot you named YourLastName_PSY-210_Unit11_EmployeeTurnover_Chi-Square.xxx).
- Seventh, report the chi-square statistic and p-value.
- Can you reject the null hypothesis of independence?
- Eighth, name three specific instances in your past, present, or future that you think it would have been or it will be useful to conduct a chi-square test of independence.
Unit 11: Assignment #3 (due before 11:59 pm Central on TUE JUL 21):
- In the second half of this Unit, you’ll learn how to evaluate relationships between continuous variables.
- First, remember, as you learned back in Unit 2, that continuous measurements can fall anywhere in an infinite range of values. For example, your height, the length of your foot, and the amount of sleep you got last night are all continuous measurements.
- Second, refresh your memory about correlations by re-reading of Investopedia’s (No Date) article, “Correlation Coefficient.” While reading this excerpt, make sure you understand the following:
- “The correlation coefficient is a statistical measure of the strength of the relationship between two continuous variables.”
- “The values of a Pearson correlation coefficient range between -1.000 and 1.000.”
- A correlation of -1.000 shows a perfect negative correlation, while a correlation of 1.000 shows a perfect positive correlation. A correlation of 0.000 shows no linear relationship the two variables.”
- “The strength of a relationship is indicated by the magnitude of the correlation coefficient.”
- Third, read Poldrack’s (2020) Chapter 13 “Modeling Continuous Relationships.” While reading this excerpt, make sure you understand the following:
- “One way to quantify the relationship between two continuous variables is by
calculating their covariance.”
- The variance measures one variable’s deviation from the mean; the covariance measures the relation between two variables’ deviation from their mean.
- Although we don’t usually use the covariance to describe relationships between two variables, we do use correlation coefficients.
- After calculating a correlation coefficient, we can test the null hypothesis, which predicts that the correlation coefficient is 0.000.
- Now, you’ll learn how to compute a correlation coefficient.
- First, search the Internet for a tutorial or how-to guide to teach you how to calculate a Pearson correlation coefficient using your chosen data management platform.
- The how-to guide you find can be in any format (e.g., video, written text, figures, or the like — or a combination of formats).
- However, the how-to guide you find must be from the Internet and not from other sources (e.g., textbooks or friends).
- Remember that it’s important to learn to use Google to find out how to do something you don’t know how to do (and that most most data scientists frequently use Google to learn — or remind themselves) how to do things).
- Be sure to write down the URL of the tutorial or how-to guide you find and use.
- Second, download your classmates’ Height-Foot Length Data Set.
- If you’re using the browser Chrome or the browser Firefox, click on the link for your data set. When prompted, save the file to your PSY-210_Summer2020_Unit11 folder.
- If you’re using the browser Safari, right-click on the link for your data set and select “Download Linked File.”
- Third, import your classmates’ Height-Foot Length Data Set .csv file into a blank spreadsheet in your chosen data management platform.
- Remember to follow Andrews’ (2020) Import Data how-to article.
- Save the new spreadsheet in which you have imported your classmates’ Height-Foot Length Data Set, naming the file, YourLastName_PSY-210_Unit11_HeightFootLength_Data
- Fourth, using your classmates’ Height-Foot Length Data Set calculate the following:
- the mean of your classmates’ Height (in feet)
- the standard deviation of your classmates’ Height (in feet)
- the N, meaning the sample size, which is the number of students in your class who reported their Height (in feet)
- the mean of your classmates’ Foot Length (in inches)
- the standard deviation of your classmates’ Foot Length (in inches)
- the N, meaning the sample size, which is the number of students in your class who reported their Foot Length (in inches)
- the Pearson correlation coefficient between your classmates’ Height (in feet) and their Foot Length (in inches)
- Fifth, take a screenshot of your classmates’ Height-Foot Length Data Set means, standard deviations, Ns, and Pearson correlation coefficient and save the screenshot as YourLastName_PSY-210_Unit11_HeightFootLength_Stats.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- To get more practice computing correlation coefficients:
- First, download your classmates’ Height-Sleep Data Set.
- If you’re using the browser Chrome or the browser Firefox, click on the link for your data set. When prompted, save the file to your PSY-210_Summer2020_Unit11 folder.
- If you’re using the browser Safari, right-click on the link for your data set and select “Download Linked File.”
- Second, import your classmates’ Height-Sleep Data Set .csv file into a blank spreadsheet in your chosen data management platform.
- Save the new spreadsheet in which you have imported your classmates’ Height-Sleep Data Set, naming the file, YourLastName_PSY-210_Unit11_HeightSleep_Data
- Third, using your classmates’ Height-Sleep Data Set calculate the following:
- the mean of your classmates’ Height (in feet); you can use this calculation to check the mean you previously calculated for this variable
- the standard deviation of your classmates’ Height (in feet); you can use this calculation to check the standard deviation you previously calculated for this variable
- the N, meaning the sample size, which is the number of students in your class who reported their Height (in feet); you can use this calculation to check the N you previously calculated for this variable.
- the mean of your classmates’ previous night of Sleep (in hours)
- the standard deviation of your classmates’ previous night of Sleep (in hours)
- the N, meaning the sample size, which is the number of students in your class who reported their previous night of Sleep (in hours)
- the Pearson correlation coefficient between your classmates’ Height (in feet) and their previous night of Sleep (in hours)
- Fourth, take a screenshot of your classmates’ Height-Sleep Data Set means, standard deviations, Ns, and Pearson correlation coefficient and save the screenshot as YourLastName_PSY-210_Unit11_HeightSleep_Stats.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like)
- To get even more practice calculating correlation coefficients:
- First, download this Temp-Coffee-Juice Data Set, which comprise the Daily Maximum Temperature (in Fahrenheit), the daily Coffee Sales (in US dollars), and the daily Juice Sales (in US dollars) at another university’s student-run cafe (and made available by Penn State Online Statistics).
- If you’re using the browser Chrome or the browser Firefox, click on the link for your data set. When prompted, save the file to your PSY-210_Summer2020_Unit11 folder.
- If you’re using the browser Safari, right-click on the link for your data set and select “Download Linked File.”
- Second, import the Temp-Coffee-Juice Data Set .csv file into a blank spreadsheet in your chosen data management platform.
- Save the new spreadsheet in which you have imported the Temp-Coffee-Juice Data Set, naming the file, YourLastName_PSY-210_Unit11_Temp-Coffee-Juice_Data
- Third, using the Temp-Coffee-Juice Data Set calculate the following:
- the mean of the Daily Maximum Temperature (in Fahrenheit)
- the standard deviation of the Daily Maximum Temperature (in Fahrenheit)
- the N, meaning the sample size, which is the number of days for which the Daily Maximum Temperature (in Fahrenheit) was recorded
- the mean of the daily Coffee Sales (in US dollars)
- the standard deviation of the daily Coffee Sales (in US dollars)
- the N, meaning the sample size, which is the number of days for which the daily Coffee Sales (in US dollars) were recorded
- the mean of the daily Juice Sales (in US dollars)
- the standard deviation of the daily Juice Sales (in US dollars)
- the N, meaning the sample size, which is the number of days for which the daily Juice Sales (in US dollars) were recorded
- the Pearson correlation coefficient between the Daily Maximum Temperature (in Fahrenheit) and the daily Coffee Sales (in US dollars)
- the Pearson correlation coefficient between the Daily Maximum Temperature (in Fahrenheit) and the daily Juice Sales (in US dollars)
- Fourth, take a screenshot of the Temp-Coffee-Juice Data Set means, standard deviations, Ns, and Pearson correlation coefficients and save the screenshot as YourLastName_PSY-210_Unit11_Temp-Coffee-Juice_Stats.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- Now, it’s time to learn how to test the null hypothesis associated with each of the correlation coefficients you calculated.
- First, make sure you have calculated FOUR correlation coefficients:
- the correlation between your classmates’ Height (in feet) and their Foot Length (in inches)
- the correlation between your classmates’ Height (in feet) and their previous night of Sleep (in hours)
- the correlation between the Daily Maximum Temperature (in Fahrenheit) and the daily Coffee Sales (in US dollars)
- the correlation between the Daily Maximum Temperature (in Fahrenheit) and the daily Juice Sales (in US dollars)
- Second, remember from Poldrack’s (2020) chapter that for all correlation coefficients, the null hypothesis predicts that the correlation coefficient is 0.000.
- Third, using ONE of the below calculators, find the p-value of EACH of the four correlation coefficients you have calculated:
- Fourth, when using the above calculators:
- The N or Sample Size is based on the number of students (classmates) who reported their height, foot length, and sleep OR the number of days for which the temperature, the coffee sales, and the juice sales were recorded.
- If you have the choice, choose a two-sided test, also referred to as two-tailed probability (because we did not have a directional alternative hypothesis).
- Record the p-value for EACH correlation.
- Fifth, for EACH p-value you recorded, decide whether you can reject the null hypothesis that the correlation is 0.000?
- Remember from Unit 8, if the p-value is low enough (e.g., p < .050), we can reject the null hypothesis of a 0.000 correlation (the variables are not related).
- If the p-value is not low enough (e.g., p ≥ .050), we cannot reject the null hypothesis of a 0.000 correlation.
- Go to the Unit 11: Assignment #3 Discussion Board and do the following:
- First, embed the screenshot of your classmates’ Height-Foot Length Data Set means, standard deviations, Ns, and Pearson correlation coefficient (YourLastName_PSY-210_Unit11_HeightFootLength_Stats.xxx).
- Report the Pearson correlation coefficient between your classmates’ Height and Foot Length.
- Report the Pearson correlation coefficient’s p-value.
- State whether you can reject the null hypothesis of a zero correlation.
- Second, embed the screenshot of your classmates’ Height-Sleep Data Set means, standard deviations, Ns, and Pearson correlation coefficient (YourLastName_PSY-210_Unit11_HeightSleep_Stats.xxx).
- Report the Pearson correlation coefficient between your classmates’ Height and Sleep.
- Report the Pearson correlation coefficient’s p-value.
- State whether you can reject the null hypothesis of a zero correlation.
- Third, embed the screenshot of the Temp-Coffee-Juice Data Set means, standard deviations, Ns, and Pearson correlation coefficients (YourLastName_PSY-210_Unit11_Temp-Coffee-Juice_Stats.xxx
- Report the Pearson correlation coefficient between the Daily Maximum Temperature and the daily Coffee Sales.
- Report the Pearson correlation coefficient’s p-value.
- State whether you can reject the null hypothesis of a zero correlation.
- Report the Pearson correlation coefficient between the Daily Maximum Temperature and the daily Juice Sales.
- Report the Pearson correlation coefficient’s p-value.
- State whether you can reject the null hypothesis of a zero correlation.
Unit 11: Assignment #4 (due before 11:59 pm Central on TUE JUL 21)
- In this assignment, you’re going to learn how to make Scatter Plots.
- First, to learn what a Scatter Plot is, read an excerpt from Khan Academy’s (No Date) article, “Scatter Plots.” While reading this excerpt, make sure you understand the following:
- In a Scatter Plot, each pair of values in the data set gets plotted as one point whose x-coordinate represents one variable’s value and whose y-coordinate represents the other variable’s value.
- For example, in the example Scatter Plot in Khan Academy’s article, each dot represents one of the 23 students’ quiz score and that same student’s shoe size.
- Second, to learn more about Scatter Plots, read an excerpt from Math Is Fun’s (2017) article, “Scatter Plots.” While reading this excerpt, make sure you understand the following:
- Scatter Plots are also called X-Y Plots because they display the relationship between two sets of data, which are plotted using Cartesian (x,y) coordinates.
- For example, in the first example Scatter Plot in Math Is Fun’s article, each dot represents one of the 11 students’ height and that same student’s weight.
- As another example, in the second example Scatter Plot in Math Is Fun’s article, each dot represents one of the 12 days on which the temperature was recorded and that same day on which ice cream sales were recorded.
- Now, search the Internet for a tutorial or how-to guide to teach you how to make Scatter Plots (often called X-Y Plots) using your chosen data management platform.
- The how-to guide you find can be in any format (e.g., video, written text, figures, or the like — or a combination of formats).
- However, the how-to guide you find must be from the Internet and not from other sources (e.g., textbooks or friends).
- Next, create a Scatter Plot for each of FOUR data pairs for which you calculated Pearson correlation coefficients in Unit 11: Assignment #3:
- ONE: the correlation between your classmates’ Height (in feet) and their Foot Length (in inches)
- TWO: the correlation between your classmates’ Height (in feet) and their previous night of Sleep (in hours)
- THREE: the correlation between the Daily Maximum Temperature (in Fahrenheit) and daily Coffee Sales (in US dollars)
- FOUR: the correlation between the Daily Maximum Temperature (in Fahrenheit) and daily Juice Sales (in US dollars)
- Remember that a “good graph” includes these four major components:
- a Graph Title
- Axis Labels
- Graph Units
- Graph Data
- For the two Scatter Plots that present your classmates’ Height (in feet):
- Use the y-axis to represent your classmates’ Height (in feet), and adjust the y-axis to a minimum of 4.000 feet and a maximum of 7.000 feet.
- You can refresh your memory on how to change the y-axis by reading this handout.
- For one Scatter Plot, use the x-axis to represent your classmates’ Foot Length (in inches), and adjust the x-axis to a minimum of 7.000 inches and a maximum of 13.000 inches.
- For the other Scatter Plot, use the x-axis to represent your classmates’ Sleep (in hours), and adjust the x-axis to a minimum of 0.000 hours and a maximum of 16.000 hours.
- For the two Scatter Plots that present Daily Maximum Temperature (in Fahrenheit):
- Use the y-axis to represent the Daily Maximum Temperature (in Fahrenheit), and adjust the y-axis to a minimum of 0.000 (degrees) Fahrenheit and a maximum of 90.000 (degrees) Fahrenheit.
- For one Scatter Plot, use the x-axis to represent daily Coffee Sales (in US dollars), and adjust the x-axis to a minimum of 0.000 dollars and a maximum of 140.000 dollars.
- For the other Scatter Plot, use the x-axis to represent Juice Sales (in US dollars), and adjust the x-axis to a minimum of 0.000 dollars and a maximum of 45.000 dollars.
- Save each of the four Scatter Plots you created as a screenshot named YourLastName_PSY-210_Unit11_YYY_ScatterPlot.xxx (where YYY is the data set and xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- Return to the excerpt from Math Is Fun’s (2017) article, “Scatter Plots,” and study the last page that presents seven idealized Scatter Plots.
- First, notice that the data points in Scatter Plots that represent positive correlations tilt up left-to-right.
- In contrast, the data points in Scatter Plots that represent negative correlations tilt down left-to-right.
- Second, notice that the data points in Scatter Plots that represent high correlations are much more tightly clustered around an imaginary diagonal line.
- In contrast, the data points in Scatter Plots that represent low correlations are much more scattered.
- Go to the Unit 11: Assignment #4 Discussion Board and do the following:
- First, embed each of the four Scatter Plots that you created.
- Second, for each Scatter Plot, identify whether it is a positive or negative correlation and whether it is a high, moderate, low, or no correlation.
Unit 11: Assignment #5 (due before 11:59 pm Central on WED JUL 22):
- Meet online with your NEW Chat Group (which you formed during Unit 8) for a one-hour text-based Group Chat at a time/date that your Chat Group previously arranged.
- BEFORE YOU MEET WITH YOUR CHAT GROUP, each member of the Chat Group must do ALL of the following:
- First, to sharpen your ability to interpret Scatter Plots, read Math Boot Camp’s (2017) article, “Reading Scatter Plots.” While reading this article, make sure you understand the following:
- The shape in a Scatter Plot can be either linear or curvilinear.
- Scatter Plots with a linear shape have points that seem to fall along a line (hence, the term linear).
- In a positive linear pattern, the imaginary line slopes UP from left-to-right.
- In a negative linear pattern, the imaginary line slopes DOWN, from left-to-right.
- The strength of a correlation is shown by how tightly clustered together the points are to each other.
- Second, to better understand what correlations can and cannot tell us:
- Third, while watching these videos, to make sure you understand why correlation cannot be used to prove causation:
- Write down at least four examples of correlation not proving causation from the videos you watched.
- One example you can write down is the correlation between the amount of ice cream purchased (during each month of the year) and the number of drowning deaths (during each month of the year) not proving that ice cream causes drowning.
- Write down at least two examples of a correlation that might be caused by another variable.
- One example you can write down is the correlation between ice cream sales per month and drowning deaths per month; the correlation might be caused by another variable, season of the year. In such cases, the other variable is called a confounding variable.
- Write down at least two examples of a correlation that might be due to coincidence.
- One example you can write down is the correlation between pool drownings per year and Nicholas Cage films per year. That correlation is most likely simply be due to coincidence.
- Fourth, other than completing the above reading and video-watching assignments and writing down your examples, DO NOT begin working on any of the steps listed below until your Chat Group begins their one-hour Group Chat.
- During your one-hour Group Chat:
- First, play “Guess the Correlation Coefficient Based on the Scatter Plot.”
- Instructions are included on the first page of the game.
- Every member of your Chat Group MUST make guesses about every set of four Scatter Plots before you, as a group, scroll to see the answers.
- As for who gets to make their four guesses first:
- Sum your birthdate (e.g., if you were born on March 18, 1999, your birthdate sum is 03 [March] + 18 [18th] + 99 [1999] = 120):
- For three-student Chat Groups:
- Rotate in the order of highest birthdate sum, lowest birthday sum, neither highest nor lowest birthdate sum (i.e., Trial One: highest, lowest, neither; Trial Two: lowest, neither, highest; Trial Three: neither, highest, lowest; and so forth).
- For two-student Chat Groups:
- Rotate in the order of highest birthdate sum, lowest birthday sum (i.e., Trial One: highest, lowest; Trial Two: lowest, highest; Trial Three: highest, lowest; and so forth).
- Second, as a group rather than individually, look through the homepage of Tyler Vigen’s “Spurious Correlation” website.
- Spurious means “apparent but not actually valid.”
- The spurious correlations on Tyler Vigen’s homepage are like the correlations you learned about in AsapScience’s (2017) YouTube, “This ≠ That.”
- Identify your group’s THREE favorite “spurious correlation” from Tyler Vigen’s homepage.
- Third, and again as a group rather than individually, visit Tyler Vigen’s “Spurious Correlation Generator” and generate several spurious correlations, by following these instructions.
- First, as a group, choose a topic (e.g., “Sunlight by state”) by pulling down on the “interesting variable(s)” menu.
- Second, as a group, choose your specific variable (e.g., “Sunlight in Florida”) by clicking “View variables” and selecting the variable you want with the pull-down menu.
- Fourth, click “Correlate.”
- Fifth, as a group, choose your second variable (e.g., “Lawyers in South Carolina”).
- Sixth, click “Graph,” and you’ll see the graph of your spurious correlation.
- Play around with this site enough to be able to generate your group’s SIX favorite spurious correlations. Save a screenshot of each of these six graphs.
- AT THE END OF YOUR ONE-HOUR GROUP CHAT do the following:
- Nominate one member of your Chat Group (who participated in the Chat) to make a post on the Unit 11: Assignment #5 Discussion Board that summarizes your Group Chat in at least 200 words.
- Nominate a second member of your Chat Group (who participated in the Group Chat using the browser Chrome on their laptop, rather than on their mobile device) to save the Chat transcript, as described in the Course How To (under the topic, “How To Save and Attach a Chat Transcript”).
- This Chat Group member needs to make a post on the Unit 11: Assignment #5 Discussion Board and attach the Chat transcript, saved as a PDF, to that Discussion Board post.
- Remember to attach the Chat transcript by clicking on the word “Attach.” (Do not click on the sidebar menu “Files.”)
- Nominate a third member of your Chat Group (who also participated in the Chat) to make another post on the Unit 11: Assignment #5 Discussion Board that states the name of your Chat Group, the names of the Chat Group members who participated the Chat, the date of your Chat, and the start and stop time of your Group Chat
- This Chat Group member also needs to embed the six screenshots of your Chat Group’s six favorite spurious correlations.
- If only two students participated in the Group Chat, then one of those two students needs to do two of the above three tasks.
- Before ending the Group Chat, arrange the date and time for the Group Chat you will need to hold during the next Unit (Unit 12: Assignment #5).
- Record a typical Unit entry in your own Course Journal for the current Unit, Unit 11.
Congratulations, you have finished Unit 11! Onward to Unit 12! |
|