Statistics

write the first one.. Federal Election Commission Independent Expenditures for the 2012 and 2016 (to date) Presidential primaries. This is a rather large data set. I have created a new data set that only includes expenditures = $10 and < $5,000. The data set is PresCandIndExpenditures.csv and the variable of interest is “Expenditure Amount.” There are other variables in the file, but you only will be using “Expenditure Amount” for this assignment. The data were downloaded from http://www.fec.gov/data/DataCatalog.do. 1 Written Assignment 3 (Group Assignment) Due uploaded in a single .doc, .docx, or .pdf file to Canvas by 11:45 PM on Friday November 20th (note new due date) There are three data sets with four quantitative variables for this assignment. Each group member is to pick one of the datasets and one quantitative variable for this assignment. Each group member must use a different variable. The options are: 1. Federal Election Commission Independent Expenditures for the 2012 and 2016 (to date) Presidential primaries. This is a rather large data set. I have created a new data set that only includes expenditures = $10 and < $5,000. The data set is PresCandIndExpenditures.csv and the variable of interest is “Expenditure Amount.” There are other variables in the file, but you only will be using “Expenditure Amount” for this assignment. The data were downloaded from http://www.fec.gov/data/DataCatalog.do. 2. Heights and weights of Major League Baseball players. This data set includes two quantitative variables. You pick either height or weight. The data set is MLBHeightWeightData.xlsx. Two students from a group can use this data set, provided you use different variables. These data were downloaded from http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_MLB_HeightsWeights. 3. Time to a major breakdown data for used cars sold by “shady” used car dealerships. These data were simulated. This data set only includes a single variable and is named UsedCarBreakDowns.csv. Assignment We are treating the data for each variable as a population. Each group member will be responsible for completing steps 1 – 4 for their variable. Step 5 is to be completed as a single group response. 1. Describe your population. Your descriptions should be limited to your single variable and should include visual, numerical, and verbal descriptions. Since we are treating the data as the population, you need to make sure you use the correct formula for finding a population standard deviation. Many software packages default to a sample SD. In StatCrunch you can get a population SD by requesting “Unadj. std. dev.” from the selection of Summary statistics. 2. Use the mean and SD of your variable to create a normal model. Compare the normal model to the distribution of your variable and explain why you think the model is or is not useful. 3. Draw 100 simple random samples (sampling with replacement) from your population for each of n = 10, n = 25, and n = 50. For each sample calculate the mean and create a histogram of the sampling distribution of the mean. This will result in three histograms, one for each sample size. Also calculate the mean and SD for each sampling distribution. Then calculate the theoretical mean and SD for each of the three sampling distributions based on the mean and SD of your population. You might end up with a table that looks something like Table 1 shown below (be sure to adjust your caption as appropriate). Comment on how the means and SDs from the three sampling distributions you simulated compare to the theoretical means and SDs. Also, comment on any differences you observe between samples of size n = 10, n = 25 and n = 50. 2 Table. 1. Means and SDs for sampling distributions of the mean.. Sample Size Mean From Samples SD from Samples Theoretical Mean Theoretical SD 10 25 50 4. Repeat Step 2 for your sample of size n = 50 only. 5. As a group, come together with your individual analyses and write a summary that synthesizes the results. Specifically you want to comment on the types of distributions you observed for the individual population variables, focusing on shapes and spreads, and how the distributions compare with the sampling distributions for the mean created in Step 3. Are there any similarities in the sampling distributions of the means for the various variables? We are looking for qualitative comparisons for this step. Your sampling distributions were based on 100 means. Do you think you would have gotten the same results if the sampling distributions were based on 1,000 means or 10,000 means? How about 50 means? You are to submit one report per group. Make sure your group numbers as well as individual group member names are included on the report. Also indicate which variable each group member was responsible for. Your report should be organized as follows: • Page 1: Group summary—limited to one page. • Pages 2-7 for groups of 3 or pages 2-9 for groups of four containing the population summaries—limited to a maximum of 2 pages per variable. See bolded directions on page 3 regarding overlaying normal distributions onto histograms. StatCrunch has all the capabilities you need to do this assignment. To select simple random samples from your population, go to Data > Sample. • In “Select columns:” click on your variable name. • In “Sample size:” enter 10, 25, or 50 (note you will have to do this three times). • In “Number of samples:” enter 100. • In “Sampling options:” check “Sample with replacement” • In “Store samples:” select the middle radio button “Stacked with a sample id.” • Leave everything else and click “Compute!” o You may get a Warning Window that pops up saying “Whoa!! Lots of unique numeric values for Sample. Want to turn on binning for this procedure?” –if this happens, click “Cancel” o This will add two columns of data to your data set. One column will be the values of your variable that were selected for the samples. The second column will be a sample identifier; e.g., sample 1, sample 2, …, sample 100. 3 You can now get the means for each of your samples by going to Stat > Summary Stats. • In “Select column(s):” choose the column for the sample values • In “Grouping by:” choose the column that has the sample numbers (1, … , 100) • In “Statistics:” choose “Mean” • In “Output:” check “Store in data table” • Click “Compute!”—this will create a new column with the means from each of your samples. These are the values that will be used to describe the sample distribution for the given sample size. You can then use Stat > Summary Stats again with this new column to get the mean and SD for the sampling distribution based on the 50 samples. recommend that you create the data for all three sampling distributions first. Then you can get the summary stats (means and SDs) at once. You can also create the histograms at once and use the “For multiple graphs:” option to get all the histograms in a single figure with same-scaled axes. To save on space and the number of figures you have to create, overlay normal distributions onto the histograms. This way you can combine your figures for Steps 1 and 2, and Steps 3 and 4. You can get overlays of normal distributions by selecting Stat > Histogram • In “Display options:” > “Overlay distrib:” choose Normal and enter an appropriate Mean and Std. Dev.