AUT University Certificate in Foundation Studies

1 AUT University Certificate in Foundation Studies Delivered by ACG Norton College FOUNDATION STATISTICS ASSIGNMENT 2 NAME:…………………………………. ID:……………………….. Due: Thursday 26th November, 2015 Mark Scheme 1. Assignment questions 75 R Test 15 TOTAL /90 ………………………% 1. All parts of your assignment MUST be word processed. Any part written in ink or pencil will be ignored! 2 Label all graphs appropriately, and give each graph a suitable main title. 3 Show ALL workings and R output. 4 Round all calculations sensibly. 5 Assignments handed in late will be receive 0% 2 Section A. This section is to be completed using ONLY your calculator for the required workings. Question 1 [10] (a) Ages from a group of athletes are approximately normal X ?N(33.1yrs,2.3yrs). Apply the 68-95-99.7 Rule to determine the interval in which the middle 99.7% of all ages will fall. [2] Using Z tables (b) Determine what percentage of athletes’ ages will be below 31 years. [2] (c) If a sample of 1700 ages are taken from the athletes, determine how many athletes will have an age above 34 years? [3] (d) Find the upper quartile for the athletes’ ages. [3] Question 2 [25] (a) Explain the terms non-response bias and response bias in sampling. Give an example of each, not the same as those in your notes. [4] (b) Name some factors that make a successful questionnaire? [2] (c) Define the term “sampling frame”. [1] 3 (d) What is an undercount in a census? [2] (e) 1.Describe how to use a calculator to randomly select numbers in a range from 01 to 60. [2] 2. The heights, in cm, of a community of 60 people are collected in the table below. i) Use Table B of random digits to undertake a simple random sample to select 8 peoples’ heights from the table above. Start at the beginning of row 105 reading them continuously from left to right across the row. Place your results below. [2] ii) Evaluate your sample mean height. [2] 3. In the population of 60 heights, 25 are from females. Fully describe how you would complete a stratified random sample of size 20 with respect to gender. You do not need to do the sample. [5] Person # Height, cm 4 4. A systematic sample of size 6 is to be undertaken from the population of heights. Explain fully how a systematic sampling procedure is conducted if a random starting position at the height numbered 16 is chosen. List the six heights selected. [3] 5. State one advantage and one disadvantage of a census of all 60 people in the community. [2] Advantage: Disadvantage: Question 3 [10] The variable self.concept can be found in the data set EduData, Blackboard- RData- EduData.txt (a) Explain fully in what way the distribution for self.concept is non-Normal. You must use and include a Normal Quantile plot. [4] (b) What is granularity? Explain if granularity is present in this data set? [2] 5 (c) Use R Commander to provide proof that the observations of self.concept was taken from a non-Normal distribution. Produce another graph as well as summary statistics to form part of the proof. The graph must have suitable labels and titles. Write a small paragraph on your findings. [4] Section B. This section is to be completed using ONLY R Commander for the required workings. Question 4 [30] a) A random sample surveyed 78 Year Five students at a large school and the researcher recorded several variable values for each student. A linear relationship between two of the variables shown below was investigated: Variable Description NSL National Standard Literacy – a numeric academic measure. IQ Intelligence Quotient- a numeric intelligence measure. Linear Regression output from R Call: lm(formula = NSL ~ IQ, data = NSL) Residuals: Min 1Q Median 3Q Max -6.3182 -0.5377 0.2178 1.0268 3.5785 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -3.55706 1.55176 -2.292 0.0247 * IQ 0.10102 0.01414 7.142 4.74e-10 *** Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.635 on 76 degrees of freedom Multiple R-squared: 0.4016, Adjusted R-squared: 0.3937 F-statistic: 51.01 on 1 and 76 DF, p-value: 4.737e-10 i) Identify the response variable from the regression output. [1] 6 ii) The minimum residual value from the output is -6.3182. Indicate on the scatterplot which data point this is, by circling the point. [1] iii) Calculate the correlation coefficient for the relationship. [2] iv) Describe the relationship between NSL level and IQ. Include any unusual features. [4] v) The linear regression equation for estimating the NSL level from IQ of a student is NSL = 0.101IQ – 3.557 (coefficients are rounded to 3 decimal places) State a limitation of this model equation in predicting the NSL levels of students. [2] vi) Interpret the gradient of the regression line equation. Include units. [3] 7 vii) Use the model equation, NSL = 0.101IQ – 3.557 to estimate the NSL level of a student with an IQ of 115. [3] viii) The mean of the variable IQ is 108.9. Use this result to find the mean of the variable NSL. Explain your method. [3] ix) If the two variables, NSL and IQ, were interchanged (swapped), explain in general the effect on the equation of the regression line and the R-squared value for the new relationship. [3] Regression Equation: R2 : x) 1) State the value of the coefficient of determination by referring to the R output. [1] 2) Explain the meaning of this value in the context of these variables. [2] 8 xi) A pilot survey of the Year Five students was undertaken before the main sampling exercise. The results are in the table for six individuals: 1) Use your calculator to find the correlation coefficient for the relationship. [1] 2) Assuming a linear model, with IQ as the explanatory variable, the equation for the least squares regression line for the relationship is: ? = ?. ????? - ?. ??? (coefficients rounded to 4SF) Find the residual (prediction error) for the observed value (105, 8.4) [2] 3) Produce a scatterplot for the relationship. [2] IQ 90 100 105 107 112 126 NSL 5.3 6 8.4 7.2 8 9.1