1 STAT6214, FALL2015, PRACTICE EXAM

NAME: 1. Consider the simple regression model yi = 5 + ß1xi + ?i with E(?i) = 0, var(?) = s2, and cov(?i; ?j) = 0. Answer the following questions: (a) Prove that the least squares estimate of ß1 is ˆ ß1 = S i(yi?? S 5)xi i x2 i (b) Find var( ˆ ß1) (c) Find E(yi ?? ˆ ß1xi) (d) Find cov(yi ?? ˆyi, yj ?? ˆyj) 1 2. Consider the simple regression model yi = ß0 + ß1xi + ?i, ?is are Gaussian with E(?i) = 0, var(?) = s2, and cov(?i; ?j) = 0 (a) Recall that ˆyi = Sn j=1 wijyj , where wij = 1 n + (xi??x)(xj?? S x) n k=1(xk??x)2 i = 1, , n. i. Show that ˆyi = ¯y if xi = ¯x, in other words the regression line goes through the point (¯x, ¯y) ii. Using the expression above (in 2 (a)) to derive var(ˆyi) iii. Show that var(ˆyi) var(¯y) 2 3. For the following residuals plots discuss the possible violations of regression model assumptions and suggest remedial measures -0.5 0.0 0.5 1.0 1.5 2.0 -10 -5 0 5 10 Fitted values Residuals Residuals vs Fitted 9 51 46 -2 -1 0 1 2 -4 -2 0 2 4 Theoretical Quantiles Standardized residuals Normal Q-Q 9 51 46 -0.5 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 Fitted values Standardized residuals Scale-Location 9 51 46 0.00 0.02 0.04 0.06 0.08 -4 -2 0 2 4 Leverage Standardized residuals Cook’s distance 1 0.5 0.5 Residuals vs Leverage 51 46 9 0 2 4 6 -1 0 1 2 Fitted values Residuals Residuals vs Fitted 77 25 70 -2 -1 0 1 2 -1 0 1 2 3 Theoretical Quantiles Standardized residuals Normal Q-Q 2757 70 0 2 4 6 0.0 0.5 1.0 1.5 Fitted values Standardized residuals Scale-Location 77 25 70 0.00 0.01 0.02 0.03 0.04 0.05 -2 -1 0 1 2 3 Leverage Standardized residuals Cook’s distance Residuals vs Leverage 77 25 39 3 4. True or False. (a) Statistically significant correlation or evidence of a statistically significant effect always implies a causal relationship (b) Consideration must always be given to the size of the data set as this is related to the power of the analysis to detect differences in a given size (c) When working with a categorical covariate, the reference category has to be chosen. There are two considerations to be made in selecting a reference category, the ease of interpretation and the number of data points in the category. (d) To allow a fair comparisons between the different model fits, it is important that the models are being fitted to the same data set (e) A principal components analysis is done on the explanatory variables that identify vectors (i.e., the linear combinations of variables) that account, successively, for the smallest variation in the observations of the explanatory variables (f) The principal components analysis is done in complete disregard of observed variability in the response. (g) The main limitation of principal components regression lies in the difficulties of interpretation of the principal components (h) Aside from designing manipulative experiments to break correlations among explanatory variables, no technique exists that allows researchers to infer the different functional relationships between the response and explanatory variables (i) In addition to fundamental shortcomings with regard to finding the best model, stepwise procedures are known to suffer from a multiple-testing problem (j) Significance tests based on stepwise procedures lead to decreased Type I error rates 4