ReMA | Quantitative Foundations | Biostatistic
Homework 10
POSTED: 12/11/2015
DUE DATE: 12/18/2015 (at 4:45pm placed in a box at the front desk on ARB floor 7)
Please note that you will NOT get your graded homework before the final. If you would like to compare your homework to the solutions (posted 12/18/2015 at 5pm) please make a copy to keep. You will be able to collect your graded homework assignment in January.
? Please write your name on each page and staple (no paper clips) together the multiple pages of your assignment (you can’t use paper clips because they fall off too easily).
? Please SHOW ALL YOUR WORK for problems requiring hand calculations. You will receive partial credit for showing the steps along the way. A final answer with no work shown is not enough for full credit.
? Some hints on making the most of homework as a learning opportunity:
o You can work in groups or discuss the problems with your classmates, but only in a spirit of learning. Do not simply “cut and paste” from others’ work. Your final submission must be strictly your own, though informed by collaborative group work.
o If you do join a group to work on homework assignments, be sure to try all the homework problems on your own first, before meeting with your group. This way, you will have the opportunity to try to devise solutions on your own, without input from others. Then, when you get together, you can compare approaches.
Problem 1
Researchers followed 481 subjects in a study of heart disease. They computed overall survival (defined as time from diagnosis of heart disease until death from any cause) for subjects over a period of 16 years. Of the 481 subjects, 249 die, and 232 are censored. These data have been read into STATA and analyzed to assess the relationship between biological sex and survival. In these analyses, follow-up time (lenfolyr) is recorded in years (ranging from 0.003 to 15.99), survival status (fstat) is coded as 1 for dead and 0 for alive, and biological sex (sex) is coded as 1 for women and 0 for men. Some of the output is provided below. Use the output and your knowledge of survival methods to answer the questions below.
. tab fstat sex, chi
Status as |
of Last | Sex
Follow-up | Male Female | Total
-----------+----------------------+----------
Alive | 154 78 | 232
Dead | 133 116 | 249
-----------+----------------------+----------
Total | 287 194 | 481
Pearson chi2(1) = 8.3895 Pr = 0.004
a) Define “censoring” in survival studies. What is meant by the term “non-informative censoring” and why is it important in survival analysis?
b) What proportion of men die during the study follow-up period? What proportion of women? Estimate the relative risk of death for women versus men from these data, and interpret. Write your answers using probability notation.
2
c) Is the 2x2 table chi-squared statistic the best approach for evaluating the difference described in part (b)? Why or why not?
These data were analyzed using survival methods as well. Part of the output appears below.
. stcox sex
Cox regression -- Breslow method for ties
No. of subjects = 481 Number of obs = 481
No. of failures = 249
Time at risk = 2284.887068
LR chi2(X) = X.XX
Log likelihood = -1416.0443 Prob > chi2 = 0.0025
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sex | 1.472518 .1872124 X.XX X.XXX 1.147733 1.88921
------------------------------------------------------------------------------
d) What is the estimated median survival in men? In women? Which group seems to do better, based only on this comparison?
e) What is the estimated proportion surviving to 5 years in men? In women?
0.000.250.500.751.00051015analysis timesex =
Malesex =
FemaleKaplan-Meier survival estimates
3
f) Suppose a colleague computes the logrank statistic for these data, and finds the value to be equal to 9.43. Help your colleague formally test whether the survival curves for men and women are significantly different. Remember to state the null and alternative hypotheses, note the value of the test statistic and p-value, state your decision, and interpret your decision using the wording of the problem. Use an alpha of 0.05.
g) Provide an estimate of the incidence rate ratio of death for women versus men; interpret. Provide a 95% confidence interval for this value; interpret. How does the IRR compare to the RR you computed in (b)?
Problem 2
Residents of three villages, each with their own water supply, were asked to participate in a survey to identify cholera carriers. Virtually all residents in the villages present during the study period underwent examination. The proportion of residents in each village who were carriers in each village was computed and compared.
a) What type of study did the researchers conduct? (Choose the correct answer).
i. Case-control study
ii. Cross sectional study
iii. Cohort study
iv. Randomized Controlled Trial
v. Ecologic Study
b) The researchers want to test the null hypothesis that prevalence of cholera does not differ by village. What kind of test statistic can they use? Explain your thinking.
c) Below are the data. Formally test the hypothesis that village is associated with cholera colonization. Remember to state the null and alternative hypotheses, compute the value of the test statistic, df, and critical value. State your decision and interpret your decision using the wording of the problem. Use an alpha of 0.05.
Cholera Carriers
Non-Carriers Total
Village 1
47
109 156
Village 2
136
721 857
Village 3
108
305 413 Total 291 1135 1426
4
Problem 3
The Public Health Service studied the relationship between smoking and health, in a large sample of representative households. For men and for women in each age group, those who had never smoked were on average somewhat healthier than the current smokers, but the current smokers were on average much healthier than those who had recently stopped smoking.
a) What type of observational study design is this likely to be?
b) Why did they study men and women and the different age groups separately? Be brief in your response.
c) The lesson seems to be that you shouldn’t start smoking, but once you’ve started, don’t stop. Please comment on this conclusion. Do you endorse it? Why or why not? Provide a concise response in 3-5 sentences.
Problem 4
Please fill in the blanks below. Note that some blanks may require more than one-word responses.
a) The odds ratio provides a good estimate of the relative risk when the disease/outcome in question is _____________.
b) A random sample from a given population represents one in which all members of that population have _____________ chance of being chosen.
c) When testing for association between a binary exposure and a binary outcome in a 2x2 table, it is probably best to use the chi-squared test only when all of the_____________ cell counts are greater than or equal to 5.
d) A case-control study is planned to evaluate a protective exposure (one that decreases the risk of disease). The study has been designed to assure at least 80% power for detecting an odds ratio of 0.5 for a specified probability of exposure among controls. If the true odds ratio is 0.6, power will be _____________ 80%, assuming all else remains the same (e.g., sample size, exposure probability among controls, etc.).
Problem 5
a) Define the ecological fallacy and provide an example, in 3-5 sentences.
b) Define the atomistic fallacy and provide an example, in 3-5 sentences.
5
Problem 6
If you conduct a study, check and control for confounding, use the proper statistical tests, and interpret your p-values correctly, are you able to say your study does not suffer from bias? Briefly explain your thinking.
Problem 7
Researchers are interested in studying the relationship between self-characterization as a night owl (stays up late) versus an “early bird” (goes to bed early) and IQ among adults aged 18-50. Previous research suggests that the mean IQ is 100 with a standard deviation of 15 among early birds (you may assume the same SD among night owls). Pilot data suggest that the true mean difference in IQ between night owls and early birds is 2.5 IQ points.
a) Assuming an alpha level of 0.05, if the researchers would like to assure a level of power no lower than 80%, what is the minimum sample size required per group? Use the Stata output below to answer your question.
sampsi 17.5 15, sd(15) power(.8)
Estimated sample size for two-sample comparison of means
Test Ho: m1 = m2, where m1 is the mean in population 1
and m2 is the mean in population 2
Assumptions:
alpha = 0.0500 (two-sided)
power = 0.8000
m1 = 17.5
m2 = 15
sd1 = 15
sd2 = 15
n2/n1 = 1.00
Estimated required sample sizes:
n1 = 566
n2 = 566
b) How would we expect the required sample size to change from part a) if a new pilot study suggests that the mean difference in IQ between night owls and early birds is 5 IQ points, all else remaining constant? You need not do any calculations to answer this question.
c) How would we expect the required sample size to change from part a) if the researchers decide they would like to have a minimum power of 90%, all else remaining constant? You need not do any calculations to answer this question.
d) How would we expect the required sample size to change from part a) if the researchers decide to decrease the alpha level to 0.01, all else remaining constant? You need not do any calculations to answer this question.