Data Analytics & Business Intelligence.
The AIS collects a v ariety of data on its athletes. In one publically-av ailable data set, measurements were made on 202
athletes of v arious body and blood characteristics. The sum of skin folds (ssf), a measure of bodyfat,was recorded. Interest is at the
moment turning to extracting intelligence about the relationship between ssf and pbfat.
(a) What isthe correlation between ssf and
(b) Explain what yourvalue ofthe correlation means in one ortwo sentences.
(c) Write down the equation of the regression
line relating ssf (a fairly simple measure to take) to pbfat (a much more complex measure to take).
(d) Explain what your value ofthe
intercept means in one or two sentence s.
(e) Explain what yourvalue ofthe slope means in one ortwo sentences.
QUESTION 2 10
(a) What isthe predicted percentage of pbfat for a ssf of 70?
(b) Carry out a hypothesistest to check whetherthere is
evidence in the sample of a non-zero slope for the line relating ssfto pbfat. Your answer should include null and alternative hypotheses,
test statrstrc,p-value and conclusion.
(c) Write down a 95% confidence interval for the slope forthe line relating ssf to pbfat in all
(d) Explain what the confidence interval in (e) means in one or two sentence s.
(e) Produce a scatter plot of residuals
versus predicted values. Use the plot to comment on whether the re gressron Inference conditions of constant v ariance and no strong outliers
hav e been met by this data set.
QUESTION 3 10 marks
Coaches would now like to extract intelligence about whetherthere is evidence
of a difference in average haematocrit levels (a blood marker, denoted hc) between male and female athletes, on the basis of the data
(a) Write down the null and alternativ e hypothesesforthe researchers.
(b) Why should this be an independent-samplestest
and not a parre d-samplestest? Answer o one or two sentence s.
(c) Use R to find the v alue of the test statistic, and the p-value, for
(d) At the 5% level, isthe null hypothesis rejected or not? Explain your answer in one ortwo sentences.
(e) Write a conclusion to the test for the researchers.
QUESTION 4 10 marks
The relationship between sport and gender is also of interest.
Use R to carry out a chi-squared test for the researchers. Your answer should include null and alternative hypothesis, a test statistic, p
value, decision and conclusion that can be reported to the researchers.
Sports managers are interested in whether ssf depends ont e sport the athletes play.
(a) Produce a well-labelled boxplot of ssf by sport.
(b) Calculate the mean and the standard deviation of the tennis players’ ssf. Compare the location and spread of the ten sports in a short paragraph.
(c) Check whether the condition of equal standard deviations between the ten groups has been met. Your answer should include some calculations.
(d) Use an ANOVAto test the hypothesis of no difference in mean ssf between the five sports. Your answer should include a null and alternative
hypothesis, test statistic, p value and conclusion. Use a= 0.0