NO.PZ2021061603000047
问题如下:
An economist collected the monthly returns
for KDL's portfolio and a diversified stock index. The data collected are shown
in the following table:
The economist calculated the correlation
between the two returns and found it to be 0.996. The regression results with
the KDL return as the dependent variable and the index return as the
independent variable are given as follows:
When reviewing the results, Andrea Fusilier
suspected that they were unreliable. She found that the returns for Month 2
should have been 7.21% and 6.49%, instead of the large values shown in the
first table. Correcting these values resulted in a revised correlation of 0.824
and the following revised regression results:
Explain how the bad data affected the
results.
选项:
解释:
The Month 2 data point is an outlier, lying
far away from the other data values.
Because this outlier was caused by a data
entry error, correcting the outlier improves the validity and reliability of
the regression. In this case, revised R2 is lower (from 0.9921 to 0.6784). The
outliers created the illusion of a better fit from the higher R2; the outliers
altered the estimate of the slope. The standard error of the estimate is lower
when the data error is corrected (from 2.861 to 2.0624), as a result of the
lower mean square error. However, at a 0.05 level of significance, both models
fit well. The difference in the fit is illustrated in Exhibit 1:
何老师总说有P-value的话就不用看t统计量,又说P是在跟不上切得的面积,那么面积就是百分比,是和α进行对比的?但题目里p-value都是值,t-statistic不也是算出来的检验统计量的值吗?这两个有什么区别呢?请老师帮忙解答,谢谢!