开发者:上海品职教育科技有限公司 隐私政策详情

应用版本:4.2.11(IOS)|3.2.5(安卓)APP下载

叶赫那拉坤坤 · 2020年09月08日

问一道题:NO.PZ2017092702000117

问题如下:

Which sampling bias is most likely investigated with an out-of-sample test?

选项:

A.

Look-ahead bias

B.

Data-mining bias

C.

Sample selection bias

解释:

B is correct.

An out-of-sample test is used to investigate the presence of data-mining bias. Such a test uses a sample that does not overlap the time period of the sample on which a variable, strategy, or model was developed.

B 和 C请老师给区分一下?

1 个答案

星星_品职助教 · 2020年09月08日

同学你好,

这几个bias只要掌握定义就可以区分:

Data-mining bias:Data mining relates to overuse of the same or related data. Data-mining bias refers to the errors that arise from such misuse of data。也就是说反复在数据中寻找,即使没有规律也要人为的强行“挖掘”出规律来。这种被挖掘出来的“规律”由于是勉强找到的,所以只适用于本数据集,无法用于其他数据集即无法用于预测。

所以可以用out-of-sample test来检验是否有Data-mining bias,如果用其他的数据集来测试,发现之前数据集里找到的规律完全不成立,就说明很可能有Data-mining bias的存在。

--------

Sample selection bias:When data availability leads to certain assets being excluded from the analysis, we call the resulting problem sample selection bias. 也就是因为一些数据因为不好找到等原因,被排除在外了,这样用于分析的数据就是不全的,导致bias。survivorship bias就是Sample selection bias中的一种