开发者:上海品职教育科技有限公司 隐私政策详情

应用版本:4.2.11(IOS)|3.2.5(安卓)APP下载

买买要当学霸 · 2020年01月31日

问一道题:NO.PZ2015120204000031

问题如下:

Paul suggests the following step which would be repeated every quarter.

Step 2 We utilize ML techniques to divide our investable universe of about 10,000 stocks into 20 different groups, based on a wide variety of the most relevant financial and non-financial characteristics. The idea is to prevent unintended portfolio concentration by selecting stocks from each of these distinct groups.

Which of the following machine learning techniques is most appropriate for executing Step 2:

选项:

A.

K-Means Clustering

B.

Principal Components Analysis (PCA)

C.

Classification and Regression Trees (CART)

解释:

A is correct. K-Means clustering is an unsupervised machine learning algorithm which repeatedly partitions observations into a fixed number, k, of nonoverlapping clusters (i.e., groups).

B is incorrect. Principal Components Analysis is a long-established statistical method for dimension reduction, not clustering. PCA aims to summarize or reduce highly correlated features of data into a few main, uncorrelated composite variables.

C is incorrect. CART is a supervised machine learning technique that is most commonly applied to binary classification or regression.

为什么不选B啊?

我能不能这样理解:对于PCA,降维的对象是一个级别的,比如100 group变20 group,即两个都是group的level;而对于K-means,对象不是一个级别,比如10000个stock分为20 group,即stock与group不是一个level。


请问能这样理解吗?

1 个答案

星星_品职助教 · 2020年02月03日

同学你好,

PCA的作用是当特征值特别多的时候,要减少掉特征值(features)。题干中说的是把10,000只股票分成20类,不涉及删除特征值的问题。

可以理解为PCA是删除(特征值),K-means是分类。