NO.PZ202304050200007001
问题如下:
Omar Khan is investigating the potential benefits of incorporating
non-financial data, specifically weather and social media posts, to improve
their stock selection process in the retail industry.
Dataset 1: A database from a large national weather provider that
contains detailed weather data (temperature, humidity, rainfall, atmospheric
pressure, etc.) at a very localized geographic level or zone recorded by GPS
coordinates for the past 36 months.
In reviewing Dataset 1, Khan notices that there are many data fields
included that would likely be highly irrelevant to their analysis and begins
the process of selecting a subset of data fields that he believes are
applicable.
Khan
identifies a data field called
Khan’s
selection of a subset of data from the weather dataset is best described as:
选项:
A.trimming
feature selection
feature engineering
解释:
B is correct. The process of identifying and removing unneeded,
irrelevant, or redundant features in a dataset is known as feature selection.
A is incorrect. Trimming is a process for handling outliers in a dataset
by simply removing the extreme values and is also known as truncation.
C
is incorrect. Feature engineering is the process of combining, consolidating,
or creating new features that do not exist in the current weather dataset.
“Khan identifies a data field called
这不就是One Hot Encoding 独热编码的过程吗?这个就是Feature Engineering的部分呀?
还是说这道题问的只是从整体数据中拎出来和分析irrelevant的subset数据集这个动作?