开发者:上海品职教育科技有限公司 隐私政策详情

应用版本:4.2.11(IOS)|3.2.5(安卓)APP下载

Carolyne · 2024年04月21日

能讲一下吗

NO.PZ2023040502000075

问题如下:

To assist in feature selection, Steele wants to create a visualization that shows the most informative words in the dataset based on their term frequency (TF) values. After creating and analyzing the visualization, she is concerned that some tokens are likely to be noise features for ML model training; therefore, she wants to remove them.

To address her concern in her exploratory data analysis, Steele should focus on those tokens that have:

选项:

A.

low chi-square statistics

B.

low mutual information (ML) values

C.

very low and very high term frequency (TF) values

解释:

C is correct. Frequency measures can be used for vocabulary pruning to remove noise features by filtering the tokens with very high and low TF values across all the texts. Noise features are both the most frequent and most sparse (or rare) tokens in the dataset. On one end, noise features can be stop words that are typically present frequently in all the texts across the dataset. On the other end, noise features can be sparse terms that are present in only a few text files. Text classification involves dividing text documents into assigned classes. The frequent tokens strain the ML model to choose a decision boundary among the texts as the terms are present across all the texts (an example of underfitting). The rare tokens mislead the ML model into classifying texts containing the rare terms into a specific class (an example of overfitting). Thus, identifying and removing noise features are critical steps for text classification applications.

这道题问的是 选有用的feature 还是选noise feature?如果是选有用的那么应该是 前两个选项是越大越好吗 TF不是说要选 intermidiate 吗?三个都不对啊。如果是选noise 感觉三个都对啊

1 个答案

品职助教_七七 · 2024年04月22日

嗨,努力学习的PZer你好:


题干中说明需要看的是concerned that some tokens are likely to be noise features。“very low and very high term frequency (TF) values”就是noise features。所以要“focus on”这些值。

题干说明要“ based on their term frequency (TF) values.” A和B选项都不需要考虑。

----------------------------------------------
虽然现在很辛苦,但努力过的感觉真的很好,加油!

  • 1

    回答
  • 0

    关注
  • 118

    浏览
相关问题