NO.PZ2021083101000006
问题如下:
Bector turns his attention to Dataset XYZ, containing 84,000 tokens and 10,000 sentences. Bector chooses an appropriate feature selection method to identify and remove unnecessary tokens from the dataset and then focuses on model training.
For performance evaluation purposes, Dataset XYZ is split into a training set, crossvalidation (CV) set, and test set. Each of the sentences has already been labeled as either a positive sentiment (Class “1”) or a negative sentiment (Class “0”) sentence.
There is an unequal class distribution between the positive sentiment and negative sentiment sentences in Dataset XYZ. Simple random sampling is applied within levels of the sentiment class labels to balance the class distributions within the splits.
Bector’s view is that the false positive and false negative evaluation metrics should be given equal weight.
Based only on Dataset XYZ’s composition and Bector’s view regarding false positive and false negative evaluation metrics, which performance measure is most appropriate?
选项:
A.Recall
F1 score
Precision
解释:
B is correct.
F1 score is the most appropriate performance measure for Dataset XYZ. Bector gives equal weight to false positives and false negatives. Accuracy and F1 score are overall performance measures that give equal weight to false positives and false negatives.
Accuracy is considered an appropriate performance measure for balanced datasets, where the number of “1” and “0” classes are equal.
F1 score is considered more appropriate than accuracy when there is unequal class distribution in the dataset and it is necessary to measure the equilibrium of precision and recall.
Since Dataset XYZ contains an unequal class distribution between positive and negative sentiment sentences, F1 score is the most appropriate performance measure.
Precision is the ratio of correctly predicted positive classes to all predicted positive classes and is useful in situations where the cost of false positives or Type I errors is high.
Recall is the ratio of correctly predicted positive classes to all actual positive classes and is useful in situations where the cost of false negatives or Type II errors is high.
考点:Model Training - Performance Evaluation
可以解释下这道题吗 不太明白为什么不能选其他指标