NO.PZ2021083101000017
问题如下:
Achler splits the DTM into training, cross-validation, and test datasets. Achler uses a supervised learning approach to train the logistic regression model in predicting sentiment. Applying the receiver operating characteristics (ROC) technique and area under the curve (AUC) metrics, Achler evaluates model performance on both the training and the cross-validation datasets. The trained model performance for three different logistic regressions’ threshold p-values is presented in Exhibit 3.
Rivera suggests adjusting the model’s hyperparameters to improve performance.
Based on Exhibit 3, if Achler wants to improve model performance at the threshold p-value of 0.84, he should:
选项:
A.
tune the model to lower the AUC
B.
adjust model parameters to decrease ROC convexity
C.
apply LASSO regularization to the logistic regression
解释:
C is correct.
At the threshold p-value of 0.84, the AUC is 98.4% for the training dataset and 87.1% for the cross-validation dataset, which suggests that the model is currently overfitted. Least absolute shrinkage and selection operator (LASSO) regularization can be applied to the logistic regression to prevent overfitting of logistic regression models.
A is incorrect because the higher the AUC, the better the model performance.
B is incorrect because the more convex the ROC curve and the higher the AUC, the better the model performance. Adjusting model parameters with the aim of achieving lower ROC convexity would result in worse model performance on the cross-validation dataset.
考点: Model Training: Tuning
怎么理解at the threshold P value 0.84看出过度拟合