NO.PZ2021083101000004
问题如下:
Bector then computes TF–IDF (term frequency–inverse document frequency) for several words in the collection and tells Azarov the following:
Statement 2 TF at the collection level is multiplied by IDF to calculate TF–IDF.
Statement 3 TF–IDF values vary by the number of documents in the dataset, and therefore, model performance can vary when applied to a dataset with just a few documents.
Which of Bector’s statements regarding TF, IDF, and TF–IDF is correct?
选项:
A.Statement 1
Statement 2
Statement 3
解释:
C is correct.
Statement 3 is correct. TF–IDF values vary by the number of documents in the dataset, and therefore, the model performance can vary when applied to a dataset with just a few documents.
A is incorrect because IDF is calculated as the log of the inverse, or reciprocal, of the document frequency (DF) measure.
B is incorrect because TF at the sentence (not collection) level is multiplied by IDF to calculate TF–IDF.
考点:Unstructured Data Exploration - Feature Selection - Different TF measures
没怎么看明白