NO.PZ2021083101000004
问题如下:
Bector then computes TF–IDF (term frequency–inverse document frequency) for several words in the collection and tells Azarov the following:
Statement 2 TF at the collection level is multiplied by IDF to calculate TF–IDF.
Statement 3 TF–IDF values vary by the number of documents in the dataset, and therefore, model performance can vary when applied to a dataset with just a few documents.
Which of Bector’s statements regarding TF, IDF, and TF–IDF is correct?
选项:
A.
Statement 1
B.
Statement 2
C.
Statement 3
解释:
C is correct.
Statement 3 is correct. TF–IDF values vary by the number of documents in the dataset, and therefore, the model performance can vary when applied to a dataset with just a few documents.
A is incorrect because IDF is calculated as the log of the inverse, or reciprocal, of the document frequency (DF) measure.
B is incorrect because TF at the sentence (not collection) level is multiplied by IDF to calculate TF–IDF.
考点:Unstructured Data Exploration - Feature Selection - Different TF measures
一个英文的问题,虽然IDF是inverse document frequency,但是inverse of document frequency特指DF的倒数,所以不对,这样理解对吗