NO.PZ202304050200007701
问题如下:
Based on the text exploration method used for Dataset
ABC, tokens that potentially carry important information useful for
differentiating the sentiment embedded in the text are most likely to have values
that are:
选项:
A.
low
B.
intermediate
C.
high
解释:
B is correct. When
analyzing term frequency at the corpus level, also known as collection
frequency, tokens with intermediate term frequency (TF) values potentially
carry important information useful for differentiating the sentiment embedded
in the text. Tokens with the highest TF values are mostly stop words that do
not contribute to differentiating the sentiment embedded in the text, and
tokens with the lowest TF values are mostly proper nouns or sparse terms that
are also not important to the meaning of the text.
A is incorrect
because tokens with the lowest TF values are mostly proper nouns or sparse
terms (noisy terms) that are not important to the meaning of the text.
C is incorrect because tokens with the highest TF
values are mostly stop words (noisy terms) that do not contribute to
differentiating the sentiment embedded in the text.
笔上面这句话,我理解是tf-do越高词越重要。但这道题的答案好像说中间值更好?另外,题目说的是用tf-idk判断吧?答案好像就只用tf了?