问题如下:
The output created in Steele’s Step 3 can be best described as a:
选项:
A. bag-of-words.
B. set of n-grams.
C. document term matrix.
解释:
A is correct. After the cleansed text is normalized, a bag-of-words is created. A bag-of-words (BOW) is a collection of a distinct set of tokens from all the texts in a sample dataset.