开发者:上海品职教育科技有限公司 隐私政策详情

应用版本:4.2.11(IOS)|3.2.5(安卓)APP下载

TZXคิดถึง · 2022年03月22日

能不能解释一下这几个词?

NO.PZ2015120204000046

问题如下:

Steele and Schultz then discuss how to preprocess the raw text data. Steele tells Schultz that the process can be completed in the following three steps:

Step 1 Cleanse the raw text data.

Step 2 Split the cleansed data into a collection of words for them to be normalized.

Step 3 Normalize the collection of words from Step 2 and create a distinct set of tokens from the normalized words.

Steele’s Step 2 can be best described as:

选项:

A.

tokenization.

B.

lemmatization

C.

standardization.

解释:

A is correct. Tokenization is the process of splitting a given text into separate tokens. This step takes place after cleansing the raw text data (removing html tags, numbers, extra white spaces, etc.). The tokens are then normalized to create the bag-of-words (BOW).

请问这三个概念分别是啥意思?具体是在哪个步骤里?

1 个答案
已采纳答案

星星_品职助教 · 2022年03月22日

同学你好,

三个词的定义和讲义位置截图如下。其中step 2符合tokenization的定义。B选项是step 3里的,C选项本题没有体现。


  • 1

    回答
  • 3

    关注
  • 570

    浏览
相关问题