AWS Certified Machine Learning – Specialty MLS-C01 – Question027

A Machine Learning Specialist is creating a new natural language processing application that processes a dataset comprised of 1 million sentences. The aim is to then run Word2Vec to generate embeddings of the sentences and enable different types of predictions.
Here is an example from the dataset: "The quck BROWN FOX jumps over the lazy dog."
Which of the following are the operations the Specialist needs to perform to correctly sanitize and prepare the data in a repeatable manner? (Choose three.)

A.
Perform part-of-speech tagging and keep the action verb and the nouns only.
B. Normalize all words by making the sentence lowercase.
C. Remove stop words using an English stopword dictionary.
D. Correct the typography on "quck" to "quick."
E. One-hot encode all words in the sentence.
F. Tokenize the sentence into words.

Correct Answer: BCF