AWS Certified Machine Learning – Specialty MLS-C01 – Question200

A company is building a machine learning (ML) model to classify images of plants. An ML specialist has trained the model using the Amazon SageMaker built-in Image Classification algorithm. The model is hosted using a SageMaker endpoint on an ml.m5.xlarge instance for real-time inference. When used by researchers in the field, the inference has greater latency than is acceptable. The latency gets worse when multiple researchers perform inference at the same time on their devices. Using Amazon CloudWatch metrics, the ML specialist notices that the ModelLatency metric shows a high value and is responsible for most of the response latency.
The ML specialist needs to fix the performance issue so that researchers can experience less latency when performing inference from their devices.
Which action should the ML specialist take to meet this requirement?

A.
Change the endpoint instance to an ml.t3 burstable instance with the same vCPU number as the ml.m5.xlarge instance has.
B. Attach an Amazon Elastic Inference ml.eia2.medium accelerator to the endpoint instance.
C. Enable Amazon SageMaker Autopilot to automatically tune performance of the model.
D. Change the endpoint instance to use a memory optimized ML instance.

Correct Answer: A