AWS Certified Data Analytics – Specialty DAS-C01 – Question103

A healthcare company ingests patient data from multiple data sources and stores it in an Amazon S3 staging bucket. An AWS Glue ETL job transforms the data, which is written to an S3-based data lake to be queried using Amazon Athena. The company wants to match patient records even when the records do not have a common unique identifier.
Which solution meets this requirement?

A.
Use Amazon Macie pattern matching as part of the ETL job.
B. Train and use the AWS Glue PySpark filter class in the ETL job.
C. Partition tables and use the ETL job to partition the data on patient name.
D. Train and use the AWS Glue FindMatches ML transform in the ETL job.

Correct Answer: D

Explanation: