AWS Certified Data Analytics – Specialty DAS-C01 – Question113 - AWS Certified Data Analytics

A hospital uses an electronic health records (EHR) system to collect two types of data:
– Patient information, which includes a patient's name and address.
– Diagnostic tests conducted and the results of these tests.
Patient information is expected to change periodically. Existing diagnostic test data never changes and only new records are added.
The hospital runs an Amazon Redshift cluster with four dc2.large nodes and wants to automate the ingestion of the patient information and diagnostic test data into respective Amazon Redshift tables for analysis. The EHR system exports data as CSV files to an Amazon S3 bucket on a daily basis. Two sets of CSV files are generated. One set of files is for patient information with updates, deletes, and inserts. The other set of files is for new diagnostic test data only.
What is the MOST cost-effective solution to meet these requirements?

A. Use Amazon EMR with Apache Hudi. Run daily ETL jobs using Apache Spark and the Amazon Redshift JDBC driver.
B. Use an AWS Glue crawler to catalog the data in Amazon S3. Use Amazon Redshift Spectrum to perform scheduled queries of the data in Amazon S3 and ingest the data into the patient information table and the diagnostic tests table.
C. Use an AWS Lambda function to run a COPY command that appends new diagnostic test data to the diagnostic tests table. Run another COPY command to load the patient information data into the staging tables. Use a stored procedure to handle create, update, and delete operations for the patient information table.
D. Use AWS Database Migration Service (AWS DMS) to collect and process change data capture (CDC) records. Use the COPY command to load patient information data into the staging tables. Use a stored procedure to handle create, update, and delete operations for the patient information table.

Correct Answer: B

Explanation:

Reference: https://docs.aws.amazon.com/prescriptive-guidance/latest/serverless-etl-aws-glue/aws-glue-data-catalog.html
https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-schedule-query.html

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.