AWS Certified Data Analytics – Specialty DAS-C01 – Question070

A web retail company wants to implement a near-real-time clickstream analytics solution. The company wants to analyze the data with an open-source package. The analytics application will process the raw data only once, but other applications will need immediate access to the raw data for up to 1 year.
Which solution meets these requirements with the LEAST amount of operational effort?

A.
Use Amazon Kinesis Data Streams to collect the data. Use Amazon EMR with Apache Flink to consume and process the data from the Kinesis data stream. Set the retention period of the Kinesis data stream to 8,760 hours.
B. Use Amazon Kinesis Data Streams to collect the data. Use Amazon Kinesis Data Analytics with Apache Flink to process the data in real time. Set the retention period of the Kinesis data stream to 8,760 hours.
C. Use Amazon Managed Streaming for Apache Kafka (Amazon MSK) to collect the data. Use Amazon EMR with Apache Flink to consume and process the data from the Amazon MSK stream. Set the log retention hours to 8,760.
D. Use Amazon Kinesis Data Streams to collect the data. Use Amazon EMR with Apache Flink to consume and process the data from the Kinesis data stream. Create an Amazon Kinesis Data Firehose delivery stream to store the data in Amazon S3. Set an S3 Lifecycle policy to delete the data after 365 days.

Correct Answer: B

Explanation:

AWS Certified Data Analytics – Specialty DAS-C01 – Question069

A company using Amazon QuickSight Enterprise edition has thousands of dashboards, analyses, and datasets.
The company struggles to manage and assign permissions for granting users access to various items within QuickSight. The company wants to make it easier to implement sharing and permissions management.
Which solution should the company implement to simplify permissions management?

A.
Use QuickSight folders to organize dashboards, analyses, and datasets. Assign individual users permissions to these folders.
B. Use QuickSight folders to organize dashboards, analyses, and datasets. Assign group permissions by using these folders.
C. Use AWS IAM resource-based policies to assign group permissions to QuickSight items.
D. Use QuickSight user management APIs to provision group permissions based on dashboard naming conventions.

AWS Certified Data Analytics – Specialty DAS-C01 – Question068

A company leverages Amazon Athena for ad-hoc queries against data stored in Amazon S3. The company wants to implement additional controls to separate query execution and query history among users, teams, or applications running in the same AWS account to comply with internal security policies.
Which solution meets these requirements?

A.
Create an S3 bucket for each given use case, create an S3 bucket policy that grants permissions to appropriate individual IAM users, and apply the S3 bucket policy to the S3 bucket.
B. Create an Athena workgroup for each given use case, apply tags to the workgroup, and create an IAM policy using the tags to apply appropriate permissions to the workgroup.
C. Create an IAM role for each given use case, assign appropriate permissions to the role for the given use case, and add the role to associate the role with Athena.
D. Create an AWS Glue Data Catalog resource policy for each given use case that grants permissions to appropriate individual IAM users, and apply the resource policy to the specific tables used by Athena.

Correct Answer: C

Explanation:

AWS Certified Data Analytics – Specialty DAS-C01 – Question067

A healthcare company uses AWS data and analytics tools to collect, ingest, and store electronic health record (EHR) data about its patients. The raw EHR data is stored in Amazon S3 in JSON format partitioned by hour, day, and year and is updated every hour. The company wants to maintain the data catalog and metadata in an AWS Glue Data Catalog to be able to access the data using Amazon Athena or Amazon Redshift Spectrum for analytics.
When defining tables in the Data Catalog, the company has the following requirements:
– Choose the catalog table name and do not rely on the catalog table naming algorithm.
– Keep the table updated with new partitions loaded in the respective S3 bucket prefixes.
Which solution meets these requirements with minimal effort?

A.
Run an AWS Glue crawler that connects to one or more data stores, determines the data structures, and writes tables in the Data Catalog.
B. Use the AWS Glue console to manually create a table in the Data Catalog and schedule an AWS Lambda function to update the table partitions hourly.
C. Use the AWS Glue API CreateTable operation to create a table in the Data Catalog. Create an AWS Glue crawler and specify the table as the source.
D. Create an Apache Hive catalog in Amazon EMR with the table schema definition in Amazon S3, and update the table partition with a scheduled job. Migrate the Hive catalog to the Data Catalog.

Correct Answer: C

AWS Certified Data Analytics – Specialty DAS-C01 – Question066

A company is reading data from various customer databases that run on Amazon RDS. The databases contain many inconsistent fields. For example, a customer record field that is place_id in one database is location_id in another database. The company wants to link customer records across different databases, even when many customer record fields do not match exactly.
Which solution will meet these requirements with the LEAST operational overhead?

A.
Create an Amazon EMR cluster to process and analyze data in the databases. Connect to the Apache Zeppelin notebook, and use the FindMatches transform to find duplicate records in the data.
B. Create an AWS Give crawler to crawl the databases. Use the FindMatches transform to find duplicate records in the data. Evaluate and tune the transform by evaluating performance and results of finding matches.
C. Create an AWS Glue crawler to crawl the data in the databases. Use Amazon SageMaker to construct Apache Spark ML pipelines to find duplicate records in the data.
D. Create an Amazon EMR cluster to process and analyze data in the databases. Connect to the Apache Zeppelin notebook, and use Apache Spark ML to find duplicate records in the data. Evaluate and tune the model by evaluating performance and results of finding duplicates.

Correct Answer: D

AWS Certified Data Analytics – Specialty DAS-C01 – Question065

A data engineer is using AWS Glue ETL jobs to process data at frequent intervals. The processed data is then copied into Amazon S3. The ETL jobs run every 15 minutes. The AWS Glue Data Catalog partitions need to be updated automatically after the completion of each job.
Which solution will meet these requirements MOST cost-effectively?

A.
Use the AWS Glue Data Catalog to manage the data catalog. Define an AWS Glue workflow for the ETL process. Define a trigger within the workflow that can start the crawler when an ETL job run is complete.
B. Use the AWS Glue Data Catalog to manage the data catalog. Use AWS Glue Studio to manage ETL jobs. Use the AWS Glue Studio feature that supports updates to the AWS Glue Data Catalog during job runs.
C. Use an Apache Hive metastore to manage the data catalog. Update the AWS Glue ETL code to include the enableUpdateCatalog and partitionKeys arguments.
D. Use the AWS Glue Data Catalog to manage the data catalog. Update the AWS Glue ETL code to include the enableUpdateCatalog and partitionKeys arguments.

Correct Answer: A

Explanation:

Explanation:
Upon successful completion of both jobs, an event trigger, Fix/De-dupe succeeded, starts a crawler, Update schema.
Reference: https://docs.aws.amazon.com/glue/latest/dg/workflows_overview.html

AWS Certified Data Analytics – Specialty DAS-C01 – Question064

A company uses Amazon Kinesis Data Streams to ingest and process customer behavior information from application users each day. A data analytics specialist notices that its data stream is throttling. The specialist has turned on enhanced monitoring for the Kinesis data stream and has verified that the data stream did not exceed the data limits. The specialist discovers that there are hot shards.
Which solution will resolve this issue?

A.
Use a random partition key to ingest the records.
B. Increase the number of shards Split the size of the log records.
C. Limit the number of records that are sent each second by the producer to match the capacity of the stream.
D. Decrease the size of the records that are sent from the producer to match the capacity of the stream.

Correct Answer: A

AWS Certified Data Analytics – Specialty DAS-C01 – Question063

An energy company collects voltage data in real time from sensors that are attached to buildings. The company wants to receive notifications when a sequence of two voltage drops is detected within 10 minutes of a sudden voltage increase at the same building. All notifications must be delivered as quickly as possible. The system must be highly available. The company needs a solution that will automatically scale when this monitoring feature is implemented in other cities. The notification system is subscribed to an Amazon Simple Notification Service (Amazon SNS) topic for remediation.
Which solution will meet these requirements?

A.
Create an Amazon Managed Streaming for Apache Kafka cluster to ingest the data. Use an Apache Spark Streaming with Apache Kafka consumer API in an automatically scaled Amazon EMR cluster to process the incoming data. Use the Spark Streaming application to detect the known event sequence and send the SNS message.
B. Create a REST-based web service by using Amazon API Gateway in front of an AWS Lambda function. Create an Amazon RDS for PostgreSQL database with sufficient Provisioned IOPS to meet current demand. Configure the Lambda function to store incoming events in the RDS for PostgreSQL database, query the latest data to detect the known event sequence, and send the SNS message.
C. Create an Amazon Kinesis Data Firehose delivery stream to capture the incoming sensor data. Use an AWS Lambda transformation function to detect the known event sequence and send the SNS message.
D. Create an Amazon Kinesis data stream to capture the incoming sensor data. Create another stream for notifications. Set up AWS Application Auto Scaling on both streams. Create an Amazon Kinesis Data Analytics for Java application to detect the known event sequence, and add a message to the message stream Configure an AWS Lambda function to poll the message stream and publish to the SNS topic.

Correct Answer: D

Explanation:

AWS Certified Data Analytics – Specialty DAS-C01 – Question062

A business intelligence (BI) engineer must create a dashboard to visualize how often certain keywords are used in relation to others in social media posts about a public figure. The BI engineer extracts the keywords from the posts and loads them into an Amazon Redshift table. The table displays the keywords and the count corresponding to each keyword.
The BI engineer needs to display the top keywords with more emphasis on the most frequently used keywords.
Which visual type in Amazon QuickSight meets these requirements?

A.
Bar charts
B. Word clouds
C. Circle packing with words
D. Heat maps

Correct Answer: B

AWS Certified Data Analytics – Specialty DAS-C01 – Question061

A healthcare company uses Amazon Redshift for data analysis. During an annual security audit the company's security team determines that the environment is not using encryption at rest. The security team recommends that the company turn on encryption of private information as soon as possible.
Which solution meets these requirements with the LEAST operational overhead?

A.
Use the ALTER TABLE command with the ENCODE option to update existing private information columns in the Amazon Redshift tables to use LZO encoding.
B. Export data from the existing Amazon Redshift duster to Amazon S3 by using the UNLOAD command with the ENCRYPTED option. Create a new Amazon Redshift cluster with encryption enabled. Load data into the new cluster by using the COPY command.
C. Create a manual snapshot of the existing Amazon Redshift cluster. Restore the snapshot into a new Amazon Redshift cluster with encryption enabled.
D. Modify the existing Amazon Redshift cluster to use AWS Key Management Service (AWS KMS) encryption.

Correct Answer: C