AWS Certified Data Analytics – Specialty DAS-C01 – Question100

A company has a producer application that collects device log data. The producer application writes to an Amazon Kinesis Data Firehose delivery stream that delivers data to an Amazon S3 bucket. The company needs to build a series of dashboards to display real-time trends of the metrics in the log data.
Which solution will meet these requirements?

A. Update the Kinesis Data Firehose delivery stream to add an Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster as another destination. Use OpenSearch Dashboards (Kibana) for log data visualization.
B. Update the Kinesis Data Firehose delivery stream to add an Amazon Kinesis Data Analytics application as an additional destination. Use Amazon QuickSight to display the output of the Kinesis Data Analytics application.
C. Create another Kinesis Data Firehose delivery stream. Update the producer application to write a copy of the log data into the new delivery stream. Set the new delivery stream to deliver data into an Amazon QuickSight dashboard.
D. Update the producer application to write the log data to an Amazon Kinesis data stream. Deliver this data stream to the original Kinesis Data Firehose delivery stream and a new Kinesis Data Firehose delivery stream. Set the new delivery stream to deliver data into an Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster. Use OpenSearch Dashboards (Kibana) for log data visualization.

Correct Answer: B

Explanation:

Reference: https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html

AWS Certified Data Analytics – Specialty DAS-C01 – Question099

A media company wants to perform machine learning and analytics on the data residing in its Amazon S3 data lake. There are two data transformation requirements that will enable the consumers within the company to create reports:
– Daily transformations of 300 GB of data with different file formats landing in Amazon S3 at a scheduled time.
– One-time transformations of terabytes of archived data residing in the S3 data lake.
Which combination of solutions cost-effectively meets the company's requirements for transforming the data? (Choose three.)

A. For daily incoming data, use AWS Glue crawlers to scan and identify the schema.
B. For daily incoming data, use Amazon Athena to scan and identify the schema.
C. For daily incoming data, use Amazon Redshift to perform transformations.
D. For daily incoming data, use AWS Glue workflows with AWS Glue jobs to perform transformations.
E. For archived data, use Amazon EMR to perform data transformations.
F. For archived data, use Amazon SageMaker to perform data transformations.

Correct Answer: ADE

Explanation:

Reference: https://aws.amazon.com/glue/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-overview-arch.html#emr-arch-processing-frameworks

AWS Certified Data Analytics – Specialty DAS-C01 – Question098

A bank is using Amazon Managed Streaming for Apache Kafka (Amazon MSK) to populate real-time data into a data lake. The data lake is built on Amazon S3, and data must be accessible from the data lake within 24 hours. Different microservices produce messages to different topics in the cluster. The cluster is created with 8 TB of Amazon Elastic Block Store (Amazon EBS) storage and a retention period of 7 days.
The customer transaction volume has tripled recently, and disk monitoring has provided an alert that the cluster is almost out of storage capacity.
What should a data analytics specialist do to prevent the cluster from running out of disk space?

A. Use the Amazon MSK console to triple the broker storage and restart the cluster.
B. Create an Amazon CloudWatch alarm that monitors the KafkaDataLogsDiskUsed metric. Automatically flush the oldest messages when the value of this metric exceeds 85%.
C. Create a custom Amazon MSK configuration. Set the log.retention.hours parameter to 48. Update the cluster with the new configuration file.
D. Triple the number of consumers to ensure that data is consumed as soon as it is added to a topic.

Correct Answer: B

Explanation:

Reference: https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html

AWS Certified Data Analytics – Specialty DAS-C01 – Question097

A software company wants to use instrumentation data to detect and resolve errors to improve application recovery time. The company requires API usage anomalies, like error rate and response time spikes, to be detected in near-real time (NRT). The company also requires that data analysts have access to dashboards for log analysis in NRT.
Which solution meets these requirements?

A. Use Amazon Kinesis Data Firehose as the data transport layer for logging data. Use Amazon Kinesis Data Analytics to uncover the NRT API usage anomalies. Use Kinesis Data Firehose to deliver log data to Amazon OpenSearch Service (Amazon Elasticsearch Service) for search, log analytics, and application monitoring. Use OpenSearch Dashboards (Kibana) in Amazon OpenSearch Service (Amazon Elasticsearch Service) for the dashboards.
B. Use Amazon Kinesis Data Analytics as the data transport layer for logging data. Use Amazon Kinesis Data Streams to uncover NRT monitoring metrics. Use Amazon Kinesis Data Firehose to deliver log data to Amazon OpenSearch Service (Amazon Elasticsearch Service) for search, log analytics, and application monitoring. Use Amazon QuickSight for the dashboards.
C. Use Amazon Kinesis Data Analytics as the data transport layer for logging data and to uncover NRT monitoring metrics. Use Amazon Kinesis Data Firehose to deliver log data to Amazon OpenSearch Service (Amazon Elasticsearch Service) for search, log analytics, and application monitoring. Use OpenSearch Dashboards (Kibana) in Amazon OpenSearch Service (Amazon Elasticsearch Service) for the dashboards.
D. Use Amazon Kinesis Data Firehose as the data transport layer for logging data. Use Amazon Kinesis Data Analytics to uncover NRT monitoring metrics. Use Amazon Kinesis Data Streams to deliver log data to Amazon OpenSearch Service (Amazon Elasticsearch Service) for search, log analytics, and application monitoring. Use Amazon QuickSight for the dashboards.

Correct Answer: C

Explanation:

Reference: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/integrations.html

AWS Certified Data Analytics – Specialty DAS-C01 – Question096

A company has an encrypted Amazon Redshift cluster. The company recently enabled Amazon Redshift audit logs and needs to ensure that the audit logs are also encrypted at rest. The logs are retained for 1 year. The auditor queries the logs once a month.
What is the MOST cost-effective way to meet these requirements?

A. Encrypt the Amazon S3 bucket where the logs are stored by using AWS Key Management Service (AWS KMS). Copy the data into the Amazon Redshift cluster from Amazon S3 on a daily basis. Query the data as required.
B. Disable encryption on the Amazon Redshift cluster, configure audit logging, and encrypt the Amazon Redshift cluster. Use Amazon Redshift Spectrum to query the data as required.
C. Enable default encryption on the Amazon S3 bucket where the logs are stored by using AES-256 encryption. Copy the data into the Amazon Redshift cluster from Amazon S3 on a daily basis. Query the data as required.
D. Enable default encryption on the Amazon S3 bucket where the logs are stored by using AES-256 encryption. Use Amazon Redshift Spectrum to query the data as required.

Correct Answer: D

Explanation:

Reference: https://docs.aws.amazon.com/redshift/latest/mgmt/db-auditing.html

AWS Certified Data Analytics – Specialty DAS-C01 – Question095

A large telecommunications company is planning to set up a data catalog and metadata management for multiple data sources running on AWS. The catalog will be used to maintain the metadata of all the objects stored in the data stores. The data stores are composed of structured sources like Amazon RDS and Amazon Redshift, and semistructured sources like JSON and XML files stored in Amazon S3. The catalog must be updated on a regular basis, be able to detect the changes to object metadata, and require the least possible administration.
Which solution meets these requirements?

A. Use Amazon Aurora as the data catalog. Create AWS Lambda functions that will connect and gather the metadata information from multiple sources and update the data catalog in Aurora. Schedule the Lambda functions periodically.
B. Use the AWS Glue Data Catalog as the central metadata repository. Use AWS Glue crawlers to connect to multiple data stores and update the Data Catalog with metadata changes. Schedule the crawlers periodically to update the metadata catalog.
C. Use Amazon DynamoDB as the data catalog. Create AWS Lambda functions that will connect and gather the metadata information from multiple sources and update the DynamoDB catalog. Schedule the Lambda functions periodically.
D. Use the AWS Glue Data Catalog as the central metadata repository. Extract the schema for RDS and Amazon Redshift sources and build the Data Catalog. Use AWS Glue crawlers for data stored in Amazon S3 to infer the schema and automatically update the Data Catalog.

Correct Answer: D

Explanation:

Reference: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html

AWS Certified Data Analytics – Specialty DAS-C01 – Question094

A marketing company collects clickstream data. The company sends the data to Amazon Kinesis Data Firehose and stores the data in Amazon S3. The company wants to build a series of dashboards that will be used by hundreds of users across different departments. The company will use Amazon QuickSight to develop these dashboards. The company has limited resources and wants a solution that could scale and provide daily updates about clickstream activity.
Which combination of options will provide the MOST cost-effective solution? (Choose two.)

A. Use Amazon Redshift to store and query the clickstream data.
B. Use QuickSight with a direct SQL query.
C. Use Amazon Athena to query the clickstream data in Amazon S3.
D. Use S3 analytics to query the clickstream data.
E. Use the QuickSight SPICE engine with a daily refresh.

Correct Answer: BD

Explanation:

Reference: https://aws.amazon.com/blogs/big-data/create-real-time-clickstream-sessions-and-run-analytics-with-amazon-kinesis-data-analytics-aws-glue-and-amazon-athena/

AWS Certified Data Analytics – Specialty DAS-C01 – Question093

A company stores its sales and marketing data that includes personally identifiable information (PII) in Amazon S3. The company allows its analysts to launch their own Amazon EMR cluster and run analytics reports with the data. To meet compliance requirements, the company must ensure the data is not publicly accessible throughout this process. A data engineer has secured Amazon S3 but must ensure the individual EMR clusters created by the analysts are not exposed to the public internet.
Which solution should the data engineer to meet these compliance requirements with LEAST amount of effort?

A. Create an EMR security configuration and ensure the security configuration is associated with the EMR clusters when they are created.
B. Check the security group of the EMR clusters regularly to ensure it does not allow inbound traffic from IPv4 0.0.0.0/0 or IPv6 ::/0.
C. Enable the block public access setting for Amazon EMR at the account level before any EMR cluster is created.
D. Use AWS WAF to block public internet access to the EMR clusters across the board.

Correct Answer: C

Explanation:

Reference: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-block-public-access.html

AWS Certified Data Analytics – Specialty DAS-C01 – Question092

A media company is using Amazon QuickSight dashboards to visualize its national sales data. The dashboard is using a dataset with these fields: ID, date, time_zone, city, state, country, longitude, latitude, sales_volume, and number_of_items.
To modify ongoing campaigns, the company wants an interactive and intuitive visualization of which states across the country recorded a significantly lower sales volume compared to the national average.
Which addition to the company's QuickSight dashboard will meet this requirement?

A. A geospatial color-coded chart of sales volume data across the country.
B. A pivot table of sales volume data summed up at the state level.
C. A drill-down layer for state-level sales volume data.
D. A drill through to other dashboards containing state-level sales volume data.

Correct Answer: B

Explanation:

Reference: https://docs.aws.amazon.com/quicksight/latest/user/pivot-table.html

AWS Certified Data Analytics – Specialty DAS-C01 – Question091

A large energy company is using Amazon QuickSight to build dashboards and report the historical usage data of its customers. This data is hosted in Amazon Redshift. The reports need access to all the fact tables' billions of records to create aggregation in real time, grouping by multiple dimensions.
A data analyst created the dataset in QuickSight by using a SQL query and not SPICE. Business users have noted that the response time is not fast enough to meet their needs.
Which action would speed up the response time for the reports with the LEAST implementation effort?

A. Use QuickSight to modify the current dataset to use SPICE.
B. Use AWS Glue to create an Apache Spark job that joins the fact table with the dimensions. Load the data into a new table.
C. Use Amazon Redshift to create a materialized view that joins the fact table with the dimensions.
D. Use Amazon Redshift to create a stored procedure that joins the fact table with the dimensions. Load the data into a new table.

Correct Answer: A

Explanation:

Reference: https://docs.aws.amazon.com/quicksight/latest/user/spice.html

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.