AWS Certified Data Analytics – Specialty DAS-C01 – Question140

A large media company is looking for a cost-effective storage and analysis solution for its daily media recordings formatted with embedded metadata. Daily data sizes range between 10-12 TB with stream analysis required on timestamps, video resolutions, file sizes, closed captioning, audio languages, and more. Based on the analysis, processing the datasets is estimated to take between 30-180 minutes depending on the underlying framework selection. The analysis will be done by using business intelligence (BI) tools that can be connected to data sources with AWS or Java Database Connectivity (JDBC) connectors.
Which solution meets these requirements?

A.
Store the video files in Amazon DynamoDB and use AWS Lambda to extract the metadata from the files and load it to DynamoDB. Use DynamoDB to provide the data to be analyzed by the BI tools.
B. Store the video files in Amazon S3 and use AWS Lambda to extract the metadata from the files and load it to Amazon S3. Use Amazon Athena to provide the data to be analyzed by the BI tools.
C. Store the video files in Amazon DynamoDB and use Amazon EMR to extract the metadata from the files and load it to Apache Hive. Use Apache Hive to provide the data to be analyzed by the BI tools.
D. Store the video files in Amazon S3 and use AWS Glue to extract the metadata from the files and load it to Amazon Redshift. Use Amazon Redshift to provide the data to be analyzed by the BI tools.

Correct Answer: C

AWS Certified Data Analytics – Specialty DAS-C01 – Question139

A large ecommerce company uses Amazon DynamoDB with provisioned read capacity and auto scaled write capacity to store its product catalog. The company uses Apache HiveQL statements on an Amazon EMR cluster to query the DynamoDB table. After the company announced a sale on all of its products, wait times for each query have increased. The data analyst has determined that the longer wait times are being caused by throttling when querying the table.
Which solution will solve this issue?

A.
Increase the size of the EMR nodes that are provisioned.
B. Increase the number of EMR nodes that are in the cluster.
C. Increase the DynamoDB table's provisioned write throughput.
D. Increase the DynamoDB table's provisioned read throughput.

Correct Answer: C

AWS Certified Data Analytics – Specialty DAS-C01 – Question138

An invoice tracking application stores invoice images within an Amazon S3 bucket. After invoice images are uploaded, they are accessed often by applications users for 30 days. After 30 days, the invoice images are rarely accessed. The application guarantees uploaded images will never be deleted and will be immediately available upon request by users. The application has 1 million users and 20,000 read requests each second during peak usage.
Which combination of storage solutions MOST cost-effectively meet these requirements? (Choose two.)

A.
Store the invoice images by using the S3 Standard storage class. Apply a lifecycle policy to transition the images to the S3 Standard-Infrequent Access (S3 Standard-IA) storage class 30 days after upload.
B. Create one S3 key prefix for each user in the S3 bucket and store the invoice images under the user- specific prefix.
C. Store the invoice images by using the S3 Standard storage class. Apply a lifecycle policy to transition the images to the S3 Glacier Instant Retrieval storage class 30 days after upload.
D. Store the invoice images by using the S3 Standard storage class. Apply a lifecycle policy to transition the images to the S3 One Zone-Infrequent Access (S3 One Zone-IA) storage class 30 days after upload.
E. Create one S3 key prefix for each day in the S3 bucket and store the invoice images under the upload date- specific prefix.

Correct Answer: BD

AWS Certified Data Analytics – Specialty DAS-C01 – Question137

An educational technology company is running an online assessment application that allows thousands of students to concurrently take assessments on the company's platform. The application uses a combination of relational databases running on an Amazon Aurora PostgreSQL DB cluster and Amazon DynamoDB tables for storing data. Users reported issues with application performance during a recent large-scale online assessment. As a result, the company wants to design a solution that captures metrics from all databases in a centralized location and queries the metrics to identify issues with performance.
How can this solution be designed with the LEAST operational overhead?

A.
Configure AWS Database Migration Service (AWS DMS) to copy the database logs to an Amazon S3 bucket. Schedule an AWS Glue crawler to periodically populate an AWS Glue table. Query the AWS Glue table with Amazon Athena.
B. Configure an Amazon CloudWatch metric stream with an Amazon Kinesis Firehose delivery stream destination that stores the data in an Amazon S3 bucket. Schedule an AWS Glue crawler to periodically populate an AWS Glue table. Query the AWS Glue table with Amazon Athena.
C. Create an Apache Kafka cluster on Amazon EC2. Configure a Java Database Connectivity (JDBC) connector for Kafka Connect on each database to capture and stream the logs to a single Amazon CloudWatch log group. Query the CloudWatch log group with Amazon Athena.
D. Install a server on Amazon EC2 to capture logs from Amazon RDS and DynamoDB by using Java Database Connectivity (JDBC) connectors. Stream the logs to an Amazon Kinesis Data Firehose delivery stream that stores the data in an Amazon S3 bucket. Query the output logs in the S3 bucket by using Amazon Athena.

Correct Answer: B

AWS Certified Data Analytics – Specialty DAS-C01 – Question136

A data analyst is using Amazon QuickSight for data visualization across multiple datasets that are generated by applications. Each application stores files within a separate Amazon S3 bucket. The data analyst is using the AWS Glue Data Catalog as a central catalog across all application data in Amazon S3.
A new application stores its data in a separate S3 bucket. After updating the Data Catalog to include the new application data source, the data analyst creates a new QuickSight data source from an Amazon Athena table.
However, the import into SPICE does not complete.
How can the data analyst resolve this issue?

A.
Edit the permissions for the Data Catalog from within the QuickSight console.
B. Edit the permissions for the new S3 bucket from within the QuickSight console.
C. Edit the permissions for the Data Catalog from within the AWS Glue console.
D. Edit the permissions for the Athena table from within the QuickSight console.

Correct Answer: B

AWS Certified Data Analytics – Specialty DAS-C01 – Question135

A retail company's ecommerce website recently experienced performance issues when there was a one-day sale. The site reliability engineer wants to query all the web logs from the time of the sale to troubleshoot the performance issues. The web logs are stored in an Amazon S3 bucket.
Which solution MOST cost-effectively meets these requirements?

A.
Create an Amazon Redshift cluster and use Amazon Redshift Spectrum to query the web logs.
B. Use Amazon S3 Select to query the web logs.
C. Load the web logs from Amazon S3 to an Amazon DynamoDB table and query the table.
D. Use Amazon Athena to query the web logs.

Correct Answer: A

AWS Certified Data Analytics – Specialty DAS-C01 – Question134

A data architect at a large financial institution is building a data platform on AWS with the intent of implementing fraud detection by identifying duplicate customer accounts. The fraud detection algorithm will run in a batch mode to identify when a newly created account matches one for a user that was previously fraudulent.
Which approach MOST cost-effectively meets these requirements?

A.
Build a custom deduplication script by using Apache Spark on an Amazon EMR cluster. Use PySpark to compare the data frames that represent the new customers and the fraudulent customer set to identify matches.
B. Load the data to an Amazon Redshift cluster. Use custom SQL to build deduplication logic.
C. Load the data to Amazon S3 to form the basis of a data lake. Use Amazon Athena to build a deduplication script.
D. Load the data to Amazon S3. Use the AWS Glue FindMatches transform to implement deduplication logic.

Correct Answer: D

AWS Certified Data Analytics – Specialty DAS-C01 – Question133

A retail company stores order invoices in an Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster. Indices on the cluster are created monthly. Once a new month begins, no new writes are made to any of the indices from the previous months. The company has been expanding the storage on the Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster to avoid running out of space, but the company wants to reduce costs. Most searches on the cluster are on the most recent 3 months of data, while the audit team requires infrequent access to older data to generate periodic reports. The most recent 3 months of data must be quickly available for queries, but the audit team can tolerate slower queries if the solution saves on cluster costs.
Which of the following is the MOST operationally efficient solution to meet these requirements?

A.
Archive indices that are older than 3 months by using Index State Management (ISM) to create a policy to store the indices in Amazon S3 Glacier Instant Retrieval. When the audit team requires the archived data, restore the archived indices back to the Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster.
B. Archive indices that are older than 3 months by taking manual snapshots and storing the snapshots in Amazon S3. When the audit team requires the archived data, restore the archived indices back to the Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster.
C. Archive indices that are older than 3 months by using Index State Management (ISM) to create a policy to migrate the indices to Amazon OpenSearch Service (Amazon Elasticsearch Service) UltraWarm storage.
D. Archive indices that are older than 3 months by using Index State Management (ISM) to create a policy to migrate the indices to Amazon OpenSearch Service (Amazon Elasticsearch Service) UltraWarm storage. When the audit team requires the older data, migrate the indices in UltraWarm storage back to hot storage.

Correct Answer: C

AWS Certified Data Analytics – Specialty DAS-C01 – Question132

An online advertising company wants to perform sentiment analysis of social media data to measure the success of online advertisements. The company wants to implement an end-to-end streaming solution to continuously ingest data from various social networks, clean and transform the streaming data in near-real time, and make the data available for analytics and visualization with Amazon QuickSight. The company wants a solution that is easy to implement and manage so it can design better analytics solutions instead of provisioning and maintaining infrastructure.
Which solution meets these requirements with the LEAST amount of operational effort?

A.
Use Amazon Kinesis Data Firehose to ingest the data. Author an AWS Glue streaming ETL job to transform the ingested data. Load the transformed data into an Amazon Redshift table.
B. Use Apache Kafka running on Amazon EC2 instances to ingest the data. Create an Amazon EMR Spark job to transform the ingested data. Use the COPY command to load the transformed data into an Amazon Redshift table.
C. Use Amazon Managed Streaming for Apache Kafka (Amazon MSK) to ingest the data. Create an Amazon EMR Spark job to transform the ingested data. Use the COPY command to load the transformed data into an Amazon Redshift table.
D. Use Amazon Kinesis Data Streams to ingest the data. Author an AWS Glue streaming ETL job to transform the ingested data. Load the transformed data into an Amazon Redshift table.

Correct Answer: C

AWS Certified Data Analytics – Specialty DAS-C01 – Question131

A company is using a single master node in an Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster to provide a search API to its front-end applications. The company configured an automated process on AWS that monitors the OpenSearch Service cluster and automatically adds data nodes to scale the cluster, when needed. During the initial load testing, the system reacted to scaling events properly by adding data nodes, but every time a new data node is added, the company experiences a blue/green deployment that creates a disruption in service. The company wants to create highly available solution that will prevent these service disruptions.
Which solution meets these requirements?

A.
Increase the number of OpenSearch Service master nodes from one to two.
B. Configure multi-zone awareness on the OpenSearch Service cluster.
C. Configure the OpenSearch Service cluster to use three dedicated master nodes.
D. Disable OpenSearch Service Auto-Tune and roll back its changes.

Correct Answer: B