AWS Certified Data Analytics – Specialty DAS-C01 – Question080

A company is using Amazon EMR clusters for its workloads. The company manually installs third-party libraries on the clusters by logging in to the primary nodes. A data analyst needs to create an automated solution to replace the manual process.
Which solutions meet these requirements? (Choose two.)

A.
Place the required installation scripts in Amazon S3. Initiate the scripts by using custom bootstrap actions.
B. Place the required installation scripts in Amazon S3. Initiate the scripts through Apache Spark on Amazon EMR
C. Install the required third-party libraries in the existing EMR primary node. Create an AMI out of that primary node. Use that custom AMI to recreate the EMR cluster.
D. Use an Amazon DynamoDB table to store the list of required applications. Initiate an AWS Lambda function with DynamoDB Streams to install the software.
E. Launch an Amazon EC2 instance with Amazon Linux. Install the required third-party libraries on the instance. Create an AMI. Use that AMI to create the EMR cluster.

AWS Certified Data Analytics – Specialty DAS-C01 – Question079

A company with a video streaming website wants to analyze user behavior to make recommendations to users in real time. Clickstream data is being sent to Amazon Kinesis Data Streams and reference data is stored in Amazon S3. The company wants a solution that can use standard SOL queries. The solution must also provide a way to look up pre-calculated reference data while making recommendations.
Which solution meets these requirements?

A.
Use an AWS Glue Python shell job to process incoming data from Kinesis Data Streams. Use the Boto3 library to write data to Amazon Redshift.
B. Use AWS Glue streaming and Scala to process incoming data from Kinesis Data Streams. Use the AWS Glue connector to write data to Amazon Redshift.
C. Use Amazon Kinesis Data Analytics to create an in-application table based upon the reference data. Process incoming data from Kinesis Data Streams. Use a data stream to write results to Amazon Redshift.
D. Use Amazon Kinesis Data Analytics to create an in-application table based upon the reference data. Process incoming data from Kinesis Data Streams. Use an Amazon Kinesis Data Firehose delivery stream to write results to Amazon Redshift.

AWS Certified Data Analytics – Specialty DAS-C01 – Question078

A bank wants to migrate a Teradata data warehouse to the AWS Cloud. The bank needs a solution for reading large amounts of data and requires the highest possible performance. The solution also must maintain the separation of storage and compute.
Which solution meets these requirements?

A.
Use Amazon Athena to query the data in Amazon S3.
B. Use Amazon Redshift with dense compute nodes to query the data in Amazon Redshift managed storage.
C. Use Amazon Redshift with RA3 nodes to query the data in Amazon Redshift managed storage.
D. Use PrestoDB on Amazon EMR to query the data in Amazon S3.

Correct Answer: C

Explanation:

AWS Certified Data Analytics – Specialty DAS-C01 – Question077

A retail company is using an Amazon S3 bucket to host an ecommerce data lake. The company is using AWS Lake Formation to manage the data lake.
A data analytics specialist must provide access to a new business analyst team. The team will use Amazon Athena from the AWS Management Console to query data from existing web_sales and customer tables in the ecommerce database. The team needs read-only access and the ability to uniquely identify customers by using first and last names. However, the team must not be able to see any other personally identifiable data. The table structure is as follows:

Which combination of steps should the data analytics specialist take to provide the required permission by using the principle of least privilege? (Choose three.)

A.
In AWS Lake Formation, grant the business_analyst group SELECT and ALTER permissions for the web_sales table.
B. In AWS Lake Formation, grant the business_analyst group the SELECT permission for the web_sales table.
C. In AWS Lake Formation, grant the business_analyst group the SELECT permission for the customer table. Under columns, choose filter type "Include columns" with columns fisrt_name, last_name, and customer_id.
D. In AWS Lake Formation, grant the business_analyst group SELECT and ALTER permissions for the customer table. Under columns, choose filter type "Include columns" with columns fisrt_name and last_name.
E. Create users under a business_analyst IAM group. Create a policy that allows the lakeformation:GetDataAccess action, the athena:* action, and the glue:Get* action.
F. Create users under a business_analyst IAM group. Create a policy that allows the lakeformation:GetDataAccess action, the athena:* action, and the glue:Get* action. In addition, allow the s3:GetObject action, the s3:PutObject action, and the s3:GetBucketLocation action for the Athena query results S3 bucket.

Correct Answer: BCF

Explanation:

AWS Certified Data Analytics – Specialty DAS-C01 – Question076

A retail company's data analytics team recently created multiple product sales analysis dashboards for the average selling price per product using Amazon QuickSight. The dashboards were created from .csv files uploaded to Amazon S3. The team is now planning to share the dashboards with the respective external product owners by creating individual users in Amazon QuickSight. For compliance and governance reasons, restricting access is a key requirement. The product owners should view only their respective product analysis in the dashboard reports.
Which approach should the data analytics team take to allow product owners to view only their products in the dashboard?

A.
Separate the data by product and use S3 bucket policies for authorization.
B. Separate the data by product and use IAM policies for authorization.
C. Create a manifest file with row-level security.
D. Create dataset rules with row-level security.

AWS Certified Data Analytics – Specialty DAS-C01 – Question075

An ecommerce company stores customer purchase data in Amazon RDS. The company wants a solution to store and analyze historical data. The most recent 6 months of data will be queried frequently for analytics workloads. This data is several terabytes large. Once a month, historical data for the last 5 years must be accessible and will be joined with the more recent data. The company wants to optimize performance and cost.
Which storage solution will meet these requirements?

A.
Create a read replica of the RDS database to store the most recent 6 months of data. Copy the historical data into Amazon S3. Create an AWS Glue Data Catalog of the data in Amazon S3 and Amazon RDS. Run historical queries using Amazon Athena.
B. Use an ETL tool to incrementally load the most recent 6 months of data into an Amazon Redshift cluster. Run more frequent queries against this cluster. Create a read replica of the RDS database to run queries on the historical data.
C. Incrementally copy data from Amazon RDS to Amazon S3. Create an AWS Glue Data Catalog of the data in Amazon S3. Use Amazon Athena to query the data.
D. Incrementally copy data from Amazon RDS to Amazon S3. Load and store the most recent 6 months of data in Amazon Redshift. Configure an Amazon Redshift Spectrum table to connect to all historical data.

Correct Answer: D

Explanation:

Explanation:
The cost-effective way to query across S3 and Amazon RDS is using Amazon redshift spectrum.
Reference: https://docs.aws.amazon.com/redshift/latest/dg/c-using-spectrum.html
https://www.upsolver.com/blog/aws-athena-pricing-redshift-comparison

AWS Certified Data Analytics – Specialty DAS-C01 – Question074

A utility company wants to visualize data for energy usage on a daily basis in Amazon QuickSight. A data analytics specialist at the company has built a data pipeline to collect and ingest the data into Amazon S3. Each day, the data is stored in an individual .csv file in an S3 bucket. This is an example of the naming structure:
20210707_data.csv
20210708_data.csv

To allow for data querying in QuickSight through Amazon Athena, the specialist used an AWS Glue crawler to create a table with the path "s3://powertransformer/20210707_data.csv." However, when the data is queried, it returns zero rows.
How can this issue be resolved?

A.
Modify the IAM policy for the AWS Glue crawler to access Amazon S3.
B. Ingest the files again.
C. Store the files in Apache Parquet format.
D. Update the table path to "s3://powertransformer/".

AWS Certified Data Analytics – Specialty DAS-C01 – Question073

A data analytics specialist has a 50 GB data file in .csv format and wants to perform a data transformation task.
The data analytics specialist is using the Amazon Athena CREATE TABLE AS SELECT (CTAS) statement to perform the transformation. The resulting output will be used to query the data from Amazon Redshift Spectrum.
Which CTAS statement should the data analytics specialist use to provide the MOST efficient performance?

A.


B.

C.

D.

AWS Certified Data Analytics – Specialty DAS-C01 – Question072

A manufacturing company is storing data from its operational systems in Amazon S3. The company's business analysts need to perform one-time queries of the data in Amazon S3 with Amazon Athena. The company needs to access the Athena network from the on-premises network by using a JDBC connection. The company has created a VPC Security policies mandate that requests to AWS services cannot traverse the Internet.
Which combination of steps should a data analytics specialist take to meet these requirements? (Choose two.)

A.
Establish an AWS Direct Connect connection between the on-premises network and the VPC.
B. Configure the JDBC connection to connect to Athena through Amazon API Gateway.
C. Configure the JDBC connection to use a gateway VPC endpoint for Amazon S3.
D. Configure the JDBC connection to use an interface VPC endpoint for Athena.
E. Deploy Athena within a private subnet.

Correct Answer: AE

Explanation:

Explanation:
AWS Direct Connect makes it easy to establish a dedicated connection from an on-premises network to one or more VPCs in the same region.
Reference: https://docs.aws.amazon.com/whitepapers/latest/aws-vpc-connectivity-options/aws-direct-connect.html
https://stackoverflow.com/questions/68798311/aws-athena-connect-from-lambda

AWS Certified Data Analytics – Specialty DAS-C01 – Question071

A transportation company uses IoT sensors attached to trucks to collect vehicle data for its global delivery fleet.
The company currently sends the sensor data in small .csv files to Amazon S3. The files are then loaded into a 10-node Amazon Redshift cluster with two slices per node and queried using both Amazon Athena and Amazon Redshift. The company wants to optimize the files to reduce the cost of querying and also improve the speed of data loading into the Amazon Redshift cluster.
Which solution meets these requirements?

A.
Use AWS Glue to convert all the files from .csv to a single large Apache Parquet file. COPY the file into Amazon Redshift and query the file with Athena from Amazon S3.
B. Use Amazon EMR to convert each .csv file to Apache Avro. COPY the files into Amazon Redshift and query the file with Athena from Amazon S3.
C. Use AWS Glue to convert the files from .csv to a single large Apache ORC file. COPY the file into Amazon Redshift and query the file with Athena from Amazon S3.
D. Use AWS Glue to convert the files from .csv to Apache Parquet to create 20 Parquet files. COPY the files into Amazon Redshift and query the files with Athena from Amazon S3.

Correct Answer: D