AWS Certified Data Analytics – Specialty DAS-C01 – Question090

A technology company is creating a dashboard that will visualize and analyze time-sensitive data. The data will come in through Amazon Kinesis Data Firehose with the buffer interval set to 60 seconds. The dashboard must support near-real-time data.
Which visualization solution will meet these requirements?

A. Select Amazon OpenSearch Service (Amazon Elasticsearch Service) as the endpoint for Kinesis Data Firehose. Set up an OpenSearch Dashboards (Kibana) dashboard using the data in Amazon OpenSearch Service (Amazon Elasticsearch Service) with the desired analyses and visualizations.
B. Select Amazon S3 as the endpoint for Kinesis Data Firehose. Read data into an Amazon SageMaker Jupyter notebook and carry out the desired analyses and visualizations.
C. Select Amazon Redshift as the endpoint for Kinesis Data Firehose. Connect Amazon QuickSight with SPICE to Amazon Redshift to create the desired analyses and visualizations.
D. Select Amazon S3 as the endpoint for Kinesis Data Firehose. Use AWS Glue to catalog the data and Amazon Athena to query it. Connect Amazon QuickSight with SPICE to Athena to create the desired analyses and visualizations.

Correct Answer: A

Explanation:

Reference: https://aws.amazon.com/blogs/big-data/ingest-streaming-data-into-amazon-elasticsearch-service-within-the-privacy-of-your-vpc-with-amazon-kinesis-data-firehose/

AWS Certified Data Analytics – Specialty DAS-C01 – Question089

A retail company leverages Amazon Athena for ad-hoc queries against an AWS Glue Data Catalog. The data analytics team manages the data catalog and data access for the company. The data analytics team wants to separate queries and manage the cost of running those queries by different workloads and teams. Ideally, the data analysts want to group the queries run by different users within a team, store the query results in individual Amazon S3 buckets specific to each team, and enforce cost constraints on the queries run against the Data Catalog.
Which solution meets these requirements?

A. Create IAM groups and resource tags for each team within the company. Set up 1AM policies that control user access and actions on the Data Catalog resources.
B. Create Athena resource groups for each team within the company and assign users to these groups. Add S3 bucket names and other query configurations to the properties list for the resource groups.
C. Create Athena workgroups for each team within the company. Set up IAM workgroup policies that control user access and actions on the workgroup resources.
D. Create Athena query groups for each team within the company and assign users to the groups.

Correct Answer: C

Explanation:

Reference: https://docs.aws.amazon.com/athena/latest/ug/workgroups.html

AWS Certified Data Analytics – Specialty DAS-C01 – Question088

A power utility company is deploying thousands of smart meters to obtain real-time updates about power consumption. The company is using Amazon Kinesis Data Streams to collect the data streams from smart meters. The consumer application uses the Kinesis Client Library (KCL) to retrieve the stream data. The company has only one consumer application.
The company observes an average of 1 second of latency from the moment that a record is written to the stream until the record is read by a consumer application. The company must reduce this latency to 500 milliseconds.
Which solution meets these requirements?

A. Use enhanced fan-out in Kinesis Data Streams.
B. Increase the number of shards for the Kinesis data stream.
C. Reduce the propagation delay by overriding the KCL default settings.
D. Develop consumers by using Amazon Kinesis Data Firehose.

Correct Answer: C

Explanation:

Reference: https://docs.aws.amazon.com/streams/latest/dev/kinesis-low-latency.html

AWS Certified Data Analytics – Specialty DAS-C01 – Question087

A retail company is building its data warehouse solution using Amazon Redshift. As a part of that effort, the company is loading hundreds of files into the fact table created in its Amazon Redshift cluster. The company wants the solution to achieve the highest throughput and optimally use cluster resources when loading data into the company's fact table.
How should the company meet these requirements?

A. Use multiple COPY commands to load the data into the Amazon Redshift cluster.
B. Use S3DistCp to load multiple files into the Hadoop Distributed File System (HDFS) and use an HDFS connector to ingest the data into the Amazon Redshift cluster.
C. Use LOAD commands equal to the number of Amazon Redshift cluster nodes and load the data in parallel into each node.
D. Use a single COPY command to load the data into the Amazon Redshift cluster.

Correct Answer: D

Explanation:

Reference: https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-single-copy-command.html

AWS Certified Data Analytics – Specialty DAS-C01 – Question086

Three teams of data analysts use Apache Hive on an Amazon EMR cluster with the EMR File System (EMRFS) to query data stored within each team's Amazon S3 bucket. The EMR cluster has Kerberos enabled and is configured to authenticate users from the corporate Active Directory. The data is highly sensitive, so access must be limited to the members of each team.
Which steps will satisfy the security requirements?

A. For the EMR cluster Amazon EC2 instances, create a service role that grants no access to Amazon S3. Create three additional IAM roles, each granting access to each team's specific bucket. Add the additional IAM roles to the cluster's EMR role for the EC2 trust policy. Create a security configuration mapping for the additional IAM roles to Active Directory user groups for each team.
B. For the EMR cluster Amazon EC2 instances, create a service role that grants no access to Amazon S3. Create three additional IAM roles, each granting access to each team's specific bucket. Add the service role for the EMR cluster EC2 instances to the trust policies for the additional IAM roles. Create a security configuration mapping for the additional IAM roles to Active Directory user groups for each team.
C. For the EMR cluster Amazon EC2 instances, create a service role that grants full access to Amazon S3. Create three additional IAM roles, each granting access to each team's specific bucket. Add the service role for the EMR cluster EC2 instances to the trust policies for the additional IAM roles. Create a security configuration mapping for the additional IAM roles to Active Directory user groups for each team.
D. For the EMR cluster Amazon EC2 instances, create a service role that grants full access to Amazon S3. Create three additional IAM roles, each granting access to each team's specific bucket. Add the service role for the EMR cluster EC2 instances to the trust policies for the base IAM roles. Create a security configuration mapping for the additional IAM roles to Active Directory user groups for each team.

Correct Answer: B

Explanation:

Explanation:
When a cluster application makes a request to Amazon S3 through EMRFS, EMRFS evaluates role mappings in the top-down order that they appear in the security configuration. If a request made through EMRFS doesn’t match any identifier, EMRFS falls back to using the service role for cluster EC2 instances. For this reason, we recommend that the policies attached to this role limit permissions to Amazon S3.
Reference: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-emrfs-iam-roles.html
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-role-for-ec2.html

AWS Certified Data Analytics – Specialty DAS-C01 – Question085

A company is storing millions of sales transaction records in Amazon Redshift. A data analyst must perform an analysis on sales data. The analysis depends on a subset of customer record data that resides in a Salesforce application. The company wants to transfer the data from Salesforce with the least possible infrastructure setup, coding, and operational effort.
Which solution meets these requirements?

A. Use AWS Glue and the SpringML library to connect Apache Spark with Salesforce and extract the data as a table to Amazon S3 in Apache Parquet format. Query the data by using Amazon Redshift Spectrum.
B. Use Amazon AppFlow to create a flow. Establish a connection and a flow trigger to transfer customer record data from Salesforce to an Amazon Redshift table.
C. Use Amazon API Gateway to configure a Salesforce customer data flow subscription to AWS Lambda events and create tables in Amazon S3 in Apache Parquet format. Query the data by using Amazon Redshift Spectrum.
D. Use Salesforce Data Loader to export the Salesforce customer data as a .csv file and load it into Amazon S3. Query the data by using Amazon Redshift Spectrum.

Correct Answer: B

Explanation:

Reference: https://docs.aws.amazon.com/appflow/latest/userguide/connectors-salesforce-marketing-cloud.html

AWS Certified Data Analytics – Specialty DAS-C01 – Question084

A company is building an analytical solution that includes Amazon S3 as data lake storage and Amazon Redshift for data warehousing. The company wants to use Amazon Redshift Spectrum to query the data that is stored in Amazon S3.
Which steps should the company take to improve performance when the company uses Amazon Redshift Spectrum to query the S3 data files? (Choose three.)

A. Use gzip compression with individual file sizes of 1-5 GB.
B. Use a columnar storage file format.
C. Partition the data based on the most common query predicates.
D. Split the data into KB-sized files.
E. Keep all files about the same size.
F. Use file formats that are not splittable.

Correct Answer: CDE

Explanation:

Reference: https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/

AWS Certified Data Analytics – Specialty DAS-C01 – Question083

A retail company that is based in the United States has launched a global website. The website's historic transaction data is stored in an Amazon Redshift cluster in a VPC in the us-east-1 Region. The company's business intelligence (BI) team wants to enhance user experience by providing a dashboard to visualize trends.
The BI team decides to use Amazon QuickSight to render the dashboards. During development, a team in Japan provisioned QuickSight in the ap-northeast-1 Region. However, the team cannot connect from QuickSight in ap-northeast-1 to the Amazon Redshift cluster in us-east-1.
Which solution will resolve this issue MOST cost-effectively?

A. In the Amazon Redshift console, configure Cross-Region snapshots. Set the destination Region as ap-northeast-1. Restore the Amazon Redshift cluster from the snapshot. Connect to QuickSight in ap-northeast-1.
B. Create a VPC endpoint from the QuickSight VPC to the Amazon Redshift VPC.
C. Create an Amazon Redshift endpoint connection string with Region information in the string. Use this connection string in QuickSight to connect to Amazon Redshift.
D. Create a new security group for the Amazon Redshift cluster in us-east-1. Add an inbound rule that allows access from the appropriate IP address range for the QuickSight servers in ap-northeast-1.

Correct Answer: D

Explanation:

Reference: https://docs.aws.amazon.com/quicksight/latest/user/enabling-access-redshift.html
https://docs.aws.amazon.com/quicksight/latest/user/regions.html

AWS Certified Data Analytics – Specialty DAS-C01 – Question082

A company that monitors weather conditions from remote construction sites is setting up a solution to collect temperature data from the following two weather stations.
– Station A, which has 10 sensors
– Station B, which has five sensors
These weather stations were placed by onsite subject-matter experts.
Each sensor has a unique ID. The data collected from each sensor will be collected using Amazon Kinesis Data Streams.
Based on the total incoming and outgoing data throughput, a single Amazon Kinesis data stream with two shards is created. Two partition keys are created based on the station names. During testing, there is a bottleneck on data coming from Station A, but not from Station B. Upon review, it is confirmed that the total stream throughput is still less than the allocated Kinesis Data Streams throughput.
How can this bottleneck be resolved without increasing the overall cost and complexity of the solution, while retaining the data collection quality requirements?

A. Increase the number of shards in Kinesis Data Streams to increase the level of parallelism.
B. Create a separate Kinesis data stream for Station A with two shards, and stream Station A sensor data to the new stream.
C. Modify the partition key to use the sensor ID instead of the station name.
D. Reduce the number of sensors in Station A from 10 to 5 sensors.

Correct Answer: C

Explanation:

Reference: https://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding.html

AWS Certified Data Analytics – Specialty DAS-C01 – Question081

A company has a business unit uploading .csv files to an Amazon S3 bucket. The company's data platform team has set up an AWS Glue crawler to do discovery, and create tables and schemas. An AWS Glue job writes processed data from the created tables to an Amazon Redshift database. The AWS Glue job handles column mapping and creating the Amazon Redshift table appropriately. When the AWS Glue job is rerun for any reason in a day. duplicate records are introduced into the Amazon Redshift table.
Which solution will update the Redshift table without duplicates when jobs are rerun?

A. Modify the AWS Glue job to copy the rows into a staging table. Add SQL commands to replace the existing rows in the main table as postactions in the DynamicFrameWriter class.
B. Load the previously inserted data into a MySQL database in the AWS Glue job. Perform an upsert operation in MySQL, and copy the results to the Amazon Redshift table.
C. Use Apache Spark's DataFrame dropDuplicates() API to eliminate duplicates and then write the data to Amazon Redshift.
D. Use the AWS Glue ResolveChoice built-in transform to select the most recent value of the column.

Correct Answer: A

Explanation:

Reference: https://aws.amazon.com/premiumsupport/knowledge-center/sql-commands-redshift-glue-job/

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.