A transport company wants to track vehicular movements by capturing geolocation records. The records are 10 B in size and up to 10,000 records are captured each second. Data transmission delays of a few minutes are acceptable, considering unreliable network conditions. The transport company decided to use Amazon Kinesis Data Streams to ingest the data. The company is looking for a reliable mechanism to send data to Kinesis Data Streams while maximizing the throughput efficiency of the Kinesis shards.
Which solution will meet the company's requirements? A. Kinesis Agent B. Kinesis Producer Library (KPL) C. Kinesis Data Firehose D. Kinesis SDK
A company owns manufacturing facilities with Internet of Things (IoT) devices installed to monitor safety data.
The company has configured an Amazon Kinesis data stream as a source for an Amazon Kinesis Data Firehose delivery stream, which outputs data to Amazon S3. The company's operations team wants to gain insights from the IoT data to monitor data quality at ingestion. The insights need to be derived in near-real time, and the output must be logged to Amazon DynamoDB for further analysis.
Which solution meets these requirements? A. Create an Amazon Kinesis Data Analytics for SQL application to read and analyze the data in the data stream. Add an output configuration so that everything written to an in-application stream persists in a DynamoDB table. B. Create an Amazon Kinesis Data Analytics for SQL application to read and analyze the data in the data stream. Add an output configuration so that everything written to an in-application stream is passed to an AWS Lambda function that saves the data in a DynamoDB table as persistent data. C. Configure an AWS Lambda function to analyze the data in the Kinesis Data Firehose delivery stream. Save the output to a DynamoDB table. D. Configure an AWS Lambda function to analyze the data in the Kinesis Data Firehose delivery stream and save the output to an S3 bucket. Schedule an AWS Glue job to periodically copy the data from the bucket to a DynamoDB table.
A public sector organization ingests large datasets from various relational databases into an Amazon S3 data lake on a daily basis. Data analysts need a mechanism to profile the data and diagnose data quality issues after the data is ingested into Amazon S3. The solution should allow the data analysts to visualize and explore the data quality metrics through a user interface.
Which set of steps provide a solution that meets these requirements? A. Create a new AWS Glue DataBrew dataset for each dataset in the S3 data lake. Create a new DataBrew project for each dataset. Create a profile job for each project and schedule it to run daily. Instruct the data analysts to explore the data quality metrics by using the DataBrew console. B. Create a new AWS Glue ETL job that uses the Deequ Spark library for data validation and schedule the ETL job to run daily. Store the output of the ETL job within an S3 bucket. Instruct the data analysts to query and visualize the data quality metrics by using the Amazon Athena console. C. Schedule an AWS Lambda function to run daily by using Amazon EventBridge (Amazon CloudWatch Events). Configure the Lambda function to test the data quality of each object and store the results in an S3 bucket. Create an Amazon QuickSight dashboard to query and visualize the results. Instruct the data analysts to explore the data quality metrics using QuickSight. D. Schedule an AWS Step Functions workflow to run daily by using Amazon EventBridge (Amazon CloudWatch Events). Configure the steps by using AWS Lambda functions to perform the data quality checks and update the catalog tags in the AWS Glue Data Catalog with the results. Instruct the data analysts to explore the data quality metrics using the Data Catalog console.
A company wants to improve user satisfaction for its smart home system by adding more features to its recommendation engine. Each sensor asynchronously pushes its nested JSON data into Amazon Kinesis Data Streams using the Kinesis Producer Library (KPL) in Java. Statistics from a set of failed sensors showed that, when a sensor is malfunctioning, its recorded data is not always sent to the cloud.
The company needs a solution that offers near-real-time analytics on the data from the most updated sensors.
Which solution enables the company to meet these requirements? A. Set the RecordMaxBufferedTime property of the KPL to "-1" to disable the buffering on the sensor side. Use Kinesis Data Analytics to enrich the data based on a company-developed anomaly detection SQL script. Push the enriched data to a fleet of Kinesis data streams and enable the data transformation feature to flatten the JSON file. Instantiate a dense storage Amazon Redshift cluster and use it as the destination for the Kinesis Data Firehose delivery stream. B. Update the sensors code to use the PutRecord/PutRecords call from the Kinesis Data Streams API with the AWS SDK for Java. Use Kinesis Data Analytics to enrich the data based on a company-developed anomaly detection SQL script. Direct the output of KDA application to a Kinesis Data Firehose delivery stream, enable the data transformation feature to flatten the JSON file, and set the Kinesis Data Firehose destination to an Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster. C. Set the RecordMaxBufferedTime property of the KPL to "0" to disable the buffering on the sensor side. Connect for each stream a dedicated Kinesis Data Firehose delivery stream and enable the data transformation feature to flatten the JSON file before sending it to an Amazon S3 bucket. Load the S3 data into an Amazon Redshift cluster. D. Update the sensors code to use the PutRecord/PutRecords call from the Kinesis Data Streams API with the AWS SDK for Java. Use AWS Glue to fetch and process data from the stream using the Kinesis Client Library (KCL). Instantiate an Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster and use AWS Lambda to directly push data into it.
A banking company wants to collect large volumes of transactional data using Amazon Kinesis Data Streams for real-time analytics. The company uses PutRecord to send data to Amazon Kinesis, and has observed network outages during certain times of the day. The company wants to obtain exactly once semantics for the entire processing pipeline.
What should the company do to obtain these characteristics? A. Design the application so it can remove duplicates during processing by embedding a unique ID in each record. B. Rely on the processing semantics of Amazon Kinesis Data Analytics to avoid duplicate processing of events. C. Design the data producer so events are not ingested into Kinesis Data Streams multiple times. D. Rely on the exactly once processing semantics of Apache Flink and Apache Spark Streaming included in Amazon EMR.
An ecommerce company uses Amazon Aurora PostgreSQL to process and store live transactional data and uses Amazon Redshift for its data warehouse solution. A nightly ETL job has been implemented to update the Redshift cluster with new data from the PostgreSQL database. The business has grown rapidly and so has the size and cost of the Redshift cluster. The company's data analytics team needs to create a solution to archive historical data and only keep the most recent 12 months of data in Amazon Redshift to reduce costs. Data analysts should also be able to run analytics queries that effectively combine data from live transactional data in PostgreSQL, current data in Redshift, and archived historical data.
Which combination of tasks will meet these requirements? (Choose three.) A. Configure the Amazon Redshift Federated Query feature to query live transactional data in the PostgreSQL database. B. Configure Amazon Redshift Spectrum to query live transactional data in the PostgreSQL database. C. Schedule a monthly job to copy data older than 12 months to Amazon S3 by using the UNLOAD command, and then delete that data from the Redshift cluster. Configure Amazon Redshift Spectrum to access historical data in Amazon S3. D. Schedule a monthly job to copy data older than 12 months to Amazon S3 Glacier Flexible Retrieval by using the UNLOAD command, and then delete that data from the Redshift cluster. Configure Redshift Spectrum to access historical data with S3 Glacier Flexible Retrieval. E. Create a late-binding view in Amazon Redshift that combines live, current, and historical data from different sources. F. Create a materialized view in Amazon Redshift that combines live, current, and historical data from different sources.
A company that produces network devices has millions of users. Data is collected from the devices on an hourly basis and stored in an Amazon S3 data lake.
The company runs analyses on the last 24 hours of data flow logs for abnormality detection and to troubleshoot and resolve user issues. The company also analyzes historical logs dating back 2 years to discover patterns and look for improvement opportunities.
The data flow logs contain many metrics, such as date, timestamp, source IP, and target IP. There are about 10 billion events every day.
How should this data be stored for optimal performance? A. In Apache ORC partitioned by date and sorted by source IP B. In compressed .csv partitioned by date and sorted by source IP C. In Apache Parquet partitioned by source IP and sorted by date D. In compressed nested JSON partitioned by source IP and sorted by date
A data analytics specialist is building an automated ETL ingestion pipeline using AWS Glue to ingest compressed files that have been uploaded to an Amazon S3 bucket. The ingestion pipeline should support incremental data processing.
Which AWS Glue feature should the data analytics specialist use to meet this requirement? A. Workflows B. Triggers C. Job bookmarks D. Classifiers
An online food delivery company wants to optimize its storage costs. The company has been collecting operational data for the last 10 years in a data lake that was built on Amazon S3 by using a Standard storage class. The company does not keep data that is older than 7 years. The data analytics team frequently uses data from the past 6 months for reporting and runs queries on data from the last 2 years about once a month. Data that is more than 2 years old is rarely accessed and is only used for audit purposes.
Which combination of solutions will optimize the company's storage costs? (Choose two.) A. Create an S3 Lifecycle configuration rule to transition data that is older than 6 months to the S3 Standard- Infrequent Access (S3 Standard-IA) storage class. Create another S3 Lifecycle configuration rule to transition data that is older than 2 years to the S3 Glacier Deep Archive storage class. B. Create an S3 Lifecycle configuration rule to transition data that is older than 6 months to the S3 One Zone- Infrequent Access (S3 One Zone-IA) storage class. Create another S3 Lifecycle configuration rule to transition data that is older than 2 years to the S3 Glacier Flexible Retrieval storage class. C. Use the S3 Intelligent-Tiering storage class to store data instead of the S3 Standard storage class. D. Create an S3 Lifecycle expiration rule to delete data that is older than 7 years. E. Create an S3 Lifecycle configuration rule to transition data that is older than 7 years to the S3 Glacier Deep Archive storage class.
A financial company uses Amazon Athena to query data from an Amazon S3 data lake. Files are stored in the S3 data lake in Apache ORC format. Data analysts recently introduced nested fields in the data lake ORC files, and noticed that queries are taking longer to run in Athena. A data analysts discovered that more data than what is required is being scanned for the queries.
What is the MOST operationally efficient solution to improve query performance? A. Flatten nested data and create separate files for each nested dataset. B. Use the Athena query engine V2 and push the query filter to the source ORC file. C. Use Apache Parquet format instead of ORC format. D. Recreate the data partition strategy and further narrow down the data filter criteria.
Correct Answer: C
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.