- [Morgan] Time for another sample question for this domain. This go around, we are focusing on the topic, choose high-performing database solutions for a workload. The question reads: A company is building a distributed application, which will send sensor IoT data, including weather conditions and wind speed from wind turbines, to the AWS Cloud for further processing. As the nature of data is spiky, the application needs to be able to scale. It's important to store the streaming data in a key-value database, and then send over to a centralized data lake where it can be transformed, analyzed, and combined with diverse organizational datasets to derive meaningful insights and make predictions. Which combination of solutions would accomplish the business need with minimal operational overhead? Select two. I am immediately focused in on the phrase, data is spiky, as well as, the need for data to be stored in a key-value database. Then, I'm focused on the fact that we need to send this data to a data lake for analyzing. Finally, I'm seeing here that we need to find a solution that has minimal operational overhead, and that we need to select two answers. Okay, let's take a look at the responses. A, Configure Amazon Kinesis, delivering streaming data to an Amazon S3 data lake. B, Use Amazon DocumentDB to store IoT sensor data. C, Write AWS Lambda functions delivering streaming data to Amazon S3. D, Use Amazon DynamoDB to store the IoT sensor data and enable DynamoDB Streams. E, Use Amazon Kinesis, delivering streaming data to Amazon Redshift, and enable Redshift Spectrum. We will now pause to allow you to review the stem and the responses. All right, you have until I count down from three. Pause now, if you need more time. Three, two, one. The keys are A and D. Let's discuss these responses, and why they are correct. First off, knowing that DynamoDB is a key-value database that can scale automatically and handle spiky access patterns is essential to getting this question correct. DynamoDB is a great choice for storing the sensor data, as it scales up easily without overhead, and it also allows you to store key-value data easily in tables. Then, the second thing to think about is how you will get each entry into the DynamoDB table to also be sent to the data lake. This is where DynamoDB Streams comes in. DynamoDB Streams capture item-level changes for data in DynamoDB tables in a stream. And once the changes are in the stream, you can then create solutions that process those changes, or replicate those changes, to other data stores, like a data lake, for example. So, response D, Use Amazon DynamoDB to store the IoT sensor data and enable DynamoDB Streams, is a correct response. This doesn't solve the entire problem, though. In our use case, we now need to find a solution to send that data to a data lake. That is where answer A comes in, Configure Amazon Kinesis, delivering streaming data to an Amazon S3 data lake. You can configure Amazon Kinesis to process or deliver data to DynamoDB Streams, and Kinesis can deliver data to Amazon S3. Amazon S3 is a very common service to use to host data lakes. So, knowing these three services have integrations, and are also all serverless, this solution not only solves the problem, but it will also require the least amount of operational overhead. Now, to review the incorrect responses. B, Use Amazon DocumentDB to store IoT sensor data. This is incorrect because DocumentDB is not a key-value database. So, knowing that alone, you can disqualify this answer. Next is C, Write AWS Lambda functions, delivering streaming data to Amazon S3. Now, this is a bit tricky because you could do this and invoke the Lambda function to run when the data is added to the DynamoDB stream. This could totally work. The issue, however, is that it requires more operational overhead than using Kinesis to perform this task, since writing a Lambda function would require some custom code, so this is incorrect. Finally, there is E, Use Amazon Kinesis delivering streaming data to Amazon Redshift and enable Redshift Spectrum. This is also incorrect, because although you could use Kinesis to deliver the data to Redshift, Amazon S3 is a better choice for a data lake in this scenario, because in Amazon S3, you can combine data from many different sources for analysis easily, requiring less operational overhead. All right, that is it for this one. Hope you did well, and again, make sure you follow up on any knowledge gaps you found you have while attempting to break down this sample question.