Why not prepare for Amazon’s AWS Data Analytics – Specialty certification with these questions & test your knowledge of Amazon Redshift?
Find questions on the following topic(s) –
AWS Data Analytics – Quiz 3: Redshift
1 / 6
At the end of the month, data analyst teams run end of month reporting and ad hoc analysis consisting long, complex queries, creating a spike in read usage. Queries are running slowly. Automatic Workload Management (WLM) and Short Queue Acceleration (SQA) are in place but have not fixed the problem.
Which of the following are the most cost effective and least disruptive means of scaling to meet demand?
We’re scaling for READ, not write, so both Elastic and Classic resize which will scale both READ & WRITE capacity are unnecessary
A) Incorrect -We’re scaling for READ, not write but Elastic will scale both READ & WRITE capacity and there will be some service interruption.B) Correct -“With the Concurrency Scaling feature, you can support virtually unlimited concurrent users and concurrent queries, with consistently fast query performance. When concurrency scaling is enabled, Amazon Redshift automatically adds additional cluster capacity when you need it to process an increase in concurrent read queries. Write operations continue as normal on your main clusterhttps://docs.aws.amazon.com/redshift/latest/dg/concurrency-scaling.html
C) Incorrect – Classic resize would be for scaling to meet an ongoing increase and read and write capacity need. Whilst the snapshot approach reduces down-time, the new cluster won’t be available for hours – days.
D) Correct – “With Amazon Redshift, you can already scale quickly in three ways. First, you can query data in your Amazon S3 data lakes in place using Amazon Redshift Spectrum, without needing to load it into the cluster. This flexibility lets you analyze growing data volumes without waiting for extract, transform, and load (ETL) jobs or adding more storage capacity” –https://aws.amazon.com/blogs/big-data/scale-your-amazon-redshift-clusters-up-and-down-in-minutes-to-get-the-performance-you-need-when-you-need-it/
2 / 6
Which of the following are best practice when loading data to Redshift?
A) Incorrect – “When you load all the data from a single large file, Amazon Redshift is forced to perform a serialized load, which is much slower. Split your load data files so that the files are about equal size, between 1 MB and 1 GB after compression. For optimum parallelism, the ideal size is between 1 MB and 125 MB after compression. The number of files should be a multiple of the number of slices in your cluster.”B) Correct –C) Incorrect – “Use a single COPY command to load data in parallel from S3, EMR, DynamoDB…”D) Correct
3 / 6
Which of the following are true regarding loading of data into Redshift from DynamoDB?
A) Incorrect – https://docs.aws.amazon.com/redshift/latest/dg/t_Loading-data-from-dynamodb.htmlB) Correct – see the above linkC) Incorrect – only STRING and NUMBER data types are supportedD) Correct – see the above link
4 / 6
Which of the following are true regarding data merge (‘upsert’) operations and Amazon Redshift?
“While Amazon Redshift does not support a single merge, or upsert, command to update a table from a single data source, you can perform a merge operation by creating a staging table and then using one of the methods described “
A) CorrectB) IncorrectC) CorrectD) Incorrect
5 / 6
You have created two tables on Redshift which will be used to support data analysis and will be frequently accessed, holding a 1-2GB data set. Often, analytics queries will require joins on these tables. Which style would ensure the appropriate data distribution for these tables?
A) Incorrect – AUTO may select the correct distribution style, but it’s not guaranteed to select ‘KEY’B) Incorrect – EVEN ” distribution is appropriate when a table does not participate in joins or when there is not a clear choice between KEY distribution and ALL distribution.”C) Correct – KEYD) Incorrect – ALL “ALL distribution multiplies the storage required by the number of nodes in the cluster, and so it takes much longer to load, update, or insert data into multiple tables. ALL distribution is appropriate only for relatively slow moving tables; that is, tables that are not updated frequently or extensively.”
6 / 6
The data analysis team wish to create new insights on data currently located in S3, Amazon Aurora and Redshift, to be published in a new Business Intelligence application. Which of the following is the most straight-forward solution?
B) Incorrect – Glue can make data from S3, RDS (JDBC), DynamoDB accessible to Redshift Spectrum (via external tables),https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.htbut there are numerous additional configuration steps (and costs) compared to option A)
C)Incorrect – DBlink allows query interworking between Redshift and Amazon RDS (Aurora, PostgreSQL) not S3D) Incorrect – There is no need to load the data into Redshift to accomplish the ability to query across all three data sources, and DMS is not applicable to S3.
Your score is
The average score is 27%
Your email address will not be published. Required fields are marked *
Save my name, email, and website in this browser for the next time I comment.