Redshift

Overview

Redshift is an SQL-based data warehouse service that can handle running complex queries over petabytes of data. For historical data stored in large quantities, it can make analysis and processing much more efficient. This is achieved by storing data in columns, distributing the workload of a query across several nodes to implement parallel processing, and specializing in read operations rather than transactional ones.

Data imported into Redshift must be structured to fit the schema. Though this structuring is required, Redshift's analytical capabilities can be used with unstructured data like a data lake via Redshift Spectrum. For instance, if an S3 bucket is used as a data lake, Redshift Spectrum allows this data to be queried without loading it into Redshift. Furthermore, AWS Glue facilitates the creation of data pipelines that allow for the data in the S3 bucket to be extracted, formatted, and loaded (ETL) into Redshift.