This website uses cookies to ensure you get the best possible experience. See our Cookies Policy.

PMG Digital Made for Humans

Cross-Account Redshift UNLOAD/COPY

4 MINUTE READ | October 26, 2017

Cross-Account Redshift UNLOAD/COPY

At PMG, the dev team maintains several large applications that provide our account teams with the tools they need to be awesome. We host those tools on individual AWS accounts, but sometimes they need to talk with each other across accounts.

Most recently we had to implement a redshift UNLOAD from one AWS account to an S3 bucket in another. This is a guide on how to do that.

We’ll use two terms here:

  1. The S3 Account is the AWS account with the target S3 bucket to which redshift will unload.

  2. The Redshift Account contains the redshift cluster that will do the UNLOAD or COPY operation.

The S3 Account has a bucket and bucket policy that allows the Redshift Account to access the bucket.

The Redshift Account has an IAM role that redshift assumes and grants permissions to access the bucket on the S3 Account.

A bucket policy allows an AWS account to grant third-party accounts or other principals access to the bucket.

Think of a principal as a form of a user. In a bucket policy, a principal can say things like, “grant access to any IAM entity on Account X.” In our case, the S3 Account will allow the Redshift Account to access the target bucket.

all_account_policy.json

{    "Version": "2012-10-17",    "Statement": [        {            "Sid": "AllowS3",            "Effect": "Allow",            "Principal": {                "AWS": "arn:aws:iam::{REDSHIFT_ACCOUNT_ID}:root"            },            "Action": [                "s3:PutObject",                "s3:ListBucket",                "s3:GetObject",                "s3:DeleteObject"            ],            "Resource": [                "arn:aws:s3:::CHANGEME/*",                "arn:aws:s3:::CHANGEME"            ]        }    ]}

The above allows any IAM entity on the Redshift Account to access the target bucket (named CHANGEME in this example). That’s a bit too permissive for most cases, so let’s lock it down to a single IAM role.

bucket_policy.json

{    "Version": "2012-10-17",    "Statement": [        {            "Sid": "AllowS3",            "Effect": "Allow",            "Principal": {                "AWS": "arn:aws:iam::{REDSHIFT_ACCOUNT_ID}:role/{ROLE_NAME}"            },            "Action": [                "s3:PutObject",                "s3:ListBucket",                "s3:GetObject",                "s3:DeleteObject"            ],            "Resource": [                "arn:aws:s3:::CHANGEME/*",                "arn:aws:s3:::CHANGEME"            ]        }    ]}

Now the only IAM entity on the Redshift Account that can access the bucket is the specific role defined here.

The AWS CLI can be used to set bucket policies or the UI can.

The above bucket policy only allows the cross-account access to take place; the Redshift Account still needs permissions from its own IAM system to do so. To do that we’ll create an IAM role in the Redshift Account that Redshift can assume and that grants appropriate permissions to act on the S3 bucket.

AWS has plenty of documentation on how to create a role for Redshift, so we’ll skip that here. One interesting thing I’d like to point out is that roles can be restricted to certain database users. That’s super powerful if an automated system is talking to redshift and no others should be allowed to assume a given role.

The role needs permissions to act on the target S3 bucket.

permissions.json

{    "Version": "2012-10-17",    "Statement": [        {            "Sid": "AllowS3",            "Effect": "Allow",            "Action": "s3:*",            "Resource": [                "arn:aws:s3:::CHANGEME/*",                "arn:aws:s3:::CHANGEME"            ]        }    ]}

The above could be in a managed or inline policy.

Once the roles and permissions are set up, you can associate the role with the cluster so it can be used.

With all that done, we’re finally ready to actually unload some data!

unload.sql

UNLOAD ('SELECT 1')TO 's3://CHANGEME/prefix-'CREDENTIALS 'aws_iam_role=arn:aws:iam::{REDSHIFT_ACCOUNT_ID}:role/{ROLE_NAME}';
The above unloads a SELECT 1 statement to the target bucket. The CREDENTIALS clause says to use the IAM role created above.

  1. Create a bucket in the S3 Account

  2. Add a bucket policy to that bucket that allows the Redshift Account access

  3. Create an IAM role in the Redshift Account that redshift can assume

  4. Grant permissions to access the S3 Bucket to the newly created role

  5. Associate the role with the Redshift cluster

  6. Run some UNLOAD (or COPY) statements

Since the author is in love with terraform, here is some example terraform configuration for all of the above.

Interested in working with us? See our open engineering roles here.

Insights meet inbox

Sign up for weekly articles & resources.


Posted by Christopher Davis

Related Content

thumbnail image

Get Informed

PMG Innovation Challenge Inspires New Alli Technology Solutions

4 MINUTES READ | November 2, 2021

Get Informed

Applying Function Options to Domain Entities in Go

11 MINUTES READ | October 21, 2019

thumbnail image

Get Informed

My Experience Teaching Through Jupyter Notebooks

4 MINUTES READ | September 21, 2019

Get Informed

Trading Symfony’s Form Component for Data Transfer Objects

8 MINUTES READ | September 3, 2019

Get Inspired

Working with an Automation Mindset

5 MINUTES READ | August 22, 2019

Get Informed

Parsing Redshift Logs to Understand Data Usage

7 MINUTES READ | May 6, 2019

Get Inspired

3 Tips for Showing Value in the Tech You Build

5 MINUTES READ | April 24, 2019

thumbnail image

Get Informed

Testing React

13 MINUTES READ | March 12, 2019

Get Inspired

Tips for Designing & Testing Software Without a UX Specialist

4 MINUTES READ | March 6, 2019

Get Informed

A Beginner’s Experience with Terraform

4 MINUTES READ | December 20, 2018

ALL POSTS