Demo Environment Setup¶

The tutorial requires an Amazon Redshift database as the destination and a local PostgreSQL database as the source. These instructions walk through how to configure those resources in a test environment including permissions and demo table structure.

Configuring a Source Database¶

For the tutorial, we’ll need a PostgreSQL instance. We’ll walk through the steps to set up a local PostgreSQL instance running in a Docker container below, but if you already have a database running feel free to skip the Docker instructions and change connection strings below appropriately.

Install docker if you do not already have it on your system.

# Run a local postgres instance in Docker on 54320
docker run -d --name pglocal -e POSTGRES_PASSWORD=docker -v my_dbdata:/var/lib/postgresql/data -p 54320:5432 postgres:11

# Run psql in the container
docker exec -it pglocal psql -Upostgres

This will launch a psql process within the container.

We’ll now create a table for use in our pipeline and insert some test data.

-- Inside PSQL:
CREATE TABLE starter_table (
    id SERIAL PRIMARY KEY,
    data VARCHAR(255),
    created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT NOW()
);

INSERT INTO starter_table (data)
VALUES ('my first record'), ('my second record');

SELECT * FROM starter_table;
 id |       data       |        created_at         |        updated_at
----+------------------+---------------------------+---------------------------
  1 | my first record  | 2020-05-26 11:29:52.25809 | 2020-05-26 11:29:52.25809
  2 | my second record | 2020-05-26 11:29:52.25809 | 2020-05-26 11:29:52.25809

Configuring a Destination Database¶

Next we need a destination database to copy data into. Either create a new Redshift database or use an existing one. If using an existing instance, you’ll need superuser permissions.

Once your destination database is running connect to it with your favorite SQL client and run the following (you may want to create your own password):

CREATE USER druzhba_test PASSWORD 'Druzhba123';
CREATE SCHEMA druzhba_raw;
GRANT ALL ON SCHEMA druzhba_raw TO druzhba_test;

This will create a dedicated schema to receive the tables Druzhba is going to create and create a system user and password for the Druzhba process to use.

To efficiently load to Redshift, Druzhba writes temporary files to an S3 bucket. If you do not already have one, create a bucket and define a prefix. The Druzhba process will need read/write access. You must also create an IAM copy role with access to that bucket/prefix and grant it to your Redshift instance.

With a complete testing environment in place we are ready to begin the comparatively simple task to actually define your pipeline,