:orphan:

.. _demosetup:

Demo Environment Setup
======================

The :ref:`tutorial <quickstart>` requires an `Amazon Redshift`_ database as the
destination and a local PostgreSQL_ database as the source. These instructions
walk through how to configure those resources in a test environment including
permissions and demo table structure.

.. _PostgreSQL: https://www.postgresql.org/

Configuring a Source Database
-----------------------------

For the tutorial, we'll need a PostgreSQL instance. We'll walk through the
steps to set up a local PostgreSQL instance running in a Docker container below,
but if you already have a database running feel free to skip the Docker
instructions and change connection strings below appropriately.

`Install docker <https://docs.docker.com/get-docker/>`_ if you do not already
have it on your system.

.. code-block:: bash

  # Run a local postgres instance in Docker on 54320
  docker run -d --name pglocal -e POSTGRES_PASSWORD=docker -v my_dbdata:/var/lib/postgresql/data -p 54320:5432 postgres:11

  # Run psql in the container
  docker exec -it pglocal psql -Upostgres

This will launch a psql process within the container.

We'll now create a table for use in our pipeline and insert some test data.

.. code-block:: postgresql

  -- Inside PSQL:
  CREATE TABLE starter_table (
      id SERIAL PRIMARY KEY,
      data VARCHAR(255),
      created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT NOW(),
      updated_at TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT NOW()
  );

  INSERT INTO starter_table (data)
  VALUES ('my first record'), ('my second record');

  SELECT * FROM starter_table;
   id |       data       |        created_at         |        updated_at
  ----+------------------+---------------------------+---------------------------
    1 | my first record  | 2020-05-26 11:29:52.25809 | 2020-05-26 11:29:52.25809
    2 | my second record | 2020-05-26 11:29:52.25809 | 2020-05-26 11:29:52.25809


Configuring a Destination Database
----------------------------------

Next we need a destination database to copy data into. Either `create a new
Redshift database <https://docs.aws.amazon.com/redshift/latest/gsg/getting-started.html>`_
or use an existing one. If using an existing instance, you'll
need superuser permissions.

Once your destination database is running connect to it with your favorite SQL
client and run the following (you may want to create your own password):

.. code-block:: postgresql

  CREATE USER druzhba_test PASSWORD 'Druzhba123';
  CREATE SCHEMA druzhba_raw;
  GRANT ALL ON SCHEMA druzhba_raw TO druzhba_test;

This will create a dedicated schema to receive the tables Druzhba is going to
create and create a system user and password for the Druzhba process to use.

To efficiently load to Redshift, Druzhba writes temporary files to an S3_
bucket. If you do not already have one, create a bucket and define a prefix. The
Druzhba process will need read/write access. You must also create an `IAM copy
role`_ with access to that bucket/prefix and grant it to your Redshift instance.

With a complete testing environment in place we are ready to begin the
comparatively simple task to actually :ref:`define your pipeline <define-pipeline>`,

.. _`Amazon Redshift`: https://aws.amazon.com/redshift/
.. _S3: https://aws.amazon.com/s3/
.. _`IAM copy role`: https://docs.aws.amazon.com/redshift/latest/mgmt/copy-unload-iam-role.html