Command Line Help

usage: druzhba [-h] [-ll LOG_LEVEL] [-d DATABASE]
                    [-t [TABLES [TABLES ...]]] [-np NUM_PROCESSES] [-co] [-ps]
                    [-vo] [-f] [-r]

Friendly data pipeline framework

optional arguments:
  -h, --help            show this help message and exit
  -ll LOG_LEVEL, --log-level LOG_LEVEL
                        Name of a python log level eg DEBUG
  -d DATABASE, -db DATABASE, --database DATABASE
                        A single database to run. Will override a database
                        marked disabled in the db config file
  -t [TABLES [TABLES ...]], --table [TABLES [TABLES ...]], --tables [TABLES [TABLES ...]]
                        List of tables to run separated by spaces. Must be run
                        with --database
  -np NUM_PROCESSES, --num-processes NUM_PROCESSES
                        Number of parallel processes to spawn. Defaults to
                        number of CPUs (cores) available.
  -co, --compile-only   Will print generated queries to STDOUT but not execute
  -ps, --print-sql-only
                        Will print generated CREATE and SELECT statements to
                        STDOUT only.
  -vo, --validate-only  Will execute configuration checks only.
  -f, --full-refresh    Force a full refresh of the table. Must be run with
                        --database and --table(s).
  -r, --rebuild         Automatically recreate and full-refresh the table.
                        Must be run with --database and --table(s). Only
                        supported for tables Druzhba can build.

Specific options follow.

Database Options

By default, Druzhba runs all databases defined in _pipeline.yaml (see Configuration) unless they are explicitly disabled with enabled: false. These disabled source databases may still be run by calling Druzhba with the --database argument.

Databases are referred to by their “alias”, which does not necessarily need to match the “database name” actually provided in the source connection string.

Additionally, Druzhba supports a --table option (alternately -t or --tables). Using this option will cause druzhba to update only a single table or whitespace delimited list of tables. The --table option will override enabled: false from the relevant configuration file.

Testing Options

Three testing / development options are available.

The --validate-only option is useful to run in CI to ensure that configuration files can be parsed correctly. The Druzhba process will show an error message return a non-zero exit status if Druzhba detects any configuration errors. Note that this option does not connect to any databases and therefore can only catch errors in the Druzhba configuration itself and not issues like a misspelled table name.

The --compile-only option will print SELECT statements that would be run against the source database. This option can be useful for developing and debugging custom SQL logic.

The --print-sql-only option will display all SQL statements that would be run in would be run against the destination and source databases respectively, and can be useful for troubleshooting issues in a pipeline.

Table Maintenance and Manual Overrides

Using --full-refresh as a CLI parameter overrides (defaulted to false if omitted) full-refresh table configuration field as described in If given, all tables in the Druzhba invocation will have all existing rows deleted, and the source side query will ignore the existing incremental index. This option is typically used with -t / --table and is useful if data in the source table has been deleted or updated in a way that will not be picked up by an index_column.

--rebuild behaves as full-refresh, but additionally will transactionally delete and recreate the target table. Druzhba also attempts to copy the permissions on the old table to the new one, but will ignore some kinds of permissions like with grant option. This option is useful after a migration to the source table. --rebuild is not supported for manual tables or those with a truncate_file.

Production Options

Default logging level can be overridden with the --log-level option. Its use supersedes the LOG_LEVEL environment variable. If neither is set then Druzhba defaults to INFO. Available levels include DEBUG, INFO, WARN, and ERROR.

While druzhba processes tables from a given source database sequentially, it will attempt to process multiple source databases concurrently, one table from each at a time. By default, Druzhba’s parallelism limit will be equal to the number of CPU cores detected. Use -np or --num-processes to override that default. Note that the number of concurrent processes will never exceed the number of source databases.