Hummingbird Tools

Database backup and restore tools for the Hummingbird status database. Creates compressed PostgreSQL dumps, uploads to S3 with rotation, and supports restore from the latest backup.

Features

  • Streaming Backup: Streams pg_dump through gzip to temp file, then S3 multipart upload - handles arbitrarily large databases
  • Streaming Restore: Downloads backup to temp file, streams through psql - no memory constraints
  • Automatic Rotation: Keeps configurable number of backups per group (daily/weekly/monthly)
  • Separate IAM Users: Write user for production backups, read-only user for staging restore

Prerequisites

  • AWS CLI configured with appropriate credentials
  • PostgreSQL client tools (pg_dump, psql)
  • Python 3.11 or later
  • Access to the hummingbird-status PostgreSQL database

Deployment

The SAM template creates:

  • S3 bucket for database backups
  • IAM user with write permissions (for production backup CronJobs)
  • IAM user with read-only permissions (for staging restore CronJob)

Build and deploy using containerized AWS SAM CLI:

cd hummingbird-tools
make build     # Build SAM application
make deploy    # First deployment (interactive/guided)
make redeploy  # Subsequent deployments (non-interactive)

Note: Only deployed in production - staging restore reads from the production backup bucket.

Parameters

Parameter Description Default
ResourcePrefix Prefix for all resource names myapp-prod

Resource naming:

  • S3 bucket: {ResourcePrefix}
  • Write IAM user: {ResourcePrefix}-write-user
  • Read IAM user: {ResourcePrefix}-read-user

Usage

Backup

Run a backup with rotation:

python3 -m hummingbird_tools.backup <group> <rotation>

Arguments:

  • group: Backup group name (e.g., daily, weekly, monthly)
  • rotation: Number of backups to keep for this group

Environment variables:

Variable Description
S3_BUCKET S3 bucket name
S3_PREFIX Optional prefix for S3 keys
POSTGRESQL_HOST Database hostname
POSTGRESQL_USER Database username
POSTGRESQL_PASSWORD Database password
POSTGRESQL_DATABASE Database name

Example:

export S3_BUCKET=myapp-prod-db-backups
export S3_PREFIX=mydb/
export POSTGRESQL_HOST=myapp-postgres
export POSTGRESQL_USER=postgres
export POSTGRESQL_PASSWORD=secret
export POSTGRESQL_DATABASE=mydb

python3 -m hummingbird_tools.backup daily 7

This creates a backup like mydb/2026-01-16-02-30.daily.sql.gz and removes any daily backups beyond 7.

Restore

Restore from the latest backup:

python3 -m hummingbird_tools.restore

Environment variables are the same as for backup.

The restore process:

  1. Lists all backups matching *.{daily,weekly,monthly}.sql.gz
  2. Selects the latest by filename sort (most recent timestamp)
  3. Downloads to temp file
  4. Drops all existing tables in the public schema
  5. Streams the backup through psql

CronJob Schedule

Deployed via Kubernetes CronJobs in kubernetes/hummingbird-status/:

CronJob Schedule Environment Command
daily 30 2 * * * production python3 -m hummingbird_tools.backup daily 7
weekly 30 6 * * 0 production python3 -m hummingbird_tools.backup weekly 4
monthly 30 10 1 * * production python3 -m hummingbird_tools.backup monthly 12
restore (suspended) staging python3 -m hummingbird_tools.restore

To manually trigger a staging restore:

kubectl create job --from=cronjob/hummingbird-status-restore manual-restore-$(date +%s)

S3 Key Structure

Backups are stored with a flat key structure per database:

{database}/{timestamp}.{group}.sql.gz

Examples:

mydb/2026-01-16-02-30.daily.sql.gz
mydb/2026-01-12-06-30.weekly.sql.gz
mydb/2026-01-01-10-30.monthly.sql.gz

Multiple databases can share the same bucket using different prefixes.

Development

See the main README for development workflows.

make hummingbird-tools/setup  # Install dependencies
make check                     # Lint code (ruff)
make fmt                       # Format code
make test                      # Run unit tests
make coverage                  # Run tests with coverage

Security

  • S3 bucket blocks all public access
  • S3 objects encrypted at rest (AES256)
  • Separate IAM users with least privilege:
    • Write user: s3:PutObject, s3:DeleteObject, s3:ListBucket
    • Read user: s3:GetObject, s3:ListBucket
  • Database passwords passed via environment variables (Kubernetes secrets)

License

This project is licensed under the GNU General Public License v3.0 or later - see the LICENSE file for details.