Container Catalog

Per-distro serverless catalog API for Hummingbird container images.

Features

  • Image Directory - Browse all container images with metadata
  • Tag Browser - View tags, digests, architectures per image
  • Specifications - Per-architecture OCI config details (env, cmd, user, labels)
  • SBOM - Per-architecture package lists from SPDX attestations
  • Vulnerabilities - CVE scanning results via Grype
  • CVE Metrics - CloudWatch metrics for CVE exposure duration (see Error Budgets)
  • Provenance - Source traceability from SLSA attestations
  • Release History - Timeline of past builds with drill-down
  • OpenAPI Spec - Machine-readable API documentation at /v1/openapi.json
  • Swagger UI - Interactive API explorer at /v1/docs/

Architecture

Rust serverless stack deployed as two isolated per-distro stacks:

  • API Lambda - DynamoDB pass-through (~10ms response)
  • Sync Lambda - Incremental DynamoDB sync from SNS Release events (~22 registry calls per release)
  • Index Lambda (index-lambda) - SNS-triggered Syft JSON generation, uploads native Syft JSON to S3 for scanner consumption
  • Scan Lambda (scan-lambda) - SQS-triggered CVE scanning via Grype from native Syft JSON in S3, per-digest processing with first_seen tracking and partial batch failure reporting
  • Enqueue Lambda (enqueue-lambda) - Hourly EventBridge-triggered fan-out that checks Grype DB for updates and enqueues non-superseded digests to SQS
  • Metrics Lambda (metrics-lambda) - DynamoDB Stream-triggered CVE exposure metrics and structured logs to CloudWatch
  • DynamoDB - Pre-computed JSON items (single-table PK/SK design, Streams: KEYS_ONLY)
  • S3 ScanDataBucket - Native Syft JSON storage for scanner data (gzipped, keyed by grype/{image}/{digest_hex}.json.gz)
  • SQS ScanQueue - Central queue for scan tasks (fed by S3 events and Enqueue Lambda)
  • CloudFront - CDN with per-endpoint cache TTLs
  • CloudWatch - CVE exposure metrics and structured logs
  • catalog sync - Full DynamoDB population from GitLab + Quay.io OCI v2 registry
  • catalog scan - CLI CVE scanning via Grype; reads native Syft JSON from S3 (with --bucket) or builds synthetic JSON from DynamoDB SBOMs (fallback)
  • catalog index - CLI Syft JSON generation for backfilling S3
  • Catalog SPA - Lit 3 web app served from S3 via CloudFront

Scanner Architecture

CVE vulnerability data must match direct grype <image> scans exactly for all images. Synthetic Syft JSON (built from stored SBOM packages) cannot reliably reproduce the native output because Grype relies on metadata fields, artifact relationships, and deduplication logic that are lost in the SPDX-to-API roundtrip. Additionally, raw SPDX from the build system contains package bloat (e.g. Go sub-modules, empty-version entries) that native Syft filters out when scanning a compiled binary directly.

To guarantee exact-match results, the scanner chain stores native Syft JSON in S3 rather than DynamoDB: a single image’s Syft JSON is typically 1-10 MB (gzipped to 100 KB - 1 MB), which exceeds DynamoDB’s 400 KB item limit. S3 has no per-object size constraint, avoids provisioned throughput costs for large blobs, and supports event notifications for triggering downstream scanners. An independent Index Lambda runs syft <image> on each new release and uploads the full output to s3://{bucket}/grype/{image}/{hex}.json.gz. The S3 upload fires an event notification to SQS, which triggers the Scan Lambda to read the native JSON, run Grype, and write per-canonical vulnerability data to DynamoDB.

The scanner chain is intentionally independent of the catalog data chain: the same SNS Release event triggers both the Sync Lambda (catalog data to DynamoDB) and the Index Lambda (Syft JSON to S3), with no ordering dependency between them.

CVE Data Flow

graph TD
    Konflux([Konflux Release]) -->|SNS| Sync[SyncFunction]
    Konflux -->|SNS| Index[IndexFunction]
    Sync -->|update| Tags[(Tags)]
    Tags -->|"stream via ESM"| Metrics
    Tags -->|read| Scan[ScanFunction]
    Index -->|upload| S3Syft[(S3 Syft JSON)]
    S3Syft -->|"S3 event via SQS"| Scan
    Hourly([Schedule]) -->|hourly| Enqueue[EnqueueFunction]
    Enqueue --> Gate[check Grype DB]
    Gate -->|"fan out via SQS"| Scan
    Scan -->|update| Vulns[(Image Vulnerabilities)]
    Vulns -->|"stream via ESM"| Metrics[MetricsFunction]
    Metrics -->|"metrics + logs"| CloudWatch[CloudWatch]
    Metrics -->|"write aggregate"| CatalogVulns[(Catalog Vulnerabilities)]

    classDef lambda fill:#d4e6f1,stroke:#2980b9
    classDef dynamo fill:#fdebd0,stroke:#e67e22
    classDef trigger fill:#d5f5e3,stroke:#27ae60
    classDef aws fill:#e8daf1,stroke:#8e44ad
    classDef gate fill:#e5e7e9,stroke:#7f8c8d
    class Sync,Scan,Enqueue,Metrics,Index lambda
    class Vulns,Tags,CatalogVulns,S3Syft dynamo
    class Konflux,Hourly trigger
    class CloudWatch aws
    class Gate gate

Prerequisites

  • Rust 1.75+ (for building backend)
  • Node.js 22+ (for building frontend)
  • AWS credentials (for DynamoDB access and deployment)
  • SAM CLI (for deployment)

API Endpoints

Endpoint Description
GET /v1/images Image directory
GET /v1/images/{name} Image overview (README)
GET /v1/images/{name}/tags Tags for an image
GET /v1/images/{name}/details/{canonical} Per-canonical details
GET /v1/images/{name}/sbom/{canonical} Package list
GET /v1/images/{name}/vulnerabilities/{canonical} Vulnerability scan
GET /v1/images/{name}/history/{stream}/{variant} Release timeline
GET /v1/images/{name}/releases/details/{digest} Release details (immutable)
GET /v1/images/{name}/releases/sbom/{digest} Release SBOM (immutable)
GET /v1/images/{name}/releases/vulnerabilities/{digest} Release vulnerabilities
GET /v1/vulnerabilities Catalog-wide CVE aggregate
GET /v1/openapi.json OpenAPI 3.1 specification
GET /v1/docs/ Interactive Swagger UI

Timestamp Fields

The oldest_created field on ImageSummary, Tag, and HistorySummary is the earliest OCI created timestamp across all architectures in the release. All architectures were built at or after this date, making it useful for conservative staleness detection.

The specifications endpoint returns per-architecture data keyed by architecture name. Each architecture’s created field is the direct OCI config root timestamp for that specific architecture.

Usage

All tools are built as a single catalog binary with subcommands (api, sync, sync-lambda, scan, scan-lambda, index, index-lambda, enqueue-lambda, metrics, metrics-lambda). The binary is built in the Rust container and CLI subcommands are run in the gitlab-ci container (which provides grype, syft, and other tools). Only make and podman are required.

Sync

# Dry run (print items to stdout)
make container-catalog/sync ARGS="--distro rawhide --dry-run"

# Populate DynamoDB
make container-catalog/sync ARGS="--distro rawhide --table-name <table>"

Sync Lambda

The sync-lambda subcommand runs as an AWS Lambda function triggered by SNS Release events from kubernetes-event-forwarder. It incrementally syncs a single image release to DynamoDB (~22 registry API calls per release vs ~104 for a full sync).

The Lambda:

  1. Decodes gzip+base64 SNS messages
  2. Filters for Succeeded releases targeting the configured Quay.io namespace
  3. Fetches OCI manifest data for the new digest
  4. Writes per-digest items (DETAILS, SBOM, RELEASE_DETAILS, RELEASE_SBOM)
  5. Merges into aggregate items (TAGS, HISTORY, OVERVIEW, DIRECTORY)
  6. Fetches README from GitLab for OVERVIEW content (uses README.redhat.md for hummingbird, README.md for rawhide)

Registry fetch errors (manifest, SBOM, attestation) propagate as hard failures so the Lambda retries automatically (up to 2 retries with backoff) before sending to the DLQ. GitLab README failures are non-fatal – the existing README is preserved if the fetch fails.

Environment Variable Description
TABLE_NAME DynamoDB table name
DISTRO rawhide or hummingbird
SENTRY_DSN Optional Sentry DSN for error tracking

Index

The index subcommand generates native Syft JSON for all non-superseded image digests and uploads them to S3 for scanner consumption. It reads the image directory and TAGS from DynamoDB, checks S3 for existing objects (dedup via HeadObject), runs syft <image> --platform linux/amd64, gzips the output, and uploads to s3://{bucket}/grype/{image}/{hex}.json.gz.

# Dry run (show what would be uploaded)
make container-catalog/index ARGS="--distro hummingbird --table-name <table> --bucket <bucket> --dry-run"

# Backfill S3 for all images
make container-catalog/index ARGS="--distro hummingbird --table-name <table> --bucket <bucket>"

# Backfill a single image
make container-catalog/index ARGS="--distro hummingbird --table-name <table> --bucket <bucket> --image caddy"

Index Lambda

The index-lambda subcommand runs as an AWS Lambda function triggered by the same SNS Release events as the Sync Lambda. It operates independently of the catalog chain (no DynamoDB access) and generates native Syft JSON for scanner consumption.

The Lambda:

  1. Decodes gzip+base64 SNS messages (shared with Sync Lambda via release_event module)
  2. Filters for Succeeded releases targeting the configured Quay.io namespace
  3. Checks S3 for existing objects (HeadObject dedup by digest)
  4. Runs syft <image>@<digest> --platform linux/amd64 --output syft-json
  5. Gzips and uploads to s3://{bucket}/grype/{image}/{hex}.json.gz

Failures propagate for Lambda retry (up to 2 retries) before sending to the IndexDLQ. The catalog index CLI backfills any gaps.

Environment Variable Description
SCAN_DATA_BUCKET S3 bucket for scanner data
DISTRO rawhide or hummingbird
SENTRY_DSN Optional Sentry DSN for error tracking

Scan

The scan subcommand reads image listings, tags, and SBOMs from DynamoDB (no registry access needed) and runs Grype against each image’s stored SBOM packages. Results include a first_seen timestamp per CVE, tracked at the group+variant+stream level and carried across releases for SLI computation.

# Dry run (print items to stdout)
make container-catalog/scan ARGS="--distro hummingbird --table-name <table> --dry-run"

# Scan and write to DynamoDB (purge stale vuln data first, implies --scope=all)
make container-catalog/scan ARGS="--distro hummingbird --table-name <table> --purge"

# Scan only non-superseded (current) tags
make container-catalog/scan ARGS="--distro hummingbird --table-name <table> --scope non-superseded"

# Scan all releases including historic (tagless) releases
make container-catalog/scan ARGS="--distro hummingbird --table-name <table> --scope all"

# Scan a single image
make container-catalog/scan ARGS="--distro hummingbird --table-name <table> --image caddy --dry-run"

Scan Lambda

The scan-lambda subcommand runs as an AWS Lambda function triggered by SQS messages. Two paths feed the ScanQueue:

  1. Real-time (S3 event notification): When the Index Lambda uploads a new Syft JSON to S3, an s3:ObjectCreated event (filtered on grype/ prefix) is sent directly to SQS
  2. Hourly (Enqueue Lambda): Fans out all non-superseded digests when the Grype vulnerability database has been updated, sending {"bucket": "...", "key": "grype/{image}/{hex}.json.gz"} messages

The Lambda accepts both S3 event notification JSON and direct {"bucket", "key"} messages. For each message it:

  1. Downloads and decompresses the native Syft JSON from S3
  2. Runs Grype on the raw bytes
  3. Looks up all canonical tags matching the digest from the TAGS item
  4. Writes VULNERABILITIES#{canonical} for each matching tag (with first_seen tracking) and RELEASE_VULNERABILITIES#{hex} once per digest

Processing is per-digest: a single Syft JSON serves all canonicals sharing that digest, avoiding redundant Grype invocations.

The Lambda loads the Grype DB on cold start (cached in /tmp for warm invocations), processes up to 10 messages per batch, and reports partial batch failures so only failed records return to the queue.

Environment Variable Description
TABLE_NAME DynamoDB table name
DISTRO rawhide or hummingbird
SENTRY_DSN Optional Sentry DSN for error tracking

Enqueue Lambda

The enqueue-lambda subcommand runs hourly via EventBridge Schedule. On each invocation it performs a lightweight HTTP GET of the public Grype DB listing (latest.json, ~200 bytes) and compares the built timestamp against the stored CATALOG/LAST_FULL_SCAN_DB item in DynamoDB. If the DB hasn’t changed, it returns early (~23 of 24 hourly invocations short-circuit). When an update is detected, it reads the image directory and all TAGS items, deduplicates by digest, and sends one SQS message per unique digest using SendMessageBatch. Messages use the format {"bucket": "...", "key": "grype/{image}/{hex}.json.gz"}.

Environment Variable Description
TABLE_NAME DynamoDB table name
DISTRO rawhide or hummingbird
SCAN_QUEUE_URL SQS queue URL for scan messages
SCAN_DATA_BUCKET S3 bucket for scanner data
GRYPE_DB_LATEST_URL Grype DB listing URL (has sensible default)
SENTRY_DSN Optional Sentry DSN for error tracking

Metrics

The metrics subcommand performs a one-shot read of all non-superseded vulnerability data from DynamoDB, outputs structured CVE exposure logs, and writes the catalog-wide vulnerability aggregate to DynamoDB (PK=CATALOG, SK=VULNERABILITIES). With --dry-run, it skips CloudWatch metrics and the DynamoDB aggregate write (useful for local inspection).

# Dry run (print structured logs to stdout, no CloudWatch push)
make container-catalog/metrics ARGS="--distro hummingbird --table-name <table> --dry-run"

# Push metrics to CloudWatch and print structured logs
make container-catalog/metrics ARGS="--distro hummingbird --table-name <table>"

Metrics Lambda

The metrics-lambda subcommand runs as a DynamoDB Stream-triggered Lambda that emits CVE exposure duration metrics to CloudWatch and writes a catalog-wide vulnerability aggregate to DynamoDB (served by GET /v1/vulnerabilities). The CloudWatch metrics feed the SLO dashboard and alarm defined in the error-budgets stack.

How It Works

The Lambda is triggered by DynamoDB Stream events filtered on VULNERABILITIES# and TAGS changes, with a 60-second batching window and reserved concurrency of 1 (single instance). It maintains an in-memory active CVE table across warm invocations:

  • Cold start: Reads CATALOG/DIRECTORY, all TAGS, and all non-superseded VULNERABILITIES# items from DynamoDB to build the full table (~2-3s at 10000 tags)
  • Warm invocations: Incrementally updates the table from stream event keys (~50-100 GetItem calls per batch)
  • After each invocation: Recomputes and emits all metrics and structured logs

CloudWatch Metrics

Metric Type Dimensions
CveExposureDuration Distribution [Distro], [Distro, Severity]
ActiveCveCount Count [Distro, Severity]

CveExposureDuration values are in hours, computed as now - first_seen for each active CVE on each non-superseded canonical tag. [Distro] is used by the SLO alarm; [Distro, Severity] is used by the dashboard.

Structured Logs

Each invocation emits one JSON log line per active CVE to stdout (captured by CloudWatch Logs). Example query for all active CVEs:

filter message = "active_cve"
| fields cve, severity, exposure_hours, repository, stream, variant, component
| sort exposure_hours desc
Environment Variable Description
TABLE_NAME DynamoDB table name
DISTRO rawhide or hummingbird
CLOUDWATCH_NAMESPACE CloudWatch namespace for metrics
SENTRY_DSN Optional Sentry DSN

Deployment

make container-catalog/build
make container-catalog/deploy

Configuration

catalog sync

Argument Description
--distro rawhide or hummingbird
--table-name DynamoDB table name
--purge Delete all items before writing
--cache-dir Cache directory (auto-detected)
--image Sync only a specific repo
--legacy-discovery Use GitLab-based repo discovery

catalog index

Argument Description
--distro rawhide or hummingbird
--table-name DynamoDB table name (for image/tag discovery)
--bucket S3 bucket for scanner data
--dry-run Print what would be uploaded without uploading
--image Index only a specific image
--parallel Number of concurrent operations (default: 2)

catalog scan

Argument Description
--distro rawhide or hummingbird
--table-name DynamoDB table name (required)
--bucket S3 bucket with native Syft JSON (uses S3 scan path)
--scope non-superseded, tags (default), or all
--dry-run Print items without writing
--purge Purge vuln data before writing (implies --scope all)
--cache-dir Cache directory (auto-detected)
--parallel Number of concurrent scans (default: 4)
--image Scan only a specific image
--tag Scan only a specific tag (requires --image)

SAM Parameters

Parameter Description
Distro rawhide or hummingbird
CacheEnabled Enable CloudFront caching
CatalogDomainName Catalog web UI domain
ApiDomainName API domain
HostedZoneId Route53 hosted zone
CorsOrigins Comma-separated CORS origins (default *)
SnsTopicArn SNS topic ARN for Release events (enables sync Lambda)

Frontend

The catalog web UI is a Lit 3 SPA (Web Components) with Tailwind CSS, built per-distro with Vite. Source is in container-catalog/frontend/.

Only make and podman are required (no local Node.js needed). Defaults from .envrc.defaults are applied automatically.

# Install dependencies
make container-catalog/frontend/setup

# Development server at http://localhost:5173
make container-catalog/frontend/dev

# Production build
make container-catalog/frontend/build

Host variants (*-host) run without podman (for CI or local Node.js).

Frontend Build Variables

Variable Description
VITE_API_URL API base URL for the distro
VITE_DISTRO rawhide or hummingbird
VITE_DISTRO_LABEL Display label for current distro
VITE_OTHER_CATALOG_URL URL of the other distro’s catalog (optional, hides link if unset)
VITE_OTHER_DISTRO_LABEL Display label for other distro (optional)
VITE_VULNERABILITIES_ENABLED Show vulnerabilities tab

Development

# Backend
cargo test                    # Run tests
cargo clippy --all-targets   # Lint
cargo fmt                     # Format

# Frontend (host variants, requires local Node.js)
cd container-catalog/frontend
npm run typecheck             # Type check
npm run build                 # Production build

License

This project is licensed under the GNU General Public License v3.0 or later - see the LICENSE file for details.