Container Catalog

Per-distro serverless catalog API for Hummingbird container images.

Features

Image Directory - Browse all container images with metadata
Tag Browser - View tags, digests, architectures per image
Specifications - Per-architecture OCI config details (env, cmd, user, labels)
SBOM - Per-architecture package lists from SPDX attestations
Vulnerabilities - CVE scanning results via Grype
CVE Metrics - CloudWatch metrics for CVE exposure duration (see Error Budgets)
Provenance - Source traceability from SLSA attestations
Release History - Timeline of past builds with drill-down
OpenAPI Spec - Machine-readable API documentation at /v1/openapi.json
Swagger UI - Interactive API explorer at /v1/docs/

Architecture

Rust serverless stack deployed as two isolated per-distro stacks:

API Lambda - DynamoDB pass-through (~10ms response)
Sync Lambda - Incremental DynamoDB sync from SNS Release events (~22 registry calls per release)
Index Lambda (index-lambda) - SNS-triggered Syft JSON generation, uploads native Syft JSON to S3 for scanner consumption
Scan Lambda (scan-lambda) - SQS-triggered CVE scanning via Grype from native Syft JSON in S3, per-digest processing with first_seen tracking and partial batch failure reporting
Enqueue Lambda (enqueue-lambda) - Hourly EventBridge-triggered fan-out that checks Grype DB for updates and enqueues non-superseded digests to SQS
Metrics Lambda (metrics-lambda) - DynamoDB Stream-triggered CVE exposure metrics and structured logs to CloudWatch
DynamoDB - Pre-computed JSON items (single-table PK/SK design, Streams: KEYS_ONLY)
S3 ScanDataBucket - Native Syft JSON storage for scanner data (gzipped, keyed by grype/{image}/{digest_hex}.json.gz)
SQS ScanQueue - Central queue for scan tasks (fed by S3 events and Enqueue Lambda)
CloudFront - CDN with per-endpoint cache TTLs
CloudWatch - CVE exposure metrics and structured logs
catalog sync - Full DynamoDB population from GitLab + Quay.io OCI v2 registry
catalog scan - CLI CVE scanning via Grype; reads native Syft JSON from S3 (with --bucket) or builds synthetic JSON from DynamoDB SBOMs (fallback)
catalog index - CLI Syft JSON generation for backfilling S3
Catalog SPA - Lit 3 web app served from S3 via CloudFront

Scanner Architecture

CVE vulnerability data must match direct grype <image> scans exactly for all images. Synthetic Syft JSON (built from stored SBOM packages) cannot reliably reproduce the native output because Grype relies on metadata fields, artifact relationships, and deduplication logic that are lost in the SPDX-to-API roundtrip. Additionally, raw SPDX from the build system contains package bloat (e.g. Go sub-modules, empty-version entries) that native Syft filters out when scanning a compiled binary directly.

To guarantee exact-match results, the scanner chain stores native Syft JSON in S3 rather than DynamoDB: a single image’s Syft JSON is typically 1-10 MB (gzipped to 100 KB - 1 MB), which exceeds DynamoDB’s 400 KB item limit. S3 has no per-object size constraint, avoids provisioned throughput costs for large blobs, and supports event notifications for triggering downstream scanners. An independent Index Lambda runs syft <image> on each new release and uploads the full output to s3://{bucket}/grype/{image}/{hex}.json.gz. The S3 upload fires an event notification to SQS, which triggers the Scan Lambda to read the native JSON, run Grype, and write per-canonical vulnerability data to DynamoDB.

The scanner chain is intentionally independent of the catalog data chain: the same SNS Release event triggers both the Sync Lambda (catalog data to DynamoDB) and the Index Lambda (Syft JSON to S3), with no ordering dependency between them.

CVE Data Flow

graph TD
    Konflux([Konflux Release]) -->|SNS| Sync[SyncFunction]
    Konflux -->|SNS| Index[IndexFunction]
    Sync -->|update| Tags[(Tags)]
    Tags -->|"stream via ESM"| Metrics
    Tags -->|read| Scan[ScanFunction]
    Index -->|upload| S3Syft[(S3 Syft JSON)]
    S3Syft -->|"S3 event via SQS"| Scan
    Hourly([Schedule]) -->|hourly| Enqueue[EnqueueFunction]
    Enqueue --> Gate[check Grype DB]
    Gate -->|"fan out via SQS"| Scan
    Scan -->|update| Vulns[(Image Vulnerabilities)]
    Vulns -->|"stream via ESM"| Metrics[MetricsFunction]
    Metrics -->|"metrics + logs"| CloudWatch[CloudWatch]
    Metrics -->|"write aggregate"| CatalogVulns[(Catalog Vulnerabilities)]

    classDef lambda fill:#d4e6f1,stroke:#2980b9
    classDef dynamo fill:#fdebd0,stroke:#e67e22
    classDef trigger fill:#d5f5e3,stroke:#27ae60
    classDef aws fill:#e8daf1,stroke:#8e44ad
    classDef gate fill:#e5e7e9,stroke:#7f8c8d
    class Sync,Scan,Enqueue,Metrics,Index lambda
    class Vulns,Tags,CatalogVulns,S3Syft dynamo
    class Konflux,Hourly trigger
    class CloudWatch aws
    class Gate gate

Prerequisites

Rust 1.75+ (for building backend)
Node.js 22+ (for building frontend)
AWS credentials (for DynamoDB access and deployment)
SAM CLI (for deployment)

API Endpoints

Endpoint	Description
`GET /v1/images`	Image directory
`GET /v1/images/{name}`	Image overview (README)
`GET /v1/images/{name}/tags`	Tags for an image
`GET /v1/images/{name}/details/{canonical}`	Per-canonical details
`GET /v1/images/{name}/sbom/{canonical}`	Package list
`GET /v1/images/{name}/vulnerabilities/{canonical}`	Vulnerability scan
`GET /v1/images/{name}/history/{stream}/{variant}`	Release timeline
`GET /v1/images/{name}/releases/details/{digest}`	Release details (immutable)
`GET /v1/images/{name}/releases/sbom/{digest}`	Release SBOM (immutable)
`GET /v1/images/{name}/releases/vulnerabilities/{digest}`	Release vulnerabilities
`GET /v1/vulnerabilities`	Catalog-wide CVE aggregate
`GET /v1/openapi.json`	OpenAPI 3.1 specification
`GET /v1/docs/`	Interactive Swagger UI

Timestamp Fields

The oldest_created field on ImageSummary, Tag, and HistorySummary is the earliest OCI created timestamp across all architectures in the release. All architectures were built at or after this date, making it useful for conservative staleness detection.

The specifications endpoint returns per-architecture data keyed by architecture name. Each architecture’s created field is the direct OCI config root timestamp for that specific architecture.

Usage

All tools are built as a single catalog binary with subcommands (api, sync, sync-lambda, scan, scan-lambda, index, index-lambda, enqueue-lambda, metrics, metrics-lambda). The binary is built in the Rust container and CLI subcommands are run in the gitlab-ci container (which provides grype, syft, and other tools). Only make and podman are required.

Sync

# Dry run (print items to stdout)
make container-catalog/sync ARGS="--distro rawhide --dry-run"

# Populate DynamoDB
make container-catalog/sync ARGS="--distro rawhide --table-name <table>"

Sync Lambda

The sync-lambda subcommand runs as an AWS Lambda function triggered by SNS Release events from kubernetes-event-forwarder. It incrementally syncs a single image release to DynamoDB (~22 registry API calls per release vs ~104 for a full sync).

The Lambda:

Decodes gzip+base64 SNS messages
Filters for Succeeded releases targeting the configured Quay.io namespace
Fetches OCI manifest data for the new digest
Writes per-digest items (DETAILS, SBOM, RELEASE_DETAILS, RELEASE_SBOM)
Merges into aggregate items (TAGS, HISTORY, OVERVIEW, DIRECTORY)
Fetches README from GitLab for OVERVIEW content (uses README.redhat.md for hummingbird, README.md for rawhide)

Registry fetch errors (manifest, SBOM, attestation) propagate as hard failures so the Lambda retries automatically (up to 2 retries with backoff) before sending to the DLQ. GitLab README failures are non-fatal – the existing README is preserved if the fetch fails.

Environment Variable	Description
`TABLE_NAME`	DynamoDB table name
`DISTRO`	`rawhide` or `hummingbird`
`SENTRY_DSN`	Optional Sentry DSN for error tracking

Index

The index subcommand generates native Syft JSON for all non-superseded image digests and uploads them to S3 for scanner consumption. It reads the image directory and TAGS from DynamoDB, checks S3 for existing objects (dedup via HeadObject), runs syft <image> --platform linux/amd64, gzips the output, and uploads to s3://{bucket}/grype/{image}/{hex}.json.gz.

# Dry run (show what would be uploaded)
make container-catalog/index ARGS="--distro hummingbird --table-name <table> --bucket <bucket> --dry-run"

# Backfill S3 for all images
make container-catalog/index ARGS="--distro hummingbird --table-name <table> --bucket <bucket>"

# Backfill a single image
make container-catalog/index ARGS="--distro hummingbird --table-name <table> --bucket <bucket> --image caddy"

Index Lambda

The index-lambda subcommand runs as an AWS Lambda function triggered by the same SNS Release events as the Sync Lambda. It operates independently of the catalog chain (no DynamoDB access) and generates native Syft JSON for scanner consumption.

The Lambda:

Decodes gzip+base64 SNS messages (shared with Sync Lambda via release_event module)
Filters for Succeeded releases targeting the configured Quay.io namespace
Checks S3 for existing objects (HeadObject dedup by digest)
Runs syft <image>@<digest> --platform linux/amd64 --output syft-json
Gzips and uploads to s3://{bucket}/grype/{image}/{hex}.json.gz

Failures propagate for Lambda retry (up to 2 retries) before sending to the IndexDLQ. The catalog index CLI backfills any gaps.

Environment Variable	Description
`SCAN_DATA_BUCKET`	S3 bucket for scanner data
`DISTRO`	`rawhide` or `hummingbird`
`SENTRY_DSN`	Optional Sentry DSN for error tracking

Scan

The scan subcommand reads image listings, tags, and SBOMs from DynamoDB (no registry access needed) and runs Grype against each image’s stored SBOM packages. Results include a first_seen timestamp per CVE, tracked at the group+variant+stream level and carried across releases for SLI computation.

# Dry run (print items to stdout)
make container-catalog/scan ARGS="--distro hummingbird --table-name <table> --dry-run"

# Scan and write to DynamoDB (purge stale vuln data first, implies --scope=all)
make container-catalog/scan ARGS="--distro hummingbird --table-name <table> --purge"

# Scan only non-superseded (current) tags
make container-catalog/scan ARGS="--distro hummingbird --table-name <table> --scope non-superseded"

# Scan all releases including historic (tagless) releases
make container-catalog/scan ARGS="--distro hummingbird --table-name <table> --scope all"

# Scan a single image
make container-catalog/scan ARGS="--distro hummingbird --table-name <table> --image caddy --dry-run"

Scan Lambda

The scan-lambda subcommand runs as an AWS Lambda function triggered by SQS messages. Two paths feed the ScanQueue:

Real-time (S3 event notification): When the Index Lambda uploads a new Syft JSON to S3, an s3:ObjectCreated event (filtered on grype/ prefix) is sent directly to SQS
Hourly (Enqueue Lambda): Fans out all non-superseded digests when the Grype vulnerability database has been updated, sending {"bucket": "...", "key": "grype/{image}/{hex}.json.gz"} messages

The Lambda accepts both S3 event notification JSON and direct {"bucket", "key"} messages. For each message it:

Downloads and decompresses the native Syft JSON from S3
Runs Grype on the raw bytes
Looks up all canonical tags matching the digest from the TAGS item
Writes VULNERABILITIES#{canonical} for each matching tag (with first_seen tracking) and RELEASE_VULNERABILITIES#{hex} once per digest

Processing is per-digest: a single Syft JSON serves all canonicals sharing that digest, avoiding redundant Grype invocations.

The Lambda loads the Grype DB on cold start (cached in /tmp for warm invocations), processes up to 10 messages per batch, and reports partial batch failures so only failed records return to the queue.

Environment Variable	Description
`TABLE_NAME`	DynamoDB table name
`DISTRO`	`rawhide` or `hummingbird`
`SENTRY_DSN`	Optional Sentry DSN for error tracking

Enqueue Lambda

The enqueue-lambda subcommand runs hourly via EventBridge Schedule. On each invocation it performs a lightweight HTTP GET of the public Grype DB listing (latest.json, ~200 bytes) and compares the built timestamp against the stored CATALOG/LAST_FULL_SCAN_DB item in DynamoDB. If the DB hasn’t changed, it returns early (~23 of 24 hourly invocations short-circuit). When an update is detected, it reads the image directory and all TAGS items, deduplicates by digest, and sends one SQS message per unique digest using SendMessageBatch. Messages use the format {"bucket": "...", "key": "grype/{image}/{hex}.json.gz"}.

Environment Variable	Description
`TABLE_NAME`	DynamoDB table name
`DISTRO`	`rawhide` or `hummingbird`
`SCAN_QUEUE_URL`	SQS queue URL for scan messages
`SCAN_DATA_BUCKET`	S3 bucket for scanner data
`GRYPE_DB_LATEST_URL`	Grype DB listing URL (has sensible default)
`SENTRY_DSN`	Optional Sentry DSN for error tracking

Metrics

The metrics subcommand performs a one-shot read of all non-superseded vulnerability data from DynamoDB, outputs structured CVE exposure logs, and writes the catalog-wide vulnerability aggregate to DynamoDB (PK=CATALOG, SK=VULNERABILITIES). With --dry-run, it skips CloudWatch metrics and the DynamoDB aggregate write (useful for local inspection).

# Dry run (print structured logs to stdout, no CloudWatch push)
make container-catalog/metrics ARGS="--distro hummingbird --table-name <table> --dry-run"

# Push metrics to CloudWatch and print structured logs
make container-catalog/metrics ARGS="--distro hummingbird --table-name <table>"

Metrics Lambda

The metrics-lambda subcommand runs as a DynamoDB Stream-triggered Lambda that emits CVE exposure duration metrics to CloudWatch and writes a catalog-wide vulnerability aggregate to DynamoDB (served by GET /v1/vulnerabilities). The CloudWatch metrics feed the SLO dashboard and alarm defined in the error-budgets stack.

How It Works

The Lambda is triggered by DynamoDB Stream events filtered on VULNERABILITIES# and TAGS changes, with a 60-second batching window and reserved concurrency of 1 (single instance). It maintains an in-memory active CVE table across warm invocations:

Cold start: Reads CATALOG/DIRECTORY, all TAGS, and all non-superseded VULNERABILITIES# items from DynamoDB to build the full table (~2-3s at 10000 tags)
Warm invocations: Incrementally updates the table from stream event keys (~50-100 GetItem calls per batch)
After each invocation: Recomputes and emits all metrics and structured logs

CloudWatch Metrics

Metric	Type	Dimensions
`CveExposureDuration`	Distribution	`[Distro]`, `[Distro, Severity]`
`ActiveCveCount`	Count	`[Distro, Severity]`

CveExposureDuration values are in hours, computed as now - first_seen for each active CVE on each non-superseded canonical tag. [Distro] is used by the SLO alarm; [Distro, Severity] is used by the dashboard.

Structured Logs

Each invocation emits one JSON log line per active CVE to stdout (captured by CloudWatch Logs). Example query for all active CVEs:

filter message = "active_cve"
| fields cve, severity, exposure_hours, repository, stream, variant, component
| sort exposure_hours desc

Environment Variable	Description
`TABLE_NAME`	DynamoDB table name
`DISTRO`	`rawhide` or `hummingbird`
`CLOUDWATCH_NAMESPACE`	CloudWatch namespace for metrics
`SENTRY_DSN`	Optional Sentry DSN

Deployment

make container-catalog/build
make container-catalog/deploy

Configuration

catalog sync

Argument	Description
`--distro`	`rawhide` or `hummingbird`
`--table-name`	DynamoDB table name
`--purge`	Delete all items before writing
`--cache-dir`	Cache directory (auto-detected)
`--image`	Sync only a specific repo
`--legacy-discovery`	Use GitLab-based repo discovery

catalog index

Argument	Description
`--distro`	`rawhide` or `hummingbird`
`--table-name`	DynamoDB table name (for image/tag discovery)
`--bucket`	S3 bucket for scanner data
`--dry-run`	Print what would be uploaded without uploading
`--image`	Index only a specific image
`--parallel`	Number of concurrent operations (default: 2)

catalog scan

Argument	Description
`--distro`	`rawhide` or `hummingbird`
`--table-name`	DynamoDB table name (required)
`--bucket`	S3 bucket with native Syft JSON (uses S3 scan path)
`--scope`	`non-superseded`, `tags` (default), or `all`
`--dry-run`	Print items without writing
`--purge`	Purge vuln data before writing (implies `--scope all`)
`--cache-dir`	Cache directory (auto-detected)
`--parallel`	Number of concurrent scans (default: 4)
`--image`	Scan only a specific image
`--tag`	Scan only a specific tag (requires `--image`)

SAM Parameters

Parameter	Description
`Distro`	`rawhide` or `hummingbird`
`CacheEnabled`	Enable CloudFront caching
`CatalogDomainName`	Catalog web UI domain
`ApiDomainName`	API domain
`HostedZoneId`	Route53 hosted zone
`CorsOrigins`	Comma-separated CORS origins (default `*`)
`SnsTopicArn`	SNS topic ARN for Release events (enables sync Lambda)

Frontend

The catalog web UI is a Lit 3 SPA (Web Components) with Tailwind CSS, built per-distro with Vite. Source is in container-catalog/frontend/.

Only make and podman are required (no local Node.js needed). Defaults from .envrc.defaults are applied automatically.

# Install dependencies
make container-catalog/frontend/setup

# Development server at http://localhost:5173
make container-catalog/frontend/dev

# Production build
make container-catalog/frontend/build

Host variants (*-host) run without podman (for CI or local Node.js).

Frontend Build Variables

Variable	Description
`VITE_API_URL`	API base URL for the distro
`VITE_DISTRO`	`rawhide` or `hummingbird`
`VITE_DISTRO_LABEL`	Display label for current distro
`VITE_OTHER_CATALOG_URL`	URL of the other distro’s catalog (optional, hides link if unset)
`VITE_OTHER_DISTRO_LABEL`	Display label for other distro (optional)
`VITE_VULNERABILITIES_ENABLED`	Show vulnerabilities tab

Development

# Backend
cargo test                    # Run tests
cargo clippy --all-targets   # Lint
cargo fmt                     # Format

# Frontend (host variants, requires local Node.js)
cd container-catalog/frontend
npm run typecheck             # Type check
npm run build                 # Production build

License

This project is licensed under the GNU General Public License v3.0 or later - see the LICENSE file for details.