Container Catalog
Per-distro serverless catalog API for Hummingbird container images.
Features
- Image Directory - Browse all container images with metadata
- Tag Browser - View tags, digests, architectures per image
- Specifications - Per-architecture OCI config details (env, cmd, user, labels)
- SBOM - Per-architecture package lists from SPDX attestations
- Vulnerabilities - CVE scanning results via Grype
- CVE Metrics - CloudWatch metrics for CVE exposure duration (see Error Budgets)
- Provenance - Source traceability from SLSA attestations
- Release History - Timeline of past builds with drill-down
- OpenAPI Spec - Machine-readable API documentation at /v1/openapi.json
- Swagger UI - Interactive API explorer at /v1/docs/
Architecture
Rust serverless stack deployed as two isolated per-distro stacks:
- API Lambda - DynamoDB pass-through (~10ms response)
- Sync Lambda - Incremental DynamoDB sync from SNS Release events (~22 registry calls per release)
- Index Lambda (
index-lambda) - SNS-triggered Syft JSON generation, uploads native Syft JSON to S3 for scanner consumption - Scan Lambda (
scan-lambda) - SQS-triggered CVE scanning via Grype from native Syft JSON in S3, per-digest processing withfirst_seentracking and partial batch failure reporting - Enqueue Lambda (
enqueue-lambda) - Hourly EventBridge-triggered fan-out that checks Grype DB for updates and enqueues non-superseded digests to SQS - Metrics Lambda (
metrics-lambda) - DynamoDB Stream-triggered CVE exposure metrics and structured logs to CloudWatch - DynamoDB - Pre-computed JSON items (single-table PK/SK design, Streams: KEYS_ONLY)
- S3 ScanDataBucket - Native Syft JSON storage for scanner data (gzipped,
keyed by
grype/{image}/{digest_hex}.json.gz) - SQS ScanQueue - Central queue for scan tasks (fed by S3 events and Enqueue Lambda)
- CloudFront - CDN with per-endpoint cache TTLs
- CloudWatch - CVE exposure metrics and structured logs
- catalog sync - Full DynamoDB population from GitLab + Quay.io OCI v2 registry
- catalog scan - CLI CVE scanning via Grype; reads native Syft JSON from
S3 (with
--bucket) or builds synthetic JSON from DynamoDB SBOMs (fallback) - catalog index - CLI Syft JSON generation for backfilling S3
- Catalog SPA - Lit 3 web app served from S3 via CloudFront
Scanner Architecture
CVE vulnerability data must match direct grype <image> scans exactly for all
images. Synthetic Syft JSON (built from stored SBOM packages) cannot reliably
reproduce the native output because Grype relies on metadata fields, artifact
relationships, and deduplication logic that are lost in the SPDX-to-API
roundtrip. Additionally, raw SPDX from the build system contains package bloat
(e.g. Go sub-modules, empty-version entries) that native Syft filters out
when scanning a compiled binary directly.
To guarantee exact-match results, the scanner chain stores native Syft JSON
in S3 rather than DynamoDB: a single image’s Syft JSON is typically 1-10 MB
(gzipped to 100 KB - 1 MB), which exceeds DynamoDB’s 400 KB item limit. S3
has no per-object size constraint, avoids provisioned throughput costs for
large blobs, and supports event notifications for triggering downstream
scanners. An independent Index Lambda runs syft <image> on each new
release and uploads the full output to
s3://{bucket}/grype/{image}/{hex}.json.gz. The S3 upload fires an event
notification to SQS, which triggers the Scan Lambda to read the native JSON,
run Grype, and write per-canonical vulnerability data to DynamoDB.
The scanner chain is intentionally independent of the catalog data chain: the same SNS Release event triggers both the Sync Lambda (catalog data to DynamoDB) and the Index Lambda (Syft JSON to S3), with no ordering dependency between them.
CVE Data Flow
graph TD
Konflux([Konflux Release]) -->|SNS| Sync[SyncFunction]
Konflux -->|SNS| Index[IndexFunction]
Sync -->|update| Tags[(Tags)]
Tags -->|"stream via ESM"| Metrics
Tags -->|read| Scan[ScanFunction]
Index -->|upload| S3Syft[(S3 Syft JSON)]
S3Syft -->|"S3 event via SQS"| Scan
Hourly([Schedule]) -->|hourly| Enqueue[EnqueueFunction]
Enqueue --> Gate[check Grype DB]
Gate -->|"fan out via SQS"| Scan
Scan -->|update| Vulns[(Image Vulnerabilities)]
Vulns -->|"stream via ESM"| Metrics[MetricsFunction]
Metrics -->|"metrics + logs"| CloudWatch[CloudWatch]
Metrics -->|"write aggregate"| CatalogVulns[(Catalog Vulnerabilities)]
classDef lambda fill:#d4e6f1,stroke:#2980b9
classDef dynamo fill:#fdebd0,stroke:#e67e22
classDef trigger fill:#d5f5e3,stroke:#27ae60
classDef aws fill:#e8daf1,stroke:#8e44ad
classDef gate fill:#e5e7e9,stroke:#7f8c8d
class Sync,Scan,Enqueue,Metrics,Index lambda
class Vulns,Tags,CatalogVulns,S3Syft dynamo
class Konflux,Hourly trigger
class CloudWatch aws
class Gate gate
Prerequisites
- Rust 1.75+ (for building backend)
- Node.js 22+ (for building frontend)
- AWS credentials (for DynamoDB access and deployment)
- SAM CLI (for deployment)
API Endpoints
| Endpoint | Description |
|---|---|
GET /v1/images |
Image directory |
GET /v1/images/{name} |
Image overview (README) |
GET /v1/images/{name}/tags |
Tags for an image |
GET /v1/images/{name}/details/{canonical} |
Per-canonical details |
GET /v1/images/{name}/sbom/{canonical} |
Package list |
GET /v1/images/{name}/vulnerabilities/{canonical} |
Vulnerability scan |
GET /v1/images/{name}/history/{stream}/{variant} |
Release timeline |
GET /v1/images/{name}/releases/details/{digest} |
Release details (immutable) |
GET /v1/images/{name}/releases/sbom/{digest} |
Release SBOM (immutable) |
GET /v1/images/{name}/releases/vulnerabilities/{digest} |
Release vulnerabilities |
GET /v1/vulnerabilities |
Catalog-wide CVE aggregate |
GET /v1/openapi.json |
OpenAPI 3.1 specification |
GET /v1/docs/ |
Interactive Swagger UI |
Timestamp Fields
The oldest_created field on ImageSummary, Tag, and HistorySummary is
the earliest OCI created timestamp across all architectures in the release.
All architectures were built at or after this date, making it useful for
conservative staleness detection.
The specifications endpoint returns per-architecture data keyed by architecture
name. Each architecture’s created field is the direct OCI config root
timestamp for that specific architecture.
Usage
All tools are built as a single catalog binary with subcommands (api,
sync, sync-lambda, scan, scan-lambda, index, index-lambda,
enqueue-lambda, metrics, metrics-lambda). The binary is built in the
Rust container and CLI subcommands are run in the gitlab-ci container (which
provides grype, syft, and other tools). Only make and podman are required.
Sync
# Dry run (print items to stdout)
make container-catalog/sync ARGS="--distro rawhide --dry-run"
# Populate DynamoDB
make container-catalog/sync ARGS="--distro rawhide --table-name <table>"
Sync Lambda
The sync-lambda subcommand runs as an AWS Lambda function triggered by SNS
Release events from kubernetes-event-forwarder.
It incrementally syncs a single image release to DynamoDB (~22 registry API
calls per release vs ~104 for a full sync).
The Lambda:
- Decodes gzip+base64 SNS messages
- Filters for Succeeded releases targeting the configured Quay.io namespace
- Fetches OCI manifest data for the new digest
- Writes per-digest items (DETAILS, SBOM, RELEASE_DETAILS, RELEASE_SBOM)
- Merges into aggregate items (TAGS, HISTORY, OVERVIEW, DIRECTORY)
- Fetches README from GitLab for OVERVIEW content (uses
README.redhat.mdfor hummingbird,README.mdfor rawhide)
Registry fetch errors (manifest, SBOM, attestation) propagate as hard failures so the Lambda retries automatically (up to 2 retries with backoff) before sending to the DLQ. GitLab README failures are non-fatal – the existing README is preserved if the fetch fails.
| Environment Variable | Description |
|---|---|
TABLE_NAME |
DynamoDB table name |
DISTRO |
rawhide or hummingbird |
SENTRY_DSN |
Optional Sentry DSN for error tracking |
Index
The index subcommand generates native Syft JSON for all non-superseded image
digests and uploads them to S3 for scanner consumption. It reads the image
directory and TAGS from DynamoDB, checks S3 for existing objects (dedup via
HeadObject), runs syft <image> --platform linux/amd64, gzips the output,
and uploads to s3://{bucket}/grype/{image}/{hex}.json.gz.
# Dry run (show what would be uploaded)
make container-catalog/index ARGS="--distro hummingbird --table-name <table> --bucket <bucket> --dry-run"
# Backfill S3 for all images
make container-catalog/index ARGS="--distro hummingbird --table-name <table> --bucket <bucket>"
# Backfill a single image
make container-catalog/index ARGS="--distro hummingbird --table-name <table> --bucket <bucket> --image caddy"
Index Lambda
The index-lambda subcommand runs as an AWS Lambda function triggered by the
same SNS Release events as the Sync Lambda. It operates independently of the
catalog chain (no DynamoDB access) and generates native Syft JSON for scanner
consumption.
The Lambda:
- Decodes gzip+base64 SNS messages (shared with Sync Lambda via
release_eventmodule) - Filters for Succeeded releases targeting the configured Quay.io namespace
- Checks S3 for existing objects (HeadObject dedup by digest)
- Runs
syft <image>@<digest> --platform linux/amd64 --output syft-json - Gzips and uploads to
s3://{bucket}/grype/{image}/{hex}.json.gz
Failures propagate for Lambda retry (up to 2 retries) before sending to the
IndexDLQ. The catalog index CLI backfills any gaps.
| Environment Variable | Description |
|---|---|
SCAN_DATA_BUCKET |
S3 bucket for scanner data |
DISTRO |
rawhide or hummingbird |
SENTRY_DSN |
Optional Sentry DSN for error tracking |
Scan
The scan subcommand reads image listings, tags, and SBOMs from DynamoDB
(no registry access needed) and runs Grype against each image’s stored SBOM
packages. Results include a first_seen timestamp per CVE, tracked at the
group+variant+stream level and carried across releases for SLI computation.
# Dry run (print items to stdout)
make container-catalog/scan ARGS="--distro hummingbird --table-name <table> --dry-run"
# Scan and write to DynamoDB (purge stale vuln data first, implies --scope=all)
make container-catalog/scan ARGS="--distro hummingbird --table-name <table> --purge"
# Scan only non-superseded (current) tags
make container-catalog/scan ARGS="--distro hummingbird --table-name <table> --scope non-superseded"
# Scan all releases including historic (tagless) releases
make container-catalog/scan ARGS="--distro hummingbird --table-name <table> --scope all"
# Scan a single image
make container-catalog/scan ARGS="--distro hummingbird --table-name <table> --image caddy --dry-run"
Scan Lambda
The scan-lambda subcommand runs as an AWS Lambda function triggered by SQS
messages. Two paths feed the ScanQueue:
- Real-time (S3 event notification): When the Index Lambda uploads a new
Syft JSON to S3, an
s3:ObjectCreatedevent (filtered ongrype/prefix) is sent directly to SQS - Hourly (Enqueue Lambda): Fans out all non-superseded digests when the
Grype vulnerability database has been updated, sending
{"bucket": "...", "key": "grype/{image}/{hex}.json.gz"}messages
The Lambda accepts both S3 event notification JSON and direct
{"bucket", "key"} messages. For each message it:
- Downloads and decompresses the native Syft JSON from S3
- Runs Grype on the raw bytes
- Looks up all canonical tags matching the digest from the TAGS item
- Writes
VULNERABILITIES#{canonical}for each matching tag (withfirst_seentracking) andRELEASE_VULNERABILITIES#{hex}once per digest
Processing is per-digest: a single Syft JSON serves all canonicals sharing that digest, avoiding redundant Grype invocations.
The Lambda loads the Grype DB on cold start (cached in /tmp for warm
invocations), processes up to 10 messages per batch, and reports partial
batch failures so only failed records return to the queue.
| Environment Variable | Description |
|---|---|
TABLE_NAME |
DynamoDB table name |
DISTRO |
rawhide or hummingbird |
SENTRY_DSN |
Optional Sentry DSN for error tracking |
Enqueue Lambda
The enqueue-lambda subcommand runs hourly via EventBridge Schedule. On each
invocation it performs a lightweight HTTP GET of the public Grype DB listing
(latest.json, ~200 bytes) and compares the built timestamp against the
stored CATALOG/LAST_FULL_SCAN_DB item in DynamoDB. If the DB hasn’t changed,
it returns early (~23 of 24 hourly invocations short-circuit). When an update
is detected, it reads the image directory and all TAGS items, deduplicates by
digest, and sends one SQS message per unique digest using SendMessageBatch.
Messages use the format {"bucket": "...", "key": "grype/{image}/{hex}.json.gz"}.
| Environment Variable | Description |
|---|---|
TABLE_NAME |
DynamoDB table name |
DISTRO |
rawhide or hummingbird |
SCAN_QUEUE_URL |
SQS queue URL for scan messages |
SCAN_DATA_BUCKET |
S3 bucket for scanner data |
GRYPE_DB_LATEST_URL |
Grype DB listing URL (has sensible default) |
SENTRY_DSN |
Optional Sentry DSN for error tracking |
Metrics
The metrics subcommand performs a one-shot read of all non-superseded
vulnerability data from DynamoDB, outputs structured CVE exposure logs, and
writes the catalog-wide vulnerability aggregate to DynamoDB
(PK=CATALOG, SK=VULNERABILITIES). With --dry-run, it skips CloudWatch
metrics and the DynamoDB aggregate write (useful for local inspection).
# Dry run (print structured logs to stdout, no CloudWatch push)
make container-catalog/metrics ARGS="--distro hummingbird --table-name <table> --dry-run"
# Push metrics to CloudWatch and print structured logs
make container-catalog/metrics ARGS="--distro hummingbird --table-name <table>"
Metrics Lambda
The metrics-lambda subcommand runs as a DynamoDB Stream-triggered Lambda that
emits CVE exposure duration metrics to CloudWatch and writes a catalog-wide
vulnerability aggregate to DynamoDB (served by GET /v1/vulnerabilities).
The CloudWatch metrics feed the SLO dashboard and alarm defined in the
error-budgets stack.
How It Works
The Lambda is triggered by DynamoDB Stream events filtered on VULNERABILITIES#
and TAGS changes, with a 60-second batching window and reserved concurrency
of 1 (single instance). It maintains an in-memory active CVE table across warm
invocations:
- Cold start: Reads
CATALOG/DIRECTORY, allTAGS, and all non-supersededVULNERABILITIES#items from DynamoDB to build the full table (~2-3s at 10000 tags) - Warm invocations: Incrementally updates the table from stream event keys (~50-100 GetItem calls per batch)
- After each invocation: Recomputes and emits all metrics and structured logs
CloudWatch Metrics
| Metric | Type | Dimensions |
|---|---|---|
CveExposureDuration |
Distribution | [Distro], [Distro, Severity] |
ActiveCveCount |
Count | [Distro, Severity] |
CveExposureDuration values are in hours, computed as now - first_seen for
each active CVE on each non-superseded canonical tag. [Distro] is used by
the SLO alarm; [Distro, Severity] is used by the dashboard.
Structured Logs
Each invocation emits one JSON log line per active CVE to stdout (captured by CloudWatch Logs). Example query for all active CVEs:
filter message = "active_cve"
| fields cve, severity, exposure_hours, repository, stream, variant, component
| sort exposure_hours desc
| Environment Variable | Description |
|---|---|
TABLE_NAME |
DynamoDB table name |
DISTRO |
rawhide or hummingbird |
CLOUDWATCH_NAMESPACE |
CloudWatch namespace for metrics |
SENTRY_DSN |
Optional Sentry DSN |
Deployment
make container-catalog/build
make container-catalog/deploy
Configuration
catalog sync
| Argument | Description |
|---|---|
--distro |
rawhide or hummingbird |
--table-name |
DynamoDB table name |
--purge |
Delete all items before writing |
--cache-dir |
Cache directory (auto-detected) |
--image |
Sync only a specific repo |
--legacy-discovery |
Use GitLab-based repo discovery |
catalog index
| Argument | Description |
|---|---|
--distro |
rawhide or hummingbird |
--table-name |
DynamoDB table name (for image/tag discovery) |
--bucket |
S3 bucket for scanner data |
--dry-run |
Print what would be uploaded without uploading |
--image |
Index only a specific image |
--parallel |
Number of concurrent operations (default: 2) |
catalog scan
| Argument | Description |
|---|---|
--distro |
rawhide or hummingbird |
--table-name |
DynamoDB table name (required) |
--bucket |
S3 bucket with native Syft JSON (uses S3 scan path) |
--scope |
non-superseded, tags (default), or all |
--dry-run |
Print items without writing |
--purge |
Purge vuln data before writing (implies --scope all) |
--cache-dir |
Cache directory (auto-detected) |
--parallel |
Number of concurrent scans (default: 4) |
--image |
Scan only a specific image |
--tag |
Scan only a specific tag (requires --image) |
SAM Parameters
| Parameter | Description |
|---|---|
Distro |
rawhide or hummingbird |
CacheEnabled |
Enable CloudFront caching |
CatalogDomainName |
Catalog web UI domain |
ApiDomainName |
API domain |
HostedZoneId |
Route53 hosted zone |
CorsOrigins |
Comma-separated CORS origins (default *) |
SnsTopicArn |
SNS topic ARN for Release events (enables sync Lambda) |
Frontend
The catalog web UI is a Lit 3 SPA (Web Components) with Tailwind CSS, built
per-distro with Vite. Source is in container-catalog/frontend/.
Only make and podman are required (no local Node.js needed).
Defaults from .envrc.defaults are applied automatically.
# Install dependencies
make container-catalog/frontend/setup
# Development server at http://localhost:5173
make container-catalog/frontend/dev
# Production build
make container-catalog/frontend/build
Host variants (*-host) run without podman (for CI or local Node.js).
Frontend Build Variables
| Variable | Description |
|---|---|
VITE_API_URL |
API base URL for the distro |
VITE_DISTRO |
rawhide or hummingbird |
VITE_DISTRO_LABEL |
Display label for current distro |
VITE_OTHER_CATALOG_URL |
URL of the other distro’s catalog (optional, hides link if unset) |
VITE_OTHER_DISTRO_LABEL |
Display label for other distro (optional) |
VITE_VULNERABILITIES_ENABLED |
Show vulnerabilities tab |
Development
# Backend
cargo test # Run tests
cargo clippy --all-targets # Lint
cargo fmt # Format
# Frontend (host variants, requires local Node.js)
cd container-catalog/frontend
npm run typecheck # Type check
npm run build # Production build
License
This project is licensed under the GNU General Public License v3.0 or later - see the LICENSE file for details.