Lambda S3 Cache

An AWS Lambda setup that caches files from public URLs in S3. When a URL is requested, the service returns a 302 redirect to either the cached S3 copy (via presigned URL) or the original source. Cache misses trigger asynchronous downloads to S3, ensuring future requests are served from the cache.

Caching uses the original URL’s host and path as the S3 key - each unique URL maps to a single cache entry that persists until expiration. Frequently accessed content automatically extends its cache lifetime on each access.

Note: Cached content is immutable; changes at the origin won’t be reflected until the cache expires.

Features

  • Streaming Upload: Handles files efficiently by streaming directly from source to S3 without loading into memory
  • Presigned S3 URLs: Returns short-lived signed URLs for cached content
  • Automatic Expiration: Cached content expires after a certain time, with lifetime extended on each access
  • URL Prefix Allowlist: Only caches content matching explicitly allowed host+path prefixes
  • Extension Filtering: Limit caching to specific file extensions
  • Custom Domain: Optional custom domain with automatic TLS certificate management via ACM and Route53

Architecture

Three Lambda functions handle the caching workflow:

  1. Handler - API Gateway endpoint that checks cache, returns 302 redirects, and triggers async operations. Implements touch cooldown to prevent S3 throttling by only touching objects after a configurable time period has elapsed since last modification.
  2. Uploader - Downloads from origin and streams to S3 on cache misses (invoked asynchronously)
  3. Touch - Updates S3 object timestamps to extend cache lifetime on cache hits (invoked asynchronously only when cooldown period has elapsed)

Prerequisites

  • AWS CLI configured with appropriate credentials (IAM permissions for Lambda, API Gateway, S3, CloudFormation, CloudWatch Logs, and optionally Route53/ACM for custom domain)
  • Podman or Docker (for containerized SAM build/deploy)
  • Python 3.11 or later (for development)

Deployment

Build and deploy using containerized AWS SAM CLI:

make lambda-s3-cache/build     # Build Lambda package
make lambda-s3-cache/deploy    # First deployment (interactive/guided)
make lambda-s3-cache/redeploy  # Subsequent deployments (non-interactive)

Deployment output: ApiEndpoint - the API Gateway URL to use for requests

Custom Domain

Optional custom domain with automatic TLS certificate management (ACM + Route53). Requires a Route53 hosted zone. Deploy with CustomDomainName and HostedZoneId parameters - CloudFormation handles certificate creation, DNS validation, and configuration. Certificate validation takes 5-30 minutes; allow up to 1 hour for DNS propagation.

Parameters

Parameter Description Default
ResourcePrefix Prefix for all resource names myapp-prod
PresignedUrlExpiration Presigned URL expiration (seconds) 3600
AllowedPrefixes Whitespace-separated list of allowed URL prefixes example.com/path/
AllowedExtensions Whitespace-separated file extensions .rpm
CacheExpirationDays Days to keep cached content (minimum: 1) 14
TouchCooldownMinutes Minimum minutes between touch operations 60
CustomDomainName Optional custom domain name ``
HostedZoneId Route53 hosted zone ID (required if custom domain) ``
SentryDsn Optional Sentry DSN for error tracking ``

Resource naming: All AWS resources follow {ResourcePrefix}-{type}-{name} pattern (e.g., myapp-prod-bucket, myapp-prod-lambda-handler).

Usage

Make GET requests to the API endpoint (or custom domain if configured). The URL to cache is encoded in the path (without the https:// prefix):

curl -L "https://<api-endpoint>/example.com/path/to/file.rpm"

Behavior:

  • First request (cache miss): Redirects to original URL while triggering async S3 upload
  • Subsequent requests (cache hit): Redirects to presigned S3 URL and resets cache expiration
  • Disallowed prefixes/extensions: URLs not matching allowed prefixes or extensions are transparently redirected to the original URL (302 pass-through)

Usage as Repository Proxy

The cache can be used as a DNF/yum baseurl for RPM repositories. Non-RPM files (repository metadata like repomd.xml, primary.xml.gz) pass through transparently via 302 redirects, while .rpm files get cached:

[myrepo]
name=My Repository
baseurl=https://koji-s3-cache.example.com/download.example.org/pub/repo/$basearch/
enabled=1

This provides caching benefits for RPM downloads while maintaining full repository functionality.

Development

See the main README for development workflows.

make lambda-s3-cache/setup  # Install dependencies
make check                  # Lint code
make fmt                    # Format code
make test                   # Run unit tests
make coverage               # Run tests with coverage

Configuration

Lambda functions receive configuration via environment variables (automatically set by CloudFormation):

Variable Handler Uploader Touch Description
S3_BUCKET_NAME S3 bucket name
PRESIGNED_URL_EXPIRATION Presigned URL expiration (sec)
UPLOADER_LAMBDA_ARN Uploader Lambda ARN
TOUCH_LAMBDA_ARN Touch Lambda ARN
TOUCH_COOLDOWN_MINUTES Min minutes between touch ops
ALLOWED_PREFIXES Allowed URL prefixes
ALLOWED_EXTENSIONS Allowed file extensions
SENTRY_DSN Optional Sentry DSN

Security & Limitations

Security:

  • S3 bucket has public access blocked; all objects encrypted at rest (AES256)
  • Presigned URLs expire after a certain time
  • IAM policies follow least privilege principle
  • URL prefix allowlist prevents caching arbitrary URLs
  • Extension allowlist restricts which file types can be cached

Limitations:

  • Lambda timeout: 15 min (uploader), 30 sec (handler, touch)
  • Lambda memory: 1024 MB (uploader), 256 MB (handler, touch)
  • S3 object size: Up to 5 TB (AWS limit)

License

This project is licensed under the GNU General Public License v3.0 or later - see the LICENSE file for details.