Lambda S3 Cache

An AWS Lambda setup that caches files from public URLs in S3. When a URL is requested, the service returns a 302 redirect to either the cached S3 copy (via presigned URL) or the original source. Cache misses trigger asynchronous downloads to S3, ensuring future requests are served from the cache.

Caching uses the original URL’s host and path as the S3 key - each unique URL maps to a single cache entry that persists until expiration. Frequently accessed content automatically extends its cache lifetime on each access.

Note: Cached content behavior depends on file type:

  • Immutable content (matching immutable extension filter, e.g., .rpm): Changes at origin won’t be reflected until cache expires
  • Mutable content (non-matching extension, e.g., repomd.xml): Cache is revalidated on each request via HEAD check; stale behavior is configurable

Cache Revalidation: Mutable content uses HTTP ETags for efficient validation. The uploader stores the origin’s ETag in S3 metadata when caching content. On subsequent requests, the handler sends a HEAD request with If-None-Match header containing the stored ETag. If the origin responds with 304 Not Modified, the cache is fresh and served directly. If the ETag differs, the cache is stale and an async refresh is triggered; response behavior depends on StaleCacheBehavior.

Note: If the origin doesn’t provide ETags, the cache cannot validate freshness. In this case, mutable content always redirects to origin without triggering cache updates (caching would be ineffective since every request would redirect anyway).

Features

  • Streaming Upload: Handles files efficiently by streaming directly from source to S3 without loading into memory
  • Presigned S3 URLs: Returns short-lived signed URLs for cached content
  • Automatic Expiration: Cached content expires after a certain time, with lifetime extended on each access
  • URL Prefix Allowlist: Only caches content matching explicitly allowed host+path prefixes
  • Immutable Extension Filter: Identifies immutable content that doesn’t require revalidation
  • Cache Revalidation: Mutable content (non-matching extensions) is validated on each request; stale cache behavior is configurable (origin or cache)
  • Custom Domain: Optional custom domain with automatic TLS certificate management via ACM and Route53

Architecture

Three Lambda functions handle the caching workflow:

  1. Handler - API Gateway endpoint that checks cache, returns 302 redirects, and triggers async operations. Implements touch cooldown to prevent S3 throttling by only touching objects after a configurable time period has elapsed since last modification.
  2. Uploader - Downloads from origin and streams to S3 on cache misses (invoked asynchronously)
  3. Touch - Updates S3 object timestamps to extend cache lifetime on cache hits (invoked asynchronously only when cooldown period has elapsed)

Request Flow

flowchart TD
    Start([Request]) --> CheckURL{URL matches<br/>AllowedPrefixes?}

    CheckURL -->|No| RedirectOrigin
    CheckURL -->|Yes| HeadS3[HEAD S3]:::network
    HeadS3 --> CheckCache{Cache exists?}

    CheckCache -->|No: Cache Miss| InvokeUploaderMiss[Invoke Uploader async]:::async
    CheckCache -->|Yes: Cache Hit| CheckImmutable{Immutable extension?}

    CheckImmutable -->|No: Mutable| HeadOrigin[HEAD Origin<br/>If-None-Match: stored ETag]:::network
    HeadOrigin --> CheckChanged{Origin changed?}

    CheckChanged -->|No ETag from origin| RedirectOrigin
    CheckChanged -->|ETag differs| InvokeUploaderStale[Invoke Uploader async]:::async
    CheckChanged -->|ETag matches| CheckCooldown
    CheckChanged -->|304 Not Modified| CheckCooldown
    CheckChanged -->|"Error/timeout"| CheckCooldown

    CheckImmutable -->|Yes| CheckCooldown

    CheckCooldown{Touch cooldown<br/>elapsed?} -->|Yes| InvokeTouch[Invoke Touch async]:::async

    InvokeUploaderMiss --> RedirectOrigin
    InvokeUploaderStale -->|"STALE_CACHE_BEHAVIOR=origin"| RedirectOrigin

    InvokeUploaderStale -->|"STALE_CACHE_BEHAVIOR=cache"| RedirectCache
    CheckCooldown -->|No| RedirectCache
    InvokeTouch --> RedirectCache

    subgraph redirectGroup [ ]
        RedirectOrigin[302 to Origin URL]:::redirect
        RedirectCache[302 to S3 Presigned URL]:::redirect
    end
    style redirectGroup fill:none,stroke:none

    classDef network fill:#10b981,color:#000
    classDef async fill:#ff9900,color:#000
    classDef redirect fill:#3b82f6,color:#fff

Legend: Green = network call, Orange = async Lambda invocation, Blue = 302 redirect response

Prerequisites

  • AWS CLI configured with appropriate credentials (IAM permissions for Lambda, API Gateway, S3, CloudFormation, CloudWatch Logs, and optionally Route53/ACM for custom domain)
  • Podman or Docker (for containerized SAM build/deploy)
  • Python 3.11 or later (for development)

Deployment

Build and deploy using containerized AWS SAM CLI:

make lambda-s3-cache/build     # Build Lambda package
make lambda-s3-cache/deploy    # First deployment (interactive/guided)
make lambda-s3-cache/redeploy  # Subsequent deployments (non-interactive)

Deployment output: ApiEndpoint - the API Gateway URL to use for requests

Custom Domain

Optional custom domain with automatic TLS certificate management (ACM + Route53). Requires a Route53 hosted zone. Deploy with CustomDomainName and HostedZoneId parameters - CloudFormation handles certificate creation, DNS validation, and configuration. Certificate validation takes 5-30 minutes; allow up to 1 hour for DNS propagation.

Parameters

Parameter Description Default
ResourcePrefix Prefix for all resource names myapp-prod
PresignedUrlExpiration Presigned URL expiration (seconds) 3600
AllowedPrefixes Whitespace-separated list of allowed URL prefixes example.com/path/
ImmutableExtensions File extensions for immutable content (no revalidation) .rpm
CacheExpirationDays Days to keep cached content (minimum: 1) 14
TouchCooldownMinutes Minimum minutes between touch operations 60
StaleCacheBehavior When cache is stale: origin or cache origin
CustomDomainName Optional custom domain name ``
HostedZoneId Route53 hosted zone ID (required if custom domain) ``
SentryDsn Optional Sentry DSN for error tracking ``

Resource naming: All AWS resources follow {ResourcePrefix}-{type}-{name} pattern (e.g., myapp-prod-bucket, myapp-prod-lambda-handler).

Usage

Make GET requests to the API endpoint (or custom domain if configured). The URL to cache is encoded in the path (without the https:// prefix):

curl -L "https://<api-endpoint>/example.com/path/to/file.rpm"

Behavior:

  • First request (cache miss): Redirects to original URL while triggering async S3 upload
  • Subsequent requests (cache hit):
    • Immutable content (matching immutable extension filter): Redirects to presigned S3 URL and resets cache expiration
    • Mutable content (non-matching extension): HEAD request validates cache freshness (10s timeout). If stale, behavior depends on StaleCacheBehavior; if fresh, serves from cache
  • Disallowed prefixes: URLs not matching allowed prefixes are transparently redirected to the original URL (302 pass-through)

Immutable Extension Filter Behavior

The ImmutableExtensions parameter determines caching behavior:

  • Matching extension (e.g., .rpm): Treated as immutable content. Cached without revalidation - changes at origin won’t be reflected until cache expires. Served directly from cache on all requests.

  • Non-matching extension (e.g., .xml, .gz): Treated as mutable content. Cached with revalidation - HEAD request on each cache hit validates freshness using ETags. If stale, behavior depends on StaleCacheBehavior:

    • origin (default): Redirects to origin for fresh content, async refresh
    • cache: Serves stale cache immediately (faster), async refresh in background

Important: All files matching AllowedPrefixes are cached, regardless of extension. The extension filter only determines whether revalidation is performed.

Usage as Repository Proxy

The cache can be used as a DNF/yum baseurl for RPM repositories. Configure ImmutableExtensions to include .rpm:

  • .rpm files (matching extension): Cached as immutable - fast, no revalidation
  • Metadata files (non-matching extension, e.g., repomd.xml, primary.xml.gz): Cached with revalidation - ensures fresh metadata while benefiting from cache
[myrepo]
name=My Repository
baseurl=https://koji-s3-cache.example.com/download.example.org/pub/repo/$basearch/
enabled=1

This provides caching benefits for RPM downloads while ensuring repository metadata stays fresh.

Development

See the main README for development workflows.

make lambda-s3-cache/setup  # Install dependencies
make check                  # Lint code
make fmt                    # Format code
make test                   # Run unit tests
make coverage               # Run tests with coverage

Configuration

Lambda functions receive configuration via environment variables (automatically set by CloudFormation):

Variable Handler Uploader Touch Description
S3_BUCKET_NAME S3 bucket name
PRESIGNED_URL_EXPIRATION Presigned URL expiration (sec)
UPLOADER_LAMBDA_ARN Uploader Lambda ARN
TOUCH_LAMBDA_ARN Touch Lambda ARN
TOUCH_COOLDOWN_MINUTES Min minutes between touch ops
ALLOWED_PREFIXES Allowed URL prefixes
IMMUTABLE_EXTENSIONS Extensions for immutable content
STALE_CACHE_BEHAVIOR Stale behavior: origin or cache
SENTRY_DSN Optional Sentry DSN

Security & Limitations

Security:

  • S3 bucket has public access blocked; all objects encrypted at rest (AES256)
  • Presigned URLs expire after a certain time
  • IAM policies follow least privilege principle
  • URL prefix allowlist prevents caching arbitrary URLs
  • Immutable extension filter distinguishes immutable vs mutable content for revalidation

Limitations:

  • Lambda timeout: 15 min (uploader), 30 sec (handler, touch)
  • Lambda memory: 1024 MB (uploader), 256 MB (handler, touch)
  • S3 object size: Up to 5 TB (AWS limit)

License

This project is licensed under the GNU General Public License v3.0 or later - see the LICENSE file for details.