Lambda S3 Cache
An AWS Lambda setup that caches files from public URLs in S3. When a URL is requested, the service returns a 302 redirect to either the cached S3 copy (via presigned URL) or the original source. Cache misses trigger asynchronous downloads to S3, ensuring future requests are served from the cache.
Caching uses the original URL’s host and path as the S3 key - each unique URL maps to a single cache entry that persists until expiration. Frequently accessed content automatically extends its cache lifetime on each access.
Note: Cached content behavior depends on file type:
- Immutable content (matching immutable extension filter, e.g.,
.rpm): Changes at origin won’t be reflected until cache expires - Mutable content (non-matching extension, e.g.,
repomd.xml): Cache is revalidated on each request via HEAD check; stale behavior is configurable
Cache Revalidation: Mutable content uses HTTP ETags for efficient validation.
The uploader stores the origin’s ETag in S3 metadata when caching content. On
subsequent requests, the handler sends a HEAD request with If-None-Match header
containing the stored ETag. If the origin responds with 304 Not Modified, the
cache is fresh and served directly. If the ETag differs, the cache is stale and
an async refresh is triggered; response behavior depends on StaleCacheBehavior.
Note: If the origin doesn’t provide ETags, the cache cannot validate freshness. In this case, mutable content always redirects to origin without triggering cache updates (caching would be ineffective since every request would redirect anyway).
Features
- Streaming Upload: Handles files efficiently by streaming directly from source to S3 without loading into memory
- Presigned S3 URLs: Returns short-lived signed URLs for cached content
- Automatic Expiration: Cached content expires after a certain time, with lifetime extended on each access
- URL Prefix Allowlist: Only caches content matching explicitly allowed host+path prefixes
- Immutable Extension Filter: Identifies immutable content that doesn’t require revalidation
- Cache Revalidation: Mutable content (non-matching extensions) is validated on each request; stale cache behavior is configurable (origin or cache)
- Custom Domain: Optional custom domain with automatic TLS certificate management via ACM and Route53
Architecture
Three Lambda functions handle the caching workflow:
- Handler - API Gateway endpoint that checks cache, returns 302 redirects, and triggers async operations. Implements touch cooldown to prevent S3 throttling by only touching objects after a configurable time period has elapsed since last modification.
- Uploader - Downloads from origin and streams to S3 on cache misses (invoked asynchronously)
- Touch - Updates S3 object timestamps to extend cache lifetime on cache hits (invoked asynchronously only when cooldown period has elapsed)
Request Flow
flowchart TD
Start([Request]) --> CheckURL{URL matches<br/>AllowedPrefixes?}
CheckURL -->|No| RedirectOrigin
CheckURL -->|Yes| HeadS3[HEAD S3]:::network
HeadS3 --> CheckCache{Cache exists?}
CheckCache -->|No: Cache Miss| InvokeUploaderMiss[Invoke Uploader async]:::async
CheckCache -->|Yes: Cache Hit| CheckImmutable{Immutable extension?}
CheckImmutable -->|No: Mutable| HeadOrigin[HEAD Origin<br/>If-None-Match: stored ETag]:::network
HeadOrigin --> CheckChanged{Origin changed?}
CheckChanged -->|No ETag from origin| RedirectOrigin
CheckChanged -->|ETag differs| InvokeUploaderStale[Invoke Uploader async]:::async
CheckChanged -->|ETag matches| CheckCooldown
CheckChanged -->|304 Not Modified| CheckCooldown
CheckChanged -->|"Error/timeout"| CheckCooldown
CheckImmutable -->|Yes| CheckCooldown
CheckCooldown{Touch cooldown<br/>elapsed?} -->|Yes| InvokeTouch[Invoke Touch async]:::async
InvokeUploaderMiss --> RedirectOrigin
InvokeUploaderStale -->|"STALE_CACHE_BEHAVIOR=origin"| RedirectOrigin
InvokeUploaderStale -->|"STALE_CACHE_BEHAVIOR=cache"| RedirectCache
CheckCooldown -->|No| RedirectCache
InvokeTouch --> RedirectCache
subgraph redirectGroup [ ]
RedirectOrigin[302 to Origin URL]:::redirect
RedirectCache[302 to S3 Presigned URL]:::redirect
end
style redirectGroup fill:none,stroke:none
classDef network fill:#10b981,color:#000
classDef async fill:#ff9900,color:#000
classDef redirect fill:#3b82f6,color:#fff
Legend: Green = network call, Orange = async Lambda invocation, Blue = 302 redirect response
Prerequisites
- AWS CLI configured with appropriate credentials (IAM permissions for Lambda, API Gateway, S3, CloudFormation, CloudWatch Logs, and optionally Route53/ACM for custom domain)
- Podman or Docker (for containerized SAM build/deploy)
- Python 3.11 or later (for development)
Deployment
Build and deploy using containerized AWS SAM CLI:
make lambda-s3-cache/build # Build Lambda package
make lambda-s3-cache/deploy # First deployment (interactive/guided)
make lambda-s3-cache/redeploy # Subsequent deployments (non-interactive)
Deployment output: ApiEndpoint - the API Gateway URL to use for requests
Custom Domain
Optional custom domain with automatic TLS certificate management (ACM +
Route53). Requires a Route53 hosted zone. Deploy with CustomDomainName and
HostedZoneId parameters - CloudFormation handles certificate creation, DNS
validation, and configuration. Certificate validation takes 5-30 minutes; allow
up to 1 hour for DNS propagation.
Parameters
| Parameter | Description | Default |
|---|---|---|
ResourcePrefix |
Prefix for all resource names | myapp-prod |
PresignedUrlExpiration |
Presigned URL expiration (seconds) | 3600 |
AllowedPrefixes |
Whitespace-separated list of allowed URL prefixes | example.com/path/ |
ImmutableExtensions |
File extensions for immutable content (no revalidation) | .rpm |
CacheExpirationDays |
Days to keep cached content (minimum: 1) | 14 |
TouchCooldownMinutes |
Minimum minutes between touch operations | 60 |
StaleCacheBehavior |
When cache is stale: origin or cache |
origin |
CustomDomainName |
Optional custom domain name | `` |
HostedZoneId |
Route53 hosted zone ID (required if custom domain) | `` |
SentryDsn |
Optional Sentry DSN for error tracking | `` |
Resource naming: All AWS resources follow {ResourcePrefix}-{type}-{name}
pattern (e.g., myapp-prod-bucket, myapp-prod-lambda-handler).
Usage
Make GET requests to the API endpoint (or custom domain if configured). The URL
to cache is encoded in the path (without the https:// prefix):
curl -L "https://<api-endpoint>/example.com/path/to/file.rpm"
Behavior:
- First request (cache miss): Redirects to original URL while triggering async S3 upload
- Subsequent requests (cache hit):
- Immutable content (matching immutable extension filter): Redirects to presigned S3 URL and resets cache expiration
- Mutable content (non-matching extension): HEAD request validates cache
freshness (10s timeout). If stale, behavior depends on
StaleCacheBehavior; if fresh, serves from cache
- Disallowed prefixes: URLs not matching allowed prefixes are transparently redirected to the original URL (302 pass-through)
Immutable Extension Filter Behavior
The ImmutableExtensions parameter determines caching behavior:
-
Matching extension (e.g.,
.rpm): Treated as immutable content. Cached without revalidation - changes at origin won’t be reflected until cache expires. Served directly from cache on all requests. -
Non-matching extension (e.g.,
.xml,.gz): Treated as mutable content. Cached with revalidation - HEAD request on each cache hit validates freshness using ETags. If stale, behavior depends onStaleCacheBehavior:origin(default): Redirects to origin for fresh content, async refreshcache: Serves stale cache immediately (faster), async refresh in background
Important: All files matching AllowedPrefixes are cached, regardless of
extension. The extension filter only determines whether revalidation is performed.
Usage as Repository Proxy
The cache can be used as a DNF/yum baseurl for RPM repositories. Configure
ImmutableExtensions to include .rpm:
.rpmfiles (matching extension): Cached as immutable - fast, no revalidation- Metadata files (non-matching extension, e.g.,
repomd.xml,primary.xml.gz): Cached with revalidation - ensures fresh metadata while benefiting from cache
[myrepo]
name=My Repository
baseurl=https://koji-s3-cache.example.com/download.example.org/pub/repo/$basearch/
enabled=1
This provides caching benefits for RPM downloads while ensuring repository metadata stays fresh.
Development
See the main README for development workflows.
make lambda-s3-cache/setup # Install dependencies
make check # Lint code
make fmt # Format code
make test # Run unit tests
make coverage # Run tests with coverage
Configuration
Lambda functions receive configuration via environment variables (automatically set by CloudFormation):
| Variable | Handler | Uploader | Touch | Description |
|---|---|---|---|---|
S3_BUCKET_NAME |
✅ | ✅ | ✅ | S3 bucket name |
PRESIGNED_URL_EXPIRATION |
✅ | Presigned URL expiration (sec) | ||
UPLOADER_LAMBDA_ARN |
✅ | Uploader Lambda ARN | ||
TOUCH_LAMBDA_ARN |
✅ | Touch Lambda ARN | ||
TOUCH_COOLDOWN_MINUTES |
✅ | Min minutes between touch ops | ||
ALLOWED_PREFIXES |
✅ | Allowed URL prefixes | ||
IMMUTABLE_EXTENSIONS |
✅ | Extensions for immutable content | ||
STALE_CACHE_BEHAVIOR |
✅ | Stale behavior: origin or cache | ||
SENTRY_DSN |
✅ | ✅ | ✅ | Optional Sentry DSN |
Security & Limitations
Security:
- S3 bucket has public access blocked; all objects encrypted at rest (AES256)
- Presigned URLs expire after a certain time
- IAM policies follow least privilege principle
- URL prefix allowlist prevents caching arbitrary URLs
- Immutable extension filter distinguishes immutable vs mutable content for revalidation
Limitations:
- Lambda timeout: 15 min (uploader), 30 sec (handler, touch)
- Lambda memory: 1024 MB (uploader), 256 MB (handler, touch)
- S3 object size: Up to 5 TB (AWS limit)
License
This project is licensed under the GNU General Public License v3.0 or later - see the LICENSE file for details.