S3 Lookaside Cache
Infrastructure for caching dist-git lookaside content, providing access to source artifacts for the Hummingbird build pipeline.
Overview
This stack provides S3-based infrastructure for dist-git artifacts with CloudFront CDN. It caches content from upstream sources like Git tarballs.
A companion stack (s3-lookaside-cache-upload-role) creates a GitLab OIDC
identity provider and an IAM role that GitLab CI jobs can assume using
short-lived JWT tokens, eliminating the need for long-lived access keys.
Architecture
Cache Infrastructure
flowchart LR
clients["Build Clients"]
cf["CloudFront\nDistribution"]
s3["S3 Bucket\n(dist-git-cache)"]
logs["S3 Logs\nBucket"]
backup["AWS Backup\nVault"]
headers["Security\nHeaders"]
clients --> cf --> s3
cf --> headers
s3 --> logs --> backup
GitLab CI Upload Role
flowchart LR
gitlab["GitLab CI\n(.gitlab-ci.yml)"]
sts["AWS STS\n(validates JWT\nvia OIDC/JWKS)"]
role["IAM Role\n(scoped to\nS3 PutObject)"]
s3["S3 Bucket\n(dist-git cache)"]
gitlab -- "id_token" --> sts
sts -- "AssumeRoleWithWebIdentity" --> role
role -- "temporary credentials" --> sts
sts -- "credentials" --> gitlab
gitlab -- "s3:PutObject" --> s3
Components
Cache Stack (s3-lookaside-cache)
| Resource | Type | Description |
|---|---|---|
DistGitCacheBucket |
S3 Bucket | Main cache storage with versioning and object lock |
DistGitLogBucket |
S3 Bucket | Access logs with tiered storage lifecycle |
DistGitCacheDistribution |
CloudFront | CDN with HTTP/2+3, IPv6, TLS 1.2+ |
DistGitCacheOAC |
Origin Access Ctrl | Secure S3 access from CloudFront |
DistGitCacheCachePolicy |
Cache Policy | 1-day default TTL, 1-year max, Gzip/Brotli |
DistGitCacheResponseHeadersPolicy |
Response Headers | HSTS, X-Frame-Options, XSS protection |
DistGitBackupVault |
Backup Vault | AWS Backup vault for data protection |
DistGitBackupPlan |
Backup Plan | Daily (35-day) and weekly (365-day) backups |
DistGitUploadPolicy |
IAM Managed Policy | Grants s3:PutObject to the cache bucket |
Upload Role Stack (s3-lookaside-cache-upload-role)
| Resource | Type | Description |
|---|---|---|
GitLabOIDCProvider |
IAM OIDC Provider | Registers gitlab.com as a trusted identity provider |
DistGitUploadRole |
IAM Role | Web identity role assumable by the configured GitLab project |
Parameters
Cache Stack Parameters
| Parameter | Description |
|---|---|
ResourcePrefix |
Prefix for resource names (e.g., arr-hummingbird-prod-dist-git-cache) |
BucketName |
Globally unique name for the cache bucket (logs bucket appends -logs) |
Upload Role Parameters
| Parameter | Description |
|---|---|
ResourcePrefix |
Prefix for resource names (e.g., arr-hummingbird-prod-dist-git-upload) |
CacheStackName |
Name of the deployed s3-lookaside-cache CloudFormation stack |
GitLabProjectPath |
GitLab project path allowed to assume the role (e.g., redhat/hummingbird/rpms) |
S3 Key Structure
Files are stored using the dist-git lookaside path convention:
{namespace}/{package}/{filename}/{hashType}/{hash}/{filename}
Example:
rpms/tar/tar-1.35.tar.xz/sha512/abc123.../tar-1.35.tar.xz
Trust Policy
The upload role’s trust policy restricts access using three conditions (all
StringEquals):
- Audience (
gitlab.com:aud): Must behttps://gitlab.com - Subject (
gitlab.com:sub): Must matchproject_path:<GitLabProjectPath>:ref_type:branch:ref:main - Protected ref (
gitlab.com:ref_protected): Must be"true"
Only the main branch of the configured project can assume the role.
GitLab CI Usage
Configure your .gitlab-ci.yml to assume the role using
id_tokens. The AWS CLI
automatically calls AssumeRoleWithWebIdentity when
AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_ARN are set:
upload to cache:
image:
name: amazon/aws-cli:latest
entrypoint: [""]
id_tokens:
GITLAB_OIDC_TOKEN:
aud: https://gitlab.com
script:
- set +x
- printenv GITLAB_OIDC_TOKEN > /tmp/oidc-token
- export AWS_WEB_IDENTITY_TOKEN_FILE=/tmp/oidc-token
- export AWS_ROLE_ARN="arn:aws:iam::${AWS_ACCOUNT_ID}:role/${ROLE_NAME}"
- export AWS_ROLE_SESSION_NAME="gitlab-ci-${CI_JOB_ID}"
- aws s3 cp "$FILE" "s3://${BUCKET}/${S3_KEY}"
- rm -f /tmp/oidc-token
The aud value in id_tokens must match the audience configured in the OIDC
identity provider (https://gitlab.com). The token is written to a file
because the AWS SDK reads it from AWS_WEB_IDENTITY_TOKEN_FILE rather than
accepting it inline.
Security Features
- Encryption: AES256 server-side encryption with S3 bucket keys
- Public Access: All public access blocked on both buckets
- Transport Security: HTTPS enforced via bucket policy
- Object Lock: GOVERNANCE mode with 1-day default retention (will be increased in the future)
- CloudFront OAC: Modern Origin Access Control (not legacy OAI)
- Security Headers: HSTS, X-Frame-Options (DENY), X-Content-Type-Options, X-XSS-Protection, strict referrer policy
- TLS 1.2 Minimum: For all CloudFront connections
- OIDC Federation: Short-lived tokens instead of long-lived access keys
Backup Strategy
| Schedule | Retention | Cold Storage |
|---|---|---|
| Daily (5:00 AM UTC) | 35 days | - |
| Weekly (Sunday 5:00 AM UTC) | 365 days | After 30 days |
Continuous backup is enabled for point-in-time recovery.
Log Lifecycle
Access logs transition through storage tiers:
| Age | Storage Class |
|---|---|
| 0-30 days | Standard |
| 30-60 days | Standard-IA |
| 60-90 days | Glacier IR |
| 90+ days | Expired |
Outputs
Cache Stack Outputs
| Output | Description |
|---|---|
BucketName |
Name of the cache S3 bucket |
BucketArn |
ARN of the cache S3 bucket |
LogBucketName |
Name of the logs S3 bucket |
LogBucketArn |
ARN of the logs S3 bucket |
DistributionDomainName |
CloudFront domain name |
DistributionId |
CloudFront distribution ID |
DistributionArn |
CloudFront distribution ARN |
BackupVaultName |
AWS Backup vault name |
BackupVaultArn |
AWS Backup vault ARN |
BackupPlanId |
AWS Backup plan ID |
UploadPolicyArn |
ARN of the managed policy granting upload access |
Upload Role Outputs
| Output | Description |
|---|---|
RoleArn |
ARN of the upload role (pass to GitLab CI as ROLE_ARN) |
RoleName |
Name of the upload role |
OIDCProviderArn |
ARN of the GitLab OIDC identity provider |