S3 Lookaside Cache

Infrastructure for caching dist-git lookaside content, providing access to source artifacts for the Hummingbird build pipeline.

Overview

This stack provides S3-based infrastructure for dist-git artifacts with CloudFront CDN. It caches content from upstream sources like Git tarballs.

A companion stack (s3-lookaside-cache-upload-role) creates a GitLab OIDC identity provider and an IAM role that GitLab CI jobs can assume using short-lived JWT tokens, eliminating the need for long-lived access keys.

Architecture

Cache Infrastructure

flowchart LR
    clients["Build Clients"]
    cf["CloudFront\nDistribution"]
    s3["S3 Bucket\n(dist-git-cache)"]
    logs["S3 Logs\nBucket"]
    backup["AWS Backup\nVault"]
    headers["Security\nHeaders"]

    clients --> cf --> s3
    cf --> headers
    s3 --> logs --> backup

GitLab CI Upload Role

flowchart LR
    gitlab["GitLab CI\n(.gitlab-ci.yml)"]
    sts["AWS STS\n(validates JWT\nvia OIDC/JWKS)"]
    role["IAM Role\n(scoped to\nS3 PutObject)"]
    s3["S3 Bucket\n(dist-git cache)"]

    gitlab -- "id_token" --> sts
    sts -- "AssumeRoleWithWebIdentity" --> role
    role -- "temporary credentials" --> sts
    sts -- "credentials" --> gitlab
    gitlab -- "s3:PutObject" --> s3

Components

Cache Stack (s3-lookaside-cache)

Resource Type Description
DistGitCacheBucket S3 Bucket Main cache storage with versioning and object lock
DistGitLogBucket S3 Bucket Access logs with tiered storage lifecycle
DistGitCacheDistribution CloudFront CDN with HTTP/2+3, IPv6, TLS 1.2+
DistGitCacheOAC Origin Access Ctrl Secure S3 access from CloudFront
DistGitCacheCachePolicy Cache Policy 1-day default TTL, 1-year max, Gzip/Brotli
DistGitCacheResponseHeadersPolicy Response Headers HSTS, X-Frame-Options, XSS protection
DistGitBackupVault Backup Vault AWS Backup vault for data protection
DistGitBackupPlan Backup Plan Daily (35-day) and weekly (365-day) backups
DistGitUploadPolicy IAM Managed Policy Grants s3:PutObject to the cache bucket

Upload Role Stack (s3-lookaside-cache-upload-role)

Resource Type Description
GitLabOIDCProvider IAM OIDC Provider Registers gitlab.com as a trusted identity provider
DistGitUploadRole IAM Role Web identity role assumable by the configured GitLab project

Parameters

Cache Stack Parameters

Parameter Description
ResourcePrefix Prefix for resource names (e.g., arr-hummingbird-prod-dist-git-cache)
BucketName Globally unique name for the cache bucket (logs bucket appends -logs)

Upload Role Parameters

Parameter Description
ResourcePrefix Prefix for resource names (e.g., arr-hummingbird-prod-dist-git-upload)
CacheStackName Name of the deployed s3-lookaside-cache CloudFormation stack
GitLabProjectPath GitLab project path allowed to assume the role (e.g., redhat/hummingbird/rpms)

S3 Key Structure

Files are stored using the dist-git lookaside path convention:

{namespace}/{package}/{filename}/{hashType}/{hash}/{filename}

Example:

rpms/tar/tar-1.35.tar.xz/sha512/abc123.../tar-1.35.tar.xz

Trust Policy

The upload role’s trust policy restricts access using three conditions (all StringEquals):

  • Audience (gitlab.com:aud): Must be https://gitlab.com
  • Subject (gitlab.com:sub): Must match project_path:<GitLabProjectPath>:ref_type:branch:ref:main
  • Protected ref (gitlab.com:ref_protected): Must be "true"

Only the main branch of the configured project can assume the role.

GitLab CI Usage

Configure your .gitlab-ci.yml to assume the role using id_tokens. The AWS CLI automatically calls AssumeRoleWithWebIdentity when AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_ARN are set:

upload to cache:
  image:
    name: amazon/aws-cli:latest
    entrypoint: [""]
  id_tokens:
    GITLAB_OIDC_TOKEN:
      aud: https://gitlab.com
  script:
    - set +x
    - printenv GITLAB_OIDC_TOKEN > /tmp/oidc-token
    - export AWS_WEB_IDENTITY_TOKEN_FILE=/tmp/oidc-token
    - export AWS_ROLE_ARN="arn:aws:iam::${AWS_ACCOUNT_ID}:role/${ROLE_NAME}"
    - export AWS_ROLE_SESSION_NAME="gitlab-ci-${CI_JOB_ID}"
    - aws s3 cp "$FILE" "s3://${BUCKET}/${S3_KEY}"
    - rm -f /tmp/oidc-token

The aud value in id_tokens must match the audience configured in the OIDC identity provider (https://gitlab.com). The token is written to a file because the AWS SDK reads it from AWS_WEB_IDENTITY_TOKEN_FILE rather than accepting it inline.

Security Features

  • Encryption: AES256 server-side encryption with S3 bucket keys
  • Public Access: All public access blocked on both buckets
  • Transport Security: HTTPS enforced via bucket policy
  • Object Lock: GOVERNANCE mode with 1-day default retention (will be increased in the future)
  • CloudFront OAC: Modern Origin Access Control (not legacy OAI)
  • Security Headers: HSTS, X-Frame-Options (DENY), X-Content-Type-Options, X-XSS-Protection, strict referrer policy
  • TLS 1.2 Minimum: For all CloudFront connections
  • OIDC Federation: Short-lived tokens instead of long-lived access keys

Backup Strategy

Schedule Retention Cold Storage
Daily (5:00 AM UTC) 35 days -
Weekly (Sunday 5:00 AM UTC) 365 days After 30 days

Continuous backup is enabled for point-in-time recovery.

Log Lifecycle

Access logs transition through storage tiers:

Age Storage Class
0-30 days Standard
30-60 days Standard-IA
60-90 days Glacier IR
90+ days Expired

Outputs

Cache Stack Outputs

Output Description
BucketName Name of the cache S3 bucket
BucketArn ARN of the cache S3 bucket
LogBucketName Name of the logs S3 bucket
LogBucketArn ARN of the logs S3 bucket
DistributionDomainName CloudFront domain name
DistributionId CloudFront distribution ID
DistributionArn CloudFront distribution ARN
BackupVaultName AWS Backup vault name
BackupVaultArn AWS Backup vault ARN
BackupPlanId AWS Backup plan ID
UploadPolicyArn ARN of the managed policy granting upload access

Upload Role Outputs

Output Description
RoleArn ARN of the upload role (pass to GitLab CI as ROLE_ARN)
RoleName Name of the upload role
OIDCProviderArn ARN of the GitLab OIDC identity provider

References