Hummingbird Agent

An event-driven LLM agent that investigates CI/CD failures and posts findings as GitLab merge request notes. The agent executes markdown-defined workflows using tool calling, with all data processing running inside an isolated sandbox container.

For the architectural design rationale, module boundaries, security invariants, and design decision registry, see Agent Design. For the model loop wire format, see Agent Model Loop.

flowchart TD
    Pipeline["Pipeline Fails"]
    MREvent["MR Created / Updated"]
    Slash["/hummingbird command"]

    Pipeline -->|"event"| Agent
    MREvent -->|"event"| Agent
    Slash -->|"command"| Agent

    subgraph Agent ["Hummingbird Agent"]
        APIs["GitLab + Konflux +<br/>Testing Farm"]
        Model["LLM<br/>(Gemini / Claude)"]
        subgraph Sandbox ["Isolated Sandbox (no network)"]
            Tools["jq / python3 / yq<br/>data processing"]
        end
        APIs <-->|"data"| Model
        Model <-->|"commands"| Tools
    end

    Note["MR Note<br/>(analysis or review)"]

    Agent -->|"posts"| Note
    Note -->|"reply to continue"| Agent

Features

  • Workflow-driven analysis - Investigation logic lives in .md files, not in code; easy to iterate without redeployment
  • Centralized YAML config - A single config file defines operational settings, workflows, enabled data sources with token env var names, project allowlists, and per-project limits
  • Sandboxed execution - All untrusted commands (jq, python3, shell) run in an isolated container, never on the host
  • Three sandbox backends - Podman for local development (network-isolated), direct K8s pod creation, or Deployment-backed pool for low-latency production use (restricted-v2 SCC compliant)
  • Data source abstraction - GitLab, Konflux, and Testing Farm are registered as tool-calling functions the model invokes directly
  • Auto-spill for large outputs - Stdout/stderr and data source responses exceeding 4 KB are automatically saved to sandbox files with a compact preview returned to the model, keeping context usage bounded
  • Prompt caching - Gemini uses implicit server-side caching automatically; Claude uses explicit sliding-window cache breakpoints that reduce input token costs by ~80% on multi-turn agent runs
  • Token budget management - Per-call context ceiling and iteration-based soft/hard limits prevent runaway sessions
  • Session persistence - Conversation history, transcript, and sandbox files saved to S3 (production) or local directory (development) for debugging and future session resumption. Sessions are stored in a provider-neutral format, enabling model switching between conversations.

Architecture

flowchart LR
    subgraph input [Input]
        SQS["SQS Queue"]
        CLI["CLI --event"]
    end

    subgraph agentLoop [Agent Loop]
        WF["Workflow .md<br/>(system prompt)"]
        LLM["LLM&lt;br/&gt;(Gemini / Claude)"]
        TR["Tool Registry"]
    end

    subgraph tools [Tools]
        SE["sandbox_exec"]
        FTS["fetch_to_sandbox"]
        DS["Data Sources"]
    end

    subgraph sandbox [Sandbox Container]
        JQ["jq / python3 / yq"]
        Files["Spilled files"]
    end

    subgraph external [External APIs]
        GL["GitLab API"]
        KX["Konflux K8s"]
        TF["Testing Farm"]
    end

    subgraph output [Output]
        Note["GitLab MR Note"]
        Session["Session State"]
    end

    SQS --> WF
    CLI --> WF
    WF --> LLM
    LLM -->|"tool calls"| TR
    TR --> SE --> sandbox
    TR --> FTS --> sandbox
    TR --> DS
    DS --> GL
    DS --> KX
    DS --> TF
    DS -->|"auto-spill"| sandbox
    LLM -->|"final text"| Note
    LLM --> Session

An event (CLI --event JSON or SQS message) identifies a GitLab project and MR IID. The config file maps project paths to workflows and provides action, max_iterations, enabled data sources, and token env var names. Both run and serve use this config. The markdown body of the prompt file becomes the LLM system prompt. The agent loop iterates until the model produces a final text response or hits the iteration/context limit. Tool calls are dispatched through the ToolRegistry: sandbox_exec runs shell commands, fetch_to_sandbox pipes data source output into the sandbox, and direct data source calls return results inline (or auto-spill large responses to files).

For a detailed walkthrough of the model loop – what gets sent to the model each iteration, how tool calls flow, the exact wire format, and how user replies integrate for session resumption – see Agent Model Loop.

Event-Driven Triggers

In production, the agent consumes events from an SQS queue subscribed to the central SNS topic. The SNS filter policy delivers three event types:

  • gitlab::pipeline – Fires when a GitLab CI pipeline completes. The agent triggers on status=failed pipelines from merge_request_event sources where the triggering user has at least Developer access on the project. Only workflows with trigger: pipeline are executed. Since the pipeline stays open until all Konflux external stages resolve, this naturally waits for all builds and tests to finish before triggering.

  • gitlab::merge_request – Fires on MR open, reopen, and update events. Only workflows with trigger: merge_request are executed. Draft MRs are skipped; marking a draft as ready triggers a review (the event has draft: false). SHA-based deduplication prevents reviewing the same code revision twice – metadata-only updates (title, label changes) on an already-reviewed MR are silently skipped.

  • gitlab::note – MR comment events. Two sub-flows:

    • Slash command (/hummingbird <workflow-name>): triggers a specific workflow. Prefix matching is supported (e.g. /hummingbird analyze matches analyze-failures). /hummingbird or /hummingbird help lists available workflows. The note author must have Developer+ access on the project. Optional runtime overrides can be appended: /hummingbird code-review model=claude-sonnet-4-6 max_iterations=25.
    • Reply to agent note: when a user replies to an existing agent note (which contains a session marker), the agent loads the previous session from S3 (conversation history + sandbox files) and continues the conversation with the user’s reply as input. The system prompt includes a CONTINUATION_PROMPT that prevents the model from re-running the full workflow. If the session is not found (expired/deleted), the agent falls back to a cold start. Replies may include overrides on a separate line (e.g. /hummingbird model=claude-opus-4-6); override lines are stripped from the user message. Overrides persist in the session until explicitly changed.

    Notes generated by the agent itself (containing session markers) are skipped to prevent infinite loops. Reply threading respects the internal_notes config: if the project requires internal notes but the original thread was public, the reply is posted as a new top-level internal note instead.

Both pipeline and merge_request triggers apply per-workflow ignore_users and ignore_branches filtering: usernames and branch names matching any regex pattern in a workflow’s ignore_users or ignore_branches list are skipped. This allows bot-authored MRs or maintenance branches (e.g. chore/*) to be excluded from code review while still analyzing their pipeline failures.

Rate limiting is per-workflow: each workflow’s thread count is tracked independently via JSON session markers that embed the workflow name and commit SHA. The max_runs_per_mr limit applies separately to each workflow on a given MR.

Events flow through a two-stage SQS pipeline. A slim ingress router forwards webhooks from the standard queue to an SQS FIFO queue, grouped by discussion_id for note events (ensuring same-discussion ordering) and by SQS MessageId for pipeline/MR events (no serialization needed). Phase 1 handlers validate the event and resolve the session, then post a continuation message back to the FIFO grouped by session_id. Phase 2 picks up the continuation and runs the workflow. This ensures all work for a given session is serialized even across multiple discussion threads.

The SQS infrastructure is defined in template.yaml (SAM/CloudFormation): a standard ingress queue (60s visibility timeout for the router hop) and a FIFO work queue (30-minute visibility timeout for workflow execution).

Model Configuration

The agent supports multiple LLM providers via a unified adapter interface.

Gemini

  • API key (local dev) – Set GOOGLE_API_KEY. Calls generativelanguage.googleapis.com directly. No region configuration needed.
  • Vertex AI (production) – Set GOOGLE_CLOUD_PROJECT and configure model_regions in the YAML settings. Uses Application Default Credentials (ADC) via google-auth. For OpenShift, mount a service account key JSON file and set GOOGLE_APPLICATION_CREDENTIALS, or use workload identity.

Gemini uses implicit server-side caching – repeated prefixes are automatically cached by the Vertex AI backend with no opt-in required. Cached input tokens are billed at 25% of the base input rate. The agent tracks cached token counts from API responses for cost estimation.

Anthropic Claude

Claude models are accessed via Vertex AI using the same GCP project (GOOGLE_CLOUD_PROJECT) with the region resolved from model_regions. Enable the desired model in the GCP Model Garden. Use model: claude-sonnet-4-20250514 in workflow config.

Prompt caching is enabled automatically for Claude models. The agent places explicit cache breakpoints on messages so that the full conversation prefix is served from cache on every turn after the first. Cache writes cost 1.25x the base input rate; cache reads cost 0.10x (90% discount). For a typical 20-iteration agent run, this reduces input token costs by approximately 80%. Ephemeral messages (iteration warnings, nudges) are excluded from cache writes to avoid polluting the cache with transient content.

Region configuration

GCP regions are configured via settings.model_regions in the config file – a map of model name prefixes to GCP regions. The agent resolves the region by longest-prefix match on the effective model name (same algorithm as cost estimation). Example:

settings:
  model_regions:
    gemini-2.5-pro: us-east5
    gemini-3.1-pro: global
    claude: us-east5

At least one of GOOGLE_API_KEY or GOOGLE_CLOUD_PROJECT must be set. If both are set, the API key takes precedence for Gemini models. Claude models always require Vertex AI mode (GOOGLE_CLOUD_PROJECT + model_regions); GOOGLE_API_KEY direct mode is for Gemini only.

Prerequisites

  • Python 3.11+
  • Podman (for local sandbox) or kubectl (for K8s sandbox)
  • Authentication: GOOGLE_API_KEY or GOOGLE_CLOUD_PROJECT (see above)
  • For Claude models: enable the desired model in GCP Model Garden
  • GitLab tokens referenced in the config file (model tool tokens, orchestrator tokens)
  • Kubeconfig with access to the Konflux cluster (if using Konflux data sources)

Installation

cd hummingbird-agent
pip install -e .

Usage

Local development (run)

The run command uses the config file for workflow lookup, data source registration, and token resolution – the same code path as serve. By default, results are printed to stdout (dry-run). Use --execute to post the result as a GitLab MR note.

There are two ways to select what to run:

Direct workflow selection (--workflow + --project + --event):

# Run a specific workflow on an MR, Podman sandbox, print to stdout
hummingbird-agent run \
    --workflow analyze-failures \
    --project org/group/project \
    --event '{"iid": 123, "sha": "abc123"}'

# Same but with K8s sandbox
hummingbird-agent run \
    --workflow analyze-failures \
    --project org/group/project \
    --event '{"iid": 123, "sha": "abc123"}' \
    --context my-cluster/my-namespace

# Post the result as a GitLab MR note (also saves session to S3 if configured)
hummingbird-agent run \
    --workflow analyze-failures \
    --project org/group/project \
    --event '{"iid": 123, "sha": "abc123"}' \
    --execute

# Save session locally for debugging (context.json, transcript.md, sandbox.tar.gz)
hummingbird-agent run \
    --workflow analyze-failures \
    --project org/group/project \
    --event '{"iid": 123, "sha": "abc123"}' \
    --save-session /tmp/my-session

# Resume from a saved session with a follow-up question
hummingbird-agent run \
    --workflow analyze-failures \
    --project org/group/project \
    --event '{"iid": 123, "sha": "abc123"}' \
    --resume-session /tmp/my-session \
    --message "Can you look at the clair-scan timeout more closely?"

# Chain: resume and save the new session for another round
hummingbird-agent run \
    --event-file event.json \
    --resume-session /tmp/my-session \
    --message "What layer hash failed?" \
    --save-session /tmp/my-session-2

Event-file replay (--event-file): routes the event through the same project-index lookup as serve, but skips rate limiting and status filtering:

# Replay a real webhook event, dry-run with Podman
hummingbird-agent run \
    --event-file event.json

# Replay with K8s sandbox and post notes
hummingbird-agent run \
    --event-file event.json \
    --context my-cluster/my-namespace \
    --execute

Production (serve)

# Poll SQS queue for events (pool sandbox by default, posts results as MR notes)
CONFIG_PATH=config.yml hummingbird-agent serve

Requires CONFIG_PATH pointing to a config file with settings.sqs_queue_url and settings.sqs_fifo_queue_url set. The config file defines which workflows run on which projects, with per-project limits and data source token mappings. Handles SIGTERM/SIGINT for graceful shutdown. A background router thread forwards events from the standard queue to the FIFO; the main thread consumes the FIFO with a semaphore-gated thread pool (controlled by settings.max_concurrent_agents) so excess messages stay in SQS for other instances.

Config hot-reload: In serve mode a background thread polls the config file for changes (every 5 seconds by default). When the file changes, the new config is validated and atomically swapped in – subsequent event dispatches use the updated config. If the new config is invalid, the previous config is kept and a warning is logged. No restart required for config changes.

The serve command also accepts --sandbox, --context, and --namespace for local development with a different sandbox backend (e.g. Podman).

CLI Options

run subcommand:

Option Description Default
--event Inline event JSON string (mutually exclusive with --event-file) -
--event-file Read event from a JSON file (mutually exclusive with --event) -
--workflow Workflow name from config file (requires --project) -
--project GitLab project path (requires --workflow) -
--execute Post result as GitLab MR note and save session to S3 -
--save-session Save session artifacts to this directory -
--resume-session Resume from a saved session directory (requires --message) -
--message Follow-up message for session resumption (requires --resume-session) -
--sandbox Sandbox backend (podman, k8s, or k8spool) podman
--context K8s context (implies K8s backend) -
--namespace K8s namespace -
-v, --verbose Enable debug logging -

serve subcommand:

Option Description Default
--sandbox Sandbox backend (podman, k8s, or k8spool) k8spool
--context K8s context (implies K8s backend) -
--namespace K8s namespace -
-v, --verbose Enable debug logging -

Configuration

Config file

Both run and serve use a single YAML config file. Set the path via CONFIG_PATH (default: config.example.yaml). The config has two sections: settings for operational parameters, and workflows for workflow definitions. The settings section provides defaults that can be omitted for local development (sensible defaults are used).

settings:
  gitlab_url: https://gitlab.com                # GitLab instance URL
  sandbox:                                      # sandbox pod configuration
    image: quay.io/.../gitlab-ci:latest         #   container image (k8s mode only)
    namespace: default                          #   K8s namespace (required for K8s/pool)
    active_deadline_seconds: 1800               #   pod hard timeout / reap interval
    linger_seconds: 300                         #   keep pod alive after success (pool mode, 0=disable)
    max_lingering_pods: 2                       #   max idle lingering pods before eviction (pool mode)
    metadata:                                   #   pod metadata (k8s mode only)
      labels:
        app.kubernetes.io/name: hummingbird-agent-sandbox
    resources:                                  #   K8s resource requests/limits (k8s mode)
      requests:
        cpu: "100m"
        memory: "256Mi"
        ephemeral-storage: "256Mi"
      limits:
        cpu: "1"
        memory: "1Gi"
        ephemeral-storage: "2Gi"
  max_concurrent_agents: 4                      # max concurrent workflows (serve)
  sqs_queue_url: ""                             # standard SQS ingress queue URL (serve)
  sqs_fifo_queue_url: ""                        # FIFO work queue URL (serve)
  s3_session_bucket: ""                         # S3 bucket for session persistence
  model: gemini-3.1-pro-preview  # or claude-sonnet-4-20250514
  model_regions:                                 # model prefix -> GCP region
    gemini-2.5-pro: us-east5
    gemini-3.1-pro: global
    claude: us-east5
  max_iterations: 30                             # default iteration limit
  max_runs_per_mr: 5                             # default per-MR rate limit
  internal_notes: true                           # default note visibility
  docs_url: https://gitlab.com/org/group/project/-/blob/main/docs/agent.md
  source_url: https://gitlab.com/org/group/project
  slack_url: https://slack.example.com/archives/C0123456789
  slack_label: "#my-channel"

workflows:
  code-review:
    trigger: merge_request                     # auto-trigger on MR events
    description: Performs AI-powered code review
    workflow_url: https://gitlab.com/org/group/project/-/blob/main/workflows/code-review.md
    ignore_users:                              # skip bot MRs (regex fullmatch)
      - "renovate\\[bot\\]"
      - ".*-bot"
    ignore_branches:                           # skip maintenance branches (regex fullmatch)
      - "chore/.*"
    prompt: workflows/code-review.md
    action: post_gitlab_note
    model: gemini-3.1-pro-preview  # or claude-sonnet-4-20250514
    max_iterations: 15
    max_inline_size: 200000                    # keep full diffs in context
    context_limit: 500000                      # Gemini 3.1 Pro / Claude Sonnet 4 have large context windows
    data_sources:
      gitlab:
        token_env: GITLAB_TOKEN_RO
    projects:
      org/group/project: {}

  analyze-failures:
    trigger: pipeline                          # auto-trigger on failed pipelines
    description: Investigates CI/CD pipeline failures
    auto_resolve_on_push: true                 # resolve threads when a new SHA is pushed
    auto_resolve_on_success: true              # resolve threads when pipeline succeeds
    workflow_url: https://gitlab.com/org/group/project/-/blob/main/workflows/analyze-failures.md
    prompt: workflows/analyze-failures.md       # relative to config file dir
    action: post_gitlab_note
    model: gemini-3.1-pro-preview  # or claude-sonnet-4-20250514
    max_iterations: 50                          # per-workflow iteration override

    data_sources:                               # model tool tokens (read-only)
      gitlab:
        token_env: GITLAB_TOKEN_RO              # env var name, not the token
      konflux:
        cluster_url: https://example.com:6443/ns/my-tenant
        kubeconfig_env: KUBECONFIG
        kubearchive_url: https://kubearchive-api-server-product-kubearchive.apps.example.com
      testing_farm: {}

    projects:
      redhat/hummingbird/containers:
        tokens:                                 # per-project model token overrides
          gitlab: GITLAB_TOKEN_CONTAINERS_RO

With --workflow/--project, the workflow and project are looked up directly in the config. With --event-file, the project is extracted from the event body and matched against the project index to find applicable workflows.

Discussion threads

All workflow results are posted as discussion threads: a placeholder note starts the discussion and the full result is posted as a reply. The placeholder is never edited, so email notifications include the actual result text. For slash commands, the result is posted as a reply in the triggering discussion.

Auto-resolve

Workflows can opt in to automatic resolution of their discussion threads:

  • auto_resolve_on_push (default false): When a new commit is pushed to the MR (i.e. a merge_request event with action: update), all agent discussion threads for the workflow whose SHA differs from the new HEAD are resolved. This clears stale failure analyses when the developer pushes a fix.
  • auto_resolve_on_success (default false): When the head pipeline succeeds, all agent discussion threads for the workflow on that MR are resolved (regardless of SHA). This handles pipeline reruns on the same SHA where a transient failure is now green.

Both flags are independent and can be combined. Resolution runs before rate limit checks, so threads are resolved even if the workflow’s per-MR run limit has been reached.

The first agent-authored note (placeholder) includes a footer with links to documentation, source code, the Slack channel, the workflow prompt, and a continuation prompt. These links are configured via global settings:

Setting Description
settings.docs_url Link to agent documentation
settings.source_url Link to agent source repository
settings.slack_url Link to support Slack channel
settings.slack_label Display text for Slack link (default: “Slack”)

Per-workflow, set workflow_url to link to the workflow’s prompt file. The footer stays on the placeholder and the result is a separate reply.

Token separation

Tokens are split into two categories that never mix:

  • Model tool tokens (in YAML data_sources / tokens): read-only tokens passed to the LLM’s tool calls. Declared in the config file as env var names. These are user-defined and resolved at runtime from the referenced env vars. Create as project access tokens with Reporter role and read_api scope. Reporter is the minimum role required to see internal (confidential) notes in the discussions tool.
  • Orchestrator tokens (ORCHESTRATOR_* env vars, NOT in YAML): write-capable tokens used by the runner for posting/editing notes and rate-limit counting. Create as project access tokens with Developer role and api scope. Developer role is required because the discussions tool trust-filters notes by author access level (>= Developer); if the orchestrator bot has only Reporter access, its own notes are redacted. Resolved by convention: ORCHESTRATOR_GITLAB_TOKEN_<MANGLED_PROJECT> (per-project) or ORCHESTRATOR_GITLAB_TOKEN (fallback). The ORCHESTRATOR_ prefix makes these impossible to confuse with model tokens.

Environment Variables

The agent reads only secrets and authentication from environment variables. All operational settings come from the config file’s settings section.

Variable Required Default Description
CONFIG_PATH no config.example.yaml Path to config YAML
GOOGLE_API_KEY yes* - Gemini API key; Gemini direct mode only (not for Claude)
GOOGLE_CLOUD_PROJECT yes* - GCP project ID (Vertex AI mode); required for Claude models
ORCHESTRATOR_GITLAB_TOKEN serve - Orchestrator GitLab token (global fallback)
ORCHESTRATOR_GITLAB_TOKEN_<PROJECT> no - Per-project orchestrator token
SENTRY_DSN no - Sentry DSN for error tracking

*One of GOOGLE_API_KEY or GOOGLE_CLOUD_PROJECT is required. Claude models require Vertex AI (GOOGLE_CLOUD_PROJECT); GOOGLE_API_KEY is for Gemini direct mode only.

Model tool tokens (e.g. GITLAB_TOKEN_RO, GITLAB_TOKEN_CONTAINERS_RO) and data source credentials (e.g. KONFLUX_CLUSTER_URL, KUBECONFIG) are referenced by name in the config file’s data_sources and tokens sections. They are not listed in the table above because their names are user-defined.

Security and Design Constraints

The agent is designed to run in a shared OpenShift cluster without cluster-admin access, processing potentially untrusted merge requests. These constraints shaped the architecture:

Sandbox isolation. All arbitrary commands executed by the LLM run inside an ephemeral container, never on the host:

  • Podman (local): --network=none, --user 65532, no host mounts. Complete network isolation.
  • Kubernetes (production): Pods comply with OpenShift’s restricted-v2 Security Context Constraint: runAsNonRoot, seccompProfile: RuntimeDefault, allowPrivilegeEscalation: false, capabilities.drop: ["ALL"], automountServiceAccountToken: false (no K8s API access from sandbox), activeDeadlineSeconds (configurable, default 1800). Security context fields are hardcoded in the pod manifest for portability to vanilla Kubernetes with Pod Security Admission (restricted level). Resource requests/limits, metadata, and activeDeadlineSeconds are configurable via settings.sandbox in the config file. Network access is denied by a NetworkPolicy on the sandbox namespace that blocks all egress from all pods (podSelector: {}).

No cluster-admin required. The agent operates with namespace-scoped permissions only. The orchestrator’s ServiceAccount needs only:

  • pods: create, get, list, delete, patch – sandbox pod lifecycle and pool claims
  • pods/exec: create – command execution via kubectl exec

These permissions are granted via a Role in the sandbox namespace, not the orchestrator’s own namespace. No CRDs, no custom runtimes, no cluster-scoped resources. Konflux data is fetched via bearer token from kubeconfig, not from inside the cluster.

Namespace separation. Sandbox pods are created in a dedicated namespace, separate from the orchestrator. This limits blast radius: even if a sandbox pod is compromised, it has no visibility into the orchestrator’s Secrets, Pods, or ServiceAccount tokens. The sandbox namespace is locked down with standard K8s resources:

  • RBAC: Role + RoleBinding scoped to the namespace, granting only the permissions above to the orchestrator’s ServiceAccount
  • NetworkPolicy: uses podSelector: {} to select all pods in the dedicated sandbox namespace, denying all egress (egress: []). The sandbox cannot reach the internet, the K8s API, or other pods.
  • activeDeadlineSeconds: sandbox pods self-terminate after the configured timeout (default 1800s / 30 minutes) even if the orchestrator crashes or is killed, preventing orphaned pods

Credential separation. Data source credentials (GitLab tokens, kubeconfig) live in the orchestrator process only, injected via K8s Secrets. The sandbox container has no credentials, no SA token (automountServiceAccountToken: false), and no network access. Data flows into the sandbox via stdin piping through write_file.

Command execution via kubectl exec. The K8s sandbox uses a hybrid approach: the Kubernetes Python client manages pod lifecycle (create, wait, delete), while kubectl exec handles command execution. This avoids the complexity and reliability issues of the websocket-based exec API.

Sandbox Backends

Podman (local) K8s (direct) K8sPool (production default)
Start podman run -d --network=none create_namespaced_pod Claim standby pod from Deployment
Exec podman exec kubectl exec kubectl exec
Auth Local Podman socket In-cluster SA or kubeconfig In-cluster SA or kubeconfig
Network None (--network=none) None (deny-all NetworkPolicy) None (deny-all NetworkPolicy)
User 65532 (fixed) Namespace UID range (SCC) Namespace UID range (SCC)
Cleanup podman rm -f delete_namespaced_pod delete_namespaced_pod

All three implement the Sandbox protocol: start(), exec(), write_file(), read_file(), cleanup(), linger().

The pool backend (k8spool) eliminates pod startup latency by claiming pre-warmed pods from a Kubernetes Deployment. Claimed pods are detached from the ReplicaSet and the Deployment automatically creates replacements. After a successful workflow, pool pods linger for linger_seconds (default 300, configurable, 0 to disable) so that user replies can reuse the same pod without re-creating it or restoring from S3. The reaper runs once per workflow execution and deletes expired lingering pods. It also evicts excess lingering pods beyond max_lingering_pods (default 2), starting with those closest to their deadline. See the design doc (section 8.8) for details.

Data Sources

Data sources are registered as tool-calling functions. The model invokes them by name; the orchestrator executes them and returns results (or auto-spills large responses to the sandbox).

GitLab

Tool Description
gitlab_get_mr_details MR metadata (title, author, state, SHA, labels)
gitlab_get_mr_unified_diff Complete unified diff in patch format
gitlab_get_mr_diff Per-file structured change data
gitlab_get_mr_commits List of commits in a merge request
gitlab_get_mr_discussions Discussion threads with redacted agent transcripts and trust-filtered comments
gitlab_get_commit_statuses CI/CD pipeline statuses for a commit
gitlab_get_file_at_ref Raw file content at a git ref
gitlab_get_repo_archive Repository tar.gz (binary, auto-spilled)
gitlab_get_job_log CI job trace output (ANSI codes stripped)

Konflux

Fetches Tekton PipelineRuns and TaskRuns from both the live K8s API and Kubearchive (for completed resources), with deduplication by UID.

Tool Description
konflux_list_pipelineruns All PipelineRuns for a commit SHA
konflux_list_taskruns All TaskRuns for a commit SHA
konflux_list_pods All pods for a commit SHA
konflux_get_pod Full pod resource (spec, status, conditions, container statuses)
konflux_get_pod_log Pod logs; optional container param, fetches all containers when omitted

Response metadata includes konflux_ui base URL for building reviewer-facing links.

Testing Farm

Tool Description
tf_get_results JUnit XML results for a request ID
tf_get_test_log Individual test log by URL (restricted to Testing Farm artifact URLs)
tf_get_request_status Request state, queue/run times

Response metadata includes artifacts_base URL for building artifact links.

Workflow System

Workflows are .md files whose content becomes the LLM system prompt verbatim. Workflow metadata (action, model, max_iterations, enabled data sources, project allowlists) is defined in the config file. The .md file is pure system prompt text.

Available workflows:

  • analyze-failures.md - Investigates CI/CD pipeline failures by fetching MR details, identifying failed pipelines via commit statuses, retrieving PipelineRuns/TaskRuns from Konflux, analyzing test results from Testing Farm, and producing a grouped root-cause report with reviewer-facing URLs.
  • code-review.md - Performs AI-powered code review by fetching MR details, the unified diff, and prior discussion threads in parallel, then producing structured feedback with severity ratings, code examples, and actionable suggestions. On follow-up reviews (after SHA updates), the agent sees its own previous findings, developer responses, and resolved threads – avoiding duplicate findings and respecting developer explanations. Uses elevated max_inline_size (200 KB) and context_limit (500K tokens) to keep the full diff in context.

Token Budget Management

Both Gemini and Claude benefit from prompt caching that reduces the effective cost of full history replay. Cached token counts from both providers feed into the estimate_cost() calculation. See Agent Model Loop – Prompt caching for details on how caching works per provider.

The agent uses a dual-limit approach instead of a cumulative token budget:

  • Iteration limit (settings.max_iterations, default 30, overridable per-workflow) - Hard cap on tool-calling rounds. A wrap-up prompt is injected at 80% (SOFT_ITERATION_RATIO).
  • Context limit (CONTEXT_LIMIT, default 60,000 tokens, overridable per-workflow via context_limit) - Per-call input token ceiling. When exceeded, a wrap-up prompt forces the model to finalize.

Large outputs are automatically redirected to sandbox files to keep the LLM context small. The spill threshold defaults to 4 KB (MAX_INLINE_SIZE) but can be overridden per-workflow via max_inline_size in the config:

  • sandbox_exec - stdout/stderr exceeding the threshold saved to /tmp/_out/{N}.txt; model receives a preview (head + tail) with file path
  • Data sources - text exceeding the threshold saved to /tmp/_out/{name}_{N}.txt with preview; binary data saved to .bin
  • fetch_to_sandbox - always writes to the caller-specified path; returns metadata only

All tuning constants are centralized in config.py:

Constant Default Purpose
DEFAULT_MAX_ITERATIONS 30 Hard iteration cap
SOFT_ITERATION_RATIO 0.8 Inject wrap-up at this fraction
CONTEXT_LIMIT 60,000 Per-call input token ceiling
OUTPUT_PREVIEW_BYTES 4,096 Preview size for spilled outputs
OUTPUT_TAIL_BYTES 512 Extra tail appended to previews
MAX_INLINE_SIZE 4,096 Max inline size for data source responses

Development

See the main README for development workflows.

make hummingbird-agent/setup  # Install dependencies
make check                    # Lint code (ruff)
make fmt                      # Format code
make test                     # Run unit tests
make coverage                 # Run tests with coverage

License

This project is licensed under the GNU General Public License v3.0 or later - see the LICENSE file for details.