Hummingbird Agent
An event-driven LLM agent that investigates CI/CD failures and posts findings as GitLab merge request notes. The agent executes markdown-defined workflows using tool calling, with all data processing running inside an isolated sandbox container.
For the architectural design rationale, module boundaries, security invariants, and design decision registry, see Agent Design. For the model loop wire format, see Agent Model Loop.
flowchart TD
Pipeline["Pipeline Fails"]
MREvent["MR Created / Updated"]
Slash["/hummingbird command"]
Pipeline -->|"event"| Agent
MREvent -->|"event"| Agent
Slash -->|"command"| Agent
subgraph Agent ["Hummingbird Agent"]
APIs["GitLab + Konflux +<br/>Testing Farm"]
Model["LLM<br/>(Gemini / Claude)"]
subgraph Sandbox ["Isolated Sandbox (no network)"]
Tools["jq / python3 / yq<br/>data processing"]
end
APIs <-->|"data"| Model
Model <-->|"commands"| Tools
end
Note["MR Note<br/>(analysis or review)"]
Agent -->|"posts"| Note
Note -->|"reply to continue"| Agent
Features
- Workflow-driven analysis - Investigation logic lives in
.mdfiles, not in code; easy to iterate without redeployment - Centralized YAML config - A single config file defines operational settings, workflows, enabled data sources with token env var names, project allowlists, and per-project limits
- Sandboxed execution - All untrusted commands (
jq,python3, shell) run in an isolated container, never on the host - Three sandbox backends - Podman for local development (network-isolated), direct K8s pod creation, or Deployment-backed pool for low-latency production use (restricted-v2 SCC compliant)
- Data source abstraction - GitLab, Konflux, and Testing Farm are registered as tool-calling functions the model invokes directly
- Auto-spill for large outputs - Stdout/stderr and data source responses exceeding 4 KB are automatically saved to sandbox files with a compact preview returned to the model, keeping context usage bounded
- Prompt caching - Gemini uses implicit server-side caching automatically; Claude uses explicit sliding-window cache breakpoints that reduce input token costs by ~80% on multi-turn agent runs
- Token budget management - Per-call context ceiling and iteration-based soft/hard limits prevent runaway sessions
- Session persistence - Conversation history, transcript, and sandbox files saved to S3 (production) or local directory (development) for debugging and future session resumption. Sessions are stored in a provider-neutral format, enabling model switching between conversations.
Architecture
flowchart LR
subgraph input [Input]
SQS["SQS Queue"]
CLI["CLI --event"]
end
subgraph agentLoop [Agent Loop]
WF["Workflow .md<br/>(system prompt)"]
LLM["LLM<br/>(Gemini / Claude)"]
TR["Tool Registry"]
end
subgraph tools [Tools]
SE["sandbox_exec"]
FTS["fetch_to_sandbox"]
DS["Data Sources"]
end
subgraph sandbox [Sandbox Container]
JQ["jq / python3 / yq"]
Files["Spilled files"]
end
subgraph external [External APIs]
GL["GitLab API"]
KX["Konflux K8s"]
TF["Testing Farm"]
end
subgraph output [Output]
Note["GitLab MR Note"]
Session["Session State"]
end
SQS --> WF
CLI --> WF
WF --> LLM
LLM -->|"tool calls"| TR
TR --> SE --> sandbox
TR --> FTS --> sandbox
TR --> DS
DS --> GL
DS --> KX
DS --> TF
DS -->|"auto-spill"| sandbox
LLM -->|"final text"| Note
LLM --> Session
An event (CLI --event JSON or SQS message) identifies a GitLab project and
MR IID. The config file maps project paths to workflows and provides
action, max_iterations, enabled data sources, and token env var names.
Both run and serve use this config. The markdown body of the prompt file
becomes the LLM system prompt. The agent
loop iterates until the model produces a final text response or hits the
iteration/context limit. Tool calls are dispatched through the ToolRegistry:
sandbox_exec runs shell commands, fetch_to_sandbox pipes data source
output into the sandbox, and direct data source calls return results inline
(or auto-spill large responses to files).
For a detailed walkthrough of the model loop – what gets sent to the model each iteration, how tool calls flow, the exact wire format, and how user replies integrate for session resumption – see Agent Model Loop.
Event-Driven Triggers
In production, the agent consumes events from an SQS queue subscribed to the central SNS topic. The SNS filter policy delivers three event types:
-
gitlab::pipeline– Fires when a GitLab CI pipeline completes. The agent triggers onstatus=failedpipelines frommerge_request_eventsources where the triggering user has at least Developer access on the project. Only workflows withtrigger: pipelineare executed. Since the pipeline stays open until all Konflux external stages resolve, this naturally waits for all builds and tests to finish before triggering. -
gitlab::merge_request– Fires on MR open, reopen, and update events. Only workflows withtrigger: merge_requestare executed. Draft MRs are skipped; marking a draft as ready triggers a review (the event hasdraft: false). SHA-based deduplication prevents reviewing the same code revision twice – metadata-only updates (title, label changes) on an already-reviewed MR are silently skipped. -
gitlab::note– MR comment events. Two sub-flows:- Slash command (
/hummingbird <workflow-name>): triggers a specific workflow. Prefix matching is supported (e.g./hummingbird analyzematchesanalyze-failures)./hummingbirdor/hummingbird helplists available workflows. The note author must have Developer+ access on the project. Optional runtime overrides can be appended:/hummingbird code-review model=claude-sonnet-4-6 max_iterations=25. - Reply to agent note: when a user replies to an existing agent note
(which contains a session marker), the agent loads the previous session
from S3 (conversation history + sandbox files) and continues the
conversation with the user’s reply as input. The system prompt includes
a
CONTINUATION_PROMPTthat prevents the model from re-running the full workflow. If the session is not found (expired/deleted), the agent falls back to a cold start. Replies may include overrides on a separate line (e.g./hummingbird model=claude-opus-4-6); override lines are stripped from the user message. Overrides persist in the session until explicitly changed.
Notes generated by the agent itself (containing session markers) are skipped to prevent infinite loops. Reply threading respects the
internal_notesconfig: if the project requires internal notes but the original thread was public, the reply is posted as a new top-level internal note instead. - Slash command (
Both pipeline and merge_request triggers apply per-workflow ignore_users
and ignore_branches filtering: usernames and branch names matching any
regex pattern in a workflow’s ignore_users or ignore_branches list are
skipped. This allows bot-authored MRs or maintenance branches (e.g.
chore/*) to be excluded from code review while still analyzing their
pipeline failures.
Rate limiting is per-workflow: each workflow’s thread count is tracked
independently via JSON session markers that embed the workflow name and
commit SHA. The max_runs_per_mr limit applies separately to each
workflow on a given MR.
Events flow through a two-stage SQS pipeline. A slim ingress router
forwards webhooks from the standard queue to an SQS FIFO queue, grouped
by discussion_id for note events (ensuring same-discussion ordering)
and by SQS MessageId for pipeline/MR events (no serialization needed).
Phase 1 handlers validate the event and resolve the session, then post a
continuation message back to the FIFO grouped by session_id. Phase 2
picks up the continuation and runs the workflow. This ensures all work
for a given session is serialized even across multiple discussion threads.
The SQS infrastructure is defined in template.yaml (SAM/CloudFormation):
a standard ingress queue (60s visibility timeout for the router hop) and
a FIFO work queue (30-minute visibility timeout for workflow execution).
Model Configuration
The agent supports multiple LLM providers via a unified adapter interface.
Gemini
- API key (local dev) – Set
GOOGLE_API_KEY. Callsgenerativelanguage.googleapis.comdirectly. No region configuration needed. - Vertex AI (production) – Set
GOOGLE_CLOUD_PROJECTand configuremodel_regionsin the YAML settings. Uses Application Default Credentials (ADC) viagoogle-auth. For OpenShift, mount a service account key JSON file and setGOOGLE_APPLICATION_CREDENTIALS, or use workload identity.
Gemini uses implicit server-side caching – repeated prefixes are automatically cached by the Vertex AI backend with no opt-in required. Cached input tokens are billed at 25% of the base input rate. The agent tracks cached token counts from API responses for cost estimation.
Anthropic Claude
Claude models are accessed via Vertex AI using the same GCP project
(GOOGLE_CLOUD_PROJECT) with the region resolved from model_regions.
Enable the desired model in the GCP Model Garden. Use
model: claude-sonnet-4-20250514 in workflow config.
Prompt caching is enabled automatically for Claude models. The agent places explicit cache breakpoints on messages so that the full conversation prefix is served from cache on every turn after the first. Cache writes cost 1.25x the base input rate; cache reads cost 0.10x (90% discount). For a typical 20-iteration agent run, this reduces input token costs by approximately 80%. Ephemeral messages (iteration warnings, nudges) are excluded from cache writes to avoid polluting the cache with transient content.
Region configuration
GCP regions are configured via settings.model_regions in the config file –
a map of model name prefixes to GCP regions. The agent resolves the region by
longest-prefix match on the effective model name (same algorithm as cost
estimation). Example:
settings:
model_regions:
gemini-2.5-pro: us-east5
gemini-3.1-pro: global
claude: us-east5
At least one of GOOGLE_API_KEY or GOOGLE_CLOUD_PROJECT must be set. If both
are set, the API key takes precedence for Gemini models. Claude models always
require Vertex AI mode (GOOGLE_CLOUD_PROJECT + model_regions);
GOOGLE_API_KEY direct mode is for Gemini only.
Prerequisites
- Python 3.11+
- Podman (for local sandbox) or
kubectl(for K8s sandbox) - Authentication:
GOOGLE_API_KEYorGOOGLE_CLOUD_PROJECT(see above) - For Claude models: enable the desired model in GCP Model Garden
- GitLab tokens referenced in the config file (model tool tokens, orchestrator tokens)
- Kubeconfig with access to the Konflux cluster (if using Konflux data sources)
Installation
cd hummingbird-agent
pip install -e .
Usage
Local development (run)
The run command uses the config file for workflow lookup, data source
registration, and token resolution – the same code path as serve. By
default, results are printed to stdout (dry-run). Use --execute to post
the result as a GitLab MR note.
There are two ways to select what to run:
Direct workflow selection (--workflow + --project + --event):
# Run a specific workflow on an MR, Podman sandbox, print to stdout
hummingbird-agent run \
--workflow analyze-failures \
--project org/group/project \
--event '{"iid": 123, "sha": "abc123"}'
# Same but with K8s sandbox
hummingbird-agent run \
--workflow analyze-failures \
--project org/group/project \
--event '{"iid": 123, "sha": "abc123"}' \
--context my-cluster/my-namespace
# Post the result as a GitLab MR note (also saves session to S3 if configured)
hummingbird-agent run \
--workflow analyze-failures \
--project org/group/project \
--event '{"iid": 123, "sha": "abc123"}' \
--execute
# Save session locally for debugging (context.json, transcript.md, sandbox.tar.gz)
hummingbird-agent run \
--workflow analyze-failures \
--project org/group/project \
--event '{"iid": 123, "sha": "abc123"}' \
--save-session /tmp/my-session
# Resume from a saved session with a follow-up question
hummingbird-agent run \
--workflow analyze-failures \
--project org/group/project \
--event '{"iid": 123, "sha": "abc123"}' \
--resume-session /tmp/my-session \
--message "Can you look at the clair-scan timeout more closely?"
# Chain: resume and save the new session for another round
hummingbird-agent run \
--event-file event.json \
--resume-session /tmp/my-session \
--message "What layer hash failed?" \
--save-session /tmp/my-session-2
Event-file replay (--event-file): routes the event through the same
project-index lookup as serve, but skips rate limiting and status filtering:
# Replay a real webhook event, dry-run with Podman
hummingbird-agent run \
--event-file event.json
# Replay with K8s sandbox and post notes
hummingbird-agent run \
--event-file event.json \
--context my-cluster/my-namespace \
--execute
Production (serve)
# Poll SQS queue for events (pool sandbox by default, posts results as MR notes)
CONFIG_PATH=config.yml hummingbird-agent serve
Requires CONFIG_PATH pointing to a config file with settings.sqs_queue_url
and settings.sqs_fifo_queue_url set. The config file defines which workflows
run on which projects, with per-project limits and data source token mappings.
Handles SIGTERM/SIGINT for graceful shutdown. A background router thread
forwards events from the standard queue to the FIFO; the main thread consumes
the FIFO with a semaphore-gated thread pool (controlled by
settings.max_concurrent_agents) so excess messages stay in SQS for other
instances.
Config hot-reload: In serve mode a background thread polls the config file for changes (every 5 seconds by default). When the file changes, the new config is validated and atomically swapped in – subsequent event dispatches use the updated config. If the new config is invalid, the previous config is kept and a warning is logged. No restart required for config changes.
The serve command also accepts --sandbox, --context, and --namespace
for local development with a different sandbox backend (e.g. Podman).
CLI Options
run subcommand:
| Option | Description | Default |
|---|---|---|
--event |
Inline event JSON string (mutually exclusive with --event-file) |
- |
--event-file |
Read event from a JSON file (mutually exclusive with --event) |
- |
--workflow |
Workflow name from config file (requires --project) |
- |
--project |
GitLab project path (requires --workflow) |
- |
--execute |
Post result as GitLab MR note and save session to S3 | - |
--save-session |
Save session artifacts to this directory | - |
--resume-session |
Resume from a saved session directory (requires --message) |
- |
--message |
Follow-up message for session resumption (requires --resume-session) |
- |
--sandbox |
Sandbox backend (podman, k8s, or k8spool) |
podman |
--context |
K8s context (implies K8s backend) | - |
--namespace |
K8s namespace | - |
-v, --verbose |
Enable debug logging | - |
serve subcommand:
| Option | Description | Default |
|---|---|---|
--sandbox |
Sandbox backend (podman, k8s, or k8spool) |
k8spool |
--context |
K8s context (implies K8s backend) | - |
--namespace |
K8s namespace | - |
-v, --verbose |
Enable debug logging | - |
Configuration
Config file
Both run and serve use a single YAML config file. Set the path via
CONFIG_PATH (default: config.example.yaml). The config has two sections:
settings for operational parameters, and workflows for workflow
definitions. The settings section provides defaults that can be omitted
for local development (sensible defaults are used).
settings:
gitlab_url: https://gitlab.com # GitLab instance URL
sandbox: # sandbox pod configuration
image: quay.io/.../gitlab-ci:latest # container image (k8s mode only)
namespace: default # K8s namespace (required for K8s/pool)
active_deadline_seconds: 1800 # pod hard timeout / reap interval
linger_seconds: 300 # keep pod alive after success (pool mode, 0=disable)
max_lingering_pods: 2 # max idle lingering pods before eviction (pool mode)
metadata: # pod metadata (k8s mode only)
labels:
app.kubernetes.io/name: hummingbird-agent-sandbox
resources: # K8s resource requests/limits (k8s mode)
requests:
cpu: "100m"
memory: "256Mi"
ephemeral-storage: "256Mi"
limits:
cpu: "1"
memory: "1Gi"
ephemeral-storage: "2Gi"
max_concurrent_agents: 4 # max concurrent workflows (serve)
sqs_queue_url: "" # standard SQS ingress queue URL (serve)
sqs_fifo_queue_url: "" # FIFO work queue URL (serve)
s3_session_bucket: "" # S3 bucket for session persistence
model: gemini-3.1-pro-preview # or claude-sonnet-4-20250514
model_regions: # model prefix -> GCP region
gemini-2.5-pro: us-east5
gemini-3.1-pro: global
claude: us-east5
max_iterations: 30 # default iteration limit
max_runs_per_mr: 5 # default per-MR rate limit
internal_notes: true # default note visibility
docs_url: https://gitlab.com/org/group/project/-/blob/main/docs/agent.md
source_url: https://gitlab.com/org/group/project
slack_url: https://slack.example.com/archives/C0123456789
slack_label: "#my-channel"
workflows:
code-review:
trigger: merge_request # auto-trigger on MR events
description: Performs AI-powered code review
workflow_url: https://gitlab.com/org/group/project/-/blob/main/workflows/code-review.md
ignore_users: # skip bot MRs (regex fullmatch)
- "renovate\\[bot\\]"
- ".*-bot"
ignore_branches: # skip maintenance branches (regex fullmatch)
- "chore/.*"
prompt: workflows/code-review.md
action: post_gitlab_note
model: gemini-3.1-pro-preview # or claude-sonnet-4-20250514
max_iterations: 15
max_inline_size: 200000 # keep full diffs in context
context_limit: 500000 # Gemini 3.1 Pro / Claude Sonnet 4 have large context windows
data_sources:
gitlab:
token_env: GITLAB_TOKEN_RO
projects:
org/group/project: {}
analyze-failures:
trigger: pipeline # auto-trigger on failed pipelines
description: Investigates CI/CD pipeline failures
auto_resolve_on_push: true # resolve threads when a new SHA is pushed
auto_resolve_on_success: true # resolve threads when pipeline succeeds
workflow_url: https://gitlab.com/org/group/project/-/blob/main/workflows/analyze-failures.md
prompt: workflows/analyze-failures.md # relative to config file dir
action: post_gitlab_note
model: gemini-3.1-pro-preview # or claude-sonnet-4-20250514
max_iterations: 50 # per-workflow iteration override
data_sources: # model tool tokens (read-only)
gitlab:
token_env: GITLAB_TOKEN_RO # env var name, not the token
konflux:
cluster_url: https://example.com:6443/ns/my-tenant
kubeconfig_env: KUBECONFIG
kubearchive_url: https://kubearchive-api-server-product-kubearchive.apps.example.com
testing_farm: {}
projects:
redhat/hummingbird/containers:
tokens: # per-project model token overrides
gitlab: GITLAB_TOKEN_CONTAINERS_RO
With --workflow/--project, the workflow and project are looked up directly
in the config. With --event-file, the project is extracted from the event
body and matched against the project index to find applicable workflows.
Discussion threads
All workflow results are posted as discussion threads: a placeholder note starts the discussion and the full result is posted as a reply. The placeholder is never edited, so email notifications include the actual result text. For slash commands, the result is posted as a reply in the triggering discussion.
Auto-resolve
Workflows can opt in to automatic resolution of their discussion threads:
auto_resolve_on_push(defaultfalse): When a new commit is pushed to the MR (i.e. amerge_requestevent withaction: update), all agent discussion threads for the workflow whose SHA differs from the new HEAD are resolved. This clears stale failure analyses when the developer pushes a fix.auto_resolve_on_success(defaultfalse): When the head pipeline succeeds, all agent discussion threads for the workflow on that MR are resolved (regardless of SHA). This handles pipeline reruns on the same SHA where a transient failure is now green.
Both flags are independent and can be combined. Resolution runs before rate limit checks, so threads are resolved even if the workflow’s per-MR run limit has been reached.
Placeholder footer
The first agent-authored note (placeholder) includes a footer with links to documentation, source code, the Slack channel, the workflow prompt, and a continuation prompt. These links are configured via global settings:
| Setting | Description |
|---|---|
settings.docs_url |
Link to agent documentation |
settings.source_url |
Link to agent source repository |
settings.slack_url |
Link to support Slack channel |
settings.slack_label |
Display text for Slack link (default: “Slack”) |
Per-workflow, set workflow_url to link to the workflow’s prompt file.
The footer stays on the placeholder and the result is a separate reply.
Token separation
Tokens are split into two categories that never mix:
- Model tool tokens (in YAML
data_sources/tokens): read-only tokens passed to the LLM’s tool calls. Declared in the config file as env var names. These are user-defined and resolved at runtime from the referenced env vars. Create as project access tokens with Reporter role andread_apiscope. Reporter is the minimum role required to see internal (confidential) notes in the discussions tool. - Orchestrator tokens (
ORCHESTRATOR_*env vars, NOT in YAML): write-capable tokens used by the runner for posting/editing notes and rate-limit counting. Create as project access tokens with Developer role andapiscope. Developer role is required because the discussions tool trust-filters notes by author access level (>= Developer); if the orchestrator bot has only Reporter access, its own notes are redacted. Resolved by convention:ORCHESTRATOR_GITLAB_TOKEN_<MANGLED_PROJECT>(per-project) orORCHESTRATOR_GITLAB_TOKEN(fallback). TheORCHESTRATOR_prefix makes these impossible to confuse with model tokens.
Environment Variables
The agent reads only secrets and authentication from environment variables.
All operational settings come from the config file’s settings section.
| Variable | Required | Default | Description |
|---|---|---|---|
CONFIG_PATH |
no | config.example.yaml |
Path to config YAML |
GOOGLE_API_KEY |
yes* | - | Gemini API key; Gemini direct mode only (not for Claude) |
GOOGLE_CLOUD_PROJECT |
yes* | - | GCP project ID (Vertex AI mode); required for Claude models |
ORCHESTRATOR_GITLAB_TOKEN |
serve | - | Orchestrator GitLab token (global fallback) |
ORCHESTRATOR_GITLAB_TOKEN_<PROJECT> |
no | - | Per-project orchestrator token |
SENTRY_DSN |
no | - | Sentry DSN for error tracking |
*One of GOOGLE_API_KEY or GOOGLE_CLOUD_PROJECT is required. Claude models
require Vertex AI (GOOGLE_CLOUD_PROJECT); GOOGLE_API_KEY is for Gemini
direct mode only.
Model tool tokens (e.g. GITLAB_TOKEN_RO, GITLAB_TOKEN_CONTAINERS_RO) and
data source credentials (e.g. KONFLUX_CLUSTER_URL, KUBECONFIG) are
referenced by name in the config file’s data_sources and tokens sections.
They are not listed in the table above because their names are user-defined.
Security and Design Constraints
The agent is designed to run in a shared OpenShift cluster without cluster-admin access, processing potentially untrusted merge requests. These constraints shaped the architecture:
Sandbox isolation. All arbitrary commands executed by the LLM run inside an ephemeral container, never on the host:
- Podman (local):
--network=none,--user 65532, no host mounts. Complete network isolation. - Kubernetes (production): Pods comply with OpenShift’s
restricted-v2Security Context Constraint:runAsNonRoot,seccompProfile: RuntimeDefault,allowPrivilegeEscalation: false,capabilities.drop: ["ALL"],automountServiceAccountToken: false(no K8s API access from sandbox),activeDeadlineSeconds(configurable, default 1800). Security context fields are hardcoded in the pod manifest for portability to vanilla Kubernetes with Pod Security Admission (restricted level). Resource requests/limits, metadata, andactiveDeadlineSecondsare configurable viasettings.sandboxin the config file. Network access is denied by aNetworkPolicyon the sandbox namespace that blocks all egress from all pods (podSelector: {}).
No cluster-admin required. The agent operates with namespace-scoped permissions only. The orchestrator’s ServiceAccount needs only:
pods: create, get, list, delete, patch– sandbox pod lifecycle and pool claimspods/exec: create– command execution viakubectl exec
These permissions are granted via a Role in the sandbox namespace, not the orchestrator’s own namespace. No CRDs, no custom runtimes, no cluster-scoped resources. Konflux data is fetched via bearer token from kubeconfig, not from inside the cluster.
Namespace separation. Sandbox pods are created in a dedicated namespace, separate from the orchestrator. This limits blast radius: even if a sandbox pod is compromised, it has no visibility into the orchestrator’s Secrets, Pods, or ServiceAccount tokens. The sandbox namespace is locked down with standard K8s resources:
- RBAC: Role + RoleBinding scoped to the namespace, granting only the permissions above to the orchestrator’s ServiceAccount
- NetworkPolicy: uses
podSelector: {}to select all pods in the dedicated sandbox namespace, denying all egress (egress: []). The sandbox cannot reach the internet, the K8s API, or other pods. - activeDeadlineSeconds: sandbox pods self-terminate after the configured timeout (default 1800s / 30 minutes) even if the orchestrator crashes or is killed, preventing orphaned pods
Credential separation. Data source credentials (GitLab tokens, kubeconfig)
live in the orchestrator process only, injected via K8s Secrets. The sandbox
container has no credentials, no SA token
(automountServiceAccountToken: false), and no network access. Data flows
into the sandbox via stdin piping through write_file.
Command execution via kubectl exec. The K8s sandbox uses a hybrid approach:
the Kubernetes Python client manages pod lifecycle (create, wait, delete),
while kubectl exec handles command execution. This avoids the complexity and
reliability issues of the websocket-based exec API.
Sandbox Backends
| Podman (local) | K8s (direct) | K8sPool (production default) | |
|---|---|---|---|
| Start | podman run -d --network=none |
create_namespaced_pod |
Claim standby pod from Deployment |
| Exec | podman exec |
kubectl exec |
kubectl exec |
| Auth | Local Podman socket | In-cluster SA or kubeconfig | In-cluster SA or kubeconfig |
| Network | None (--network=none) |
None (deny-all NetworkPolicy) |
None (deny-all NetworkPolicy) |
| User | 65532 (fixed) |
Namespace UID range (SCC) | Namespace UID range (SCC) |
| Cleanup | podman rm -f |
delete_namespaced_pod |
delete_namespaced_pod |
All three implement the Sandbox protocol: start(), exec(),
write_file(), read_file(), cleanup(), linger().
The pool backend (k8spool) eliminates pod startup latency by claiming
pre-warmed pods from a Kubernetes Deployment. Claimed pods are detached
from the ReplicaSet and the Deployment automatically creates replacements.
After a successful workflow, pool pods linger for linger_seconds (default
300, configurable, 0 to disable) so that user replies can reuse the same
pod without re-creating it or restoring from S3. The reaper runs once per
workflow execution and deletes expired lingering pods. It also evicts excess
lingering pods beyond max_lingering_pods (default 2), starting with those
closest to their deadline. See the design doc (section 8.8) for details.
Data Sources
Data sources are registered as tool-calling functions. The model invokes them by name; the orchestrator executes them and returns results (or auto-spills large responses to the sandbox).
GitLab
| Tool | Description |
|---|---|
gitlab_get_mr_details |
MR metadata (title, author, state, SHA, labels) |
gitlab_get_mr_unified_diff |
Complete unified diff in patch format |
gitlab_get_mr_diff |
Per-file structured change data |
gitlab_get_mr_commits |
List of commits in a merge request |
gitlab_get_mr_discussions |
Discussion threads with redacted agent transcripts and trust-filtered comments |
gitlab_get_commit_statuses |
CI/CD pipeline statuses for a commit |
gitlab_get_file_at_ref |
Raw file content at a git ref |
gitlab_get_repo_archive |
Repository tar.gz (binary, auto-spilled) |
gitlab_get_job_log |
CI job trace output (ANSI codes stripped) |
Konflux
Fetches Tekton PipelineRuns and TaskRuns from both the live K8s API and Kubearchive (for completed resources), with deduplication by UID.
| Tool | Description |
|---|---|
konflux_list_pipelineruns |
All PipelineRuns for a commit SHA |
konflux_list_taskruns |
All TaskRuns for a commit SHA |
konflux_list_pods |
All pods for a commit SHA |
konflux_get_pod |
Full pod resource (spec, status, conditions, container statuses) |
konflux_get_pod_log |
Pod logs; optional container param, fetches all containers when omitted |
Response metadata includes konflux_ui base URL for building reviewer-facing
links.
Testing Farm
| Tool | Description |
|---|---|
tf_get_results |
JUnit XML results for a request ID |
tf_get_test_log |
Individual test log by URL (restricted to Testing Farm artifact URLs) |
tf_get_request_status |
Request state, queue/run times |
Response metadata includes artifacts_base URL for building artifact links.
Workflow System
Workflows are .md files whose content becomes the LLM system prompt
verbatim. Workflow metadata (action, model, max_iterations, enabled
data sources, project allowlists) is defined in the config file. The .md
file is pure system prompt text.
Available workflows:
analyze-failures.md- Investigates CI/CD pipeline failures by fetching MR details, identifying failed pipelines via commit statuses, retrieving PipelineRuns/TaskRuns from Konflux, analyzing test results from Testing Farm, and producing a grouped root-cause report with reviewer-facing URLs.code-review.md- Performs AI-powered code review by fetching MR details, the unified diff, and prior discussion threads in parallel, then producing structured feedback with severity ratings, code examples, and actionable suggestions. On follow-up reviews (after SHA updates), the agent sees its own previous findings, developer responses, and resolved threads – avoiding duplicate findings and respecting developer explanations. Uses elevatedmax_inline_size(200 KB) andcontext_limit(500K tokens) to keep the full diff in context.
Token Budget Management
Both Gemini and Claude benefit from prompt caching that reduces the effective
cost of full history replay. Cached token counts from both providers feed
into the estimate_cost() calculation. See
Agent Model Loop – Prompt caching
for details on how caching works per provider.
The agent uses a dual-limit approach instead of a cumulative token budget:
- Iteration limit (
settings.max_iterations, default 30, overridable per-workflow) - Hard cap on tool-calling rounds. A wrap-up prompt is injected at 80% (SOFT_ITERATION_RATIO). - Context limit (
CONTEXT_LIMIT, default 60,000 tokens, overridable per-workflow viacontext_limit) - Per-call input token ceiling. When exceeded, a wrap-up prompt forces the model to finalize.
Large outputs are automatically redirected to sandbox files to keep the LLM
context small. The spill threshold defaults to 4 KB (MAX_INLINE_SIZE) but
can be overridden per-workflow via max_inline_size in the config:
sandbox_exec- stdout/stderr exceeding the threshold saved to/tmp/_out/{N}.txt; model receives a preview (head + tail) with file path- Data sources - text exceeding the threshold saved to
/tmp/_out/{name}_{N}.txtwith preview; binary data saved to.bin fetch_to_sandbox- always writes to the caller-specified path; returns metadata only
All tuning constants are centralized in config.py:
| Constant | Default | Purpose |
|---|---|---|
DEFAULT_MAX_ITERATIONS |
30 | Hard iteration cap |
SOFT_ITERATION_RATIO |
0.8 | Inject wrap-up at this fraction |
CONTEXT_LIMIT |
60,000 | Per-call input token ceiling |
OUTPUT_PREVIEW_BYTES |
4,096 | Preview size for spilled outputs |
OUTPUT_TAIL_BYTES |
512 | Extra tail appended to previews |
MAX_INLINE_SIZE |
4,096 | Max inline size for data source responses |
Development
See the main README for development workflows.
make hummingbird-agent/setup # Install dependencies
make check # Lint code (ruff)
make fmt # Format code
make test # Run unit tests
make coverage # Run tests with coverage
License
This project is licensed under the GNU General Public License v3.0 or later - see the LICENSE file for details.