Pipeline Design

Design guidelines and architecture of the K8s test pipeline.

Task/Step Overview

Task Type Steps Results Retries Timeout
check-for-tests Inline taskSpec write-snapshot, resolve-component, fetch-source, check-for-tests, write-results HAS_TESTS, TEST_OUTPUT 2 10m
provision-namespace External bundle (task-eaas-provision-space) (managed by task) secretRef 2 10m
run-tests Inline taskSpec write-snapshot, resolve-component, fetch-source, run-tests, write-results TEST_OUTPUT 2 24h
fail-pipeline-on-test-failure Inline taskSpec (finally) check-results 0 10m

Both check-for-tests and run-tests independently fetch source — each task gets its own emptyDir volume, so no state carries across tasks.

Design Guidelines

1. Filesystem over results

Pass data between steps via files on a shared emptyDir, not Tekton results. Tekton results have a 4KB termination message limit (tektoncd/pipeline#4060) and large $(params.*) substitutions hit ARG_MAX.

Each task mounts an emptyDir at /workdir. Steps communicate via files: snapshot.json, component, image_names, src/, kubeconfig, result, error. Zero $(params.*) references remain inside script: blocks — all param values are passed via env vars or written to /workdir.

2. Distinguish retryable from non-retryable errors

Use exit 1 for transient failures (network, registry) that benefit from retry. Use exit 0 + a sentinel file for permanent failures (bad data) to prevent wasted retries.

Non-retryable errors write Konflux-format JSON to /workdir/error and exit 0. Subsequent steps check for this sentinel and skip. The write-results step copies it to TEST_OUTPUT. Tasks default to retries: 2, but fail-pipeline-on-test-failure has retries: 0 — test failures should not be retried.

3. Fail fast

Every shell script enables set -euo pipefail so unexpected command failures surface immediately rather than cascading silently.

4. Separate data from status

Report test results as structured data for the CI system to consume, and separately translate results into pipeline pass/fail for source control integration.

Konflux reads TEST_OUTPUT results as data but does not fail the pipeline based on them. The fail-pipeline-on-test-failure finally-task reads both TEST_OUTPUT values and exit 1 on non-SUCCESS. This is needed because GitLab PAC integration requires the pipeline itself to fail for MR status reporting.

5. Gate expensive work behind cheap checks

Run a fast, cheap check before provisioning resources. Skip the expensive path when there is nothing to do.

provision-namespace and run-tests are gated on check-for-tests.results.HAS_TESTS == "true" via Tekton when expressions. If no tests-k8s.yml exists, the pipeline completes with just check-for-tests, skipping EaaS provisioning. This is the common case for components without K8s tests.

6. Make failures reproducible

Preserve all diagnostic output (do not swallow stderr). Log exact commands so developers can copy-paste to reproduce locally.

Stderr from cosign/jq/oras is not swallowed. The full run_tests_k8s.sh command line is logged.

7. Set task timeouts deliberately

Tekton applies a global default TaskRun timeout (typically 1h from default-timeout-minutes in config-defaults) to every task that has no explicit timeout, regardless of pipeline-level timeouts. This means omitting a task timeout does NOT let the pipeline timeout control the bound — the task gets killed at 1h.

Every task in this pipeline has an explicit timeout. Bounded operations (check-for-tests, provision-namespace, fail-pipeline-on-test-failure) use 10m. The variable-duration run-tests task uses 24h so the pipeline-level timeout (controlled per-application via ITS annotations like test.appstudio.openshift.io/pipeline_timeout) is the effective bound, following the Testing Farm pipeline pattern.

8. Define shared logic once

When multiple tasks need identical steps, use YAML anchors to define them once and reuse via aliases. Tekton pipelines have no function/import mechanism, so anchors are the only DRY tool available within a single pipeline file. Without this, step logic drifts between tasks when one copy is updated but not the other.

write-snapshot, resolve-component, and fetch-source are defined as anchors (&write-snapshot-step, etc.) in check-for-tests and reused via *write-snapshot-step in run-tests. Both tasks independently execute the same steps to fetch and process source, staying in sync.

Trusted Artifacts Source Fetch

Source code is fetched via the Konflux Trusted Artifacts chain:

  1. cosign download attestation — retrieves build attestations (handles multiple JSONL attestations)
  2. Extract SOURCE_ARTIFACT from .predicate.buildConfig.tasks[].results[]
  3. oras blob fetch — downloads the source tarball
  4. tar -xzf — extracts to /workdir/src

The oci: prefix is stripped from the artifact URI. --no-same-owner avoids permission issues during extraction.