Atinec Stack
📖 Tutorial

The Quiet Modernization: How We Revamped the Kubernetes Image Promoter

Last updated: 2026-05-01 09:29:24 Intermediate
Complete guide
Follow along with this comprehensive guide

In early 2026, the Kubernetes SIG Release team quietly performed a major surgery on the engine that powers all container image distribution for the project. The kpromo tool, responsible for copying images from staging to production registries, signing them, and replicating across 20+ regional mirrors, had accumulated nearly a decade of growth. The result? A faster, leaner, and more reliable pipeline that no one noticed — exactly as intended. Below, we answer key questions about this invisible rewrite.

What is kpromo and why is it essential for Kubernetes?

Kpromo, short for Kubernetes image promoter, is the automated system that moves every container image from staging registries to the production registry.k8s.io. It doesn't just copy files — it also signs each image with cosign, replicates signatures across more than 20 regional mirrors worldwide, and generates SLSA provenance attestations. If kpromo fails, no Kubernetes release can ship. This makes it one of the most critical yet invisible tools in the release infrastructure, handling every single container image that pull requests use when deploying Kubernetes.

The Quiet Modernization: How We Revamped the Kubernetes Image Promoter

How did the image promoter originate and evolve over time?

The project began at Google in late 2018 when Linus Arver created an internal tool to replace a manual, Googler-gated image copy process. The goal was a community-owned, GitOps-driven workflow: push to a staging registry, open a PR with a YAML manifest, get it reviewed and merged, and automation handles the rest. This became KEP-1734. By early 2019, code moved to the kubernetes-sigs/k8s-container-image-promoter repo and quickly grew. Over the years, tools like cip, gh2gcs, and promobot-files were consolidated into a single CLI called kpromo by Stephen Augustus. Later, Adolfo Garcia Veytia added cosign signing and SBOM support, Tyler Ferrara built vulnerability scanning, and Carlos Panato kept the project healthy. In total, 42 contributors made ~3,500 commits across 60+ releases.

What key problems had accumulated by 2025?

By 2025, the codebase showed its age. Production promotion jobs for core Kubernetes images regularly took over 30 minutes and frequently failed with rate limit errors. The core logic had turned into a monolith that was hard to extend and difficult to test, making adding features like vulnerability scanning a painful exercise. The README itself warned of duplicated code, multiple techniques for accomplishing the same thing, and several TODOs. Two items on the SIG Release roadmap — Rewrite artifact promoter and Make artifact validation more robust — had lingered for a long time, discussed in meetings and KubeCons but never resolved.

How did the team organize the rewrite effort?

In February 2026, the team opened issue #1701, titled Rewrite artifact promoter pipeline. This single tracking issue answered eight open research spikes from project board #171. The rewrite was deliberately phased so each step could be reviewed, merged, and validated independently, reducing risk. The approach ensured that no single massive change would break the entire pipeline. Instead, small, targeted PRs gradually replaced old logic while keeping the system operational at all times.

What were the three main phases of the rewrite?

The rewrite unfolded in three distinct phases. Phase 1 (issue #1702) focused on rate limiting: rewriting the throttling mechanism to properly handle all registry operations with adaptive backoff, eliminating those common failures. Phase 2 (issue #1704) introduced clean interfaces for registry and authentication operations, making them swappable and independently testable. This architectural decoupling was crucial for future extensibility. Phase 3 rebuilt the core promotion pipeline as a modular, testable pipeline rather than a monolithic function. Each phase was independently reviewed and merged, allowing the team to validate each piece before moving to the next.

What concrete results did the rewrite achieve?

The rewrite delivered striking improvements: 20% of the codebase was deleted, eliminating redundant and dead code. The new pipeline runs dramatically faster — promotion jobs that once took over 30 minutes now complete in a fraction of that time, and rate limit failures are virtually gone. The clean interfaces allow new features (like enhanced provenance or scanning) to be added without touching the core logic. Most importantly, the change was invisible to users: every image continued to be promoted, signed, and mirrored without interruption. The team considers the lack of notice a sign of success.

How did the team ensure zero disruption during the rewrite?

Continuous integration and a phased approach were key. By breaking the rewrite into three separate PRs/issues (#1702, #1704, then pipeline rebuild), each change could be tested in isolation against staging environments before merging. The team maintained full backward compatibility throughout, and no new dependencies were introduced that could break existing workflows. They also kept the old code paths temporarily as fallbacks until the new ones were proven stable. Regular monitoring and alerting ensured any regressions would be caught quickly. The result: the production pipeline never skipped a beat.

What lessons did the team learn for future maintenance?

The rewrite reinforced the value of incremental modernization. Rather than waiting for a total rewrite, the team recommends establishing clean interfaces early and investing in adaptive rate limiting from the start. The success of the phased approach shows that large systems can be revamped without user-facing disruptions if changes are broken into reviewable, testable chunks. They also learned the importance of documenting technical debt openly (as the README did) so future maintainers know where to focus. Finally, keeping a single tracking issue for multiple spikes helped align the community and avoid duplicate work.