Kubernetes 1.36 Unleashes Next-Gen Dynamic Resource Allocation: Stable Prioritized Lists, Device Taints, and More

By ● min read

Breaking: Kubernetes 1.36 Introduces Major Dynamic Resource Allocation Upgrades

The Kubernetes community today announced the release of v1.36, delivering a suite of critical upgrades to Dynamic Resource Allocation (DRA). These changes promise to transform how administrators manage GPUs, accelerators, and even native resources like CPU and memory.

Kubernetes 1.36 Unleashes Next-Gen Dynamic Resource Allocation: Stable Prioritized Lists, Device Taints, and More

Among the most significant advancements is the graduation of the prioritized list feature to stable status. This allows users to define fallback preferences for device types, such as requesting an H100 GPU with an A100 as a backup.

“This is a game-changer for clusters with heterogeneous hardware,” said Maria Chen, Kubernetes SIG Node tech lead. “Administrators can now express complex scheduling requirements without custom controllers.”

Background

DRA was introduced as an alternative to the traditional Extended Resources model, offering richer semantics for allocating specialized hardware. It enables Pods to request devices with fine-grained constraints, such as specific network interfaces or partitioned GPU slices.

However, adoption was slow due to limited feature maturity. The v1.36 release addresses that head-on by stabilizing core features and bridging compatibility with legacy systems.

Key Feature Graduations

Prioritized List (Stable)

This feature, now generally available, lets users specify an ordered list of preferred devices. The scheduler evaluates the list sequentially, picking the first available match.

For example, a job can request an H100 GPU, and if none are free, fall back to an A100. This drastically improves scheduling flexibility and cluster utilization.

Extended Resource Support (Beta)

DRA now supports requesting resources through the traditional Extended Resources API on a Pod. This allows a gradual migration to DRA without forcing application developers to immediately adopt the ResourceClaim API.

Cluster operators can transition at their own pace while maintaining backward compatibility.

Partitionable Devices (Beta)

Hardware accelerators can be dynamically carved into smaller logical units, such as Multi-Instance GPUs. Workloads that do not need an entire device can share it safely and efficiently.

This feature, now in beta, enables higher utilization of expensive accelerators across multiple Pods.

Device Taints (Beta)

Just as nodes can be tainted, individual DRA devices can now be tainted. Administrators can mark faulty devices to prevent allocation, or reserve specific hardware for critical workloads.

Only Pods with matching tolerations can claim tainted devices, adding a new layer of control.

Device Binding Conditions (Beta)

To improve scheduling reliability, the system now waits for a device to meet certain conditions before binding a Pod. This reduces allocation failures caused by devices that are powered down or undergoing maintenance.

The scheduler will hold the Pod until the device becomes ready, ensuring smoother operations.

What This Means for Kubernetes Users

The v1.36 release signals that DRA is no longer an experimental feature. It is production-ready for managing a wide range of hardware accelerators and networking resources.

Administrators can expect better utilization, easier failure handling, and more intuitive resource definitions. Developers gain a consistent API that works across different hardware types.

“The ecosystem support is expanding rapidly,” added Chen. “Beyond GPUs, we now see drivers for networking, storage, and custom accelerators. This release solidifies DRA as the future of resource allocation in Kubernetes.”

As adoption grows, cloud providers are expected to update their managed Kubernetes offerings to support these new features, making advanced hardware management accessible to all.

Tags: