This anonymized case covers a rendering farm built over time from a mix of GPU generations. The goal was to make the farm behave more like a single, well-balanced pool of capacity.
The studio operated a small farm of around twenty GPUs: a mix of datacenter cards and high-end consumer cards with different performance profiles. Over the years, new machines had been added as needed, and the software stack had grown organically.
Jobs were submitted through a queueing system running on top of Kubernetes. On paper, the total GPU capacity was sufficient for the studio’s workload. In practice, artists often experienced long queues and unpredictable turnaround times.
We began by collecting GPU utilization, job metadata, and queue statistics across the cluster. This quickly confirmed what the artists already felt: some nodes did far more work than others.
The existing scheduler treated all GPUs as if they were roughly equivalent. In reality, there were substantial differences in performance between cards. Heavy jobs could easily land on slower GPUs, while fast GPUs idled or handled lighter tasks.
We looked at which jobs caused the longest queues and the heaviest load. These often involved large textures and complex shading setups that were noticeably more sensitive to disk and network performance.
Tracing the path of assets from storage to GPU showed that certain nodes had slower effective access to shared storage, which further amplified differences between machines.
| Metric | Before | After |
|---|---|---|
| GPU load distribution | Highly uneven | Much more balanced across the farm |
| Average render time | Varied significantly per job | Reduced by roughly 20–25% |
| Queue length during busy periods | Frequently long and unpredictable | Noticeably shorter and more stable |
| Artist perception | Hard to know when work would finish | Turnaround felt more consistent and manageable |
Heterogeneous GPU fleets are a fact of life for many teams. Treating all cards as equivalent can hide a lot of capacity and create unnecessary frustration. A small amount of awareness in the scheduler goes a long way toward making a mixed farm feel like a unified resource.