Coding

Consolidate your GitLab stack with Gitaly on Kubernetes

"Kubernetes gets a long-awaited GitLab boost: Gitaly's shift to the platform eliminates hybrid setup headaches, but requires careful tuning to mitigate memory-intensive Git operations and prevent out-of-memory crashes, now mitigated by running each process in a dedicated cgroup."

With GitLab 18.11, Gitaly on Kubernetes is now generally available. Teams that previously had to run Gitaly on separate virtual machines while keeping other GitLab components in Kubernetes can now consolidate everything into a single Kubernetes-managed environment.

Overview

Gitaly is the Git repository storage backend for GitLab. Until now, running GitLab on Kubernetes meant maintaining a hybrid architecture: most components lived in the cluster, but Gitaly required its own VM fleet. This added operational complexity — separate monitoring, separate upgrades, separate scaling.

The GA release eliminates that split. Gitaly can now be deployed as a Kubernetes-native component via the GitLab Helm chart, either as part of a full GitLab installation or as an external service.

Key technical challenges and solutions

Git operations are memory-intensive and their usage patterns are hard to predict. To prevent a single runaway Git process from crashing the entire Gitaly service, Gitaly can be configured to run each Git process inside a dedicated cgroup. If a Git process exceeds its cgroup memory limit and gets terminated, the main Gitaly process remains unaffected.

Making this work in Kubernetes required additional steps. Most Kubernetes clusters use containerd as their container runtime, which until recently only allowed containers to write to cgroupfs in privileged mode. The solution is to mount /sys/fs/cgroup via an init container and make the path writable.

Pod restarts also needed attention. On a VM, Omnibus can upgrade the Gitaly binary in place and reload gracefully by keeping the socket open while swapping out the process. On Kubernetes, when a StatefulSet pod is replaced — due to a Helm upgrade, node drain, or configuration change — the Gitaly pod is stopped and restarted with a hard stop, not a graceful reload. For Gitaly sharded deployments without high-availability, this means downtime.

The fix: making client retries configurable. By configuring Gitaly clients — such as Rails — to retry requests long enough for Gitaly to restart and become available again, users may see slightly higher latency during that window, but requests will ultimately succeed.

Performance benchmarks

GitLab ran benchmarks comparing Gitaly on VMs versus Gitaly on Kubernetes during upgrades. The results:

  • git clone: 100% success rate on both VM and Kubernetes
  • git pull: 100% on VM, 99.16% on Kubernetes
  • git push: 99.66% on VM, 100% on Kubernetes

These numbers are nearly identical. Full 100% success across all operations would require Gitaly Cluster (Praefect), which provides high-availability but does not yet support Kubernetes — that is actively being worked on.

What this means for you

If you're running GitLab in hybrid mode, you can now consolidate your infrastructure by moving Gitaly into the cluster. This eliminates maintaining and monitoring a separate VM fleet alongside your Kubernetes nodes.

If you're adopting GitLab for the first time and already operate software on Kubernetes, you now get a fully Kubernetes-native GitLab deployment via the Helm chart.

Installation

The recommended way to deploy Gitaly on Kubernetes is through the GitLab Helm chart. Before starting, read the Gitaly on Kubernetes documentation, which covers key configuration guidance and helps avoid common pitfalls. Gitaly can be deployed either as part of a full GitLab installation or as an external component.

Bottom line

Gitaly on Kubernetes GA removes a major operational headache for GitLab-on-Kubernetes users. The hybrid VM-plus-cluster era is over, though teams needing high-availability should wait for Gitaly Cluster support, which is still in development.

Similar Articles

More articles like this

Coding 1 min

Visual Studio Code 1.120

Visual Studio Code’s 1.120 update slashes debugging friction with native Data Breakpoints, letting engineers pause execution when specific object properties change—not just memory addresses. The release also bakes in GitHub Copilot-powered inline code completions for Python, JavaScript, and TypeScript, cutting keystrokes by up to 40% in early benchmarks, while a revamped terminal shell integration finally bridges the gap between local and remote workflows.

Coding 1 min

Dirtyfrag: Universal Linux LPE

A previously unknown Linux kernel vulnerability, dubbed Dirtyfrag, has been unearthed, allowing attackers to bypass memory protections and execute arbitrary code with elevated privileges via a carefully crafted network packet. The exploit leverages a flaw in the Linux kernel's networking stack, specifically in the handling of IPv6 fragmentation, to inject malicious code into a system's memory. This Local Privilege Escalation (LPE) vulnerability affects all Linux distributions.

Coding 1 min

AI Slop Is Killing Online Communities

"Rise of AI-generated spam and noise is suffocating online forums, as machine learning models optimized for clickbait and engagement flood platforms with low-quality content, overwhelming moderation tools and driving away genuine users. This 'AI slop' is often created by exploiting vulnerabilities in large language models, which can be trained to produce convincing but vacuous posts. The result is a toxic feedback loop that erodes community trust and threatens the very fabric of online discourse."

Coding 1 min

Natural Language Autoencoders: Turning Claude's Thoughts into Text

Anthropic’s latest research weaponizes Claude’s latent thought vectors as “natural-language autoencoders,” compressing the model’s internal reasoning into human-readable text without fine-tuning. By decoding the 16,384-token context window into coherent chains-of-thought, the technique slashes inference costs by 40 % while preserving 92 % of task accuracy—potentially unlocking real-time, explainable AI for high-stakes domains like healthcare diagnostics and legal compliance.

Coding 1 min

Show HN: Stage CLI – a tool to make reading your AI generated changes easier

A new command-line interface tool, Stage CLI, streamlines code review by breaking down AI-generated changes into logical chapters, allowing developers to navigate and understand modifications more efficiently. This open-source tool works with any coding agent, presenting changes in a browser-based interface that diverges from traditional IDE and CLI diff presentation methods. By reorganizing code review, Stage CLI aims to simplify the process of reviewing and understanding AI-driven code modifications.

Coding 1 min

Motherboard sales are now collapsing amid unprecedented shortages fueled by AI

"Enthusiast PC market motherboard sales plummet by 25% as chipmakers redirect semiconductor production to AI-focused applications, forcing top manufacturers like ASUS, Gigabyte, and MSI to slash projected sales by millions in 2025, exacerbating an already dire shortage of essential components."