Posted at: 3 March

Senior Backend Engineer (Ruby and/or Go), Tenant Scale; Cells Infrastructure

Company

CompanyGitLab

GitLab is a San Francisco-based DevOps platform offering B2B and B2C solutions for software development, security, and collaboration, with a global presence.

Remote Hiring Policy:

GitLab is a fully remote company that hires globally, with team members located in over 65 countries. We embrace flexibility in scheduling to accommodate various time zones.

Job Type

Full-time

Allowed Applicant Locations

North America, South America, Europe, Asia

Job Description

An overview of this role

As a Senior Backend Engineer, Cells Infrastructure, you'll help us build the foundation that lets GitLab.com scale horizontally through our Cells architecture. You'll work on two core parts of that system: edge routing services that direct traffic across a fleet of independent Cell clusters, and the Topology Service that manages and serves cluster topology information as the source of truth for the rest of the platform. Your work will make routing reliable and low-latency across protocols, and ensure GitLab teams can build Cell-aware features with confidence as we grow.

Some examples of our projects:

  • Building and operating routing services in TypeScript that direct requests to the correct Cell using cluster topology data

  • Developing and maintaining the systems that store, update, and serve cluster topology information that routing, resource assignment, and Cell lifecycle decisions depend on

What you'll do

  • Design and implement edge traffic routing that directs requests to the correct Cell in a way that's transparent to users.

  • Build and evolve the Topology Service that serves as the authoritative source of cluster state for routing, resource assignment, and Cell lifecycle decisions.

  • Collaborate across the GitLab Rails monolith and supporting services to make features and data models Cell-aware with feature teams across the product.

  • Operate and improve the routing and topology systems you build by participating in tier-2 on-call, responding to escalated incidents, and strengthening observability and operational tooling.

  • Author Architecture Decision Records (ADRs), operational runbooks, and documentation so other teams can understand, adopt, and extend the Cells platform.

  • Review merge requests from GitLab team members and community contributors, maintaining high standards for correctness, performance, and security across the stack.

What you'll bring

  • Experience building observable, resilient production services using Go or Ruby on Rails (TypeScript experience is a plus).

  • Background delivering and operating production systems in high-scale environments, including incident response and operational ownership.

  • Ability to reason about distributed systems, including consistency models, partitioning strategies, failure modes, and operational tradeoffs.

  • Experience building high-throughput networking services (gRPC and protocol buffers knowledge is a plus).

  • Familiarity working in large, multi-team codebases and coordinating changes across teams and services, including making features and data models Cell-aware.

  • Knowledge of observability practices such as metrics, tracing, and alerting, with an approach focused on building systems you'd be confident operating on-call.

  • Strong written communication skills for an async-first, globally distributed team, including documenting decisions (for example, architecture decision records) and runbooks.

  • Experience working with relational databases in production, including schema design, migrations, and query performance tuning (PostgreSQL experience is a plus).

About the team

We're the Cells Infrastructure team within the Tenant Scale group in Infrastructure Platforms. We're a globally distributed, all-remote group of Backend Engineers and Site Reliability Engineers working asynchronously across multiple time zones. We own foundational services for GitLab's Cells architecture, including edge routing that directs requests to the right Cell and our topology systems that act as the source of truth for cluster state. Our challenge is making GitLab.com scale horizontally in a way that's reliable, low-latency, and operable, so every request reaches the right cluster and we can keep growing safely. For more on how we work, see Tenant Scale Group Handbook Page.