Posted at: 10 February

Senior Database Reliability Engineer (DBRE) & Architect

Company

CompanyAlex Staff Agency

Alex Staff Agency is an international IT recruitment B2B agency specializing in connecting top tech talent with companies in the IT and creative sectors, operating remotely without a fixed headquarters.

Remote Hiring Policy:

Alex Staff Agency embraces remote work and offers flexible collaboration options, including fully remote roles and hybrid models in locations such as London. Team members are supported across various regions.

Job Type

Full-time

Allowed Applicant Locations

Worldwide

Salary

$120,000 to $160,000 per year

Job Description

This position is open at a global product-led IT company specializing in infrastructure stability and security solutions. Their products are recognized as the industry standard in the Hosting and Enterprise segments, powering over 500,000 servers worldwide.

In 2025, the company is evolving its data management strategy, shifting from traditional database administration to an Internal Database-as-a-Service (DBaaS) model. This role requires a visionary engineer to design resilient distributed systems, automate infrastructure through code, and transform databases into a reliable service for product teams. This is an ideal opportunity for those ready to handle petabytes of data and build high-scale platform solutions.

Key Challenges & Responsibilities:

  • Designing and implementing a self-service platform (Terraform + Ansible) for deploying HA clusters (PostgreSQL, ClickHouse, MongoDB, Redis) in a heterogeneous environment (Bare Metal, OpenNebula, K8s, Public Clouds).
  • Managing rapidly growing analytics clusters (12+ clusters, tens of terabytes), focusing on sharding, ReplicatedMergeTree, and building reliable S3 backup pipelines under high load.
  • Maintaining and scaling infrastructure for Apache Airflow and Redash, ensuring the reliability of ETL pipelines and visualization tools.
  • Implementing SRE practices in data management: replacing manual incident response with automated self-healing mechanisms and defining SLO/SLIs.
  • Migrating legacy solutions to modern cloud patterns and implementing Kubernetes operators for stateful workloads.
  • Serving as a technical authority for product teams to optimize data schemas and SQL queries for high-load systems.

Tech Stack:

  • DB: PostgreSQL 15+ (Patroni, PgBouncer), ClickHouse (Sharded/Replicated), MongoDB, Redis, Kafka.
  • Data & Analytics: Apache Airflow, Redash.
  • Infrastructure: Hybrid Cloud (3+ private DCs, OpenNebula, K8s, Bare Metal, AWS, GCP, Azure, DO).
  • IaC & CI/CD: Terraform, Ansible, Python/Go, GitLab, Jenkins, Gerrit.
  • Observability: VictoriaMetrics, Grafana, Loki.

Must have:

  • 5+ years of PostgreSQL expertise: deep knowledge of MVCC, locking mechanics, expert-level Patroni/PgBouncer configuration, and experience with seamless major version upgrades under load.
  • ClickHouse mastery: experience operating large clusters, understanding ZooKeeper/ClickHouse Keeper, sharding, replication internals, and performance diagnostics at the data-part level.
  • Engineering mindset (SRE/DevOps): experience writing complex Terraform modules and Ansible roles; proficiency in Python or Go for automation is a major asset.
  • Hybrid environment experience: understanding the nuances of running DBs on Bare Metal vs. Kubernetes vs. Public Cloud, with the ability to optimize TCO and disk subsystem performance (NVMe, Network Storage).
  • Systems approach: understanding the full stack from network packets to business logic, including security standards (FIPS, Audit logs) and Disaster Recovery.

Nice to Have:

  • Experience building an Internal Developer Platform (IDP).
  • Experience operating databases in Kubernetes via operators (CloudNativePG, Altinity Operator).
  • Background working with Cloud or Hosting providers on similar services.

- Fully remote work from any location worldwide and flexible working hours.

- Opportunity to impact architectural decisions for services used by thousands of companies globally.

- 24 days of vacation, 10 national holidays, and unlimited paid sick leave.

- Compensation for private medical insurance.

- Reimbursement for co-working spaces and gym/sports activities.

- Dedicated budget for education, training, and conferences.

- Reward program for innovative ideas that lead to company patents.