Posted at: 18 February

Senior DevOps Engineer - Highload, Cloud & Data-Intensive Systems (EU / Remote)

Company

CompanyAlex Staff Agency

Alex Staff Agency is an international IT recruitment B2B agency specializing in connecting top tech talent with companies in the IT and creative sectors, operating remotely without a fixed headquarters.

Remote Hiring Policy:

Alex Staff Agency embraces remote work and offers flexible collaboration options, including fully remote roles and hybrid models in locations such as London. Team members are supported across various regions.

Job Type

Full-time

Allowed Applicant Locations

Europe

Salary

€5,000 to €8,000 per month

Job Description

About the project
The team develops and maintains distributed services around analytics, APIs, and transaction monitoring. The systems process very large volumes of data — terabytes of storage, trillions of records, continuously growing load.

Infrastructure:

~100 servers (bare metal + VPS)
active use of IaC
Kubernetes clusters in production
focus on stability, observability, and automation

The project is long-term — not a hype startup, but a mature product with real users.

What the work looks like
This is a hands-on role with a clear time allocation:

60% — operations and incidents (including helping teams)
20% — infrastructure automation
20% — prototyping, improvements, technical initiatives

There is on-call responsibility, but normally after-hours incidents happen 2–3 times a year, not every week.

Responsibilities
Operation of production services and infrastructure (server provisioning/decommissioning, updates, replacements, performance troubleshooting)
Support and development of Infrastructure as Code (Terraform / Ansible: modules, roles, standards, reviews)
Monitoring, alerting, backups, and regular recovery checks
Development of service and infrastructure automation
Development of CI/CD and release procedures
Incident diagnosis and resolution, support for product teams
Traffic analytics, bot and attack protection tools
Responsibility for 24/7 platform stability

What’s important
4+ years of experience operating Linux/Ubuntu infrastructure and production services
Strong understanding of networking and troubleshooting
Kubernetes (cluster operations), Rancher, Docker / containerd
Hands-on experience with Ansible and Terraform
Monitoring: Prometheus / Thanos / Telegraf / Grafana / Sentry
CI/CD: Jenkins
Automation: Bash, Python
Experience working with LVM

Nice to have
Experience working with blockchain nodes
Diagnosis and tuning of ClickHouse and MongoDB in high-load clusters
Providers: Hetzner / OVHcloud
Cloudflare (edge, DDoS), experience with AWS
Handling abuse tickets with hosting providers

Technology stack
VPN: WireGuard, OpenVPN
Databases: ClickHouse, MongoDB, Redis, PostgreSQL
Applications: Node.js (pm2), php-fpm, Lua, Tarantool
Supporting services: Go (operatorSDK), Ruby, Node.js, PHP

5,000 – 8,000 € net

Format: office / hybrid / remote

Location: Spain (Barcelona and suburbs) or remote (CET ±2)

Full-time

Opportunity to genuinely influence architecture and processes

Mature engineering team and reasonable expectations