Posted at: 15 January

Site Reliability Engineer

Company

TensorWave

TensorWave is a Las Vegas-based B2B cloud computing provider specializing in AI and high-performance computing infrastructure, utilizing AMD Instinct GPUs to deliver scalable solutions for enterprises and AI researchers.

Job Type

Full-time

Allowed Applicant Locations

United States

Job Description

Our mission at Tensorwave Cloud is to build seamless, secure, reliable, and resilient AI infrastructure at scale, eliminating barriers and challenging the status quo to empower builders and support AI innovation.

About the role

We are seeking a Site Reliability Engineer with a strong background in software engineering to build and maintain highly scalable, secure, and resilient infrastructure.

You’ll play a critical role in designing low-level systems, automating infrastructure with modern tooling, and ensuring platform reliability.

This role is ideal for someone who’s comfortable working at the intersection of systems programming and DevOps - writing code in Go, Javascript, Rust, C, or Zig while also managing infrastructure with NixOS, Kubernetes, and Terraform.

Responsibilities

  • Design, build, and maintain infrastructure systems using Linux and NixOS

  • Manage infrastructure-as-code with Terraform to provision and scale resources

  • Architect and operate Kubernetes clusters with a focus on performance, security, and automation

  • Write high-performance tooling and internal utilities in Go, Javascript, Rust

  • Develop and maintain CI/CD pipelines for infrastructure and code deployments

  • Monitor system performance, resolve issues, and improve reliability through observability tooling

  • Collaborate closely with engineering teams to support deployment strategies and development workflows

Required Experience

  • Bachelor of Science in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience

  • 5+ years in DevOps, Site Reliability, or Infrastructure Engineering roles

  • Proficiency in one or more low-level languages: Rust, C, Zig, Javascript, and Go

  • Deep experience with Linux systems and configuration management

  • Hands-on experience with Terraform, Kubernetes, and containerized environments

  • Strong understanding of systems programming, performance tuning, and operating system internals

  • Familiarity with CI/CD practices and infrastructure monitoring/alerting tools

What We Bring

  • Mission driven company

  • Competitive Salary

  • Stock Options

  • 100% paid Medical, Dental, and Vision insurance

  • Flexible PTO

  • Paid Holidays

  • 401(k)

  • Parental Leave

  • Flexible Spending Account

  • Short Term Disability Insurance

  • Life and Voluntary Supplemental Insurance

  • Mental Health Benefits through Spring Health

We’re looking for resilient, adaptable people to join our team, people who believe in the mission and think at massive scale. The solutions that worked on a handful of devices will not work at Exascale. Be prepared to be pushed daily, to learn a lot, and literally build the future.

Tensorwave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, national origin, or veteran status.