Posted at: 10 April
Platform Engineer (Hybrid Infrastructure) - Level 3
Company
Smarsh Inc. is a Portland-based B2B SaaS provider specializing in digital communications governance and compliance solutions for regulated industries.
Remote Hiring Policy:
Smarsh supports remote work for various roles, including positions available for candidates in the United States. The company values a diverse workforce and encourages applications from individuals across different regions.
Job Type
Full-time
Allowed Applicant Locations
Worldwide
Salary
$120,000 to $160,000 per year
Job Description
The SMB Platform Engineering team at Smarsh is responsible for the reliability, automation, and evolution of the infrastructure powering Smarsh's SMB Product, a mission-critical archiving and communication surveillance platform currently running on-premises. Later this year, the team will be taking on cloud infrastructure responsibilities as part of Smarsh's broader platform modernization effort.
As a Level 3 Platform Engineer on the SMB Platform Engineering team, you will own the design, delivery, and operation of infrastructure automation and platform tooling across Smarsh's on-premises environment, with cloud responsibilities coming into scope later this year. You will work independently on complex projects and serve as a technical liaison to adjacent Smarsh platform teams. In year one, success looks like: measurable reduction in operational toil, at least one significant automation or migration initiative delivered, and runbooks for the systems you own.
How will you contribute?
- Own Kubernetes platform operations, including cluster health, workload deployments, scaling, and incident response.
- Design, implement, and operate infrastructure automation using Ansible, Terraform, and GitOps workflows (ArgoCD / Flux)
- Lead migration projects moving on-premises workloads toward One Smarsh (cloud-native) platform services.
- Build and maintain CI/CD pipelines (CircleCI, GitHub Actions) for infrastructure and application delivery.
- Drive observability improvements across Datadog, Splunk, and ELK, including dashboards, alert tuning, and SLO/SLA definition.
- Participate in the on-call rotation, responding to P1/P2 incidents; the team rotates on-call roughly every 4-6 weeks and performs scheduled overnight maintenance windows a few times per year.
- Support security and compliance requirements, including patch management, access controls, and audit readiness for regulated workloads.
- Contribute to runbooks and operational documentation as systems are built and changed
- Collaborate with other Smarsh platform teams on the build and adoption of a One Smarsh platform.
What will you bring?
- 4–7 years of experience in platform engineering, SRE, or infrastructure engineering roles.
- Strong hands-on experience with Kubernetes (cluster operations, Helm, workload troubleshooting).
- Proficiency with infrastructure-as-code tooling, specifically Ansible and/or Terraform in production environments.
- Strong Linux systems administration skills (Ubuntu)
- Experience with GitOps workflows and CI/CD pipelines at scale.
- Experience with VMware vSphere in a production environment.
- Demonstrated ability to self-direct and drive projects to completion with minimal oversight.
- Comfortable operating in evolving environments where processes and tooling are actively maturing.
- Strong communication skills with cross-functional stakeholders.
- Experience with one or more of Datadog, Splunk, or ELK for dashboards, monitors, and log management is preffered.
- Familiarity with compliance-sensitive or regulated industry infrastructure (financial services, healthcare, or similar) is preffered.
- Experience with ArgoCD, Flux, or similar GitOps continuous delivery tooling is preffered.
- Familiarity with Jenkins or Concourse for CI/CD pipeline management is preffered.
- Familiarity with VMware Kubernetes Service (VKS) or other VMware-native Kubernetes platforms is preffered.
- Python scripting for automation and tooling is preffered.
- Prior experience in an on-call rotation with a defined SLA structure is preffered.
- Experience with cloud infrastructure (AWS), beneficial as the team takes on cloud responsibilities later this year is preffered.