Posted at: 16 January

Lead DevOps Engineer (Azure, Terraform)

Company

CompanyNorthBay Solutions

NorthBay Solutions is a global B2B technology consulting firm and AWS Premier Partner specializing in AWS-oriented professional services, including AI/ML and cloud migrations, serving diverse industries such as Automotive and Healthcare.

Job Type

Full-time

Allowed Applicant Locations

India

Job Description

Job Title: Lead DevOps Engineer (Azure, Terraform)
Employment Type: Full-time Remote (India) 

About the Role: 

NorthBay, a leading AWS Premier Partner, is seeking a highly skilled Lead DevOps (Azure, Terraform) to join its growing cloud and AI engineering team. This role is ideal for candidates with a strong foundation in cloud DevOps practices and a passion for implementing scalable MLOps solutions. 

Key Responsibilities: 

● Design, implement, and manage CI/CD pipelines using tools such as Jenkins, GitHub Actions, or Azure DevOps 

● Develop and maintain Infrastructure-as-Code using Terraform 

● Manage and scale container orchestration environments using Kubernetes, including experience with larger production-grade clusters 

● Ensure cloud infrastructure is optimized, secure, and monitored effectively 

● Collaborate with data science teams to support ML model deployment and operationalization 

● Implement MLOps best practices, including model versioning, deployment strategies (e.g., blue-green), monitoring (data drift, concept drift), and experiment tracking (e.g., MLflow) 

● Build and maintain automated ML pipelines to streamline model lifecycle management 

Required Skills: 

● 8 to 12 years of experience in DevOps and/or MLOps roles 

● Proficient in CI/CD tools: Jenkins, GitHub Actions, Azure DevOps 

● Strong expertise in Terraform, including managing and scaling infrastructure across large environments 

Hands-on experience with Kubernetes in larger clusters, including workload distribution, autoscaling, and cluster monitoring 

● Strong understanding of containerization technologies (Docker) and microservices architecture 

● Solid grasp of cloud networking, security best practices, and observability 

● Scripting proficiency in Bash and Python 

Preferred Skills: 

● Experience with MLflow, TFX, Kubeflow, or SageMaker Pipelines 

● Knowledge of model performance monitoring and ML system reliability 

● Familiarity with AWS MLOps stack or equivalent tools on Azure/GCP