Your mission
We are looking for a Go Platform Engineer who thrives at the intersection of infrastructure, AI systems, and DevOps. In this role, you will architect and scale the backbone of our AI Platform: Ensuring high availability, low latency, and seamless integration of machine learning capabilities into production. You will own the microservices that power AI inference, build robust multi-tenant infrastructure, and support our Data & AI team with production-grade DevOps practices.
Your responsibilities:
Your responsibilities:
- Design, build, and maintain Go microservices that handle AI model inference, data processing pipelines, and real-time streaming workflows.
- Architect scalable APIs (gRPC/REST) that serve as the bridge between AI models and production applications.
- Own the Kubernetes infrastructure (EKS), including deployments, autoscaling policies, service mesh, and cluster health monitoring.
- Implement service-to-service communication using gRPC and message queues (RabbitMQ/SQS) for asynchronous processing.
- Integrate with cloud AI services (AWS Bedrock, OpenAI, Anthropic) and manage model serving infrastructure.
- Build multi-tenant capabilities including authentication (JWT/JWKS), rate limiting, usage tracking, and tenant isolation.
- Partner with the Data & AI team to productionize machine learning models—wrapping them in production-ready services with proper health checks, circuit breakers, and graceful degradation.
- Build comprehensive observability: structured logging, metrics (Prometheus), distributed tracing (Jaeger/Tempo), and alerting.
- Implement CI/CD pipelines and infrastructure-as-code (Terraform) for automated deployments and disaster recovery.
- Ensure high availability through proper monitoring, incident response, and post-mortem analysis.
- Optimize resource utilization for GPU workloads and cost-efficient scaling strategies.