Posted at: 30 April
Distillation Lead
Company
Waabi is a Toronto-based B2B technology company specializing in autonomous driving systems for long-haul trucking, leveraging AI and simulation to enhance logistics operations.
Remote Hiring Policy:
Waabi supports remote work primarily within the United States and Canada, offering a mix of remote, hybrid, and on-site roles across various locations including Toronto, San Francisco, Dallas, and Pittsburgh.
Job Type
Full-time
Allowed Applicant Locations
United States, Canada
Salary
$195,000 to $286,000 per year
Job Description
You will…
- Define and drive the technical strategy for model distillation and compression across Waabi's AI stack — spanning perception, world models, and planning — with an eye toward both onboard deployment and simulation use-cases.
- Design, implement, and scale state-of-the-art distillation and efficiency pipelines, which may include:
-
Distillation for generative models (diffusion, autoregressive, flow-matching, video models)
-
Quantization-aware training (QAT) and post-training quantization (PTQ)
-
Knowledge distillation (feature-level, response-based, and relation-based)
-
Structured and unstructured pruning and sparsification
-
Low-rank factorization and efficient architecture design
-
Speculative decoding and other inference-time efficiency techniques
- Collaborate closely with ML Platform, Infrastructure, Onboard, Autonomy, and Simulation teams to integrate compressed models into production pipelines and meet latency, memory, and throughput targets across deployment contexts.
- Define rigorous benchmarks and evaluation frameworks to characterize efficiency vs. quality trade-offs across models and hardware targets.
- Mentor and guide researchers and engineers working in the distillation and model efficiency space, setting a high technical bar and fostering a culture of rigorous experimentation.
- Champion best practices for model compression across the organization; disseminate knowledge through internal design reviews, documentation, and technical talks.
- Stay at the cutting edge of model efficiency research; contribute to the broader scientific community through publications and open-source contributions.
Qualifications:
- Deep distillation expertise: You have extensive hands-on experience designing and implementing distillation, quantization, pruning, and model compression techniques for large-scale neural networks, with demonstrated impact in production settings.
- Strong research and engineering foundation: A Bachelor's or Master's degree in Machine Learning, Computer Vision, Robotics, or a related field, or equivalent industry experience; relevant hands-on experience in model distillation and efficiency is what matters most. Expert Python and PyTorch (or JAX) skills with experience in large-scale distributed training.
- Technical leadership: You have a proven track record of setting technical direction and driving projects from conception to production. You inspire and elevate those around you through deep technical expertise and mentorship.
- Cross-functional collaboration: You have experience working closely with infrastructure, platform, and autonomy teams to deploy compressed models under real engineering constraints.
- Clear communicator: You can communicate complex technical trade-offs clearly to diverse audiences and drive alignment across research and engineering teams.
Bonus:
- Experience with hardware-aware optimization (TensorRT, ONNX, custom CUDA kernels, hardware-specific quantization).
- Publications at top-tier ML/CV venues (NeurIPS, ICML, CVPR, ICLR, ECCV) in model compression, efficient deep learning, or related areas.
- Experience distilling large generative models (diffusion models, LLMs, VLMs, or video models).
- Background in autonomous vehicles or robotics.