Posted at: 26 April
Senior Tools Development Engineer
Company
NVIDIA Corporation is a Santa Clara-based technology company specializing in designing GPUs and AI solutions for gaming, professional visualization, and cloud services, operating in both B2B and B2C markets globally.
Remote Hiring Policy:
NVIDIA supports flexible remote work arrangements and hires from various regions globally, including the Americas, Europe, Asia, and the Middle East, with roles that may require collaboration across time zones.
Job Type
Full-time
Allowed Applicant Locations
India
Job Description
What does it look like to build infrastructure that thinks — that triages failures, files bugs, and surfaces root causes without waiting for a human to ask? What if the tools we build today become the foundation for how the whole industry does software quality tomorrow? We are building an engineering team where a small group of high-agency engineers, equipped with well-designed autonomous agents, can accomplish what previously required a much larger organisation. This role sits at the centre of that transformation. You will design and build the agentic infrastructure that powers our test automation and quality engineering workflows for the NVIDIA Omniverse platform. This isn’t about using AI tools to work faster — it’s about building the infrastructure that other engineers depend on to ship high-quality software with greater speed and confidence!What you’ll be doing:As a Senior Tools Development Engineer on our team, you will own end-to-end outcomes, work with significant autonomy, and have a direct impact on the reliability of one of NVIDIA's most strategic developer platforms. In this role you can expect to:Build Agentic Test PipelinesDevelop and deploy multi-agent systems for automated test generation, log analysis, failure triage, and bug-filing workflowsBuild and maintain agent orchestration frameworks using tools such as Claude Code, MCP servers, and agent SDK patternsCreate autonomous pipelines that reduce cognitive load on engineers by routing failures, surfacing root causes, and generating actionable bug reportsOwn Infrastructure QualityBuild evaluation systems to measure agent output quality — ensuring autonomous pipelines are reliable, not just fastEstablish observability and monitoring for agentic workflows so failures are transparent, debug-gable, and recoverableDrive Team AdoptionBuild internal tooling that is adoptable, not just technically impressive — with clear documentation and low onboarding frictionCollaborate with the broader QA team to identify automation opportunities and build the tools that accelerate themWhat we need to see:Core Technical SkillsStrong Python engineering — clean, testable, maintainable code with a systems-level perspectiveDeep familiarity with AI-native development workflows — Claude Code, Cursor, LLM APIs, prompt engineering in productionHands-on experience building multi-agent or autonomous systems that have shipped and run without continuous supervisionClear understanding of where LLMs fail — hallucination, context degradation, tool misuse — and experience building mitigations into system design, including evaluation frameworks for AI-generated outputsTest & Quality Engineering FoundationA graduate degree in Computer Science Engineering or equivalent5+ years in test automation, CI/CD pipeline design, or software quality engineering, including failure analysis and test triage at scaleAbility to reason about test coverage strategically across a complex, frequently-releasing platform SDKMindset & Working StyleHigh agency — owns outcomes end-to-end, defines their own path in ambiguous problem spacesThe patience and communication skill to build systems that colleagues can trust and adoptIntellectual honesty about where systems break, with a habit of building in recovery paths rather than hiding failures Ways to stand out from the crowd:Built and shipped MCP servers, custom tool integrations, or multi-agent orchestrations that extend LLM capabilities in production — with working examples to showDesigned evaluation harnesses or scoring systems that measure and enforce LLM output quality at scale, not just in prototypesYou've built agentic systems with graceful failure recovery — retry logic, fallback chains, human-in-the-loop escalation — rather than silent breakagesYou have experience with NVIDIA Omniverse, OpenUSD, or similarly complex platform SDKs, and can reason about test strategy across themYou can point to infrastructure you've shipped that measurably reduced a team's manual triage or debugging burden — with clear documentation that let others extend your work without you in the roomWith competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers. Due to outstanding growth, our elite engineering teams are rapidly growing. If you're creative with a real passion for technology, we want to hear from you. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform crucial job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.