Posted at: 31 March

Senior Dataflow Development Engineer - LPU

Company

CompanyNVIDIA

NVIDIA Corporation is a Santa Clara-based technology company specializing in designing GPUs and AI solutions for gaming, professional visualization, and cloud services, operating in both B2B and B2C markets globally.

Remote Hiring Policy:

NVIDIA supports flexible remote work arrangements and hires from various regions globally, including the Americas, Europe, Asia, and the Middle East, with roles that may require collaboration across time zones.

Job Type

Full-time

Allowed Applicant Locations

North America

Salary

$196,000 to $368,000 per year

Job Description

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.We are looking for a Dataflow Development Engineer to join our team and develop, build, and improve dataflow systems at the hardware–software boundary. You will work on defining the interactions between the runtime and accelerator, implementing and tuning dataflow pipelines, creating host-side drivers and runtimes that collaborate with programmable logic, and jointly inventing hardware and software for deterministic, low-latency execution. You will implement dataflow graphs and streaming pipelines in hardware. You will build efficient host–device interfaces (PCIe, DMA, VFIO) and collaborate with compiler and architecture teams to map high-level dataflow onto FPGA and accelerator fabrics. Your work directly affects latency, efficiency, and resource usage for inference at scale. The ideal candidate has a proven hardware approach, including experience with FPGA development, HDL, or hardware/software co-design. They can analyze timing, resource usage, and data movement. We seek engineers comfortable working from RTL to runtime. They consider pipelines and hardware performance and enjoy implementing dataflow architectures in silicon and programmable logic.What you'll be doing:Build and implement dataflow pipelines and streaming architectures.Develop host-side software, drivers, and runtimes that collaborate with our accelerator hardware (e.g. PCIe, DMA, VFIO || FPGA/LPU/GPU).Partner with compiler and hardware groups to allocate dataflow graphs onto hardware resources; improve latency, processing efficiency, and area/utilization.Build and maintain hardware–software co-design flows: from high-level dataflow specs to synthesis, place-and-route, and validation.Build tooling and methodologies for debugging, profiling, and validating dataflow behavior in hardware; participate in design reviews and cross-team alignment across EMEA and globally.What we need to see:BS or higher degree or equivalent experience in CS/EE/CE with more than 12 years in FPGA development, or hardware dataflow, or hardware/software co-design.Hands-on experience with RTL/HDL (Verilog, VHDL) or high-level synthesis (HLS); ability to build and debug dataflow-style pipelines in hardware.Strong programming abilities in C/C++ for host drivers, runtimes, or tooling; familiarity with hardware interfaces (e.g. PCIe, DMA, memory-mapped I/O).Proven understanding of dataflow and streaming concepts: pipelining, backpressure, buffering, and resource/area trade-offs.Excellent communication in English; ability to work with distributed teams.Ways to stand out from the crowd:Experience working with FPGA dataflow for machine learning inference, networking, or high-throughput streaming (e.g. Xilinx/AMD, Intel FPGA).Familiarity with FPGA toolchains (synthesis, P&R, timing closure) and with Linux, scripting, and version control.VFIO, SR-IOV, or other pass through/virtualization for accelerators; low-level driver or BSP development.ASIC or custom-silicon dataflow build; RTL developed for dataflow or network-on-chip (NoC).Background in compiler backends; MLIR or IR-level optimization for hardware mapping as well as experience with multi-FPGA or FPGA–GPU systems; distributed dataflow across programmable logic and accelerators.Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/ #LI-HybridYour base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 196,000 USD - 310,500 USD for Level 5, and 232,000 USD - 368,000 USD for Level 6.You will also be eligible for equity and benefits.Applications for this job will be accepted at least until April 4, 2026.This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.