Back to Search Results

Get alerts for jobs like this Get jobs like this tweeted to you

Company: AMD

Location: Bengaluru, KA, India

Career Level: Mid-Senior Level

Industries: Technology, Software, IT, Electronics

Apply on company website View all jobs at this company

Description

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

About the Role

The HIP Runtime team builds the foundational GPU runtime layer that powers ROCm-enabled applications, providing portability, performance, and reliability across heterogeneous compute platforms.
We are looking for an experienced SDET Manager who can lead a technically strong team in designing and executing testing strategies for the HIP runtime including key areas such as device libraries, GPU driver interfaces, memory management paths, and performance-critical GPU workflows.

You will own quality for a core component of the ROCm software ecosystem and drive end‑to‑end validation—from API behaviors and compatibility, to concurrency, performance, and multi‑GPU correctness.

Key Responsibilities Leadership & Team Development

Lead, mentor, and grow a team of SDETs focused on GPU runtime testing.
Develop an automation-first culture that prioritizes reliability, observability, and maintainability.
Establish engineering processes, KPIs, and technical growth paths for team members.

Quality Strategy for HIP Runtime

Define the long-term quality roadmap for HIP runtime APIs, GPU memory management, synchronization primitives, multi-GPU flows, and runtime-driver interactions.
Ensure deep validation in areas such as:
- Kernel launch correctness and performance
- Stream/graph execution
- Asynchronous dispatch semantics
- Peer-to-peer transfers and unified memory paths
- Runtime error-handling, stability, and stress conditions
Build end-to-end test strategies that cover regression, performance, compatibility, and stress testing.

Technical Execution

Oversee development of system-level automation frameworks for HIP runtime APIs.
Drive creation of GPU-aware test harnesses, device simulators/mocks, and diagnostic tooling.
Ensure robust CI pipelines with GPU hardware integration, multi-GPU orchestration, and reproducible workflows.
Collaborate with GPU driver, compiler, and kernel teams to debug issues spanning multiple layers.

Cross-Functional Collaboration

Work closely with HIP Runtime developers, Compiler (LLVM/HIP-Clang), Driver/ROCT, and ROCm Systems teams.
Partner with architecture teams to validate new GPU ISA capabilities and platform bring-up.
Collaborate with DevOps/SRE teams to manage GPU farm test infrastructure and reliability.

Process & Governance

Build dashboards and metrics for:
- API coverage
- Runtime stability trends
- Multi-GPU stress reliability
- Performance regressions
Define quality criteria for feature readiness, release gating, and long-term branch maintenance.
Drive continuous improvement in release quality, defect prevention, and test infrastructure efficiency.

Required Qualifications

Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field.
8+ years of experience in software testing, systems programming, or GPU-related development.
2–4+ years leading engineering or SDET teams.
Strong hands-on experience in C++ and Python-based automation.
Deep understanding of:
- GPU programming (HIP, CUDA, or OpenCL)
- Runtime systems or driver-level testing
- Memory models, concurrency, multithreading, and synchronization
Experience validating distributed or heterogeneous compute systems.
Strong background in designing automated test frameworks for low-level or performance-critical software.

Preferred Qualifications

Experience with ROCm stack, HIP runtime, CUDA runtime APIs, or GPU compute frameworks.
Knowledge of GPU drivers, device compiler toolchains, or kernel dispatch mechanisms.
Experience with GPU profiling tools and performance analysis workflows.
Prior work on large-scale GPU automation farms or device-lab management.
Familiarity with CMake, Clang/LLVM, and systems-level debugging (gdb, perf, sanitizer tools).

Soft Skills

Strong problem-solving mindset with ability to reason about complex systems.
Effective communication across engineering, program management, and architecture groups.
Ability to lead through influence in a multi-team, multi-layer runtime environment.
Passion for quality, performance, and low-level systems engineering.

Why Join the HIP Runtime Team?

Direct impact on the reliability and performance of next-generation GPU computing.
Work with world-class engineers building foundational technologies for AI, HPC, ML, graphics, and scientific computing.
Opportunity to shape the quality strategy for a key component of the ROCm platform.
Solve technically challenging problems that span multiple layers of the GPU software stack.

#LI-NR1

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's “Responsible AI Policy” is available here.

This posting is for an existing vacancy.

Apply on company website

Manager Software Quality Engineering - SDET Job Listing at AMD in Bengaluru, KA (Job ID 79806-en-us)

Description

Job Seekers

Manager Software Quality Engineering - SDET Job Listing at AMD in Bengaluru, KA (Job ID 79806-en-us)

Description

Find Connections via Linkedin

General Tips

Asking for Help

Getting Introduced

Job Seekers