Back to Search Results

Get alerts for jobs like this Get jobs like this tweeted to you

Company: AMD

Location: Hyderabad, Telangana, India

Career Level: Mid-Senior Level

Industries: Technology, Software, IT, Electronics

Apply on company website View all jobs at this company

Description

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

MTS SOFTWARE SYSTEM DESIGN ENGINEER

THE ROLE:

We are looking for a Staff-level GPU Compute / AI Validation, Debug & Performance Engineer to lead validation, deep-debug, and performance optimization for next-generation GPU compute and AI platforms. This role requires strong expertise in GPU architecture, parallel computing, and AI workloads, along with the ability to drive cross-functional technical initiatives in a global MNC environment.

The ideal candidate will own complex validation areas, act as a technical authority for GPU compute/AI debug and performance, and influence architecture and design decisions through data-driven insights.

KEY RESPONSIBILITIES:

GPU Compute / AI Validation Leadership

Own end-to-end validation strategy for GPU compute and AI workloads (HPC, ML, DL).
Define validation scope, coverage, and success metrics for compute pipelines.
Lead post-silicon validation, silicon bring-up, and feature readiness for GPU compute.
Ensure functional correctness across drivers, firmware, runtime, and frameworks.

Advanced Debug & Root Cause Analysis

Act as debug lead for complex GPU compute/AI issues spanning HW, FW, drivers, runtimes, and OS.
Debug GPU hangs, page faults, ECC errors, memory corruption, and scheduler failures.
Analyze failures using GPU traces, register dumps, crash dumps, JTAG, logs, windbg, counters and using AMD different profiler/debugger tools.
Work directly with architecture, RTL, and design teams to influence fixes and mitigations.

Performance Analysis & Optimization

Lead performance characterization and optimization for AI and compute workloads.

Identify bottlenecks across compute units, memory bandwidth, cache, interconnect, and power.
Drive workload-aware optimizations for training and inference use cases.
Validate performance-per-watt and scalability against product and architectural goals.

Automation, Tools & Infrastructure

Architect and drive automation frameworks for compute/AI validation and performance.
Develop tooling using Python to improve efficiency and coverage.
Integrate tests into CI/CD pipelines and regression systems.

Enable data-driven decision making through dashboards and performance tracking.

Technical Leadership & Cross-Functional Influence

Drive cross-team alignment with architecture, RTL, firmware, driver, compiler, and AI software teams.
Influence architectural decisions through early validation and performance feedback.

Represent the team in global technical forums and design reviews.

REQUIRED QUALIFICATION:

Technical Expertise

8+ years of experience in GPU compute / AI validation, debug, or performance
Deep understanding of GPU architecture and parallel compute models
Strong experience with AI/ML and HPC workloads
Expertise in GPU drivers, runtimes, and system software (Linux and Windows)
Hands-on experience with GPU profiling and debug tools

Proficiency in Python, Groovy, Github, Linux, Window, CI/CD, Test Development and performance analysis

Leadership & Soft Skills

Proven technical leadership at Senior/Staff level
Ability to lead ambiguous, high-impact problem areas
Strong communication skills
Mentoring and design-review experience

PREFERRED EXPERIENCE:

Product development or systems engineering background with hardware platforms and their software & firmware ecosystems

Excellent verbal communication and written, presentation skills
Excellent interpersonal, organizational, analytical, planning, and technical leadership skills
Proven record of accomplishment in delivering large multi-functional product solutions
Experience working in a fast-paced matrixed technical organization and multi-site environment
Experience with ROCm, or similar compute stacks
Experience with compiler or runtime optimizations for AI workloads
Knowledge of power, thermal, and reliability (RAS) aspects of GPUs
Prior experience in leading GPU or AI accelerator products

ACADEMIC CREDENTIALS:

Bachelor's or Master's degree in Computer or Electrical Engineering or equivalent

#LI-NR1

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's “Responsible AI Policy” is available here.

This posting is for an existing vacancy.

Apply on company website

Staff AI/ML Validation Engineer Job Listing at AMD in Hyderabad, Telangana (Job ID 77871-en-us)

Description

Job Seekers

Staff AI/ML Validation Engineer Job Listing at AMD in Hyderabad, Telangana (Job ID 77871-en-us)

Description

Find Connections via Linkedin

General Tips

Asking for Help

Getting Introduced

Job Seekers