Search for More Jobs
Get alerts for jobs like this Get jobs like this tweeted to you
Company: AMD
Location: Beijing, China
Career Level: Mid-Senior Level
Industries: Technology, Software, IT, Electronics

Description



WHAT YOU DO AT AMD CHANGES EVERYTHING 

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond.  Together, we advance your career.  



THE ROLE: 

AMD is seeking a Staff Software Development Engineer to serve as a compiler-flow architect and technical leader for a GPU-focused AI compiler stack on AMD GPUs and related heterogeneous accelerators. You will own major components spanning front-end IR construction, mid-end optimizations, backend lowering, and code generation, and you will work with partner teams to deliver end-to-end solutions—from model import and graph lowering to device-level execution—while influencing long-term compiler and GPU software strategy for next-generation AI hardware ecosystems.

 

THE PERSON: 

You combine deep hands-on compiler engineering (MLIR / LLVM / Clang, GPU execution models, AI workloads) with sound architectural judgment and technical leadership. You communicate with exceptional clarity across hardware architecture, runtime, quantization, algorithm, framework, and systems organizations; you mentor engineers, drive design and code quality, and ground decisions in benchmarks, profilers, and correctness analysis.

 

KEY RESPONSIBILITIES: 

  • Own the architecture and design of major components of a GPU-focused AI compiler stack, including front-end IR construction, mid-end optimizations, backend lowering, and code generation for AMD GPUs and heterogeneous accelerators—aligned with AMD's ROCm™ and broader GPU software directions.

  • Design and implement MLIR- and LLVM-based compiler passes: IR transformations, dialect design where applicable, optimizations, scheduling and tiling strategies, and end-to-end lowering pipelines targeting GPU hardware.

  • Deliver high-performance compilation flows for AI models, kernels, and operators, optimizing execution on modern GPUs; apply a data-minded approach—benchmark, profile, and tune critical workloads and investigate correctness and performance regressions across graph, IR, and kernel levels.

  • Build and optimize backend CodeGen using LLVM, Clang, and modern C++ toolchains (MSVC / GCC / Clang), targeting GPU runtimes and device execution environments relevant to AMD's stack.

  • Collaborate closely with AI framework, runtime, and systems teams on end-to-end GPU compiler solutions—from model import and graph lowering to device execution; partner with GPU / hardware architecture on feature enablement, performance headroom, and hardware–software co-planning.

  • Partner with quantization teams on numerics, precision modes (e.g., PTQ / QAT and related flows), and representation of quantized operations through the compiler stack where applicable.

  • Partner with algorithm and framework stakeholders on operator coverage, fusion opportunities, autotuning / scheduling trade-offs, and model-driven performance goals.

  • Analyze GPU performance bottlenecks and implement advanced optimizations across graph-level, IR-level, and kernel-level transformations (including vectorization and memory-hierarchy aware strategies where appropriate).

  • Provide technical leadership: mentor junior and senior engineers, lead design and code reviews, and help establish best practices for compiler and GPU performance engineering; lead multi-engineer or cross-team initiatives as needed.

  • Influence long-term compiler architecture and GPU software stack strategy to support next-generation AI hardware ecosystems; participate in bring-up and production issue resolution spanning compiler, runtime, and driver boundaries; improve tools, CI, tests, and workflows for scalable development.

 

PREFERRED EXPERIENCE: 

  • 5+ years of professional experience in compiler development, compiler infrastructure, or AI systems software at a depth appropriate for SMTS level scope and ownership.
  • Strong expertise in LLVM, MLIR, Clang, or comparable compiler frameworks used in production or research-at-scale settings.
  • Deep understanding of IR design, IR analysis and optimization, IR transformations, lowering, and GPU-oriented CodeGen.
  • Strong programming skills in C++ (e.g., C++17/20 style and practices) and experience building large-scale, high-performance systems.
  • Familiarity with compiler toolchains and build systems such as MSVC, GCC, Clang, and CMake.
  • Solid foundations in computer architecture, including instruction sets, register allocation, control- and data-flow analysis, and SSA-style representations and transformations.
  • Proactive collaborator with strong communication and technical leadership—including influence across teams without relying solely on authority.

  • Ability to work cross-team in a fast-moving, deeply technical environment (hardware, runtime, quantization, algorithms, frameworks, product).

  • Strong problem-solving, ownership, and accountability; comfort with ambiguity and data-driven prioritization.

  • Hands-on experience with GPU programming models and toolchains—strongly preferred: AMD ROCm™ with compiler-oriented work (HIP, hipcc, Clang GPU offload, LLVM AMDGPU or closely related codegen); preferred: NVIDIA CUDA compiler workflows (NVCC or Clang CUDA, PTX, NVVM / libNVVM, LLVM NVPTX). Also valued: Vulkan, OpenCL, SYCL, or custom accelerator / NPU stacks where relevant to compiler integration.

  • Experience integrating with AI frameworks and ecosystems such as PyTorch, TensorFlow, JAX, ONNX, TVM, XLA / OpenXLA / StableHLO, or Triton.

  • Windows development or shipping experience is strongly preferred for roles touching cross-platform toolchains and customer environments.

  • Practical experience building AI compiler solutions, including: MLIR dialect design; kernel fusion and quantization pipelines; kernel autotuning and scheduling; tiling, vectorization, and GPU memory-hierarchy optimization.

  • Demonstrated ability to lead technical architecture development and influence long-term compiler and runtime strategy; strong track record of production impact in compiler projects or large-scale engineering systems.

  • Familiarity with git, CI, debuggers, and profilers; conda or Docker is a plus.

 

ACADEMIC CREDENTIALS: 

  • Bachelor's or Master's degree (or PhD) in Computer Science, Software Engineering, Electrical Engineering, or a related field—or equivalent depth of education and experience.

#LI-JW2



Benefits offered are described:  AMD benefits at a glance.

 

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law.   We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

 

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position.  AMD's “Responsible AI Policy” is available here.

 

This posting is for an existing vacancy.


 Apply on company website