Description
WHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.
THE ROLE:
The successful candidate will assume responsibility for silicon and system level electrical validation, maximizing performance/watt through power management feature tuning and optimization, power model correlation and prototyping activities of AMD Datacenter GPU products. You will interact with key System and Power Management Architects, Firmware teams, Power Modelling Leads, Board Design & Validation Engineers, Performance engineers, Customer Engineering teams as well as Product Definition team to achieve the desired product goals.
THE PERSON:
The ideal person is passionate about technology and building great products. You have strong Power fundamentals, understanding of Power Management features, hands on experience optimizing them, silicon validation experience, good understanding of PID controllers, PDN, Physical design, Semiconductor Process, Thermal/Power interactions, Power/Performance optimization & Power Models background. You have excellent communication skills, critical problem-solving skills, data analysis and visualization skills, able to multi-task and work with cross-domain, cross-functional teams to build state of the art HPC & AI products. Must be a self-starter, strong team collaborator and able to independently drive tasks to completion working with cross-functional teams. You are hands-on, willing to interact directly with the hardware, utilize scopes and probes to gather detailed electrical information. You have automation, scripting, data processing and analysis skills. Have leadership and mentorship skills to lead, train and groom college grads or junior engineers in the team.
KEY RESPONSIBILITIES:
- Execute Power Attainment test plans in post-silicon phases in support of Data Center GPU product roadmap optimizing for power, perf/watt and performance.
- Configure and setup ML/AI Datacenter GPU systems for data collection, experiments and test plan execution
- Utilize lab equipment such as oscilloscope, high speed probes, function generator and data acquisition equipment to gather required electrical characterization data for power and performance optimization.
- Actively participate in analysis of post silicon performance and power data collected to ensure integrity of results, provide summary and conclusions of results, drive productization of features
- Analyze and debug interactions between various power management features
- Analyzing data from workload or execution output datalogs using excel or JMP analysis tools manually or developed automation
- Execute ROI analysis of power management features and provide feedback to power management architecture team.
- Support prototyping experiments and development of new GPU features that impact performance and power
- Electrically stress the system, validate the limits of ASIC and system/board components and optimize settings for stability and performance.
- Troubleshoot system-level issues that may occur in test environments and platforms
- Proactively drive continuous improvement for post-silicon power attainment activities
- Participate in development of automation environment in developing scripts automating workloads, enhancing capabilities of execution capabilities in Linux, Python and other support software support tools
- Work in a fast-paced resource constrained environment to build top of the line HPC & AI GPU products
- Provide Technical leadership for electrical validation and power optimization in datacenter platforms.
- Be part of team building, develop and mentor junior engineers into technical leads of future
- Drive process efficiencies, automation and AI for debug and analysis.
- Provide weekly readouts to executives on progress, blockers and next steps.
- Work with Rack & Cluster teams to develop and execute E2E electrical validation test plan, build electrically robust, reliant, stable and performant systems.
- Debug customer issues, collaborate with L1, L2 support and customers to design DOEs to isolate the problem and provide a fix.
PREFERRED EXPERIENCE:
- 7 years of hands-on experience as an engineer in semiconductor industry.
- Demonstrated ability to execute and deliver multiple projects in a timely fashion.
- Prioritizing work items in a fast-paced environment and escalating as necessary.
- Excellent grasp of computer organization/architecture, GPU architecture and power management
- Knowledge in power limited performance methodologies and control theory
- Extensive experience in platform optimization. Solid knowledge of Computer I/O.
- Experience with tools for power and performance analysis
- Strong programming skills, scripting experience in Python preferred
- Familiarity with HPC/AI applications, benchmarks would be a big plus.
- Desirable to be proficient in Linux command line environment and Shell scripting
- Deep knowledge of power management techniques like deep sleep, clock gating, pstates etc
- Experience with container technologies (ex. Docker)
- Strong analytical and problem-solving skills with a key attention to detail
- Experience in data analysis, summarization, and presentation
- Excellent presentation and communication skills
- Experience in use and debug of lab tools such as oscilloscopes, DAQs, power measurement capabilities
- Experience working in Windows and Linux environments
- Experience working in data center environments, knowledge of boards, systems, racks, clusters and building large electrically stable systems.
ACADEMIC CREDENTIALS:
Bachelors in Computer Engineering, Electrical Engineering, or Computer Science. MS Preferred.
LOCATION:
Markham, ON
#LI-SL2
#LI-HYBRID
Benefits offered are described: AMD benefits at a glance.
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.
AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's “Responsible AI Policy” is available here.
This posting is for an existing vacancy.
Apply on company website