Back to Search Results
Get alerts for jobs like this Get jobs like this tweeted to you
Company: AMD
Location: BG, Serbia
Career Level: Mid-Senior Level
Industries: Technology, Software, IT, Electronics

Description



WHAT YOU DO AT AMD CHANGES EVERYTHING 

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond.  Together, we advance your career.  



Role Summary

Develop, maintain, and evolve test content and validation frameworks for AMD's large-scale GPU platforms. Evangelize best practices and guidance for a broad contributor community spanning software, firmware, and hardware teams. Enable scalable automation across on‑prem and cloud/Kubernetes environments, and apply AI/ML to improve coverage, reduce flakiness, accelerate triage, and increase validation efficiency.

What you'll do

  • Design, implement, and maintain reusable test frameworks, libraries, and templates used across multiple GPU product lines.
  • Establish and maintain governance, standards, and guidance for test content structure, metadata/traceability, coding style, and reviews.
  • Author and maintain high‑quality test content; perform code reviews and mentoring to raise bar across the contributor community.
  • Build and operate CI/CD workflows, containerized runners, and artifact/result management; scale automation on on‑prem test farms and cloud/Kubernetes.
  • Evangelize best practices via documentation, playbooks, examples, trainings, and office hours to drive adoption and consistency.
  • Apply AI/ML to validation: log classification, failure clustering, intelligent test selection/prioritization, anomaly detection, and safe test‑case generation; evaluate models/datasets and integrate responsibly.
  • Improve observability (logging, metrics, tracing) and define SLIs/SLOs for validation platforms; drive down flake rate and runtime.
  • Partner with platform, firmware/BIOS, compiler, and driver teams to align validation content with roadmaps and release criteria.
  • Maintain clear documentation and change management for frameworks and standards.

Minimum Qualifications (Required)

  • BS in Computer Engineering or Computer Science (or related field); MS a plus.
  • Strong background in software/hardware testing and automation with broad knowledge of validation methodologies.
  • Demonstrated experience delivering and maintaining validation frameworks/solutions adopted by multiple teams.
  • Proficiency in Python and Bash; ability to read and review C/C++ test/driver code.
  • Comfortable using Linux and Windows via command line/shell for development and automation.
  • Experience deploying scalable automation in hybrid environments (on‑prem and cloud/Kubernetes); CI/CD (Jenkins, GitLab CI, or GitHub Actions).
  • Hands‑on with Git and Docker; strong debugging and scripting skills.
  • Solid understanding of computer architecture (CPU, GPU, memory, I/O, power).
  • Knowledge of firmware/BIOS and low‑level software interactions relevant to GPU/system validation.
  • Applied AI/ML experience in testing/validation (e.g., failure clustering, log analysis, test selection, anomaly detection).
  • Excellent communication, collaboration, and influence skills.

Preferred Qualifications

  • 3+ years in test automation, validation engineering, or systems/solutions engineering for complex HW/SW products.
  • Experience operating datacenter‑scale validation or cloud‑based test platforms (elastic/queued runners, multi‑tenant).
  • Kubernetes expertise (Helm, autoscaling, operators) and container registries.
  • Infrastructure as Code (Any: Terraform/Ansible/Packer/MAAS); virtualization (KVM, LXD).
  • Observability stacks (Prometheus/Grafana, ELK/OpenSearch, OpenTelemetry); results warehousing (SQL/NoSQL).
  • Familiarity with ML tooling (TensorFlow, PyTorch, scikit‑learn) and MLOps practices (data pipelines, model evaluation, experiment tracking such as MLflow).
  • Knowledge of cloud platforms (AWS/Azure/GCP) and networking concepts applicable to test farms.

Tools & Tech (not exhaustive)

Python • Bash • C/C++ • Git • Jenkins/GitLab CI/GitHub Actions • Docker • Kubernetes • Linux/Windows • TensorFlow/PyTorch • scikit‑learn • MLflow • Prometheus/Grafana • ELK/OpenSearch • SQL/NoSQL • OpenTelemetry • Terraform/Ansible/Packer/MAAS • KVM/LXD • AWS/Azure/GCP

 

 

#LI-DB1

#LI-HYBRID



Benefits offered are described:  AMD benefits at a glance.

 

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law.   We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.


 Apply on company website