Company:
QT Technologies Ireland Limited
Job Area:
Information Technology Group, Information Technology Group > IT Engineering
General Summary:
About The Role
Qualcomm offers flexible work options tailored to our employee’s needs. These include a combination of work from home and working in our brand new, state of the art office in Penrose Dock, Cork.
Well-being and life balance are fundamental to Qualcomm as an employer. We recognise and understand that employees have missed spending quality time with loved ones and extended family.
As such, Cork Qualcomm policy allows our employees to blend short-term remote working with annual leave.
We are seeking a highly skilled Technical Support Engineer specializing in Machine Learning (ML) operations, Kubernetes, container technologies, and Run:AI. In this role, you will be responsible for providing technical and operational support for customers leveraging GPU computing platforms to optimize and manage AI/ML workloads, particularly in Kubernetes-based environments. The ideal candidate will have deep expertise in Kubernetes orchestration and GPU management, as well as a solid understanding of how these address AI/ML operations at scale.
Key Responsibilities
1. Kubernetes Orchestration & Resource Management: Serve as the subject matter expert for Kubernetes and container orchestration. Guide customers through the design and deployment of Kubernetes clusters tailored for AI/ML use cases, helping them effectively manage workloads through Run:AI. Ensure optimal resource allocation, including GPU sharing, node management, and job scheduling across clusters.
2. Cluster Monitoring & Optimization: Monitor and tune Kubernetes clusters to ensure they are optimized for AI/ML workloads. Provide support on managing Kubernetes autoscaling, resource quotas, and performance monitoring of distributed ML models running on Kubernetes clusters via the Run:AI platform.
3. GPU Troubleshooting and Incident Response: Diagnose and resolve complex issues regarding dependencies between GPU drivers and software, Nvidia toolkit errors, or GPU component failure.
4. Run:AI Platform Support: Provide expert support for the Run:AI platform, assisting customers with the deployment, configuration, and management of Kubernetes clusters that handle AI/ML workloads.
5. Workload Optimization on Kubernetes: Assist customers in optimizing dynamic resource allocation for their AI/ML workloads by utilizing the Run:AI scheduler in conjunction with Kubernetes's native tools.
6. Kubernetes Troubleshooting & Incident Response: Diagnose and resolve complex issues related to Kubernetes cluster management, ensuring smooth operation across the entire Kubernetes environment.
7. Integration Support: Help customers integrate Run:AI into their existing Kubernetes-based ML infrastructure.
8. Security and Best Practices in Kubernetes: Advise customers on security best practices for Kubernetes clusters handling sensitive ML workloads.
9. Collaboration with HQ: Work closely with the engineering and product teams in HQ, providing feedback on Kubernetes-related issues.
10. Training & Documentation: Develop training materials and deliver technical workshops on using Run:AI in Kubernetes environments.
Minimum Qualifications:
• 4+ years of IT-related work experience with a Bachelor's degree.
OR
7+ years of IT-related work experience without a Bachelor’s degree.
Physical Requirements:
• Frequently transports and installs equipment up to 20 lbs.
Requirements
* 3+ years of experience in technical support roles with strong expertise in Kubernetes administration, container orchestration, and AI/ML workload management.
* 1+ year of general GPU administration.
* In-depth knowledge of Kubernetes (CKA or CKAD certification highly preferred).
* Proficiency in Kubernetes resource management.
* Experience with configuration management tools (Puppet, Chef, Ansible) and Kubernetes management platforms like Rancher a plus.
* Experience with Run:AI platform or similar tools for ML workload optimization.
* Hands-on experience with Docker and containerized environments for AI/ML operations.
* Strong understanding of ML frameworks (e.g., TensorFlow, PyTorch).
* Excellent analytical, communication, and problem-solving skills.
* Ability to manage priorities in a fast-paced environment and collaborate within a matrix organization.
Where you will be working
Cork has a proud reputation as Ireland's second largest economic engine and is now one of the Top 20 location choices in Europe.
Equal Opportunities
We are an Equal Opportunity employer; all qualified applicants will receive consideration for employment without regard to race, colour, religion, sexual orientation, gender identity, national origin, disability, veteran status, or any protected classification.
What's on Offer
* Salary, stock and performance related bonus
* Maternity/Paternity Leave
* Employee stock purchase scheme
* Matching pension scheme
* Education Assistance
* Relocation and immigration support (if needed)
* Life, Medical, Income and Travel Insurance
* Subsidised memberships for physical and mental well-being
* Bicycle purchase scheme
* Employee run clubs, including, running, football, chess, badminton + many more
*References to a particular number of years experience are for indicative purposes only.
If you would like more information about this role, please contact Qualcomm Careers.
#J-18808-Ljbffr