About The Role
We are seeking a highly skilled Technical Support Engineer specializing in Machine Learning (ML) operations, Kubernetes, container technologies, and Run:AI. In this role, you will be responsible for providing technical and operational support for customers leveraging GPU computing platforms to optimize and manage AI/ML workloads, particularly in Kubernetes-based environments.
Key Responsibilities
* Kubernetes Orchestration & Resource Management:
o Serve as the subject matter expert for Kubernetes and container orchestration.
o Guide customers through the design and deployment of Kubernetes clusters tailored for AI/ML use cases.
* Cluster Monitoring & Optimization:
o Monitor and tune Kubernetes clusters to ensure they are optimized for AI/ML workloads.
* GPU Troubleshooting and Incident Response:
o Diagnose and resolve complex issues regarding dependencies between GPU drivers and software.
* Run:AI Platform Support:
o Provide expert support for the Run:AI platform, assisting customers with the deployment, configuration, and management of Kubernetes clusters.
* Workload Optimization on Kubernetes:
o Assist customers in optimizing dynamic resource allocation for their AI/ML workloads.
* Kubernetes Troubleshooting & Incident Response:
o Diagnose and resolve complex issues related to Kubernetes cluster management.
* Integration Support:
o Help customers integrate Run:AI into their existing Kubernetes-based ML infrastructure.
* Security and Best Practices in Kubernetes:
o Advise customers on security best practices for Kubernetes clusters.
* Collaboration with HQ:
o Work closely with the engineering and product teams in HQ.
* Training & Documentation:
o Develop training materials and deliver technical workshops.
Minimum Qualifications
* 4+ years of IT-related work experience with a Bachelor's degree.
* OR
* 7+ years of IT-related work experience without a Bachelor's degree.
Requirements
* 3+ years of experience in technical support roles with strong expertise in Kubernetes administration.
* 1+ year of general GPU administration.
* In-depth knowledge of Kubernetes (CKA or CKAD certification highly preferred).
* Proficiency in Kubernetes resource management.
* Experience with configuration management tools.
* Experience with Run:AI platform or similar tools.
* Hands-on experience with Docker and containerized environments.
* Strong understanding of ML frameworks.
* Excellent analytical, communication, and problem-solving skills.
* Ability to manage priorities in a fast-paced environment.
What We Offer
* Salary, stock and performance related bonus
* Maternity/Paternity Leave
* Employee stock purchase scheme
* Matching pension scheme
* Education Assistance
* Relocation and immigration support (if needed)
* Life, Medical, Income and Travel Insurance
* Subsidised memberships for physical and mental well-being
* Bicycle purchase scheme
* Employee run clubs