Technical Support Engineer - Machine Learning Operations
We are seeking a highly skilled Technical Support Engineer specializing in Machine Learning (ML) operations, Kubernetes, container technologies, and Run:AI.
About the Role
This role provides technical and operational support for customers leveraging GPU computing platforms to optimize and manage AI/ML workloads, particularly in Kubernetes-based environments.
Key Responsibilities:
* Kubernetes Orchestration & Resource Management: Serve as the subject matter expert for Kubernetes and container orchestration. Guide customers through the design and deployment of Kubernetes clusters tailored for AI/ML use cases.
* Cluster Monitoring & Optimization: Monitor and tune Kubernetes clusters to ensure they are optimized for AI/ML workloads.
* GPU Troubleshooting and Incident Response: Diagnose and resolve complex issues regarding dependencies between GPU drivers and software.
* Run:AI Platform Support: Provide expert support for the Run:AI platform, assisting customers with the deployment, configuration, and management of Kubernetes clusters.
* Workload Optimization on Kubernetes: Assist customers in optimizing dynamic resource allocation for their AI/ML workloads.
* Kubernetes Troubleshooting & Incident Response: Diagnose and resolve complex issues related to Kubernetes cluster management.
* Integration Support: Help customers integrate Run:AI into their existing Kubernetes-based ML infrastructure.
* Security and Best Practices in Kubernetes: Advise customers on security best practices for Kubernetes clusters.
* Collaboration with HQ: Work closely with the engineering and product teams in HQ.
* Training & Documentation: Develop training materials and deliver technical workshops.
Requirements:
* 4+ years of IT-related work experience with a Bachelor's degree: OR 7+ years of IT-related work experience without a Bachelor's degree.
* 3+ years of experience in technical support roles with strong expertise in Kubernetes administration.
* 1+ year of general GPU administration.
* In-depth knowledge of Kubernetes (CKA or CKAD certification highly preferred).
* Proficiency in Kubernetes resource management.
* Experience with configuration management tools.
* Experience with Run:AI platform or similar tools.
* Hands-on experience with Docker and containerized environments.
* Strong understanding of ML frameworks.
* Excellent analytical, communication, and problem-solving skills.
* Ability to manage priorities in a fast-paced environment.
What We Offer:
* Salary: €100,000 - €120,000 per annum.
* Stock and Performance Related Bonus: Eligible employees receive stock and performance-related bonuses based on individual and company performance.
* Maternity/Paternity Leave: Competitive maternity/paternity leave policy.
* Employee Stock Purchase Scheme: Opportunity to purchase company shares at a discounted rate.
* Matching Pension Scheme: Company matches employee pension contributions up to a certain percentage.
* Education Assistance: Financial support for education-related expenses.
* Relocation and Immigration Support: Assistance with relocation and immigration processes.
* Life, Medical, Income, and Travel Insurance: Comprehensive insurance package.
* Subsidised Memberships for Physical and Mental Well-being: Subsidized memberships for physical and mental well-being activities.
* Bicycle Purchase Scheme: Opportunity to purchase bicycles at a discounted rate.
* Employee-Run Clubs: Opportunities to participate in employee-run clubs and activities.