Site Reliability Engineering Manager
We are building and supporting new and existing critical infrastructural systems and frameworks which provide and support services like structured and unstructured storage, caching, queueing, searching, and much more at hyperscale.
About the Role
This is a hands-on role, to establish SRE practices for a private cloud service, to accelerate our ability to reliably and consistently deliver thousands of applications.
About the Team
The Apple Services Engineering Cloud Services SRE organization is looking for a strong, hands-on leader. The leader will lead a platform focused SRE team, and be responsible for the reliability of the platform.
Responsibilities
* Act as the Service Owner, designing and mapping key performance indicators to achieve the organization's mission.
* Lead the definition of requirements, priorities, and planning of engineering deliverables.
* Implement structured engineering and operations processes.
* Lead the team in daily agile SRE practices, ensuring proper team focus on priorities, achievements, and deliverables.
* Optimise velocity and efficiency of delivery, and drive continuous improvement.
Requirements
* Experience in critical, large scale distributed systems experience, combining Hardware, Operating Systems and Software.
* Experience building and leading engineering teams; ideally SRE or Production Engineering.
* Strong emphasis on SRE as an engineering subject area, with proficiency in at least one of the following languages (Golang, Rust, Python, Swift).
* Understanding of SRE principles, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts, with a keen eye for opportunities to eliminate toil by code and process improvements.
* Bachelors or Masters in Computer Science, Computer Engineering, or equivalent experience.
Estimated salary: $250,000 - $350,000 per year.