Our client is seeking a passionate and experienced Lead Site Reliability Engineer to join their dynamic Site Reliability Engineering group within Enterprise Infrastructure.
This role combines Operations Excellence with Development Experience, making it an ideal opportunity for those who thrive in this environment.
The position is based in Galway and offers amazing benefits and career progression opportunities.
As a Lead Site Reliability Engineer, you will lead efforts to define and execute a comprehensive reliability and observability strategy, ensuring our systems are always available for our customers.
You will also troubleshoot stack-wide engineering issues across hardware, software, network, applications, and cloud service providers.
Additionally, you will coach and mentor peer SREs and development teams on building highly available systems.
During major incidents, you will be an escalation point, taking hands-on responsibility to lead production bridges across teams.
After incidents, you will conduct thorough post-mortem reviews, focusing on deep technical root cause analysis, observability, and automation enhancements.
Requirements
* Bachelor's degree (or higher) in a technology-related field (e.g., Engineering, Computer Science)
* Master's degree a plus
* Extensive hands-on experience deploying and supporting highly distributed multi-tiered systems at scale
* Practical experience with Public Cloud platforms, preferably AWS or Azure
* Proficiency with EKS, AKS, or Rancher Kubernetes Service for container orchestration
* Experience with distributed architectures, including microservices, containerized services, and serverless architectures
* Strong hands-on Kubernetes skills
* Programming experience in compiled/OOP languages (e.g., C#, Java) and scripting languages (e.g., JavaScript/TypeScript, Python)
* Proven ability to maintain scalability and resiliency in complex environments
* Familiarity with modern monitoring tools (e.g., Datadog, Prometheus, Splunk)
This is a great opportunity to be part of a vibrant team that values collaboration and continuous improvement.
As a Lead Site Reliability Engineer, you will work in an environment where your contributions directly impact the reliability of critical systems.
Enjoy opportunities for professional growth and development in a supportive atmosphere.
If you're excited about driving reliability and resilience in high-scale environments while working alongside talented professionals, we want to hear from you.
What We Offer
* Amazing benefits
* Career progression opportunities
* A supportive environment for professional growth and development
* Collaborative team that values continuous improvement