Direct message the job poster from Solas IT Recruitment
Senior IT Recruitment Consultant @ Solas IT Recruitment | ERF CertRP
My Client is an innovative and rapidly growing SaaS company that delivers cutting-edge solutions and are seeking a talented and driven Site Reliability Engineer (SRE) to join our growing engineering team. This is an exciting opportunity to be part of a high-impact team focused on ensuring the availability, scalability, and performance of our platform in a fast-paced and dynamic environment.
Key Responsibilities:
1. System Reliability: Ensure the reliability, availability, and performance of our SaaS platform by developing and maintaining automated monitoring, alerting, and incident response systems.
2. Automation & Tooling: Automate manual processes and optimize operational workflows to reduce overhead and improve efficiency. Build tools to manage infrastructure at scale.
3. Capacity Planning & Scaling: Plan and execute scaling strategies, ensuring that infrastructure can handle growth and demand spikes without impacting user experience.
4. Incident Management: Lead the response to incidents, perform root cause analysis (RCA), and put in place preventive measures to reduce recurring issues.
5. Collaboration: Work closely with Development, QA, and Operations teams to build processes and solutions that optimize the balance between development velocity and system reliability.
6. Continuous Improvement: Help drive the adoption of best practices across engineering teams, improve our deployment pipelines, and ensure systems are secure, highly available, and well-documented.
7. Performance Optimization: Monitor and optimize system performance, identify bottlenecks, and implement effective solutions.
Requirements:
1. 3+ years of experience in a Site Reliability Engineering, DevOps, or similar role in a SaaS environment or large-scale distributed systems.
2. Proficiency in cloud platforms such as AWS, Azure, or GCP.
3. Strong experience with containerization technologies (Docker, Kubernetes).
4. Proficient in scripting and automation (e.g., Python, Bash, Go).
5. Familiarity with CI/CD pipelines and related tools (e.g., Jenkins, GitLab, CircleCI).
6. Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, ELK stack).
7. Knowledge of infrastructure-as-code tools (e.g., Terraform, CloudFormation).
8. Familiarity with incident response, postmortems, and continuous improvement processes.
9. Strong troubleshooting and problem-solving skills in distributed systems.
10. Excellent communication skills with the ability to work cross-functionally with product, engineering, and operations teams.
11. A degree in Computer Science, Engineering, or a related field is preferred, though relevant experience is valued.
Nice to Have:
1. Experience with service mesh technologies (e.g., Istio).
2. Background in microservices architecture and its challenges.
3. Familiarity with security best practices in cloud-based systems.
Seniority level
Mid-Senior level
Employment type
Full-time
Job function
Information Technology
Industries
Staffing and Recruiting
#J-18808-Ljbffr