Site Reliability Engineer
We are seeking experienced Site Reliability Engineers who excel at ensuring the reliability and scalability of production systems, and possess extensive experience with monitoring and automation tools.
Responsibilities:
* Ensure the reliability, availability, and performance of production systems
* Design, implement, and maintain monitoring and alerting systems
* Automate repetitive tasks and processes
* Collaborate with development teams to improve system architecture and deployment processes
* Conduct root cause analysis of incidents and implement corrective measures
* Evaluate and integrate new technologies to enhance system reliability
Requirements:
* 5+ years as a Site Reliability Engineer or similar role
* Expertise in monitoring tools (Prometheus, Grafana)
* Proficient with cloud services (AWS, GCP, or Azure)
* Skilled in automation and configuration management (Ansible, Terraform)
* Experience with both Windows and Linux operating systems
* Strong understanding of networking and security best practices
About Us:
Reperio Human Capital acts as an Employment Agency and an Employment Business.