Site Reliability Engineer
Solas IT Recruitment, Dublin, County Dublin, Ireland (Remote)
We're seeking an experienced Site Reliability Engineer (SRE) with a DevSecOps background to join our growing team. You’ll ensure high availability and performance of internal and external services, while working with real-time data from large-scale distributed systems. You’ll tackle challenges in building fault-tolerant, secure, microservice-based systems.
Key Responsibilities:
* Analyze system metrics to improve performance and fault detection.
* Collaborate with engineering teams to improve service through testing and automation.
* Ensure reliability and minimal downtime by balancing feature speed and system stability.
* Implement best practices for security, compliance, and availability.
* Plan and execute system upgrades.
* Mentor fellow engineers and participate in on-call rotation.
Qualifications:
* Kubernetes: Expertise in managing and troubleshooting production clusters. Experience with Amazon EKS is a plus.
* Configuration Management: Skilled with tools like Ansible, Helm, and Kustomize.
* Monitoring: Familiar with Prometheus, Grafana, and similar tools.
* AWS: Strong knowledge of AWS services (EC2, S3, VPC, etc.).
* Infrastructure as Code (IaC): Experience with Terraform for cloud resource management.
* Queuing Systems: Experience with RabbitMQ, Kafka, or AmazonMQ.
* Database Management: Experience with MySQL and Amazon RDS.
* Networking & Security: Knowledge of network design and security protocols.
* High-Uptime Systems: Expertise in maintaining high-availability environments.
* Collaboration: Ability to work across departments to meet project goals.
* Programming: Proficient in Python, Go, or JavaScript. Familiar with CI/CD pipelines.
* Problem-Solving: Skilled in identifying and fixing performance issues.
#J-18808-Ljbffr