Job Description
An excellent opportunity exists for a Senior Site Reliability Engineer to work on-site in Cork with an innovative automation technology company.
Key Responsibilities:
* Conduct incident management and post-incident reviews,
Design and implement software to assist in operations and support,
Collaborate with developers to ensure designed solutions meet non-functional requirements such as availability, performance, security, and maintainability,
Improve internal and external processes and systems by moving from ad-hoc to infrastructure and configuration as code throughout the organization,
Implement monitoring and alerting throughout in-house and cloud-based systems,
Enhance system reliability throughout the organization and work with product teams to improve their products,
Provide support for production and in-house systems,
Define SLOs (Service Level Objectives) and SLAs (Service Level Agreements) for products,
Manage release budgets,
Oversee in-house development and CICD (Continuous Integration/Continuous Deployment) systems,
Ensure design, implementation, and maintainability of robust, scalable, high-quality software and systems within the SRE domain,
Take ownership of complex project tasks and contribute to technical decisions to ensure successful delivery,
Contribute to site architecture for all products,
Participate actively in the best practice SRE function striving for and achieving higher standards of individual and team performance,
Build relationships with external teams,
Drive and achieve knowledge sharing across all products,
Pursue continuous education and development of technical skills, and apply them to the domain,
Identify personal development opportunities, set goals, and demonstrate ability to deliver on them,
Mentor and train other engineers throughout the company, and drive company-wide improvement.
Essential Requirements:
* Bachelor's degree in a related field such as Computer Science, Computer Engineering, Electrical Engineering, or equivalent, and 5-7 years of professional development or operations experience,
2-3 years of experience in DevOps engineering or SRE roles,
2-3 years of proven experience in cloud computing,
Experience with VMWare ESXi,
Experience with AWS,
Knowledge of Configuration Management Technologies, such as Ansible, Puppet, Chef, or Salt,
Experience with Infrastructure as Code Technologies, such as Terraform and CloudFormation,
Experience with multiple operating systems, including Ubuntu, MacOS, and Windows,
Scripting skills in languages like Bash or Python,
CICD and build pipelines expertise in Jenkins and Gitlab,
Experience working with multiple teams to facilitate orderly project and release plans,
Experience in issue analysis in a cloud environment.