About the RoleWe are seeking a Senior SRE with experience of working with scaled SaaS production infrastructure.
The successful candidate will work as part of a team focused on site reliability, security, and scalability, as we manage our rapid growth.
The ideal candidate will be a proactive and driven individual, who excels at understanding and working on complex technical solutions requiring performance and optimisation at scale.
Our core technologies include PHP, MySQL, and AWS.
Participating in an on-call roster is required as part of this role.
This is a hybrid role (2 days in the office).
#LI-HybridKey ResponsibilitiesAct as a senior member of the SRE team, supporting activities including the backlog and workload of the team, scoping requirements, peer review of code, providing feedback to the rest of the team.Represent the team in management and stakeholder meetings.
Ensure best practices are kept, and suggest improvements to our development processes where you see gaps.Investigate, test, and resolve technical problems, working closely with other engineers to deliver core product functionality.Defining SLOs, SLIs, and SLAs for key metrics that indicate the health, security, stability and uptime of production, staging and development environmentsMonitoring the above environments and reacting to alerts and issues that may arise in day-to-day operation of their product line.Participate in an on-call rota for priority-1 level alarms with the rest of the Platform teamsOngoing upgrades and improvements to operational processes to optimise performance, stability and cost.Working with the platform engineering team to contribute to the planning of how we carry application/infrastructure releases and configuration changes.Interact with internal teams and external 3rd party vendors to troubleshoot and resolve complex problemsYour Experience and Qualifications5+ years experience in an engineering role responsible for supporting a scaled SaaS platform running on Linux in a cloud environmentExperience working with high-performance systems, and solving complex engineering problems at scale (our platform processes ~100 Billion messages per year)Understanding of distributed systems design – including asynchronous tasks, event driven architecture, scheduling, caching and queue processingAbility to apply distributed systems design knowledge to resolve scaling constraints.
The capability to carry out performance tuning from the API to Application to Database layer of the platform.Strong communication skills and ability to explain complex technical solutions simply to othersStrong understanding of PHP, GoLang, MySQL, Opentelemetry, PrometheusExperience with Cloud and DevOps technologies (AWS, Terraform, CI/CD etc.
)Experience with specific technologies in our stack: Clickhouse, Kafka, Pulsar, PythonExperience with networking and security conceptsInterest or experience with marketing technologiesInterest or experience with big data, data analytics, AI and machine learningLocationIreland (Dublin) or UK (London or Milton Keynes)About usHeadquartered in Ireland with offices in the UK and US, Xtremepush is an Omnichannel Customer Engagement Platform powered by a built-in CDP.
It enables high-velocity companies to build, grow, and retain strong customer relationships through personalised, relevant, and timely communication.
With a true single customer view at its core, Xtremepush provides actionable customer intelligence that drives engagement, conversion, and revenue across all channels, while putting customer retention first.
At Xtremepush, we believe that diversity adds incredible value to our teams, our products, and our culture.
We don't just accept differences, we celebrate it, we support it, and we thrive on it for the benefit of our employees, our products and our community.
As an equal opportunity employer, we stay true to our mission by ensuring that our place can be anyone's place regardless of race, religion, gender, sexual orientation, national origin, disability or age.