At eBay, we're more than a global ecommerce leader — we’re changing the way the world shops and sells. Our platform empowers millions of buyers and sellers in more than 190 markets around the world. We’re committed to pushing boundaries and leaving our mark as we reinvent the future of ecommerce for enthusiasts.
Our customers are our compass, authenticity thrives, bold ideas are welcome, and everyone can bring their unique selves to work — every day. We're in this together, sustaining the future of our customers, our company, and our planet.
Join a team of passionate thinkers, innovators, and dreamers — and help us connect people and build communities to create economic opportunity for all.
About the team and the role:
As a Site Reliability Engineer at eBay, you'll play a key role in managing major incidents and the overall health of our services, making sure they are both resilient and high-performing. You’ll create strategies for availability and reliability, enhance domain ecosystem observability, and support a shift toward a more engineering-focused culture. Your contributions will ensure that eBay’s technology remains cutting-edge and reliable for our global community.
What you will accomplish:
* Lead Incident Management: Act as the Incident Commander to drive resolution of major incidents, manage alarms, and ensure effective communication with leadership and partner teams.
* Proactive Monitoring: Continuously monitor the health of eBay's critical services to identify and address potential issues before they escalate.
* Collaborative Problem Solving: Work closely with partner teams to resolve recurring technical issues, onboard new alerts, and develop high-quality Standard Operating Procedures (SOPs).
* Automation and Process Enhancement: Identify and implement opportunities to enhance automation and reduce manual workload, improving overall efficiency.
* Solution Development: Collaborate with Architecture, Engineering, and Operations teams to develop solutions that ensure high site availability, reliability and performance.
* Enhance Monitoring Tools: Improve tools for monitoring and mitigating site incidents, and conduct reliability audits and tests to strengthen eBay’s reliability and incident management capabilities.
What you will bring:
* 3 years of experience in large-scale internet/server environments, including cloud computing and multi-tier architectures.
* Strong incident management and leadership skills, with excellent technical triage and troubleshooting abilities, especially during crises.
* Hands-on Software engineering skills including Java, Python, GO, etc.
* Expert knowledge in large-scale web operations, including web-based Java/J2EE architectures, JVM configurations, and a deep understanding of UNIX, Linux, networking (TCP/IP), and databases (both relational and NoSQL).
* Experience in android and iOS application debugging.
* Experience with observability tools such as Grafana and Prometheus, and skills in documenting procedures for knowledge management.
NOTE: As part of the operation staff members of the SEC work a fixed shift. This position is for a day shift (7:00 AM to 5:00 PM) in our Dublin, Ireland location. Team members will work four days in a row in 10 hour shifts, with no on call responsibilities.
#J-18808-Ljbffr