Incident Manager/Tech Ops Engineer, Central Technical Operations Services (C-TOS)
Central Technical Operations Services (C-TOS) is the first line of defense for maintaining high availability in the Amazon Retail Website. We make customer impacting events shorter, less frequent, and less severe, by providing large scale event and incident management. The Amazon Retail Website has hundreds of millions of customers globally who can be impacted by these types of incidents; the work we do to mitigate them helps real people at a tremendous scale. Our support engineers are front-and-center in driving down event duration by utilizing their operational experience, knowledge of best practices, and effective usage of incident management tools.
We’re looking for Incident Managers/Tech Ops Engineers who have owned or participated in operational and/or incident management for at least one large-scale enterprise. The Amazon Retail Website is complex and constantly changing; it operates across dozens of countries, consists of thousands of cloud-based services, is built and maintained by tens of thousands of engineers, and serves hundreds of millions of customers. When it experiences major issues, part of your job will be to respond to it within minutes and ensure the best course of action is taken. This experience will expose you to all things Amazon.
Our engineers are encouraged to build solutions to problems while sharing the benefit of those solutions with other service teams. This is an excellent opportunity to join one of Amazon’s world-class teams of engineers, and work with some of the best and brightest while also developing your skills and career within one of the most dynamic, innovative, and progressive technology companies anywhere. In addition to a stimulating and fun working environment, Amazon offers mentoring programs with experienced engineers, regular tech talks with technology Principals, and well-defined career paths for motivated engineers who want to contribute to our culture of operational excellence and customer-focused technical innovation. This position will be part of a globally distributed team of 20+ engineers across Austin, Dublin, and Sydney to allow for 24x7 coverage. Each group will work 10 hour shifts for 4 days a week. If you're looking for a team with great growth potential and an opportunity to make a huge impact, this is the team to join.
Key job responsibilities
1. Be a technology evangelist and use your deep knowledge to solve business problems
2. Reduce mean time to resolution for all incident types
3. Design and/or build world class listening systems
4. Adapt and improve operations management systems and processes to accommodate rapid and increasing growth
5. Participate in Agile sprints to evolve business processes and technologies
6. Create and review documentation, design new standard operating procedures
7. Identify and troubleshoot recurring platform issues and engage service owners to drive resolution
8. Automate tasks through creation and maintenance of scripts and tools
9. Respond to and complete customer requests within SLA via a trouble ticketing system
10. Take part in a “follow the sun” rotation split between Seattle, Dublin and Sydney sites, including weekends and holidays
11. Mentor peers in your areas of technical and operational strength
12. Participate in the interviewing process
About the team
Mentorship & Career Growth: We care about your career aspirations and tailor your development to your unique abilities. We want you to grow and progress in Amazon. We will facilitate your growth through an increase in scope of the projects you work on over time, and include you in projects for partner teams to experience new and interesting challenges. In this role, you’ll have the opportunity to work on operational readiness, software development and testing, driving operational excellence and cross-cutting initiatives among others.
Minimum Requirements
1. Bachelor’s degree in Computer Science or related.
2. Relevant experience in a large-scale online technical operations environment
3. Strong Incident Management skills
4. Experience scripting in at least one interpreted language (e.g. Java, Python) as well as shell
5. Experience using Linux and networking fundamentals
6. Experience driving collaborative projects from conception to delivery
7. Experience in Agile/Scrum or related collaborative workflow
8. Confidence to drive and manage large conference calls
9. Understanding of routing protocols to help facilitate troubleshooting and remediation of networking issues
10. Experience dealing effectively with customers during problem resolution and operating efficiently under pressure
11. Effective prioritization and time management
12. Effective organizational skills to maintain a consistently high standard of operations in a busy environment
13. Excellent troubleshooting skills and a commitment to document findings
Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your experience and skills. We value your passion to discover, invent, simplify and build. Protecting your privacy and the security of your data is a longstanding top priority for Amazon. Please consult our Privacy Notice (https://www.amazon.jobs/en/privacy_page) to know more about how we collect, use and transfer the personal data of our candidates.
Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit https://www.amazon.jobs/content/en/how-we-hire/accommodations.
#J-18808-Ljbffr