Do you enjoy balancing being hands-on, leading by example, with helping shape strategic direction? Do the challenges that come of driving technical, business, and cultural change to improve the reliability, performance, and efficiency of one of the largest cloud providers excite you? The Amazon Managed Operations (MO) organization was founded in April 2023, with the objective to reduce operational load and toil through long-term engineering projects. MO is building the best-in-class engineering and operations team that will own the day-to-day operations for Amazon Regions; improving the availability, reliability, latency, performance and efficiency to operate Amazon regions. Amazon is looking for a highly motivated Principal Systems Development Engineer to drive technical operational efficiency across Amazon. This role will tackle intrinsically hard problems, venturing beyond comfortable approaches when necessary. You will learn, educate, and advocate, acquiring expertise as needed, pioneer new spaces, and inspire others as to what’s possible. This role is internally focused and highly visible, demanding continuous learning, collaboration across departments within Amazon, and it will significantly impact the quality of life for both current and future customers and builders who directly or indirectly depend on Amazon's’ European Sovereign Cloud.
A day in the life
You’ll balance your time between operating production systems and making long-term improvements to the reliability, availability, and performance of those software systems. An example week could look like: Monday you provide meaningful feedback on the most critical upcoming change whilst guiding the most senior technical talent in your organization to make more decisions without you. Tuesday you identified a major reliability risk in the interplay between systems in your care and designed a cohesive solution. On Wednesday you lead the design review with the relevant technical leaders, receiving consensus on a path forward. Thursday, you influenced your senior management to take goals and make investments to achieve that outcome. Friday, you begun developing part of that system which would have the most impact on the reliability of the overall system.
* Requirement to participate in On-Call rotation.
* Fluency in written and spoken English is required.
* Successful applicants must have the legal right to work in Ireland.
* Amazon will provide relocation support for successful applicants relocating within the European Union.
BASIC QUALIFICATIONS
•10+ years of experience in software development or related field
•Experience operating and troubleshooting reliable, scalable software systems
•Proficient in at least one modern programming language such as Java, Typescript, Python, or Ruby
•Able to troubleshoot at all levels, from network to operating systems to software applications
•Proficient communicator across languages, cultures, and time zones
•Able to periodically travel to meet with internal engineering teams, leaders, and customers
PREFERRED QUALIFICATIONS
*•Highly Proficient in operating 24x7 high-availability, distributed software applications
•Desire to dive deep into, and find opportunities to improve, the reliability, availability, and performance of distributed software systems.
•Experience influencing and leading strategic efforts requiring work from multiple teams
•Experience actively mentor individual engineers and managers
•Experience performance tuning software applications and optimizing fleet utilization
•Strong understanding of network fundamentals (DNS, DHCP, TCP/IP, routing, load balancing, load shedding)
•Proficient with Infrastructure as Code, (such as CDK, CloudFormation, Puppet, Chef, Ansible, or similar)
•Proficient with operating services in AWS
•Experience with monitoring frameworks (such as CloudWatch, Datadog, Grafana, Elastic or similar)
•Experience scripting operating system tasks in Bash, Python, etc.