Site Reliability Engineer - Coding, Managed Operations
DESCRIPTION
AWS is set to introduce the inaugural European Sovereign Cloud (ESC), marking a significant development in utility computing (UC). To spearhead this initiative, we are actively seeking experienced systems development engineers with a strong background in automation and operations. As part of the AWS Managed Operations team, you will play a pivotal role in building and leading operations and development teams dedicated to delivering high-availability AWS services, including EC2, S3, Dynamo, Lambda, and Bedrock, exclusively for EU customers.
Your responsibilities will encompass overseeing the launch of the ESC in 2025, working closely with global AWS teams, and influencing the evolution of AWS services and technology. A typical day in this role involves collaborating with technology leaders, contributing to the enhancement of day-to-day operations, and ensuring improvements in availability, reliability, latency, performance, and efficiency of the ESC. You will be required to occasionally participate in 'on-call' rotations to resolve incidents occurring out-of-hours.
The overarching goal is to deliver scalable services and ensure a high-availability experience for EU customers. If you are an experienced professional ready for a challenging and impactful opportunity, we invite you to join our efforts in building a best-in-class development engineering and operations team that aligns with AWS' commitment to customer satisfaction and continual innovation.
BASIC QUALIFICATIONS
- 3+ years of experience in software development or related field with proficiency in at least one modern programming language such as Java, Typescript, Python, or Ruby
- 3+ years of experience with Linux, using the command line and basic administration, and computer networking fundamentals
- Able to troubleshoot at all levels, from network to operating systems to software applications
- Experience supporting cloud systems or other services. Proficient troubleshooting and anticipating problems that affect the performance, reliability, or availability of software systems.
PREFERRED QUALIFICATIONS
- Excellent communication and problem-solving skills across languages
- Experience operating 24x7 high-availability, distributed software applications and performance tuning software applications and optimizing fleet utilization
- Strong understanding of network fundamentals (DNS, DHCP, TCP/IP, routing, load balancing, load shedding) and experience with monitoring frameworks (such as CloudWatch, Datadog, Grafana, Elastic or similar)
- Experience scripting operating system tasks in Bash, Python, etc. and with Infrastructure as Code, (such as CDK, CloudFormation, Puppet, Chef, Ansible, or similar)
- Experience operating services in AWS.
Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your experience and skills. We value your passion to discover, invent, simplify and build. Protecting your privacy and the security of your data is a longstanding top priority for Amazon. Please consult our Privacy Notice to know more about how we collect, use and transfer the personal data of our candidates.
Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.
#J-18808-Ljbffr