Social network you want to login/join with:
* Take a purist SRE approach to shared multi-tenant infrastructure for a resilient SaaS microservice-based containerized systems in addition to customer-centric application environments
* Oversee and automate the team’s growing presence in AWS
* Contribute to core infrastructure systems development with features, bug fixes, reliability improvements, etc
* Platform reliability engineering of a complex single sign-on SAML/OAuth-based central authentication platform
* Creatively build and develop tooling to aid in driving 24x7x365 follow-the-sun operations of critical production systems
* Automate deployment tasks for core product and infrastructure tools and maintain automation infrastructure
* Create system documentation and training materials to empower and educate our fellow team members
* Build and maintain observability tooling, metrics, and dashboarding for a global platform product infrastructure
* Improve our incident management lifecycle to identify, mitigate, and learn from reliability risks and issues
* Enhance platform observability with helping create a self-healing approach to platform reliability
* Collaborate with engineering teams, providing product feedback and where necessary contribute code to the product
REQUIRED SKILLS AND EXPERIENCE
* Education and Work Experience :
* Bachelor’s Degree in Computer Science or related field.
* Software engineering and task automation skills with Bash, Python, and/or Go are a must.
* Familiarity with the Agile software development lifecycle.
* Deep background with Linux systems and engineering.
* Highly experienced with engineering and automating on Amazon Web Services (AWS).
* Experience supporting web applications running on Java / Apache / Tomcat in a live production environment.
* Prior experience with IaC tools like Terraform/Terragrunt/Terraspace.
* Prior experience with devops/gitops tools (Git, Bitbucket, Flux CD, Teamcity) for gate promotions.
* Production-At-Scale support background in a heavily microservice-based world.
* Hands-on engineering and ops expertise in containerization (Docker, Helm, Kubernetes/EKS, CNI and Ingress networking).
* Strong understanding of Single-Sign On, SAML, OAuth (Bonus if hands-on experience with Okta).
* Seasoned expertise around certificate technology and basic concepts of encryption.
* Experience working with Relational Databases such as Aurora Postgres and/or Oracle RDS.
* Advanced exposure to application development, web UI (design and development), JSON, application architecture.
* Experience strongly utilizing observability tools (logging/APM) like Datadog, CloudWatch, and PagerDuty.
* Familiarity with event store/stream-processing technologies like Kafka or AWS SQS.
* Understanding of Open Application Model systems such as KubeVela or Crossplane.
* Personal Qualities and Soft Skills :
* You greatly prefer writing code than clicking a GUI.
* You enjoy teaching, being a mentor to others, and working across boundaries.
* Outstanding troubleshooting skills; ability to think critically and display an aptitude for problem solving.
* Strong analytical mind with a penchant for process development and enhancement.
* A highly positive can-do attitude with desire for being a team player.
* Great communication skills and ability to explain complex technical concepts to a varied audience.
* Demonstrate strong follow-through, a strong work ethic and consistently keep and meet commitments.
* Other Requirements:
* Ability to read, write, and speak English.
* We provide 24x7 support to our customers, so we expect you to take turns with your teammates being on-call for weekend production emergencies or to provide rotating weekend operational support.
* Travel – Expect occasional travel (less than 5%) to other Guidewire offices for training and team meetings.
#J-18808-Ljbffr