Globalization Partners’ automated, AI-enabled global employment platform, designed by our technical teams and powered by our worldwide HR experts, enables our customers to hire, onboard, and manage the best talent they can find, anywhere in the world.
With diverse teams all around the world, our people are the heartbeat of the company and the reason why Globalization Partners is a fun and inclusive place to work. We encourage and support personal growth and career development, trust our team members with the autonomy to do their best work, and believe in recognition for a job well done.
Our ideal candidate has a passion for automation, is a deep innovator, and wants to solve complex problems. Your knowledge and experience will be crucial to design and develop high performing cloud-based software products using traditional Agile methodologies and modern frameworks.
Did we mention you can experience all of this while working remotely? As a remote-first employer, we value your experience and skills more than where you are located. Join our collaborative work environment where you can make a real impact and love the work you’re doing!
About the position:
We are looking for an experienced Manager with an operational or site reliability engineering (SRE) background with a passion for providing superior system availability and experience with observability strategies. We are looking for candidates who can drive reliability and performance across Globalization Partners' cloud-native platform. As a Manager, DevOps, you will have the opportunity to tackle the complex problems of a rapidly scaling global organization while using your expertise in delivering and supporting critical services.
What you will do:
* Collaborate with other tech leads and support teams to ensure integrated end-to-end availability, reliability, security, and performance
* Define support strategies for systems in the Cloud
* Influence resiliency and scalability in production environments running in Amazon Web Services (AWS)
* Identify and drive resolution on monitoring and alerting gaps.
* Lead a team to design, write and deliver technical and process automation to improve the availability, scalability, latency, security, and efficiency of Globalization Partner's services
* Solve problems relating to mission-critical services and build automation to prevent problem recurrence, with the goal of automated response to all non-exceptional service conditions.
* Engage in service capacity planning and demand forecasting, software performance analysis, security analysis, and system tuning
* Identify and remediate risk to critical and non-critical system KPIs.
* Understand the full technology stack of systems in the assigned domain.
What we are looking for:
* Experience using observability tooling such as New Relic or DataDog to reduce outages and detection time.
* Familiarity with application architectures and networking.
* Strong experience with DevOps and Site Reliability Engineering concepts and principles.
* Understanding of Unix/Linux systems: system libraries, file systems, and client-server protocols.
* Networking: knowledge and understanding of network theory, such as different protocols (TCP/IP, UDP, ICMP, MAC addresses, IP packets, DNS, OSI layers, and load balancing).
* Experience working with containers and container orchestration platforms.
* Familiarity with serverless architecture and platforms?
* Experience with Infrastructure as Code (IaC) frameworks: Terraform, Cloudformation, CDK.
* Experience working delivering with CI/CD pipelines (AWS Code Deploy, GitHub Actions, Jenkins)
* Experience with full-stack engineering from Java and/or Javascript front end services to backend storage systems in both SQL and no-SQL contexts.
* Strong experience with data steaming (Kinesis, SQS, Kafka).
* We are looking for somebody with experience managing a team or teams of fully remote engineers across different timezones.
#J-18808-Ljbffr