The Role
At First Digital Finance Corp, we’re passionate about building software that solves problems. We will count on our site reliability engineer (SRE) to empower users with a rich feature set, high availability, and stellar performance level to pursue their missions. As we expand customer deployments, we’re seeking an experienced SRE to deliver insights from massive-scale data in real-time. Specifically, we’re searching for someone who has fresh ideas and a unique viewpoint, and who enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences for every interaction.
Your responsibilities will include:
* Run the production environment by monitoring availability and taking a holistic view of system health
* Build software and systems to manage platform infrastructure and applications
* Improve reliability, quality, and time-to-market of our suite of software solutions
* Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
* Provide primary operational support and engineering for multiple large-scale distributed software applications
* Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding
* Partner with development teams to improve services through rigorous testing and release procedures
* Participate in system design consulting, platform management, and capacity planning
* Create sustainable systems and services through automation and uplift
* Balance feature development speed and reliability with well-defined service-level objectives
* You must have experience in design/build in AWS, this is a deal stopper, if you don't have
Ideal Profile
* Bachelor’s degree (or equivalent) in computer science or related discipline
* 5+ years experience as DevOps
* Experience with distributed storage technologies such as NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn) ArgoCD would be great as well.
* A proactive approach to identifying problems, performance bottlenecks, and areas for improvement
What's on Offer?
#J-18808-Ljbffr