Description
This role requires working in the Dublin office at least three days per week.
Kick-start your career with some real responsibility and an incredible learning experience working on cloud services in OCI. You will play an instrumental role in delivering the cloud experience that is changing lives across the globe. Your versatility will be your greatest asset as you turn your hand to deployment, operations and execution. You’ll have the opportunity to collaborate with the brightest minds in the industry and bring fresh insight to everything you do. Deliver fascinating, high scale services and solutions and enjoy extraordinary career growth at a company that wants to see you thrive.
We are building a Global Operations team which can provide you the opportunity to build and operate a suite of massive scale, integrated cloud services in a broadly distributed, multi-tenant cloud environment. OCI is committed to providing the best in cloud services that meet the needs of our customers who are tackling some of the worlds biggest challenges. We oHer unique opportunities for smart, hands-on engineers with the expertise and passion to solve diHicult problems in distributed highly available services and virtualized infrastructure. At every level, our engineers have a significant technical and business impact operating and building innovative new systems to power our customers' business critical applications.
Engineers will:
Improve monitoring, notifications, configuration and deployment of our services.
Perform proactive service checks and monitor/triage and address incoming
system/application alerts to ensure appropriate priority and response.
Triage and troubleshoot service impacting events from multiple signals including
phone, email, service telemetry and alerting.
Perform change management activities for services such as upgrades and patching.
Identify and work with engineering to implement opportunities for automation,
signal noise reduction, recurring issues and other actions to reduce time to mitigate service impacting events and increase the productivity of cloud operations and development resources.
Coordinate, document and track critical incidents ensuring rapid and complete issue resolution and an appropriate closed loop to customers and other key stakeholders.
Improve the availability, scalability, latency, ease of use, and eHiciency of service control plans and operational tooling.
Modify /enhance monitoring infrastructure for the services.
Participate in service capacity planning and demand forecasting, software
performance analysis and system tuning.
Potentially participate in regular rotations as a central part of the 24x7 operations
team. We are hiring in multiple time zones to ensure global 24x7 from India to
Ireland to the USA.
Need to be reliable in terms of working scheduled hours.
Need to be motivated quick learners.
Desired skills include:
BE/BTech or ME/MTech in Computer Science, or equivalent.
1+ years of work experience as a software, site reliability or customer support
engineer
Ability to work independently and across teams to guide other engineers through
technical operations
Good technical writing and communication skills. Engineers will need to be able to
clearly write descriptions of operational issues and corrective actions for incidents.
Slack skills and being comfortable coordinating with others online.
Basic Linux system administration knowledge and experience
Shell scripting, at least basic things, recursive search, output redirection, etc.
Very strong analytical skills to identify problem root causes.
Systematic problem-solving approach, combined with a strong sense of ownership
and drive in resolving operations issues.
Candidates will have the opportunity to develop many of the following skills.
Current possession of some of these skills is a bonus.
Knowledge of Linux OS internals and administration including networkservices, TCP/IP, NFS, SSH, NTP, bonding, vlans, tuning, systems diagnosisskills, systemd, kernel modules, user management, storage components
Experience working under pressure to mitigate customer issues aHecting
service reliability, data integrity, and overall customer experience.
Monitoring, management, analysis and troubleshooting of large-scale, distributed systems.
Experience with IaaS, PaaS and SaaS architectures
Experience in building and managing virtualized and containerized systems (KVM, Containers/Docker/Kubernetes, Helm, Puppet, Chef).
Understanding and experience with Micro-services architecture, Oracle database, MySQL, Oracle WebLogic servers
Experience with cloud, development and build technologies: Python, Bash, Ansible, Terraform, Hadoop, Kafka, Solr, Redis, Git, Intellij, Jenkins and Maven
Familiarity with identity, security and encryption technologies and following security best practices.
Career Level - IC2
#LI-Hybrid #LI-NS1ResponsibilitiesWhat You’ll Do:Join a fun and flexible workplace where you’ll enhance your skills and build a solid professional foundation. As a Cloud Operations Engineer in our Global Production Services you will contribute to an exciting team working on some of the hottest cloud services such as Ksplice, Oracle Linux YUM Service, OS Management Hub, and more. As a Cloud Operations Engineer, you will use your skills to learn how to constantly deliver and improve on these tremendous cloud services. Operations work will include troubleshooting production issues and handling change management requests for upgrades, patches or modifications. When not working on operations you will be working on software engineering tasks such as review of incidents to drive improvement of services, tools or runbooks to increase our reliability, scalability and reduce operational overhead through automation, training, documentation, service enhancement, or processimprovement. This position has the opportunity to leverage and learn the ins and outs of current cloud service architecture, deployment, monitoring and operationaltechnologies. There are many useful and desirable skills which will be acquired if not already present. See below for the many cool and current technologies in play. The ideal candidate has some of the skills, but key is the motivation and ability to learn quickly as well as a passion for an excellent customer experience.