This role requires working in the Dublin office at least three days per week.
Kick-start your career with some real responsibility and an incredible learning experience working on cloud services in OCI. You will play an instrumental role in delivering the cloud experience that is changing lives across the globe. Your versatility will be your greatest asset as you turn your hand to deployment, operations and execution. You’ll have the opportunity to collaborate with the brightest minds in the industry and bring fresh insight to everything you do. Deliver fascinating, high scale services and solutions and enjoy extraordinary career growth at a company that wants to see you thrive.
We are building a Global Operations team which can provide you the opportunity to build and operate a suite of massive scale, integrated cloud services in a broadly distributed, multi-tenant cloud environment. OCI is committed to providing the best in cloud services that meet the needs of our customers who are tackling some of the worlds biggest challenges. We offer unique opportunities for smart, hands-on engineers with the expertise and passion to solve difficult problems in distributed highly available services and virtualized infrastructure. At every level, our engineers have a significant technical and business impact operating and building innovative new systems to power our customers' business critical applications.
Engineers will:
* Improve monitoring, notifications, configuration and deployment of our services.
* Perform proactive service checks and monitor/triage and address incoming system/application alerts to ensure appropriate priority and response.
* Triage and troubleshoot service impacting events from multiple signals including phone, email, service telemetry and alerting.
* Perform change management activities for services such as upgrades and patching.
* Identify and work with engineering to implement opportunities for automation, signal noise reduction, recurring issues and other actions to reduce time to mitigate service impacting events and increase the productivity of cloud operations and development resources.
* Coordinate, document and track critical incidents ensuring rapid and complete issue resolution and an appropriate closed loop to customers and other key stakeholders.
* Improve the availability, scalability, latency, ease of use, and efficiency of service control plans and operational tooling.
* Modify/enhance monitoring infrastructure for the services.
* Participate in service capacity planning and demand forecasting, software performance analysis and system tuning.
* Potentially participate in regular rotations as a central part of the 24x7 operations team. We are hiring in multiple time zones to ensure global 24x7 from India to Ireland to the USA.
* Need to be reliable in terms of working scheduled hours.
* Need to be motivated quick learners.
Desired skills include:
* BE/BTech or ME/MTech in Computer Science, or equivalent.
* 1+ years of work experience as a software, site reliability or customer support engineer.
* Ability to work independently and across teams to guide other engineers through technical operations.
* Good technical writing and communication skills. Engineers will need to be able to clearly write descriptions of operational issues and corrective actions for incidents.
* Slack skills and being comfortable coordinating with others online.
* Basic Linux system administration knowledge and experience.
* Shell scripting, at least basic things, recursive search, output redirection, etc.
* Very strong analytical skills to identify problem root causes.
* Systematic problem-solving approach, combined with a strong sense of ownership and drive in resolving operations issues.
* Candidates will have the opportunity to develop many of the following skills.
1. Knowledge of Linux OS internals and administration including network services, TCP/IP, NFS, SSH, NTP, bonding, VLANs, tuning, system diagnosis skills, systemd, kernel modules, user management, storage components.
2. Experience working under pressure to mitigate customer issues affecting service reliability, data integrity, and overall customer experience.
3. Monitoring, management, analysis and troubleshooting of large-scale, distributed systems.
4. Experience with IaaS, PaaS and SaaS architectures.
5. Experience in building and managing virtualized and containerized systems (KVM, Containers/Docker/Kubernetes, Helm, Puppet, Chef).
6. Understanding and experience with Micro-services architecture, Oracle database, MySQL, Oracle WebLogic servers.
7. Experience with cloud, development and build technologies: Python, Bash, Ansible, Terraform, Hadoop, Kafka, Solr, Redis, Git, IntelliJ, Jenkins and Maven.
8. Familiarity with identity, security and encryption technologies and following security best practices.
Career Level - IC2
#LI-Hybrid #LI-NS1
#J-18808-Ljbffr