Job Description
You will be part of the team responsible for the end-to-end health (performance and reliability) of Meta's backbone networks.
Key Responsibilities:
* Design, develop, and implement tools and automation to efficiently scale network mitigation strategies, identify long-term trends in performance and risks, and drive innovative solutions to monitor and improve network products.
* Support both the Classic Backbone, which transports traffic destined for users, and the Express Backbone, which handles machine-to-machine traffic between data centers.
* Collaborate with a talented team of engineers to solve complex networking and software challenges on a massive scale.
Responsibilities
* Write and review code, develop documentation, capacity plans, and debug critical issues on some of the world's largest and most complex networks.
* Participate in a weekly on-call rotation and serve as an escalation contact for service incidents.
* Conduct deep dives on complex technical issues across networks, including automated tooling, hardware failures, and network issues.
* Manage and maintain multi-vendor, multi-protocol backbone and edge networks.
* Analyze data to diagnose and identify root causes of network issues.
* Define, develop, and optimize automated network monitoring systems to mitigate and remediate network events.
* Proactively identify gaps impacting multiple teams, create execution plans, and drive projects directly or through influencing other teams.
* Contribute to team growth and development through peer mentorship.
Requirements
* Bachelor's degree in Computer Science, Computer Engineering, or equivalent practical experience.
* 4+ years of coding experience in higher-level languages (e.g., Python, C++, Go).
* 5+ years of experience in BGP, MPLS, ISIS, or similar routing protocols, with knowledge of typical configurations and performance tuning.
Preferred Qualifications
* 5+ years of experience mitigating network hardware and topology failures.
* Expert knowledge of TCP/IP and IPv6.
* Experience operating and designing SDN-based backbone networks.
* Experience working in multi-vendor network environments.
* Experience developing distributed systems and operating them at scale.
* Experience with automation frameworks and tools like Ansible, Puppet, or Chef.
* Experience configuring and maintaining network devices and NMS systems.
* Experience learning software, frameworks, and APIs.
* Experience developing and understanding network device configuration for at least one vendor.
* Knowledge of routing and switching, including hardware design and forwarding planes.