About HuaweiHuawei is a leading
global provider of information and communications technology (ICT)
infrastructure and smart devices. With integrated solutions across four key
domains – telecom networks, IT, smart devices, and cloud services – we are committed to bringing digital to every
person, home and organization for a fully connected, intelligent world.At Huawei, innovation focuses on
customer needs. We invest heavily in basic research, concentrating on
technological breakthroughs that drive the world forward. We have more than
180,000 employees, and we operate in more than 170 countries and regions.About the IRCHuawei Ireland
Research Centre (IRC) mission is to position Huawei as a recognized technology
leader and a global provider of information and communications technology (ICT)
solutions. To achieve this we are building an industry-recognized
multi-discipline Research Centre of experts with focus on medium-term to
long-term issues. The IRC will work closely with an open innovative ecosystem
with Huawei customers to address real-world issues. The IRC will also engage
with key European universities to build a basic research capability to support
Huawei technical projects.About the jobAre you a researcher or engineer interested the challenges of planet-scale cloud infrastructure? We are looking for people who are passionate about working on problems that lie at the intersection of academic research and practical industry implementations. This is a chance to be part of the strategic team that will tackle the Cloud Hardware Reliability challenges. This team works on improving the reliability of cloud infrastructure architecting and designing new features for the future servers.The Cloud Reliability Lab at the Huawei Ireland Research Center has a mission to bring world-class reliability to Huawei Cloud by solving cross-functional problems that span Hardware, Software, Networking and Operations. We have teams working in all these areas with a diverse mix of talented, people including industry experts, academic researchers, and Ph.D. interns. In your role, you will collaborate with the local research teams, other European research centers, and other engineering teams spread across the globe.Responsibilities
Understanding and investigating planet-scale technical problems. For example, defining new hardware reliability functionalities for globally distributed data centers.
Architect and help design RAS telemetry for datacenter platforms and develop manageability solutions to monitor and maintain system health.
Present findings and solutions tailored to the needs of key stakeholders, including engineering teams, senior management, customers, and external partners.
Gather insights from the cutting edge of industry and academia regarding GPU hardware development. Help translate customer requirements, feedback, and market dynamics into potential feature requests to ensure a high reliability fleet.
Requirements
Ph.D. or Master’s degree in Electrical Engineering or Computer Engineering or a related field
Expert on datacenter telemetry, including scale solution for cloud scale telemetry
Expert on scale server management including BMC management
Proven knowledge on server telemetry/manageability protocols like SMBUS, I2C, I3C, Redfish, SPI.
Privacy
StatementPlease read and understand our
West European Recruitment Privacy Notice before submitting your personal data
to Huawei so that you fully understand how we process and manage your personal
data received.http://career.huawei.com/reccampportal/portal/hrd/weu_rec_all.html