Job Description
We are seeking a highly skilled GPU Compute Architect with a strong background in microarchitecture (uArch) and Register-Transfer Level (RTL) design to join our team. This individual will play a critical role in prototyping and designing advanced compute arithmetic components such as MAC (Multiply-Accumulate) arrays and ALUs (Arithmetic Logic Units) for GPUs tailored to AI applications, with a focus on delivering comprehensive power and area estimates for each design option.
Key Responsibilities:
* Design and prototype advanced compute arithmetic units (e.g., MAC arrays, ALUs) for GPUs targeting AI and deep learning workloads.
* Develop and optimize GPU microarchitectures to enhance performance, energy efficiency, and scalability for AI-specific applications.
* Create and refine RTL implementations to validate and benchmark new design concepts.
* Conduct detailed performance modelling and analysis to identify bottlenecks and propose innovative solutions for next-generation GPU designs.
* Produce comprehensive power and area estimates for proposed designs, enabling informed trade-off analysis and decision-making.
* Collaborate with cross-functional teams, including software, hardware, and machine learning experts, to align architecture design with application requirements.
* Research and integrate emerging technologies and methodologies in GPU compute design for AI workloads.
* Lead the evaluation of design trade-offs in terms of performance, area, and power metrics.
* Drive innovation in custom compute unit design, ensuring compatibility with broader GPU pipeline architecture.
Qualifications
Required:
* Master's or Ph.D. in Electrical Engineering, Computer Engineering, Computer Science, or a related field.
* 5 + years of experience.
* Proven related experience in GPU/ASIC architecture design, with a focus on compute arithmetic via course work or relevant projects.
* Expertise in microarchitecture design and RTL coding (e.g., SystemVerilog). Strong understanding of GPU pipelines, parallel computing concepts, and AI/ML workloads.
* Proven experience in designing and optimizing MAC arrays, ALUs, or similar compute units.
* Solid knowledge of hardware modelling and simulation tools (e.g., VCS, Synopsys, ModelSim). Experience in producing and interpreting power and area estimates for complex hardware designs.
* Proficiency in performance analysis tools and techniques.
* Strong problem-solving skills with the ability to innovate and think out of the box.
Preferred:
* Familiarity with high-level synthesis (HLS) tools and methodologies.
* Background in machine learning algorithms and their hardware acceleration.
* Understanding of power optimization techniques and methodologies for compute-intensive hardware.
* Requirements listed would be obtained through a combination of industry relevant job experience, internship experiences and or schoolwork/classes/research.
#J-18808-Ljbffr