Job Description
Company Overview
Citco is a global leader in financial services, delivering innovative solutions to some of the world's largest institutional clients. We harness the power of data to drive operational efficiency and informed decision-making. We are looking for a Tech Lead - Data Engineering with extensive Databricks expertise and AWS experience to lead mission-critical data initiatives.
Role Summary
As the Tech Lead - Data Engineering, you will be responsible for architecting, implementing, and optimizing end-to-end data solutions on Databricks (Spark, Delta Lake, MLflow, etc.) while integrating with core AWS services (S3, Glue, Lambda, etc.). You will lead a technical team of data engineers, ensuring best practices in performance, security, and scalability. This role requires a deep, hands-on understanding of Databricks internals and a track record of delivering large-scale data platforms in a cloud environment.
Responsibilities
Key Responsibilities
1. Databricks Platform & Architecture
o Architect and maintain Databricks Lakehouse solutions using Delta Lake for ACID transactions and efficient data versioning.
o Leverage Databricks SQL Analytics for interactive querying and report generation.
o Manage cluster lifecycle (provisioning, sizing, scaling) and optimize Spark jobs for cost and performance.
o Implement structured streaming pipelines for near real-time data ingestion and processing.
o Configure and administer Databricks Repos, notebooks, and job scheduling/orchestration to streamline development workflows.
2. AWS Cloud Integration
o Integrate Databricks with AWS S3 as the primary data lake storage layer.
o Design and implement ETL/ELT pipelines using AWS Glue catalog, AWS Lambda, and AWS Step Functions where needed.
o Ensure proper networking configuration (VPC, security groups, private links) for secure and compliant data access.
o Automate infrastructure deployment and scaling using AWS CloudFormation or Terraform.
3. Data Pipeline & Workflow Management
o Develop and maintain scalable, reusable ETL frameworks using Spark (Python/Scala).
o Orchestrate complex workflows, applying CI/CD principles (Git-based version control, automated testing).
o Implement Delta Live Tables or similar frameworks to handle real-time data ingestion and transformations.
o Integrate with MLflow (if applicable) for experiment tracking and model versioning, ensuring data lineage and reproducibility.
4. Performance Tuning & Optimization
o Conduct advanced Spark job tuning (caching strategies, shuffle partitions, broadcast joins, memory optimization).
o Fine-tune Databricks clusters (autoscaling policies, instance types) to manage cost without compromising performance.
o Optimize I/O performance and concurrency for large-scale data sets.
5. Security & Governance
o Implement Unity Catalog or equivalent Databricks features for centralized governance, access control, and data lineage.
o Ensure compliance with industry standards (e.g., GDPR, SOC, ISO) and internal security policies.
o Apply IAM best practices across Databricks and AWS to enforce least-privilege access.
6. Technical Leadership & Mentorship
o Lead and mentor a team of data engineers, conducting code reviews, design reviews, and knowledge-sharing sessions.
o Champion Agile or Scrum development practices, coordinating sprints and deliverables.
o Serve as a primary technical liaison, working closely with product managers, data scientists, DevOps, and external stakeholders.
7. Monitoring & Reliability
o Configure observability solutions (e.g., Datadog, CloudWatch, Prometheus) to proactively identify performance bottlenecks.
o Set up alerting mechanisms for latency, cost overruns, and cluster health.
o Maintain SLAs and KPIs for data pipelines, ensuring robust data quality and reliability.
8. Innovation & Continuous Improvement
o Stay updated on Databricks roadmap and emerging data engineering trends (e.g., Photon, Lakehouse features).
o Evaluate new tools and technologies, driving POCs to improve data platform capabilities.
o Collaborate with business units to identify data-driven opportunities and craft solutions that align with strategic goals.
Qualifications
Qualifications
1. Educational Background
o Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or equivalent experience.
2. Technical Experience
o Databricks Expertise: 5+ years of hands-on Databricks (Spark) experience, with a focus on building and maintaining production-grade pipelines.
o AWS Services: Proven track record with AWS S3, EC2, Glue, EMR, Lambda, Step Functions, and security best practices (IAM, VPC).
o Programming Languages: Strong proficiency in Python (PySpark) or Scala; SQL for analytics and data modeling.
o Data Warehousing & Modeling: Familiarity with RDBMS (e.g., Postgres, Redshift) and dimensional modeling techniques.
o Infrastructure as Code: Hands-on experience using Terraform or AWS CloudFormation to manage cloud infrastructure.
o Version Control & CI/CD: Git-based workflows (GitHub/GitLab), Jenkins or similar CI/CD tools for automated builds and deployments.
3. Leadership & Soft Skills
o Demonstrated experience leading a team of data engineers in a complex, high-traffic data environment.
o Outstanding communication and stakeholder management skills, with the ability to translate technical jargon into business insights.
o Adept at problem-solving, with a track record of quickly diagnosing and resolving data performance issues.
4. Certifications (Preferred)
o Databricks Certified Associate/Professional (e.g., Databricks Certified Professional Data Engineer).
o AWS Solutions Architect (Associate or Professional).
#J-18808-Ljbffr