As a Principal Data Engineer, your responsibilities will include:
* Design and build data pipelines to process terabytes of data.
* Orchestrate in Airflow the data tasks to run on Kubernetes/Hadoop for the ingestion, processing, and cleaning of data.
* Create Docker images for various applications and deploy them on Kubernetes.
* Design and build best-in-class processes to clean and standardize data.
* Troubleshoot production issues in our Elastic Environment.
* Tuning and optimizing data processes.
Advancing the team’s DataOps culture (CI/CD, Orchestration, Testing, Monitoring) and building out standard development patterns:
* Drive innovation by testing new technology and approaches to continually advance the capability of the data engineering function.
* Drive efficiencies in current engineering processes via standardization and migration of existing on-premise processes to the cloud.
* Ensuring Data Quality – building best-in-class data quality monitoring that ensures that all data products exceed customer expectations.
Required Qualifications:
* Computer Science bachelor’s degree or similar.
* Good understanding of Data Modelling techniques i.e. DataVault, Kimble Star.
* Excellent understanding of Column-Store RDBMS (DataBricks, Snowflake, Redshift, Vertica, Clickhouse).
* Good experience handling real-time, near real-time, and batch data ingestions.
* Hands-on experience on the following technologies:
* Developing processes in Spark.
* Exposure to Kubernetes and Linux containers (i.e. Docker).
* Related/complementary open-source software platforms and languages (e.g., Scala, Python, Java, Linux).
* Proven track record of designing effective data strategies and leveraging modern data architectures that resulted in business value.
* Experience building cloud-native data pipelines on either AWS, Azure, or GCP, following best practices in cloud deployments.
Strong DataOps experience (CI/CD, Orchestration, Testing, Monitoring).
* Strong experience leading and developing data engineering teams.
Demonstrated effective interpersonal, influence, collaboration, and listening skills.
Excellent time management, organizational and prioritization skills with the ability to balance multiple priorities.
Preferred Qualifications:
* Experience with data tokenization and different techniques and tools i.e. DataVant, Protegrity.
* Experience with Azure Data Factory, Databricks, and Snowflake.
* Experience with Apache Spark and related Big Data stack and technologies, PySpark, Scala.
* Experience working with Apache Kafka, building appropriate producer/consumer apps.
* Experience working with Kubernetes and Docker, and knowledgeable about cloud infrastructure automation and management (e.g., Terraform).
* Experience working in projects with agile/scrum methodologies.
* Familiarity with production quality ML and/or AI model development and deployment.
* Healthcare industry knowledge and experience with exposure to EDI, HIPAA, HL7, and FHIR integration standards.
#J-18808-Ljbffr