Title: Lead Data Engineer
IN
Job Description
Must Have
Role: Lead Data Engineer - Databricks
Description:
• Lead functional teams or projects, managing multiple data engineering initiatives simultaneously to ensure timely delivery.
• Design, implement, and maintain scalable ETL pipelines using Databricks, PySpark, AWS Glue, and S3 to process and transform large datasets.
• Query and analyze complex datasets using SQL and Python to support business intelligence and analytics initiatives.
• Develop and document data models, schemas, and standards to ensure data consistency, usability, and governance.
• Monitor, troubleshoot, and optimize data pipeline performance and reliability, minimizing downtime and maximizing efficiency.
• Implement robust data validation and testing techniques to ensure high data quality and integrity.
• Communicate potential risks, mitigations, and business impacts clearly and promptly to stakeholders.
• Define technical requirements and design solutions leveraging existing technologies and industry best practices.
• Solve complex data-related problems by analyzing multiple sources of information and applying innovative approaches.
• Develop and maintain CI/CD pipelines and automate the deployment of data pipelines, ETL processes, and infrastructure updates.
• Implement data quality checks and monitoring processes across data pipelines.
• Ensure all data, processes, and pipelines adhere to security protocols and compliance standards.
Desired Profile:
• 7+ years of experience in data engineering, with a focus on Databricks and cloud ETL
• Proficiency in Databricks, PySpark, SQL, and Python required
• Should have experience in commercial datasets - IQVIA sales, Veeva CRM, HCP profiles, Marketing, etc,
• Demonstrated experience leading data engineering teams or projects.
• Strong analytical and problem-solving skills, with the ability to handle complex data challenges.
• Excellent communication and interpersonal skills, enabling collaboration with technical and non-technical stakeholders.
• Deep understanding of data modeling, schema design, and database management.
• Knowledge of data quality assurance techniques and compliance requirements.
• Bachelor’s degree in computer science, Information Systems, or a related field; advanced degree preferred.
• Pharmaceutical data knowledge is Plus
• Certifications in relevant technologies Databricks Certified Data Engineer is nice to have.
Skills:
Primary Skills Databricks, Python, Datamodelling, SQL
Secondary Skills Data Modelling, Pyspark, ETL, SQL
Good to have
EQUAL OPPORTUNITY