Job Description
Must Have
Role: Lead Data Engineer
Job Description:
We are looking for Senior Data Engineer profiles with strong Databricks experience (min 3+ yrs of experience on Databricks & Data Engineering). Please find the job description below for reference and share relevant profiles accordingly.
Key Responsibilities
• Design, develop, and maintain scalable, high performance data pipelines using Databricks (PySpark/SQL/Delta Lake)
• Implement and manage Medallion Architecture (Bronze, Silver, Gold layers) for reliable and governed data processing
• Optimize batch and streaming workloads for performance, reliability, and cost
• Build and orchestrate pipelines using tools such as Airflow, Databricks Workflows, or equivalent schedulers
• Ensure data quality, lineage, and observability through validations, logging, and monitoring frameworks
• Collaborate closely with architects, analytics teams, and business stakeholders to translate requirements into robust data solutions
• Support data consumption for analytics, BI, AI/ML, and downstream applications (e.g., Power BI, Tableau, ML pipelines)
• Apply best practices for data security, governance, and compliance, especially for regulated data
• Lead code reviews, enforce engineering standards, and mentor junior engineers
• Participate in platform modernization, migration, and optimization initiatives
Required Skills & Qualifications
Technical Skills
• Strong hands on experience with Databricks (PySpark, Delta Lake, SQL)
• Expertise in Spark-based data processing and distributed data systems
• Experience with cloud platforms (AWS / Azure ) and cloud native data services
• Proficiency in building ETL/ELT pipelines using structured and semi structured data
• Experience with data orchestration tools (Airflow, Databricks Jobs, etc.)
• Solid understanding of data modeling, schema evolution, and performance tuning
• Knowledge of CI/CD practices for data pipelines and infrastructure
• Familiarity with version control (Git) and agile delivery models
Domain & Business Skills (Preferred)
• Experience in Life Sciences / Healthcare data domains
• Understanding of healthcare or pharma data types (e.g., clinical, commercial, real world data, reporting datasets)