Samyak Jain
@SamyakJain
ETL Developer at ZS Associates
Pune, Maharashtra, India
Data Engineer with 3+ years of experience building scalable ETL/ELT pipelines using PySpark, SparkSQL, and SQL on AWS and Azure. Experienced in developing cloud-based data warehouses, processing structured and semi-structured data (Parquet, XML, JSON, CSV), and optimizing distributed data workflows. Hands-on with AWS services like EMR and S3, and Delta Lake architecture. Strong collaborator focused on delivering reliable data solutions.
Experience
ETL Developer
ZS Associates
Architected, Model and owned large-scale ETL/ELT pipelines using SQL and Spark (PySpark) application to process Large-volumes healthcare and Pharma data. Built and supported executive and market intelligence dashboards focused on pharmacy marketing, brand performance, and regional sales trends. Orchestrated and monitored Spark workflows on EMR and Hive compatible environments using Azkaban and AWS EMR. Optimized and automated ETL and Spark workloads by allocating shuffle partitions and executor configurations. Implement Master Data Management (MDM) and data governance frameworks for products and customer hierarchies.
Data Engineer
Celebal Technologies
Engineered scalable data pipelines and ETL pipelines using SQL, Python, Spark (PySpark/Spark SQL) on Azure Databricks. Designed and implemented data enrichments and transformations using Medallion Architecture (Bronze/Silver/Gold) with Delta Lake. Developed a data quality audit engine and integrated datasets with Power BI. Created Delta Live Table pipelines for incremental and real-time data processing. Developed advanced SQL procedures and reusable transformation scripts.
Education
DIT University
Bachelor's degree
Computer Science & Engineering with specialization Artificial Intelligence and Data Science
Licenses & Certifications
AZ 900 Microsoft Certified: Azure Fundamentals
Microsoft
Academy Accreditation - Databricks Lakehouse Fundamentals
Databricks
DP 900 Microsoft Certified: Azure Data Fundamentals
Microsoft
Databricks Certified Data Engineer Professional
Databricks