VPVicky Pandhare
@Vicky
Data Engineer at Starlite Infotech Limited
Aurangabad, Maharashtra, India
Data Engineer with 2+ years of experience building and operating scalable batch data pipelines using Python, PySpark, Apache Spark, and Apache Airflow. Experienced in AWS data lake architectures using Amazon S3, EMR, and Redshift with Bronze, Silver, and Gold data layers and incremental ingestion from relational Data sources. Focused on reliable ETL pipeline development, data quality, schema evolution, and production pipeline monitoring with backfills and SLA management. Familiar with Azure data platform tools including Azure Databricks and ADLS Gen2 for Medallion-based data lake architectures.
Experience
Data Engineer
Starlite Infotech Limited
Designed end-to-end batch data pipeline architecture integrating MySQL source systems, AWS S3 data lake layers (Bronze, Silver, Gold), PySpark transformations on EMR, and curated datasets in Amazon Redshift for analytics consumption. Implemented timestamp-based incremental ingestion from MySQL sources, enabling safe backfills and eliminating duplicate loads. Built PySpark transformation pipelines on EMR for cleansing, deduplication, enrichment, and SCD Type 2 historical tracking. Owned and operated 20+ production Apache Airflow DAGs with retries, alerts, rerun handling, and SLA monitoring. Designed rerun-safe batch workflows to support historical backfills and partial reprocessing. Monitored daily batch executions, investigated data mismatches and job failures, and performed root cause analysis. Explored Azure data platform capabilities using Azure Databricks and ADLS Gen2 to prototype Spark-based data processing pipelines.
Education
CSMSS College of Engineering
Bachelor of Technology
Electronics and Telecommunication