Default profile banner
Vicky PandhareVP

Vicky Pandhare

@Vicky

Data Engineer at Starlite Infotech Limited

Aurangabad, Maharashtra, India

Starlite Infotech LimitedCSMSS College of Engineering

Data Engineer with 2+ years of experience building and operating scalable batch data pipelines using Python, PySpark, Apache Spark, and Apache Airflow. Experienced in AWS data lake architectures using Amazon S3, EMR, and Redshift with Bronze, Silver, and Gold data layers and incremental ingestion from relational Data sources. Focused on reliable ETL pipeline development, data quality, schema evolution, and production pipeline monitoring with backfills and SLA management. Familiar with Azure data platform tools including Azure Databricks and ADLS Gen2 for Medallion-based data lake architectures.

Experience

Data Engineer

Starlite Infotech Limited

Nov 2023 - Present

Designed end-to-end batch data pipeline architecture integrating MySQL source systems, AWS S3 data lake layers (Bronze, Silver, Gold), PySpark transformations on EMR, and curated datasets in Amazon Redshift for analytics consumption. Implemented timestamp-based incremental ingestion from MySQL sources, enabling safe backfills and eliminating duplicate loads. Built PySpark transformation pipelines on EMR for cleansing, deduplication, enrichment, and SCD Type 2 historical tracking. Owned and operated 20+ production Apache Airflow DAGs with retries, alerts, rerun handling, and SLA monitoring. Designed rerun-safe batch workflows to support historical backfills and partial reprocessing. Monitored daily batch executions, investigated data mismatches and job failures, and performed root cause analysis. Explored Azure data platform capabilities using Azure Databricks and ADLS Gen2 to prototype Spark-based data processing pipelines.

Education

CSMSS College of Engineering

Bachelor of Technology

Electronics and Telecommunication

Skills

AWS (S3, EMR, Redshift, Glue, Athena)
Azure Databricks
ADLS Gen2
Python
SQL
PySpark
Apache Spark
Apache Airflow
ETL/ELT Pipeline Development
Data Lake Architecture
Data Warehousing
MySQL
Amazon Redshift
Data Quality Assurance
Schema Evolution
SCD Type 2
DAG Development
Pipeline Orchestration
SLA Monitoring