Vicky Pandhare

@Vicky

Data Engineer at Starlite Infotech Limited

Aurangabad, Maharashtra, India

Starlite Infotech LimitedCSMSS College of Engineering

Data Engineer with 2+ years of experience building and operating scalable batch data pipelines using Python, PySpark, Apache Spark, and Apache Airflow. Experienced in AWS data lake architectures using Amazon S3, EMR, and Redshift with Bronze, Silver, and Gold data layers and incremental ingestion from relational Data sources. Focused on reliable ETL pipeline development, data quality, schema evolution, and production pipeline monitoring with backfills and SLA management. Familiar with Azure data platform tools including Azure Databricks and ADLS Gen2 for Medallion-based data lake architectures.

Experience

Data Engineer

Starlite Infotech Limited

•Nov 2023 - Present

Designed end-to-end batch data pipeline architecture integrating MySQL source systems, AWS S3 data lake layers (Bronze, Silver, Gold), PySpark transformations on EMR, and curated datasets in Amazon Redshift for analytics consumption. Implemented timestamp-based incremental ingestion from MySQL sources, enabling safe backfills and eliminating duplicate loads. Built PySpark transformation pipelines on EMR for cleansing, deduplication, enrichment, and SCD Type 2 historical tracking. Owned and operated 20+ production Apache Airflow DAGs with retries, alerts, rerun handling, and SLA monitoring. Designed rerun-safe batch workflows to support historical backfills and partial reprocessing. Monitored daily batch executions, investigated data mismatches and job failures, and performed root cause analysis. Explored Azure data platform capabilities using Azure Databricks and ADLS Gen2 to prototype Spark-based data processing pipelines.

Education

CSMSS College of Engineering

Bachelor of Technology

Electronics and Telecommunication

Skills

AWS (S3, EMR, Redshift, Glue, Athena)

Azure Databricks

ADLS Gen2

Python

SQL

PySpark

Apache Spark

Apache Airflow

ETL/ELT Pipeline Development

Data Lake Architecture

Data Warehousing

MySQL

Amazon Redshift

Data Quality Assurance

Schema Evolution

SCD Type 2

DAG Development

Pipeline Orchestration

SLA Monitoring