Shubham Mehar

@mehar_shubham

Data Engineer at Infosys

Indore, Madhya Pradesh, India

InfosysInstitute of Engineering and Technology, DAVV

Results-driven Data Engineer with 2+ years of experience building scalable ETL pipelines, cloud data platforms, and real-time data workflows. Delivered measurable impact at Infosys and Persistent Systems: 35% faster query execution, 20% reduced compute costs, and 15% improved data accuracy. Deep expertise in PySpark, Azure Databricks, Delta Lake, Apache Airflow, and Unity Catalog, with strong focus on data governance and pipeline reliability.

Experience

Data Engineer

Infosys

•Feb 2025 - Present•Pune, India

Architected a scalable ETL pipeline for Amazon Paid Search data processing millions of records daily, reducing data latency by 40% and enabling real-time business analytics. Developed PySpark and Python transformation workflows to clean, enrich, and standardize large datasets, accelerating downstream consumption by 30%. Automated Silver-to-Gold layer schema conversion via Python scripts in Unity Catalog — handling table/view recreation and permission preservation — cutting manual governance effort by 50%. Refactored legacy SQL workflows to PySpark DataFrames in Databricks, achieving 35% faster query times and 20% lower cloud compute costs. Orchestrated pipeline automation using Apache Airflow, eliminating manual interventions and improving SLA adherence by 25%. Enforced data governance and compliance standards using Azure Databricks, Delta Lake, and Unity Catalog.

Data Engineer

Persistent Systems

•Jul 2023 - Dec 2024•Nagpur, India

Designed a scalable Azure Data Lake architecture consolidating 1TB+ daily data from customer usage, recharge, and network logs across multiple on-premise sources. Built PySpark ETL pipelines ingesting and partitioning 1TB+ daily CDR data, enabling near-real-time analytics for 5M+ telecom subscribers. Engineered an automated data profiling and quality framework in Python/PySpark, improving data accuracy by 15% and reducing processing time by 25%. Boosted pipeline efficiency by 20% via Spark performance tuning — bucketing, salting, partitioning, and caching strategies.

Education

Institute of Engineering and Technology, DAVV

B.E.

Information Technology

Jan 2019 - Jan 2023•Grade: 7.9 / 10

Skills

PySpark

Apache Spark

Pandas

SQL

Python

Azure Databricks

Azure Data Lake

Azure Data Factory

Azure SQL DW

AWS

Delta Lake

Unity Catalog

Data Lakehouse

ETL/ELT Design

Data Warehouse

Apache Airflow

Workflow Automation

SQL Server

MongoDB

PostgreSQL

Git

GitHub

Jira

Unix/Linux

Agile/Scrum

C++