Default profile banner
SM

Shubham Mehar

@mehar_shubham

Data Engineer at Infosys

Indore, Madhya Pradesh, India

InfosysInstitute of Engineering and Technology, DAVV

Results-driven Data Engineer with 2+ years of experience building scalable ETL pipelines, cloud data platforms, and real-time data workflows. Delivered measurable impact at Infosys and Persistent Systems: 35% faster query execution, 20% reduced compute costs, and 15% improved data accuracy. Deep expertise in PySpark, Azure Databricks, Delta Lake, Apache Airflow, and Unity Catalog, with strong focus on data governance and pipeline reliability.

Experience

Data Engineer

Infosys

Feb 2025 - PresentPune, India

Architected a scalable ETL pipeline for Amazon Paid Search data processing millions of records daily, reducing data latency by 40% and enabling real-time business analytics. Developed PySpark and Python transformation workflows to clean, enrich, and standardize large datasets, accelerating downstream consumption by 30%. Automated Silver-to-Gold layer schema conversion via Python scripts in Unity Catalog — handling table/view recreation and permission preservation — cutting manual governance effort by 50%. Refactored legacy SQL workflows to PySpark DataFrames in Databricks, achieving 35% faster query times and 20% lower cloud compute costs. Orchestrated pipeline automation using Apache Airflow, eliminating manual interventions and improving SLA adherence by 25%. Enforced data governance and compliance standards using Azure Databricks, Delta Lake, and Unity Catalog.

Data Engineer

Persistent Systems

Jul 2023 - Dec 2024Nagpur, India

Designed a scalable Azure Data Lake architecture consolidating 1TB+ daily data from customer usage, recharge, and network logs across multiple on-premise sources. Built PySpark ETL pipelines ingesting and partitioning 1TB+ daily CDR data, enabling near-real-time analytics for 5M+ telecom subscribers. Engineered an automated data profiling and quality framework in Python/PySpark, improving data accuracy by 15% and reducing processing time by 25%. Boosted pipeline efficiency by 20% via Spark performance tuning — bucketing, salting, partitioning, and caching strategies.

Education

Institute of Engineering and Technology, DAVV

B.E.

Information Technology

Jan 2019 - Jan 2023Grade: 7.9 / 10

Skills

PySpark
Apache Spark
Pandas
SQL
Python
Azure Databricks
Azure Data Lake
Azure Data Factory
Azure SQL DW
AWS
Delta Lake
Unity Catalog
Data Lakehouse
ETL/ELT Design
Data Warehouse
Apache Airflow
Workflow Automation
SQL Server
MongoDB
PostgreSQL
Git
GitHub
Jira
Unix/Linux
Agile/Scrum
C++