Utkarsh Singh

@utkarsh_singh

Data Engineer at IMocha

Pune

IMochaUnited College of Engineering and Research

Utkarsh is an experienced Data Engineer with expertise in developing robust and reusable ETL pipelines. He has a proven track record of optimizing data processes, including reducing runtime by 80% using Spark and automating workflows with Airflow. His technical skills encompass Big Data frameworks like Apache Spark and Hadoop, alongside proficiency in Python and SQL.

Experience

Data Engineer

IMocha

•Feb 2021 - Present•Pune

Reduced manual workload by 33% by generalizing the ETL Pipelines for reusability. Optimized spark-job for daily loads and reduced runtime by 80% i.e. 150 mins to 30 mins. Designed and Implemented Data Validation checks using Spark-SQL in the ETL pipeline to reduce the total development time by 2 days and designed a metric table to store historic validation results in Hive. Optimized data pipeline to avoid failures like Out of Memory Errors. Designed and developed client-specific data validation checks for multiple clients and integrated with Teams and Outlook to send notifications to downstream teams with updates on the quality of data, and raising incidents if it crosses the minimum threshold. Mentored new team members and organized several KT sessions to get them started on project modules. Collaborated with multiple cross-functional teams and automated the solutions from scheduling-based to trigger-based with Airflow to reduce manual efforts in reprocessing or history load. Developed end-to-end robust and reusable ETL pipelines. Worked on various file formats like Parquet, and CSVs.

Business Analyst Intern

Zilingo

•Nov 2019 - May 2020

Optimized SQL queries to reduce the overall runtime of the queries. Conducted business process analysis and identified critical issues and gaps.

Education

United College of Engineering and Research

B.Tech

Computer Science & Engineering

Aug 2015 - May 2019•Grade: 7 CGPA

Skills

Apache Spark

Hadoop

Hive

MySQL

HDFS

AWS S3

Apache Airflow

Git

Python

SQL

Data Structures & Algorithms

Version Control

Fast Learner

Collaborative

Active Listener & Communicator

Team Management