Default profile banner
US

Utkarsh Singh

@utkarsh_singh

Data Engineer at IMocha

Pune

UtkarshSingh

IMochaUnited College of Engineering and Research

Utkarsh is an experienced Data Engineer with expertise in developing robust and reusable ETL pipelines. He has a proven track record of optimizing data processes, including reducing runtime by 80% using Spark and automating workflows with Airflow. His technical skills encompass Big Data frameworks like Apache Spark and Hadoop, alongside proficiency in Python and SQL.

Experience

Data Engineer

IMocha

Feb 2021 - PresentPune

Reduced manual workload by 33% by generalizing the ETL Pipelines for reusability. Optimized spark-job for daily loads and reduced runtime by 80% i.e. 150 mins to 30 mins. Designed and Implemented Data Validation checks using Spark-SQL in the ETL pipeline to reduce the total development time by 2 days and designed a metric table to store historic validation results in Hive. Optimized data pipeline to avoid failures like Out of Memory Errors. Designed and developed client-specific data validation checks for multiple clients and integrated with Teams and Outlook to send notifications to downstream teams with updates on the quality of data, and raising incidents if it crosses the minimum threshold. Mentored new team members and organized several KT sessions to get them started on project modules. Collaborated with multiple cross-functional teams and automated the solutions from scheduling-based to trigger-based with Airflow to reduce manual efforts in reprocessing or history load. Developed end-to-end robust and reusable ETL pipelines. Worked on various file formats like Parquet, and CSVs.

Business Analyst Intern

Zilingo

Nov 2019 - May 2020

Optimized SQL queries to reduce the overall runtime of the queries. Conducted business process analysis and identified critical issues and gaps.

Education

United College of Engineering and Research

B.Tech

Computer Science & Engineering

Aug 2015 - May 2019Grade: 7 CGPA

Skills

Apache Spark
Hadoop
Hive
MySQL
HDFS
AWS S3
Apache Airflow
Git
Python
SQL
Data Structures & Algorithms
Version Control
Fast Learner
Collaborative
Active Listener & Communicator
Team Management