Utkarsh is an experienced Data Engineer with expertise in developing robust and reusable ETL pipelines. He has a proven track record of optimizing data processes, including reducing runtime by 80% using Spark and automating workflows with Airflow. His technical skills encompass Big Data frameworks like Apache Spark and Hadoop, alongside proficiency in Python and SQL.
Experience
Data Engineer
IMocha
Reduced manual workload by 33% by generalizing the ETL Pipelines for reusability. Optimized spark-job for daily loads and reduced runtime by 80% i.e. 150 mins to 30 mins. Designed and Implemented Data Validation checks using Spark-SQL in the ETL pipeline to reduce the total development time by 2 days and designed a metric table to store historic validation results in Hive. Optimized data pipeline to avoid failures like Out of Memory Errors. Designed and developed client-specific data validation checks for multiple clients and integrated with Teams and Outlook to send notifications to downstream teams with updates on the quality of data, and raising incidents if it crosses the minimum threshold. Mentored new team members and organized several KT sessions to get them started on project modules. Collaborated with multiple cross-functional teams and automated the solutions from scheduling-based to trigger-based with Airflow to reduce manual efforts in reprocessing or history load. Developed end-to-end robust and reusable ETL pipelines. Worked on various file formats like Parquet, and CSVs.
Business Analyst Intern
Zilingo
Optimized SQL queries to reduce the overall runtime of the queries. Conducted business process analysis and identified critical issues and gaps.
Education
United College of Engineering and Research
B.Tech
Computer Science & Engineering