Default profile banner
DK

DEEPAK KUMAR

@deepakkumar5704

Senior Analyst at EY

Kolkata, India

EYInstitute of Engineering & Management

Deepak Kumar is an experienced Data Engineer and Senior Analyst with expertise in building robust data platforms. He has significant experience optimizing ETL frameworks using tools like Snowflake, Azure DataBricks, and PySpark. His background includes developing cloud-agnostic ETL solutions, migrating on-prem systems to AWS, and implementing CI/CD pipelines using Azure DevOps.

Experience

Senior Analyst

EY

Full-time•Jul 2022 - Present•Kolkata, India

Developed an Easy, configurable Generic XML parser Notebook for Upstream Service in Azure DataBricks(PySpark), reducing manual effort to infer schema and flatten StructType data. Optimized the existing ETL framework for parallel & deadlock-free data load in snowflake, reducing load time by 70%. Developed CI Pipeline for Snowflake (DWH) in Azure-DevOps, including SQL Syntax validation, Meta Data validation, and Deploying to remote Control-m Server. Developed & Automated the Actuarial Report Generation by analyzing and translating business requirements into SQL for 8 different Data Sources. Developed a Python library to generate custom reconciliation control values by parsing 2300(CSV, EXCEL) files through pandas, decreasing overall parsing time to around 7 minutes through Python multiprocessing module.

Data Engineer

Tata Consultancy Services

Full-time•Aug 2020 - Jul 2022•Kolkata, India

Contributed to Python-based cloud Agnostic ETL(Batch) framework for snowflake(target) which loaded data from 12 source systems (S3, SQL-Server, Profisee) into modeled Data-Vault tables. Optimized the framework by limiting parallel connections, reducing the overall batch Load time by 50%, and developed features like checkpoints/restart-ability and automated notification process for data anomaly. Independently analyzed & modeled business requirements into a Data Warehouse using DataVault-2.0, creating Data-Marts for reporting. Involved in migrating the onPrem ETL framework into AWS (server-less stack: AWS s3, Batch, CloudWatch, Fargate), resulting in cost savings for the customer.

Education

Institute of Engineering & Management

B.Tech.

Computer Science

Jul 2016 - Jul 2020•Grade: 7.97/10

Skills

Python
SQL
Bash Scripting
Apache Spark
Pandas
Boto3
Ninja Build
Data vault 2.0
Data Lake
Batch ETL Pipelines
Data Cleansing
Data Extraction
Metadata Layer
Data Quality rules
Data Mart Design & Development
Snowflake
SnowPipe
SnowSQL
AWS Lambda
AWS Api-gateway
Git
Azure Devops
Amazon S3
Amazon Batch
CloudWatch
Control M
Azure Databricks
SchemaChange