Shoubhit Kumar

@shoubhit

Data Engineer at IBM

Kolkata, West Bengal, India

IBMHeritage Institute of Technology, Kolkata | MAKAUT

Data Engineer with 2.5+ years of experience delivering production-grade data pipelines at IBM using PySpark, Databricks, Microsoft Fabric, and SQL. Strong in incremental ETL, Delta Lake optimization, data quality enforcement, and SLA-driven analytics. Experienced in supporting enterprise BI, operational analytics, and AI / GenAI-enabled use cases through reliable, scalable data foundations.

Experience

Data Engineer

IBM

•Sep 2023 - Present•Kolkata

Built and operated PySpark-based ETL pipelines on Databricks handling incremental ingestion, late-arriving data, and historical backfills for enterprise reporting workloads. Implemented Delta Lake MERGE patterns and watermarking logic, reducing full reload dependency and improving pipeline efficiency by 40%. Designed Bronze, Silver, and Gold datasets with enforced schemas and data contracts, enabling consistent consumption across BI and analytics teams. Enforced data validation standards across pipelines covering schema conformity, null handling, and record uniqueness, improving downstream data reliability by up to 98%. Optimized large Delta tables using partitioning and Z-ordering, significantly improving SQL query performance and Power BI refresh latency. Automated ServiceNow to analytics ingestion workflows using Python and SQL, reducing manual SLA validation effort by 80%. Contributed to AI and GenAI enablement by preparing clean, point-in-time datasets for incident clustering, forecasting, and conversational assistants, supporting feature readiness and validation for ML workflows.

Education

Heritage Institute of Technology, Kolkata | MAKAUT

Master of Computer Application (MCA)

Computer Application

Jan 2021 - Jan 2023•Grade: 9.5

Licenses & Certifications

Databricks Certified Associate Developer for Apache Spark 3.0

Databricks

• No expiration

Microsoft Certified Fabric Data Engineer Associate

Microsoft

• No expiration

Microsoft Certified Fabric Analytics Engineer Associate

Microsoft

• No expiration

Google Cloud Digital Leader

Google

• No expiration

Skills

Python

SQL

Java

Scala

PySpark

Spark

Databricks

Microsoft Fabric

Azure Data Factory

Airflow

Azure

AWS

Google Cloud Platform

IBM Cloud

Github (Actions)

Power BI

Looker Studio

IBM Planning Analytics

dimensional modeling

Feature engineering

time-series preparation

clustering

regression

classification

ML data validation

LangChain

Copilot Studio

Azure AI Studio