Default profile banner
SK

Shoubhit Kumar

@shoubhit

Data Engineer at IBM

Kolkata, West Bengal, India

IBMHeritage Institute of Technology, Kolkata | MAKAUT

Data Engineer with 2.5+ years of experience delivering production-grade data pipelines at IBM using PySpark, Databricks, Microsoft Fabric, and SQL. Strong in incremental ETL, Delta Lake optimization, data quality enforcement, and SLA-driven analytics. Experienced in supporting enterprise BI, operational analytics, and AI / GenAI-enabled use cases through reliable, scalable data foundations.

Experience

Data Engineer

IBM

•Sep 2023 - Present•Kolkata

Built and operated PySpark-based ETL pipelines on Databricks handling incremental ingestion, late-arriving data, and historical backfills for enterprise reporting workloads. Implemented Delta Lake MERGE patterns and watermarking logic, reducing full reload dependency and improving pipeline efficiency by 40%. Designed Bronze, Silver, and Gold datasets with enforced schemas and data contracts, enabling consistent consumption across BI and analytics teams. Enforced data validation standards across pipelines covering schema conformity, null handling, and record uniqueness, improving downstream data reliability by up to 98%. Optimized large Delta tables using partitioning and Z-ordering, significantly improving SQL query performance and Power BI refresh latency. Automated ServiceNow to analytics ingestion workflows using Python and SQL, reducing manual SLA validation effort by 80%. Contributed to AI and GenAI enablement by preparing clean, point-in-time datasets for incident clustering, forecasting, and conversational assistants, supporting feature readiness and validation for ML workflows.

Education

Heritage Institute of Technology, Kolkata | MAKAUT

Master of Computer Application (MCA)

Computer Application

Jan 2021 - Jan 2023•Grade: 9.5

Licenses & Certifications

Databricks Certified Associate Developer for Apache Spark 3.0

Databricks

• No expiration

Microsoft Certified Fabric Data Engineer Associate

Microsoft

• No expiration

Microsoft Certified Fabric Analytics Engineer Associate

Microsoft

• No expiration

Google Cloud Digital Leader

Google

• No expiration

Skills

Python
SQL
Java
Scala
PySpark
Spark
Databricks
Microsoft Fabric
Azure Data Factory
Airflow
Azure
AWS
Google Cloud Platform
IBM Cloud
Github (Actions)
Power BI
Looker Studio
IBM Planning Analytics
dimensional modeling
Feature engineering
time-series preparation
clustering
regression
classification
ML data validation
LangChain
Copilot Studio
Azure AI Studio