Shoubhit Kumar
@shoubhit
Data Engineer at IBM
Kolkata, West Bengal, India
Data Engineer with 2.5+ years of experience delivering production-grade data pipelines at IBM using PySpark, Databricks, Microsoft Fabric, and SQL. Strong in incremental ETL, Delta Lake optimization, data quality enforcement, and SLA-driven analytics. Experienced in supporting enterprise BI, operational analytics, and AI / GenAI-enabled use cases through reliable, scalable data foundations.
Experience
Data Engineer
IBM
Built and operated PySpark-based ETL pipelines on Databricks handling incremental ingestion, late-arriving data, and historical backfills for enterprise reporting workloads. Implemented Delta Lake MERGE patterns and watermarking logic, reducing full reload dependency and improving pipeline efficiency by 40%. Designed Bronze, Silver, and Gold datasets with enforced schemas and data contracts, enabling consistent consumption across BI and analytics teams. Enforced data validation standards across pipelines covering schema conformity, null handling, and record uniqueness, improving downstream data reliability by up to 98%. Optimized large Delta tables using partitioning and Z-ordering, significantly improving SQL query performance and Power BI refresh latency. Automated ServiceNow to analytics ingestion workflows using Python and SQL, reducing manual SLA validation effort by 80%. Contributed to AI and GenAI enablement by preparing clean, point-in-time datasets for incident clustering, forecasting, and conversational assistants, supporting feature readiness and validation for ML workflows.
Education
Heritage Institute of Technology, Kolkata | MAKAUT
Master of Computer Application (MCA)
Computer Application
Licenses & Certifications
Databricks Certified Associate Developer for Apache Spark 3.0
Databricks
Microsoft Certified Fabric Data Engineer Associate
Microsoft
Microsoft Certified Fabric Analytics Engineer Associate
Microsoft
Google Cloud Digital Leader