Sukanya Banerjee
@SukanyaBanerjee
Azure Data Engineer
Kolkata, West Bengal, India
Data Engineer with 3.5 years of experience in designing and building scalable data pipelines on the Microsoft Azure platform. Experienced in Azure Data Factory, Azure Databricks, ADLS Gen2, and Delta Lake for building reliable ETL/ELT workflows. Skilled in Python, SQL, and PySpark for large-scale data transformation and processing. Proven ability to implement Medallion Architecture, incremental data pipelines, and Spark optimizations to support analytics and reporting workloads.
Experience
Senior Software Engineer
Capgemini
Designed and developed ADF pipelines to ingest data from multiple sources (CSV, JSON, SQL DB, REST APIs) into Azure Data Lake Storage Gen2, enabling scalable data ingestion. Implemented Medallion Architecture (Bronze, Silver, Gold) using Azure Databricks and ADLS Gen2, improving data pipeline efficiency and enabling 50% faster data access for downstream analytics. Built PySpark-based transformation pipelines in Azure Databricks to cleanse, transform, and load large datasets into Delta tables. Developed incremental data ingestion pipelines using Databricks Auto Loader and Delta Lake MERGE operations, ensuring efficient processing of new and updated records. Optimized slow-running Databricks jobs using repartitioning, caching, and efficient Spark transformations, reducing pipeline runtime from 1 hour to 30 minutes. Improved storage and query performance by implementing partitioning and compression strategies in ADLS Gen2 and Delta tables, reducing storage usage by 30%. Monitored and resolved pipeline issues, achieving a 99.9% uptime and improving the team's SLA compliance by 15%. Collaborated with cross-functional teams, aligning data solutions with business objectives and enhancing
Education
Maulana Abul Kalam Azad University Of Technology
MSC
Computer Science
Licenses & Certifications
Microsoft Certified: Azure Fundamentals (AZ-900)
Microsoft