Anukool Tiwari
@Anukool
Data Engineer at Modak Analytics
India
Data Engineer with nearly 2 years of experience specializing in the Azure Data Stack. Expert in designing and optimizing scalable data pipelines using PySpark, SQL, Azure Databricks, and Azure Data Factory. Strong experience building end-to-end ETL/ELT pipelines, implementing Delta Lake architectures, and applying CDC (Change Data Capture) for incremental data processing.
Experience
Data Engineer
Modak Analytics
Modernized Architecture (ADF + Databricks) | Client: Humana. Developed and orchestrated ADF pipelines to ingest data into Azure Data Lake Storage (ADLS Gen2) following bronzesilvergold layering. Architected and engineered end-to-end curation pipelines using PySpark, Delta Lake, and Azure Databricks, achieving a 40% data quality gain. Utilized Azure Databricks notebooks to perform complex data transformations, validations, and aggregations. Processed and managed 10M+ records daily using incremental Change Data Capture (CDC) techniques. Led performance optimization initiatives in Databricks by tuning Spark configurations, joins, and transformations, achieving a 50% improvement in data processing speed. Reduced overall pipeline execution time by 30% and enhanced data accuracy and consistency by 40%. Leveraged ADLS Gen2 for staging and intermediate storage, resulting in a 25% reduction in storage costs. Optimized PySpark jobs in Azure Databricks using partitioning, caching, and efficient Spark execution strategies.
Data Engineer
Modak Analytics
StreamSets-based Ingestion | Client: Humana. Designed and implemented ETL pipelines using StreamSets to ingest data from Genesys REST APIs, ensuring reliable batch ingestion. Stored raw ingested data in Google Cloud Storage (GCS) and maintained MongoDB audit collections for ingestion tracking and reconciliation.
Education
GL Bajaj Institute of Technology and Management
Master of Computer Applications (MCA)
Computer Applications
Licenses & Certifications
Microsoft Azure Fundamentals (AZ-900)
Microsoft