Akash Ghadage

@akashghadage

Azure Data Engineer at Tata Consultancy Services

Pune, India

Tata Consultancy ServicesSanjeevan Engineering and Technology Institute

Akash is a Databricks Certified Azure Data Engineer with over 2.5 years of experience in designing and optimizing high-performance data pipelines. He is proficient in Databricks, Apache Spark, PySpark, Python, and various Azure cloud services, including Azure Data Factory and ADLS. His expertise lies in supporting advanced analytics and AI/ML initiatives for international clients.

Experience

Azure Data Engineer

Tata Consultancy Services

Full-time•Aug 2022 - Present•Pune, India

Played a key role in the development and operations of a data lakehouse built on Azure Databricks to process ERP and non-ERP data, supporting AI/ML and BI use cases. Developed a robust, metadata-driven ETL framework leveraging medallion architecture to process 14K+ tables and terabytes of data using Databricks Autoloader, Delta Lake, Python, PySpark, and Apache Spark. Discovered and implemented 2 key optimization strategies, fine-tuned Spark jobs, resulting in a 30-40% reduction in operational costs and enhanced performance. Transitioned 200+ Databricks workloads from DBX runtime 13.3 LTS to 15.5 LTS, improving performance and stability (Awarded by TCS). Implemented Delta Lake optimization techniques, including partitioning, data skipping, optimize, Z-Ordering, and liquid clustering, reducing query time by 40%. Created Python automation scripts that generate alerts for expiring Personal Access Tokens (PATs), reducing authentication and integration failures by 100% between Databricks and other systems/services. Utilized PySpark to create a recovery procedure that addressed data loss issues, ensuring data integrity and on-time availability. Migrated data and applications from dedicated Azure IaaS/PaaS (Azure Databricks, Azure Data Factory) to a shared Azure environment, boosting processing efficiency by 30% and reducing operational costs by 20%. Engineered scalable data ingestion pipelines in Azure Data Factory to ingest 5 TB of data daily from diverse sources (Azure Blob Storage, Azure Data Lake Storage, Azure SQL, and SharePoint) into ADLS Gen 2, ensuring seamless data flow and high data availability with 99.9% uptime. Created and configured datasets and linked services to ensure seamless data integration across systems and platforms. Restructured and optimized Databricks notebooks using Python, PySpark, and Spark SQL, transforming data across multiple layers, including L0 (raw), L1 (harmonized), and L1+ (semantic), while ensuring data consistency, accuracy, and eff

Education

Sanjeevan Engineering and Technology Institute

B.Tech

Computer Science and Engineering

Jul 2018 - Jul 2022•Grade: CGPA: 9.14/10

Licenses & Certifications

Data Engineer Professional

Databricks

• No expiration

Azure Data Engineer (DP-203)

Microsoft

• No expiration

AZ-900

Microsoft

• No expiration

AI-900

Microsoft

• No expiration

DP-900

Microsoft

• No expiration

Skills

Python

PySpark

SQL

MySQL

Spark SQL

Apache Spark

Databricks

Delta Lake

Performance Tuning & Optimization

Azure Data Factory

Azure Synapse Analytics

Azure Key Vault

Azure SQL

Cosmos DB

ADLS Gen2

Azure DevOps

Git

GitHub

Jira