AGAjay Gandhi
@ajaygandhi131
Data Engineer at Deloitte
Noida, Uttar Pradesh, India
Data Engineer at Deloitte USI with close to 3 years of experience in data engineering, analytics, strategy consulting & data governance. Proficient in extracting business insights from data, building scalable data pipelines and supporting strategic decision making. Strong foundation in cross-functional stakeholder engagement. Passionate about leveraging analytics and AI to solve real-world problems and drive business impact.
Experience
Data Engineer
Deloitte
Data Engineering (Databricks + Pyspark + SQL): Areas of responsibility: 1) Moody’s: (ID Management Platform on Databricks integrated with AWS) • Developed an ID Management platform to streamline the data from different domains sourced from multiple legacy and acquisition data assets under a single umbrella managed through LUHN’s algorithm based unique ID generation to ensure data security and integrity. The solution enabled concurrent user requests and batch requests processing for large volumes of data ensuring highest uptime and minimal latency, as compared to previous state. • Created a data model (snowflake schema with SCD2) having facts, dimensions and history tables aligning with business MDM needs for ID management platform, capable of handling Full Load (Day 0 load) and incremental/delta loads (insert, update, delete). Enforced constraints and relationships to ensure single version of truth. • Enforced data governance, auditing capabilities, data archival/versioning as per business requirements. • Created STTMs (Source to Target Mapping) documents aligned with data model enabling developers and business stakeholders to understand data lineage and transformations. • Implemented the STTM and ETL business logic using pyspark through Databricks using Medallion architecture as a best practise. • Triggered and scheduled databricks jobs using S3 events and lambda functions. • Production readiness check through multiple rounds of testing (Unit Testing, User Acceptance Testing) in QA environments. 2) J&J: (Supply Chain Data Orchestration/Cloud Migration to Databricks with Azure) • Migrated the supply chain data from Pivotal Gemfire to Databricks to achieve minimal latency, better SQL support, and meeting SLAs, and orchestrated it using Databricks workflows and Unity Catalog resulting in quicker and accurate data loads as compared to legacy architecture. • Developed pyspark equivalent of Java code notebooks for handling the ETL using the Medallion architecture and lakehouse on Azure • Leveraged AI tools like Github copilot, and automation scripts to generate and automate code conversion from Java to Pyspark • Developed a dynamic file parsing script for extracting meaningful insights from text files using regular expressions for analysis, which automated the insight gathering and analysis for team of developers and business users. • Automated the creation and loading of tables in Unity Catalog by reading a static set of files in ADLS across Development, Production and QA environments. • Assisted and participated in CI/CD and Git Integration with Databricks across environments (Dev, QA and Prod) to maintain consistency. • Implemented SCD2 logic for J&J resulting in near real time data freshness and change data capture. Areas of responsibility: Data Stewardship, Data Governance, MDM & Strategy Consulting: 1) Zayo Group: (Data Governance & MDM Enablement) • Implemented single version of truth data framework and future roadmap for DG & MDM capabilities based on industry standards and best practises. • Built robust DG & MDM strategy that improved management, quality, consistency and governance across different data assets. Skills: • Strategy consulting for data governance enablement • Microsoft PowerPoint, Microsoft Word, Microsoft Excel • Python, Spark, Databricks, SQL, Azure Data Factory (ADF), CI/CD • Documentation for client deliverables Key Certifications: • Databricks Fundamentals • AZ 900: Azure Fundamentals • DP 100: Azure Data Scientist Associate • AI 900: Azure AI Fundamentals • Conducted deep dive stakeholder interviews across functions like Orders, Operations, Finance, IT Shared services and corporate support teams to understand the current state and maturity of DG & MDM to assess the gaps and suggest a future roadmap. • Identified the Critical Data Elements (CDEs) and Business Ready Datasets (BRDs) for the data assets. • Suggested and documented the DG Roles and Responsibilities through DG Op Model and RACI Matrix. • Suggested and documented the DG Policies and Processes, RBAC mechanisms and industry best practises for regulatory compliance, privacy and security of data assets. 2) Kroger Inc: (Data Governance & Data Cataloging using Alation) • Mapped L2 data flows, user activities, business workflows associated with promotional data. • Recorded risks and pain points with current state starting from contract creation till forecast planning. • Built Lineage and Data dictionary to document dependencies of data assets.
Intern
KPMG
KPMG – Internship Trainee (Jan 2023 – June 2023) • Assisted the internal solution development within the organization. • Developed a web-scrapping tool which can fetch updates from a pool of 200 websites and store them in an excel database. This helped automate the process of fetching latest information and linked the respective update/notification received with a direct URL. • Developed a Data Sampling tool, which can sample data as per user’s requirement. It had 3 features, to sample categorical data, to sample numerical data and to sample selection-based data. This helped the cross-functional teams to collaborate and do a random check on the data quality/sanity.
Education
VIT Vellore
BTECH
ECE
Licenses & Certifications
Microsoft Azure AI Fundamentals AI 900
Microsoft Azure
Microsoft Azure Certified Data Scientist DP 100
Microsoft Azure
Data bricks Data Engineering Fundamentals
Databricks
AZ 900 Azure Fundamentals
Microsoft Azure