yatharth Malhotra
@yatharthm22
Data Engineer 1 at HP Inc.
Bangalore, Karnataka, India
Data Engineer with 3+ years of experience building large-scale distributed data pipelines and analytics platforms using Spark, Databricks, and AWS. Skilled in designing scalable ETL architectures, optimizing distributed workloads, and enabling reliable data platforms for analytics and machine learning. Proven track record of improving system performance, ensuring data governance, and delivering cost-efficient infrastructure at petabyte-scale data environments.
Experience
Data Engineer 1
HP Inc.
Designed and optimized distributed ETL pipelines using PySpark and Databricks to process billions of records weekly across enterprise data platforms. Led lifecycle optimization for 900TB of legal hold datasets, migrating storage from Glacier to S3 Deep Archive after analyzing retention policies and access patterns, reducing long-term infrastructure costs. Identified inefficient ingestion patterns by analyzing downstream access behavior across event streams; coordinated with stakeholders to remove non-critical datasets while maintaining compliance constraints, saving $6.8K/month in storage costs. Optimized high-volume GDPR Data Subject Rights (DSR) processing pipelines by enabling Databricks Photon and improving distributed execution strategies, significantly reducing runtime and compute utilization. Contributed to development of enterprise-scale PII deletion pipelines ensuring GDPR compliance across millions of customer records through automated distributed processing workflows. Built a data observability and monitoring platform with automated error-metric pipelines and Looker dashboards, improving ingestion reliability and accelerating root-cause analysis for data failures. Archit
Education
KIIT University, Bhubaneswar
Bachelor of Technology (B.Tech)
Computer Science and Engineering