yatharth Malhotra

@yatharthm22

Data Engineer 1 at HP Inc.

Bangalore, Karnataka, India

HP Inc.KIIT University, Bhubaneswar

Data Engineer with 3+ years of experience building large-scale distributed data pipelines and analytics platforms using Spark, Databricks, and AWS. Skilled in designing scalable ETL architectures, optimizing distributed workloads, and enabling reliable data platforms for analytics and machine learning. Proven track record of improving system performance, ensuring data governance, and delivering cost-efficient infrastructure at petabyte-scale data environments.

Experience

Data Engineer 1

HP Inc.

•Jul 2023 - Present•Bengaluru, India

Designed and optimized distributed ETL pipelines using PySpark and Databricks to process billions of records weekly across enterprise data platforms. Led lifecycle optimization for 900TB of legal hold datasets, migrating storage from Glacier to S3 Deep Archive after analyzing retention policies and access patterns, reducing long-term infrastructure costs. Identified inefficient ingestion patterns by analyzing downstream access behavior across event streams; coordinated with stakeholders to remove non-critical datasets while maintaining compliance constraints, saving $6.8K/month in storage costs. Optimized high-volume GDPR Data Subject Rights (DSR) processing pipelines by enabling Databricks Photon and improving distributed execution strategies, significantly reducing runtime and compute utilization. Contributed to development of enterprise-scale PII deletion pipelines ensuring GDPR compliance across millions of customer records through automated distributed processing workflows. Built a data observability and monitoring platform with automated error-metric pipelines and Looker dashboards, improving ingestion reliability and accelerating root-cause analysis for data failures. Archit

Education

KIIT University, Bhubaneswar

Bachelor of Technology (B.Tech)

Computer Science and Engineering

Skills

Python

SQL

Spark

Apache Spark

Databricks

Delta Lake

ETL/ELT Pipelines

Data Modeling

Distributed Processing

AWS

Secret Manager

IAM

Lambda

Glacier

Deep Archive

ElastiCache

OpenSearch

Data Lakes

Data Warehouses

Looker

PowerBI

DataDog

Data Governance

GDPR Compliance

Data Reliability

CI/CD

Agile Development