Vishal Grover
@VG0311
Data Engineer at Tata Technologies
India
Data Engineer with 3+ years of experience architecting and operating large-scale batch and streaming data platforms processing 500GB–1TB+ per day in production. Strong expertise in Spark (PySpark/SQL), Airflow orchestration, cloud-native data platforms (AWS EMR, S3, Redshift, Glue) and data modeling for analytics and operational workloads. Proven at optimizing distributed pipelines, improving data reliability, implementing observability, and delivering scalable lakehouse-style architectures for business-critical analytics.
Experience
Data Engineer
Tata Technologies
Worked on a cloud-based connected vehicle data platform ingesting high-volume telemetry, engine signals, and operational events supporting analytics, BI, and product systems. Key Impact & Contributions: Architected and operated PySpark + Airflow data pipelines processing 500GB–1TB daily across telemetry, engine health, and event streams from 600,000+ connected vehicles. Built scalable ELT workflows on AWS EMR & S3 lake architecture, delivering analytics-ready datasets for BI, product dashboards, and operational reporting. Implemented batch and streaming-style ingestion patterns integrating Kafka feeds, MongoDB operational data, and cloud object storage into unified data models. Optimized Spark performance using partition tuning, broadcast joins, caching, and execution-plan analysis, reducing pipeline runtimes by ~30% and improving cluster cost efficiency. Designed fact/dimension data models across vehicle performance metrics, event analytics, and subscription lifecycle data — standardizing KPIs and improving cross-team consistency. Refactored complex SQL and Hive transformations to improve query latency and dashboard responsiveness for business stakeholders. Built end-to-end observ
Education
Chitkara University, Punjab
B.Tech
Computer Engineering
Licenses & Certifications
Data Science
Chitkara University
Machine Learning
Simplilearn
NLP
Simplilearn