Vishal Grover

@VG0311

Data Engineer at Tata Technologies

India

Tata TechnologiesChitkara University, Punjab

Data Engineer with 3+ years of experience architecting and operating large-scale batch and streaming data platforms processing 500GB–1TB+ per day in production. Strong expertise in Spark (PySpark/SQL), Airflow orchestration, cloud-native data platforms (AWS EMR, S3, Redshift, Glue) and data modeling for analytics and operational workloads. Proven at optimizing distributed pipelines, improving data reliability, implementing observability, and delivering scalable lakehouse-style architectures for business-critical analytics.

Experience

Data Engineer

Tata Technologies

•Jan 2023 - Present•Thane, India

Worked on a cloud-based connected vehicle data platform ingesting high-volume telemetry, engine signals, and operational events supporting analytics, BI, and product systems. Key Impact & Contributions: Architected and operated PySpark + Airflow data pipelines processing 500GB–1TB daily across telemetry, engine health, and event streams from 600,000+ connected vehicles. Built scalable ELT workflows on AWS EMR & S3 lake architecture, delivering analytics-ready datasets for BI, product dashboards, and operational reporting. Implemented batch and streaming-style ingestion patterns integrating Kafka feeds, MongoDB operational data, and cloud object storage into unified data models. Optimized Spark performance using partition tuning, broadcast joins, caching, and execution-plan analysis, reducing pipeline runtimes by ~30% and improving cluster cost efficiency. Designed fact/dimension data models across vehicle performance metrics, event analytics, and subscription lifecycle data — standardizing KPIs and improving cross-team consistency. Refactored complex SQL and Hive transformations to improve query latency and dashboard responsiveness for business stakeholders. Built end-to-end observ

Education

Chitkara University, Punjab

B.Tech

Computer Engineering

•Grade: 9.83/10

Licenses & Certifications

Data Science

Chitkara University

• No expiration

Machine Learning

Simplilearn

• No expiration

NLP

Simplilearn

• No expiration

Skills

Apache Spark

PySpark

Spark SQL

Kafka

Apache Airflow

CI/CD

AWS EMR

EC2

Glue

Lambda

Redshift

Parquet

SQL

Hive

MongoDB

Elasticsearch

Python

Shell Scripting

Data Modelling

ELK Stack

Git

Jenkins

VS Code

Jupyter Notebook