Vishal Dixit

@vishal_dxt11

Data Engineer at DeHaat

Gurugram, Haryana, India

DeHaatVellore Institute of Technology

Data Engineer experienced in designing and operating batch and near real-time pipelines on AWS using Spark, SQL, and CDC ingestion. Delivered measurable impact — cutting data movement costs by 30% and reducing manual workflow effort by 60–70%. Skilled in ETL/ELT development, lakehouse modeling, and event-driven architectures with a focus on reliability, data quality, and cost efficiency.

Experience

Data Engineer

DeHaat

•Aug 2025 - Present•Gurugram, India

Managed and optimized AWS DMS replication tasks; migrated selected workloads to Zero-ETL pipelines, reducing data movement costs by 30% while maintaining near-real-time availability in Redshift. Built and maintained scalable ETL pipelines processing 5–20 GB/day using AWS Glue (Spark) and Python to load batch and incremental data into S3/Redshift, with monitoring, retries, and freshness validation. Designed data models and ingestion workflows integrating CRM, transactional, and agronomy API data into a unified operational dataset consumed by internal dashboards and business workflows. Built RabbitMQ-based event ingestion services to capture operational transactions with idempotent consumers and retry handling, ensuring reliable and exactly-once processing.

Data Analytics Intern

STMicroelectronics

•Jul 2024 - Jun 2025•Greater Noida, India

Built Python-based data processing pipelines to analyze large-scale SoC design metrics and logs across multiple engineering teams. Automated SoC design workflows (lint, simulation, synthesis) using Python scripting, reducing manual engineering effort by 60–70% and accelerating design iteration cycles. Developed internal analytics dashboards and reporting workflows in Power BI to support engineering decision-making across design and verification teams.

Education

Vellore Institute of Technology

M.Tech

Computer Science (Big Data Analytics)

Aug 2023 - Jun 2025•Grade: 8.9/10

Dr. APJ Abdul Kalam Technical University

B.Tech

Aug 2018 - Jun 2022•Grade: 7.4/10

Licenses & Certifications

IBM Data Engineering Essentials

IBM

• No expiration

Data Pipelines with Airflow & Kafka

Coursera

• No expiration

Big Data with Spark and Hadoop

• No expiration

Skills

Python

SQL

PySpark

Bash

Apache Spark

Databricks

Hadoop

Batch & Incremental Processing

ETL/ELT Pipelines

Kafka

RabbitMQ

CDC Pipelines

Event-Driven Architecture

AWS

EMR

Redshift

RDS

Glue

DMS

Lambda

Athena

IAM

Warehouse Modeling

Lakehouse Architecture

Kappa Architecture

Lambda Architecture

Data Validation

Data Quality

Parquet

Delta Lake

Airflow

Docker

Git

CI/CD

CloudWatch

Power BI