Default profile banner
DJ

Deep jethwa

@DeepJethwa

Data Engineer at V4C.ai

Mumbai

https://linkedin.com/in/deep-jethwa-2a00a7240

V4C.aiUniversal College Of Engineering

Deep Jethwa is a Data Engineer with hands-on experience building scalable data pipelines on Databricks using PySpark and Delta Lake in Azure environments. He is skilled in developing automated ETL/ELT workflows and implementing Medallion Architecture to create reliable, analytics-ready datasets. He has experience optimizing Spark workloads and transforming complex raw data into high-quality datasets using Python and SQL for analytics and decision-making.

Experience

Data Engineer

V4C.ai

Remote

Built scalable ETL/ELT pipelines using Databricks and PySpark, enabling efficient ingestion and processing of large-scale datasets. Implemented Lakehouse architecture with Medallion Architecture (Bronze–Silver–Gold) using Delta Lake to deliver reliable, analytics-ready data layers. Developed batch and incremental ingestion pipelines on Azure Data Lake Storage (ADLS Gen2) using Auto Loader to process structured and semi-structured data (CSV, JSON, XML). Implemented data quality checks, validation rules, and data cleansing processes to ensure reliable and consistent datasets. Applied Delta Lake features such as MERGE, schema evolution, and time travel to support reliable data versioning and governance. Optimized Apache Spark workloads using partitioning, caching, and query tuning to improve pipeline performance and execution efficiency.

Data Engineer Intern

eMeasurematics

Mumbai

Built a production-grade ETL data pipeline in Linux-based environments using Python, MongoDB, Airflow, Docker to ingest and process large yard-management datasets. Performed data validation, cleansing, and normalization to ensure high-quality, consistent datasets for downstream analytics. Automated multi-interval data processing (hourly to yearly aggregations), improving workflow scalability and operational visibility. Developed automated data workflows that reduced manual reporting effort by ~80% and improved data delivery efficiency. Designed analytics-ready data models supporting business reporting, forecasting, and downstream analytics.

Education

Universal College Of Engineering

(B.E) Computer science & Honours in Data Science (Minor Specialization)

Computer Science

Jan 2021 - Jan 2025Grade: CGPI: 8.7 GPA: 3.5

Licenses & Certifications

Certified in English Proficiency (C1 Level)

TOEFL by ETS

• No expiration

Build Data Pipelines with Lakeflow Spark Declarative Pipelines

Databricks Academy

• No expiration

Deploy Workloads with Lakeflow Jobs

Databricks Academy

• No expiration

DevOps Essentials for Data Engineering

Databricks Academy

• No expiration

SQL Analytics on Databricks

Databricks Academy

• No expiration

Data Ingestion with Lakeflow Connect

Databricks Academy

• No expiration

The Complete Python Bootcamp: From Zero to Hero in Python

Udemy

• No expiration

Skills

Databricks
PySpark
Apache Spark
Delta Lake
ETL/ELT Pipelines
Medallion Architecture
Python
Pandas
Numpy
SQL
Microsoft Azure
Azure Databricks
Azure Data Factory
ADLS Gen2
Data Modeling
Dimensional Modeling
Data Warehousing Concepts
Apache Airflow
Kafka
Fivetran
MySQL
MongoDB
Git
Docker
Power BI
Excel
API Integration
REST APIs
Postman
SDLC
Hadoop
Linux Shell Scripting