Mehul Patil

@mehulpatil

Data Engineer

Pune, Maharashtra

TrendyTech InsightsMIT Academy of Engineering

Data engineer with 3+ years of experience in optimizing data pipelines and delivering large-scale data solutions. He leveraged expertise in ETL processes and databases to reduce data processing time by nearly 40% using Apache Spark. He has collaborated with cross-functional teams to increase data availability and accuracy, driving measurable business value in the vaccine manufacturing industries.

Experience

Internship - Big Data Engineering

TrendyTech Insights

Internship•Mar 2024 - Nov 2024•Bangalore, Maharashtra

Mastered core big data technologies, including MapReduce, HDFS, Hive, and Apache Spark 3.0, with expertise in both low-level resilient distributed datasets (RDD) and high-level APIs. Designed and optimized data pipelines on fully distributed computing clusters by managing data storage solutions and file compressions.

Associate Engineer - Big Data

TIBCO Software Inc.

•Oct 2020 - Nov 2022•Pune, Maharashtra

Implemented data cleaning techniques and data validation tests like schema validation on input data, leading to enhanced data accuracy by ~35%. Redesigned the data pipeline architecture, resulting in a decrease of 40% in processing time and manual data handling tasks. Maximized the storage space by ~22% and data retrieval speed by 3 to 4 minutes using Parquet files. Optimized cluster utilization to 95% by strategically reallocating tasks and tuning partition sizes, core allocation, and memory distribution. Developed and executed hive queries to construct hive tables, extracting valuable analytical insights with 100% accuracy and reliability. Leveraged version control system git, automatic deployment tool Jenkins, unit test framework Pytest, and JIRA for productivity and short release cycles. Partnered with 3 cross-functional teams, including IT support, data science, and business clients, using an agile model to address technical challenges and define project scope. Integrated Spark & hive with TIBCO Statistica 13.0 for analysis and visualization using Livy and ODBC driver. Communicated complex data processes and results to management through detailed reports and presentations. Authored 10+ articles online regarding managing data nodes in TIBCO Statistica.

Education

MIT Academy of Engineering

Btech in Computer Engineering

Computer Engineering (Minor - Data Science And Analytics)

Aug 2016 - Oct 2020•Grade: CGPA: 8.75

Skills

Python

SQL

HDFS

PySpark

Mapreduce

Hive

Microsoft SQL Server

PostgreSQL

Azure Data Factory

Azure Databricks

Azure Synapse Analytics

Pytest

Medallion Architecture

Unity Catalog

Git

Jenkins

Salesforce

Confluence

JIRA

Windows

Linux

G-Suite

Agile methodology