Mehul Patil
@mehulpatil
Data Engineer
Pune, Maharashtra
Data engineer with 3+ years of experience in optimizing data pipelines and delivering large-scale data solutions. He leveraged expertise in ETL processes and databases to reduce data processing time by nearly 40% using Apache Spark. He has collaborated with cross-functional teams to increase data availability and accuracy, driving measurable business value in the vaccine manufacturing industries.
Experience
Internship - Big Data Engineering
TrendyTech Insights
Mastered core big data technologies, including MapReduce, HDFS, Hive, and Apache Spark 3.0, with expertise in both low-level resilient distributed datasets (RDD) and high-level APIs. Designed and optimized data pipelines on fully distributed computing clusters by managing data storage solutions and file compressions.
Associate Engineer - Big Data
TIBCO Software Inc.
Implemented data cleaning techniques and data validation tests like schema validation on input data, leading to enhanced data accuracy by ~35%. Redesigned the data pipeline architecture, resulting in a decrease of 40% in processing time and manual data handling tasks. Maximized the storage space by ~22% and data retrieval speed by 3 to 4 minutes using Parquet files. Optimized cluster utilization to 95% by strategically reallocating tasks and tuning partition sizes, core allocation, and memory distribution. Developed and executed hive queries to construct hive tables, extracting valuable analytical insights with 100% accuracy and reliability. Leveraged version control system git, automatic deployment tool Jenkins, unit test framework Pytest, and JIRA for productivity and short release cycles. Partnered with 3 cross-functional teams, including IT support, data science, and business clients, using an agile model to address technical challenges and define project scope. Integrated Spark & hive with TIBCO Statistica 13.0 for analysis and visualization using Livy and ODBC driver. Communicated complex data processes and results to management through detailed reports and presentations. Authored 10+ articles online regarding managing data nodes in TIBCO Statistica.
Education
MIT Academy of Engineering
Btech in Computer Engineering
Computer Engineering (Minor - Data Science And Analytics)