Prasad Pawar

@prasadpawar

Lead Data Scientist • Data Science Consultant

Pune, India

Tata Consultancy ServicesWalchand College of Engineering

Prasad Pawar is a highly skilled data scientist with over 13 years of experience in research and development across Data Science and High-Performance Computing. He possesses deep knowledge in the entire data lifecycle, including data preparation, model building, and production deployment. His expertise spans complex problem-solving, utilizing advanced techniques in Machine Learning, Deep Learning, and NLP, and presenting actionable insights to stakeholders.

Experience

Lead Data Scientist / Data Science Consultant

Tata Consultancy Services

Consultancy/Project•May 2014 - Present

Developed Surveillance for Capital Market and Anomaly Detection solution to identify abuse scenarios in the capital market. Worked on identification and prediction of suspicious entities using classification algorithms, model training, testing, evaluation techniques & optimization. Segmented the dataset based on business logic using Louvain community partition and K-means Clustering algorithms and represented it in the form of network graphs for better analysis. Developed Big data application from scratch implementing business logic using Python, Pyspark, and Pandas, deployed on Kubernetes clusters which helped in time optimized results by fine-tuning spark configurations. Applied various Machine learning techniques such as Topic Modeling, Sentiment & Summarization using BERT, Name Entity Recognition on communication data to get more understanding and relevance of fraudulent activities performed by participants of the capital market. Delivered solution for life science-based domain, "Early detection of cross binders in the drug discovery process" by applying Machine Learning and Deep Learning approach using neural networks, Random Forest, XGBoost algorithms for training and prediction. Developed a memory recommender system for High-Performance Computation applications to increase the availability of resources with a better estimation of memory before execution of the application on HPC clusters to achieve optimum utilization of HPC resources. Optimized OpenFOAM on Intel Xeon-Phi (KNL) by identifying hotspots using Vtune and applying AVX-512 intrinsics. Applied vectorization using SIMD pragmas to enhance the performance. Parallelized and optimized the IRS computation using CUDA-C on Nvidia K20, and using OpenMP and ICC on Intel Xeon Haswell-EP.

Developer/Engineer

KPIT Technologies

Project•Jan 2011 - May 2014

Improved performance of various image processing (ADAS) applications by code optimizations and parallelizing the source code using C, OpenMP, OpenCL, GPGPU, and Linux OS. Designed and implemented auto-parallelization of loops using YUCCA, and redesigned the automatic parallelization module as a Tech Lead. Worked on a project of performance enhancement and did various experiments on OpenMP constructs.

Developer/Engineer

CDAC

Project•Sep 2008 - Jan 2011

Designed and implemented a Disaster recovery module to achieve zero Recovery Point Objective (RPO) and negligible Recovery Time Objective (RTO), including automatic replication of data from DC site to DR site at block level using iSCSI protocol with PITR techniques of PostgreSQL. Patent granted for Method and System for Business Continuity and Disaster Recovery. Published research on Automatic Sequential to Parallel code conversion (S2P tool) and Enterprise Storage Architecture for Optimal Business Continuity.

Education

Walchand College of Engineering

Masters in Computer Engineering

Computer Engineering

COE Osmanabad

Bachelors in Computer Engineering

Computer Engineering

Skills

Python

PySpark

Scala

Machine Learning

Deep Learning

Tensorflow

Keras

Sklearn

Azure essentials

Databricks

Google Cloud Essentials

NLP

Tableau

HPC

Parallelization and Optimization

OpenMP

MPI

GPGPU

Shell Scripts

Linear Regression

Logistic Regression

Decision Tree

Random Forest

XGBoost

SVM

KNN

K-means

Louvain Community Partition

Time Series Forecasting

Text Classification

Topic Modeling using LDA

Sentiment & Summarization using BERT

Named Entity Recognition

CNN

RNN

LSTM

Apache Spark

C++

CUDA

OpenCL

POSIX Threads

Socket Programming