Default profile banner
Abhishek PatilAP

Abhishek Patil

@apatil

Full Stack Data Scientist

Bangalore

https://www.linkedin.com/in/abhishek-patil-88736a168/

SapperImarticus Learning Pvt Ltd.

An experienced Data Scientist with more than four years of hands-on experience in designing, developing, and deploying various machine learning solutions. Proficient in data preprocessing, skilled in Natural Language Processing (NLU), and experienced in Image Processing for tasks like Data Extraction, Classification, and Association using Deep Learning and Machine Learning algorithms.

Experience

Data Scientist

Sapper

•Jun 2022 - Present•Bangalore

Key Information Extraction: Extracted key data (e.g., Account No., Date, and Customer Name) from diverse financial documents for various insurance clients. Achieved a 0.91 F1 score by fine-tuning the LayoutLM model on client data, along with Bert tokenizer. Employed the fine-tuned model for document classification, distinguishing between invoices and claims. Deployed one document classification model and three extraction models via torch serve for different entities. Successfully adapted and redeployed LayoutLM-based solutions for multiple clients with custom needs. Question Answering Bot: Utilized the pre-trained LLaMMa LLM language model. Generated answers and insights related to custom financial data. Incorporated the Langchain framework to build the question answering pipeline.

Consultant for Machine Learning

Genpact

•Aug 2019 - Jun 2022•Pune

Very Large Document Processing: Developed a tool for extraction of textual data like text, bullet points, and headers from large, unstructured financial PDFs. Removed noise elements such as diagrams and tables to prepare data for downstream NER analysis. Utilized MaskRCNN model, pretrained on PubLayNet, fine-tuned with client data, achieving an impressive 0.87 F1 score. Improved model efficiency by applying model pruning techniques, converting the original H5 model into TFLITE format. This significantly reduced the model size (to 1/3rd, 1/5th, or 1/10th of the original size), striking a balanced blend between accuracy and processing speed. The model was then efficiently deployed and served using the Tensorflow model server. Email segmentation: Designed, developed, and deployed an Email Segmenter that classifies the parts of unstructured email text into Greeting, Body, Signature, and Disclaimer. Predicted email text using Universal Sentence Encoder for word embedding followed by LSTM. Hosted the model using TensorFlow Model Server. Packaged the module as a RESTful API using FastAPI framework and deployed it on AWS ec2 using Docker Container. Built a machine learning model to classify the financial statements using supervised ML algorithm, Random-Forest. Implemented multiple word embedding and tokenization methods before finalizing Word2Vec to be the most effective approach, along with data cleaning, featured engineering and selection.

Education

Imarticus Learning Pvt Ltd.

Post Graduate Program in Data Science

Jan 2018 - Jan 2019

Mumbai University

Bachelor of Engineering

Jan 2014 - Jan 2018

Licenses & Certifications

Post Graduate Program in Data Science

Imarticus Learning Pvt Ltd.

Issued: Jan 2018

Skills

Python
FastAPI
Flask
SQL
MongoDB
Pandas
Scikit-learn
TensorFlow
Pytorch
Keras
numPy
NLTK
spaCy
Hugging-Face
PDFMiner
ANN
CNN
RNN
LSTM
Encoder-Decoder
Image Classification
Bert
MaskRCNN
LayoutLM
Universal Sentence Encoder
TableNet
NER
Feature Engineering
Exploratory Data Analysis
Regression
Classification
Principal Component Analysis
Optimization Techniques
Linear Regression
Naive Bayes
SVM
Logistics Regression
Decision Tree
Random Forest
Clustering
K-Nearest Neighbour
Tokenization
Embedding
Stemming & Lemmatization
Ngrams
Positional Tagging
Language Modeling
Text-classification
Word2Vec
Fasttext
LLaMMa
Lamini
LangChain
Docker
Kubernetes
Torch-serve
Tensorflow model-serve
Matplotlib
AWS
Git