Abhishek Patil

@apatil

Full Stack Data Scientist

Bangalore

https://www.linkedin.com/in/abhishek-patil-88736a168/

SapperImarticus Learning Pvt Ltd.

An experienced Data Scientist with more than four years of hands-on experience in designing, developing, and deploying various machine learning solutions. Proficient in data preprocessing, skilled in Natural Language Processing (NLU), and experienced in Image Processing for tasks like Data Extraction, Classification, and Association using Deep Learning and Machine Learning algorithms.

Experience

Data Scientist

Sapper

•Jun 2022 - Present•Bangalore

Key Information Extraction: Extracted key data (e.g., Account No., Date, and Customer Name) from diverse financial documents for various insurance clients. Achieved a 0.91 F1 score by fine-tuning the LayoutLM model on client data, along with Bert tokenizer. Employed the fine-tuned model for document classification, distinguishing between invoices and claims. Deployed one document classification model and three extraction models via torch serve for different entities. Successfully adapted and redeployed LayoutLM-based solutions for multiple clients with custom needs. Question Answering Bot: Utilized the pre-trained LLaMMa LLM language model. Generated answers and insights related to custom financial data. Incorporated the Langchain framework to build the question answering pipeline.

Consultant for Machine Learning

Genpact

•Aug 2019 - Jun 2022•Pune

Very Large Document Processing: Developed a tool for extraction of textual data like text, bullet points, and headers from large, unstructured financial PDFs. Removed noise elements such as diagrams and tables to prepare data for downstream NER analysis. Utilized MaskRCNN model, pretrained on PubLayNet, fine-tuned with client data, achieving an impressive 0.87 F1 score. Improved model efficiency by applying model pruning techniques, converting the original H5 model into TFLITE format. This significantly reduced the model size (to 1/3rd, 1/5th, or 1/10th of the original size), striking a balanced blend between accuracy and processing speed. The model was then efficiently deployed and served using the Tensorflow model server. Email segmentation: Designed, developed, and deployed an Email Segmenter that classifies the parts of unstructured email text into Greeting, Body, Signature, and Disclaimer. Predicted email text using Universal Sentence Encoder for word embedding followed by LSTM. Hosted the model using TensorFlow Model Server. Packaged the module as a RESTful API using FastAPI framework and deployed it on AWS ec2 using Docker Container. Built a machine learning model to classify the financial statements using supervised ML algorithm, Random-Forest. Implemented multiple word embedding and tokenization methods before finalizing Word2Vec to be the most effective approach, along with data cleaning, featured engineering and selection.

Education

Imarticus Learning Pvt Ltd.

Post Graduate Program in Data Science

Jan 2018 - Jan 2019

Mumbai University

Bachelor of Engineering

Jan 2014 - Jan 2018

Licenses & Certifications

Post Graduate Program in Data Science

Imarticus Learning Pvt Ltd.

Issued: Jan 2018

Skills

Python

FastAPI

Flask

SQL

MongoDB

Pandas

Scikit-learn

TensorFlow

Pytorch

Keras

numPy

NLTK

spaCy

Hugging-Face

PDFMiner

ANN

CNN

RNN

LSTM

Encoder-Decoder

Image Classification

Bert

MaskRCNN

LayoutLM

Universal Sentence Encoder

TableNet

NER

Feature Engineering

Exploratory Data Analysis

Regression

Classification

Principal Component Analysis

Optimization Techniques

Linear Regression

Naive Bayes

SVM

Logistics Regression

Decision Tree

Random Forest

Clustering

K-Nearest Neighbour

Tokenization

Embedding

Stemming & Lemmatization

Ngrams

Positional Tagging

Language Modeling

Text-classification

Word2Vec

Fasttext

LLaMMa

Lamini

LangChain

Docker

Kubernetes

Torch-serve

Tensorflow model-serve

Matplotlib

AWS

Git