APAbhishek Patil
@apatil
Full Stack Data Scientist
Bangalore
An experienced Data Scientist with more than four years of hands-on experience in designing, developing, and deploying various machine learning solutions. Proficient in data preprocessing, skilled in Natural Language Processing (NLU), and experienced in Image Processing for tasks like Data Extraction, Classification, and Association using Deep Learning and Machine Learning algorithms.
Experience
Data Scientist
Sapper
Key Information Extraction: Extracted key data (e.g., Account No., Date, and Customer Name) from diverse financial documents for various insurance clients. Achieved a 0.91 F1 score by fine-tuning the LayoutLM model on client data, along with Bert tokenizer. Employed the fine-tuned model for document classification, distinguishing between invoices and claims. Deployed one document classification model and three extraction models via torch serve for different entities. Successfully adapted and redeployed LayoutLM-based solutions for multiple clients with custom needs. Question Answering Bot: Utilized the pre-trained LLaMMa LLM language model. Generated answers and insights related to custom financial data. Incorporated the Langchain framework to build the question answering pipeline.
Consultant for Machine Learning
Genpact
Very Large Document Processing: Developed a tool for extraction of textual data like text, bullet points, and headers from large, unstructured financial PDFs. Removed noise elements such as diagrams and tables to prepare data for downstream NER analysis. Utilized MaskRCNN model, pretrained on PubLayNet, fine-tuned with client data, achieving an impressive 0.87 F1 score. Improved model efficiency by applying model pruning techniques, converting the original H5 model into TFLITE format. This significantly reduced the model size (to 1/3rd, 1/5th, or 1/10th of the original size), striking a balanced blend between accuracy and processing speed. The model was then efficiently deployed and served using the Tensorflow model server. Email segmentation: Designed, developed, and deployed an Email Segmenter that classifies the parts of unstructured email text into Greeting, Body, Signature, and Disclaimer. Predicted email text using Universal Sentence Encoder for word embedding followed by LSTM. Hosted the model using TensorFlow Model Server. Packaged the module as a RESTful API using FastAPI framework and deployed it on AWS ec2 using Docker Container. Built a machine learning model to classify the financial statements using supervised ML algorithm, Random-Forest. Implemented multiple word embedding and tokenization methods before finalizing Word2Vec to be the most effective approach, along with data cleaning, featured engineering and selection.
Education
Imarticus Learning Pvt Ltd.
Post Graduate Program in Data Science
Mumbai University
Bachelor of Engineering
Licenses & Certifications
Post Graduate Program in Data Science
Imarticus Learning Pvt Ltd.