Salman Baqri
@salmanbaqri
Data Scientist at United Health Group
Noida
Salman Baqri is an experienced Data Scientist specializing in advanced NLP and Machine Learning solutions. He has significant experience building and deploying complex models, including BERT and T5, for tasks like multi-class classification and intent recognition, particularly in the healthcare domain. His background includes optimizing pipelines and developing ML models using techniques such as Mixed Integer Programming and various clustering methods. He is proficient in Python, SQL, and various deep learning frameworks.
Experience
Data Scientist
United Health Group
Part of Advance Research and Analytics (ARA) team and working on various NLP R&D projects. Built 2 Multi-Class Classification Model using BERT with 50+ class labels which made into production. Explored SOTA techniques for Class Imbalance like DiceLoss, FocalLoss, Data Augmentation using Large Language Models and Back Translation. Built Intent Classification Model using T5 for Multi-Turn Dialogue Chat Conversations. Achieved F1-score of 0.7 for 50+ labels. Trained BERT from scratch using MLM on 640k chat conversations to incorporate Healthcare domain knowledge in the model. Experience in training various Transformer models like BERT, RoBERTa, DistilBERT etc on multiple GPU’s. Exposure of working on Flask and Docker for deployment of Deep Learning Models. Working on various POC’s to predict Sentiment from voice data and convert voice calls to textual data.
Intern(NLP)
IBM Research
Implemented Topic Modelling techniques such as LSI, LDA for automatic labelling of JAVA clusters. Improved Model Performance by using Inter-Class Usage file along with POS tagging and LDA Mallet. Implemented Topic Coherence for evaluation and to generate optimal number of topics. Built Ensemble Topic Model using LDA and BERT for generating Topics from source code and k-means for clustering.
Associate Technology L2
Publicis Sapient
Implemented an Optimization Model to optimize Pipeline flow for one of the largest energy infrastructure companies in North America. Optimization Model was further deployed using Docker and Microsoft Azure. Built a Classification model using Logistic Regression for one of the largest energy infrastructure companies in North America. Built a Machine Learning model using SVM/XGBOOST/Logistic Regression to classify helpdesk tickets into categories such as IT, Finance, Payroll etc. Provide monthly recommendations to Fortune 500 Oil Integrated Company such as Vessel Plan, TradePlan, Pipeline Plan for their Oil Refinery using Mixed Integer Programming model. Built various Visualization Dashboards using TIBCO SPOTFIRE to present recommendations to the client. Built a Python Application to validate various Business rules. Application validated data based on SQL queries and custom code written in Python. Application provided the flexibility to add new business rules. Added over 110+ business rules and reduced manual effort from 48hrs to 2 hrs. Built a Python Utility to monitor Optimization model runs on different servers at regular intervals and send consolidated status as e-mail. Built a Scheduling Application in VBA to track Vessels for their on-going voyages.
Test Engineer
Broadcom
Managed monthly Test Release Cycles that involved execution of test suites. Wrote test scripts for new test cases in Python.
Education
Chennai Mathematical Institute
Master of Science in Data Science
Data Science
JSS Noida
B.Tech in Computer Science
Computer Science
Bishop Conrad School
Higher Secondary
Bishop Conrad School
High School