Default profile banner
SB

Salman Baqri

@salmanbaqri

Data Scientist at United Health Group

Noida

linkedin.com/in/salman-baqri-b8b74b59/

United Health GroupChennai Mathematical Institute

Salman Baqri is an experienced Data Scientist specializing in advanced NLP and Machine Learning solutions. He has significant experience building and deploying complex models, including BERT and T5, for tasks like multi-class classification and intent recognition, particularly in the healthcare domain. His background includes optimizing pipelines and developing ML models using techniques such as Mixed Integer Programming and various clustering methods. He is proficient in Python, SQL, and various deep learning frameworks.

Experience

Data Scientist

United Health Group

Full-timeJun 2021 - PresentNoida

Part of Advance Research and Analytics (ARA) team and working on various NLP R&D projects. Built 2 Multi-Class Classification Model using BERT with 50+ class labels which made into production. Explored SOTA techniques for Class Imbalance like DiceLoss, FocalLoss, Data Augmentation using Large Language Models and Back Translation. Built Intent Classification Model using T5 for Multi-Turn Dialogue Chat Conversations. Achieved F1-score of 0.7 for 50+ labels. Trained BERT from scratch using MLM on 640k chat conversations to incorporate Healthcare domain knowledge in the model. Experience in training various Transformer models like BERT, RoBERTa, DistilBERT etc on multiple GPU’s. Exposure of working on Flask and Docker for deployment of Deep Learning Models. Working on various POC’s to predict Sentiment from voice data and convert voice calls to textual data.

Intern(NLP)

IBM Research

InternshipMay 2020 - Jan 2021Remote

Implemented Topic Modelling techniques such as LSI, LDA for automatic labelling of JAVA clusters. Improved Model Performance by using Inter-Class Usage file along with POS tagging and LDA Mallet. Implemented Topic Coherence for evaluation and to generate optimal number of topics. Built Ensemble Topic Model using LDA and BERT for generating Topics from source code and k-means for clustering.

Associate Technology L2

Publicis Sapient

Full-timeApr 2016 - Jul 2019Gurgaon, Haryana

Implemented an Optimization Model to optimize Pipeline flow for one of the largest energy infrastructure companies in North America. Optimization Model was further deployed using Docker and Microsoft Azure. Built a Classification model using Logistic Regression for one of the largest energy infrastructure companies in North America. Built a Machine Learning model using SVM/XGBOOST/Logistic Regression to classify helpdesk tickets into categories such as IT, Finance, Payroll etc. Provide monthly recommendations to Fortune 500 Oil Integrated Company such as Vessel Plan, TradePlan, Pipeline Plan for their Oil Refinery using Mixed Integer Programming model. Built various Visualization Dashboards using TIBCO SPOTFIRE to present recommendations to the client. Built a Python Application to validate various Business rules. Application validated data based on SQL queries and custom code written in Python. Application provided the flexibility to add new business rules. Added over 110+ business rules and reduced manual effort from 48hrs to 2 hrs. Built a Python Utility to monitor Optimization model runs on different servers at regular intervals and send consolidated status as e-mail. Built a Scheduling Application in VBA to track Vessels for their on-going voyages.

Test Engineer

Broadcom

Full-timeJul 2015 - Mar 2016Bangalore, Karnataka

Managed monthly Test Release Cycles that involved execution of test suites. Wrote test scripts for new test cases in Python.

Education

Chennai Mathematical Institute

Master of Science in Data Science

Data Science

Aug 2019 - May 2021Grade: CGPA: 8.67

JSS Noida

B.Tech in Computer Science

Computer Science

Aug 2011 - May 2015Grade: Percentage: 71.2

Bishop Conrad School

Higher Secondary

Jan 2010 - Jan 2010Grade: Percentage: 84.4

Bishop Conrad School

High School

Jan 2008 - Jan 2008Grade: Percentage: 86.8

Skills

Python
SQL
R
Java
VBA
NLP
Deep Learning
BERT
T5
Topic Modeling
Docker
Microsoft Azure
PyTorch
scikit-learn
XGBoost
Optimization Modeling
Flask
Spotfire