Saurabh Singh
@saurabhsingh4414
Senior Associate Consultant - Data Scientist at Infosys
Hyderabad, India
Saurabh is an experienced professional skilled in designing and implementing solutions using ML algorithms, NLP, Deep Learning, and Cloud Technologies. He has experience working with BFSI, Hospitality, and various web/mobile products. He is adept at leading data science teams to build efficient, end-to-end solutions.
Experience
Senior Associate Consultant- Data Scientist
Infosys
Led a team of 4 to develop a Computer Vision model for inhouse web based application to extract data from Legal documents. Developed an OCR Engine using CV2 and Pytesseract to process scanned PDFs of legal documents related to eviction, tuition and medical loans. Deployed the OCR Engine as a web application on an EC2 instance, secured with APIGEE and PingFederate to ensure application Is only accessible to authorized users and that the extracted data from documents is secure. Successfully processing over 300 documents daily with 97% accuracy.
Data Scientist
Incedo
Trained a classification model to predict the Mortgage Propensity for fintech firm. Gathered a dataset from US and UK government websites and the data engineering team, and preprocessed it using Sklearn. Evaluated the model using Logistic Regression, XGBoost and Random Forest, and choose random forest as the final model due to higher precession and recall after tuning hyperparameters. The model performed with recall of 99%. Designed Tableau Dashboards by Web Scrapping healthcare data. Scraped state-level data Oncology, Neurology and Ophthalmology from websites provided by the client using python libraries pandas, requests, scrapy and beautifulsoup. Imported the data into CSV files and preprocessed it by removing special characters and imputing missing values. Developed Tableau dashboards for features like population, number of patients, CAGR, Medicines etc.
Machine Learning Engineer
Peoplestrong
Productionised a resume and job description similarity NLP model. Model was built using pypdf2, spacy, NLTK and genism models(doc2vec). Model extracts skills from both documents converts them in tokens, tokens to vectors. Model then calculates the cosine similarity between the vectors of both documents and screens out the candidate with cosine similarity less than 60%. Successfully deployed the model on ALT Recruit website of Peoplestrong by creating a FlaskAPI.
Education
Galgotias University
B.Tech
Computer Science and Engineering