Rehan Ahmad
@rehanahmad
Software Engineer 2 (Data Science) at HighRadius Technologies
Hyderabad, India
Result-driven data scientist with 6 years of experience in the fintech domain. Proven track record of successfully leading and completing complex machine learning projects demonstrating strong analytical and problem-solving skills. Adept at transforming complex data into actionable insights to build solutions that add business value and enhance overall performance.
Experience
Software Engineer 2 (Data Science)
HighRadius Technologies
Leading a team of 6 data scientists to deliver impactful projects. Developed a data parsing solution for financial documents using GenerativeAI (GPT 4), demonstrating strong prompt engineering skills like zero, one and few-shot prompting along with a good understanding of the Generative Configurations. Researched on various open source LLMs like LLaMA 2, FLAN-T5, BLOOM and GPT-J. Developed a good understanding of the Transformer Architecture along with its constituent components and its implementations in the form of Auto-encoding, Auto-regressive and Seq2Seq models. Worked on fine-tuning hyper-parameters of the FLAN-T5 model for NLP tasks. Explored techniques such as Instruction Fine Tuning, Paramater Efficient Fine Tuning (PEFT) using LoRA, QLoRA and Soft Prompts. Fine tuned LLMs using RLHF to produce human friendly outputs making use of techniques like KL Divergence to avoid reward hacking phenomenon. Have basic understanding of LLM model evaluation metrics like ROUGE and BLEU score and benchmarks like Glue, SuperGlue, HELM and MMLU. Have a good understanding of the RAG architecture and its working. Created a classification model with a model performance accuracy of more than 90% to identify the presence of salt and pepper noise in the scanned images of financial documents. Enhancing it further, developed a solution using OpenCV that removes noise from such documents while retaining important text. Added around 8% to the automation of the cash-application product by integrating the noise detection and removal method into the Python web-service framework. Developed and productionized a Deep Learning classification solution using LayoutLM capable of classifying the different types of pages in financial documents with above 90% recall for each class along with a high precision, thus optimizing the downstream processes.
Software Engineer 1 (Data Science)
HighRadius Technologies
Developed and productionized entity extraction solutions to extract important business fields from structured and unstructured financial documents using LayoutLM and BERT for NER.
Associate Software Engineer 2 (Data Science)
HighRadius Technologies
Responsible for building and productionizing a machine learning model which predicts if there is a scope of manual correction in the data captured by the OCR from financial documents, thus, halved the time clients spend on manual exception handling. Developed and productionized a machine learning solution that is capable of identifying the lines containing the relevant business fields in a financial document. This identification helps in optimization of downstream processing tasks. Created monitoring reports and visualizations on a monthly basis to present to the clients the business value added by the solution.
Associate Software Engineer 1 (Data Science)
HighRadius Technologies
Conducted exploratory data analysis to identify patterns and trends in large datasets. Cleaned and manipulated raw data. Performed feature engineering by extracting new features from data to improve model performance. Assisted in developing machine learning algorithms for predictive modeling tasks.
Education
Kalinga Institute of Industrial Technology
Bachelor of Technology
Computer Science And Engineering
Created an AI-based approach to identify and correct OCR capture errors by comparing the similarity of two character images using a unique scoring function. Integrated this solution into the Python web-service framework, contributing around 12% to the automation of cash-application product. Developed a novel algorithm to identify the prevailing pattern(s) in a data set (e.g., a set of invoice numbers, document numbers, etc.), assisting in the extraction of pertinent entities from financial documents and leading to a notable decrease in false positives and a major increase in data capture accuracy. PATENT - US11758071B1: Identification and removal of noise from documents (Granted: Sept,2023). In this innovation, a Machine Learning model was utilized to detect the presence of salt and pepper noise in the scanned images of financial documents, and a solution was implemented to remove the noise while retaining essential information, including characters like periods and commas, which could