Nupur Kapur
@nupurkapur
Associate Data Scientist at Tatras Data
New Delhi, India
Nupur Kapur is an experienced Data Scientist with expertise in developing advanced AI solutions. Proficient in LLM and Prompt Engineering, they have developed complex systems including chatbot functionalities, recommender engines, and document classification tools. Their background includes building pipelines for audio analysis, sentiment prediction, and knowledge graph enhancement using technologies like TensorFlow, Keras, and MongoDB.
Experience
Associate Data Scientist
Tatras Data
Developed a chatbot system enabling users to inquire about organizational information. Implemented prompt generation and GPT-3.5 model querying for document retrieval based on user queries. Utilized predicted questions, titles, and generative summaries for organizational documents. Transformed responses into embeddings using mini-LM and stored them in Elasticsearch. Engineered a recommender system matching user queries with document attributes for personalized document recommendations. Working on the design and implementation of a recommender system to recommend videos that are similar to the one he is looking at currently. Successfully completed extensive training in LLM and Prompt Engineering. Successfully conducted a POC aimed at enhancing the topic labeling algorithm for document classification. Devised a novel approach by creating a graph based on key-phrases, leveraging cosine similarity for linking them, and applying a community detection algorithm. Identified key communities with the most relevant key-phrases and seamlessly integrated them into the topic labelling process to yield more accurate and meaningful topic assignments. Designed an intent classifier to accurately discern the purpose of various documents within the corpus, such as case studies, product specifications, and support materials. Leveraged LDA embeddings and intent labels during the classifier's training, enabling it to effectively identify the intents of all documents in the dataset. Enhanced creation of knowledge graphs from organisational documents, streamlining the process for large-scale document sets. Implemented batch processing to reduce complexity and efficiently stored knowledge graph nodes in MongoDB.
Junior Data Scientist
Tatras Data
Developed a pipeline to detect similar documents present in the Mongo database causing a reduction in the number of duplicates recommended to the users of client's website by 95% verified by using a comprehensive testing approach. Building and training convolutional networks using Tensorflow and Keras to detect bird calls within audio and associate it with the bird species. Graphed and analysed bird call data distributed over time and space to understand the seasonal variation in bird calls. Developed a pipeline to remove the unwanted noise in the background of the bird call recordings and reduced the Signal to Noise ratio by 80%. Designed a use case for predicting the cryptocurrency market status based on the sentiment analysis conducted on tweets, developed using Dataiku’s text cleaning and AI tools. Presented a use case on pneumonia detection developed with the help of Dataiku data cleaning, augmentation, and AI tools to the Dataiku team.
Junior Data Scientist
Sabudh Foundation
Developed RESTful APIs for a cloud-based digital document management system for educational institutions, E-learning platform, to build a platform for students to learn Data science through courses and projects in an Agile environment utilising a test-driven development approach. Created Asynchronous jobs and scheduled cron jobs for sending email reminders to interns before the assignment submission deadline and sending calendar notifications before webinars. Ground-up development of a chat application used in production for an E-learning platform (app and website) by 200+ interns to engage in peer-to-peer learning and enable them to connect with mentors over the platform. Guided the backend and frontend team of the Farmers app project in the designing, implementation, and deployment of a chat application on their server. Lead 10+ interns in the fellowship program focusing on Machine Learning and Deep Learning and reviewed their assignments and conducted tests to track their progress during the fellowship program. Supervised a team of interns in the fellowship program to build a multi-species classifier in the Bird Call project. Delivered lectures on Machine Learning, EDA, Feature extraction, Audio processing, and feature analysis to interns. Tested interns' proficiency with different topics of Machine Learning and Deep Learning by creating coursework, assignments, and tests.
Data Science Fellowship
Sabudh Foundation
Undergone Professional Development with deep dive in ML. Worked on a Bird Call classification project where I classified bird species based on the audio recordings collected using features like Melspectogram and MFCC and feeding inputs to CNNs.
Education
Dr. Akhilesh Das Gupta Institute of Technology & Management, GGSIP University
BTech
Information Technology