Arkajyoti Chakraborty
@arkajyoti.chakraborty
Machine Learning Intern at John Snow Labs
New Delhi, India
Arkajyoti Chakraborty is a skilled Machine Learning Engineer with experience in Natural Language Understanding (NLU), Computer Vision, and Deep Learning. Expertise includes implementing zero-shot classifiers, developing retrieval-augmented prompting frameworks for LLMs, and building OCR pipelines. Proven ability to apply domain adaptation techniques and contribute to research in areas like fake news detection and activity recognition.
Experience
Computer Vision Intern
Vigilance AI
Worked on activity recognition problem on video-based data, building a model to classify different activities using CNN feature extractor and fine-tuned transformer encoder model (84% accuracy). Developed an abnormal breathing detection pipeline using YOLO for masking and the activity pipeline (79% accuracy).
Deep Learning Research Intern
Bio-metric Research Lab (DTU)
Worked on domain adaptation techniques over fake news data and hypothesized in relation of fake news to emotion features. Studied about gradient reversal method to apply domain adaptation over the cross-data to enhance the accuracy. Worked on two short papers getting accepted at the AAAI’23 student abstract track and ICON’22 short paper track.
Machine Learning Intern
John Snow Labs
Implemented three zero-shot (Bert, Distilbert, Roberta) spark-nlp annotators in the NLU pipeline. Designed test scripts for sequence classifier models (Longformer, Xlnet, Albert, and Debarta) and implemented demo notebooks for zero-shot classifiers. Successfully merged PR’s regarding annotators Bert, Distilbert, Roberta, and Sequence Classifiers for the upcoming updated release of the NLU library.
Applied Research Intern
Tata Consultancy Service (TCS) Research
Research on clarification question generation via retrieval-based prompting large language models (LLMs) on Legal clauses and contracts. Proposed a retrieval-augmented prompting framework designed explicitly for clarification question generation for contracts. Designed and performed experiments on open-source large language models: Vicuna, Alpaca-Lora, and Dolly-V2 over zero-shot and few-shot setups. Paper titled ”Generating Clarification Questions for Disambiguating Contracts” under review at EMNLP’23.
Data Science Intern
Eka.Care
Built a pipeline of custom-trained OCR models from scratch, focusing on lab reports and their accuracy over units, ranges, and numeric values. Tested different models for inference over the edge cases and trained a TrOCR model over 50k images, achieving an average CER of 0.04, outperforming AWS Tesseract.
Education
Delhi Technological University
Bachelor of Technology
Engineering Physiscs