Default profile banner
AS

Antaripa Saha

@antaripasaha

Applied Scientist at DataFacade

Agartala, India

linkedin.com/in/antaripa-saha

DataFacadeNational Institute of Technology Agartala

Antaripa Saha is an Applied Scientist with experience in developing advanced Generative AI and ML pipelines. Expertise includes building RAG systems, optimizing LLMs, and developing recommendation engines for news platforms. He has a strong background in NLP, including creating NER models for Indic languages, and is proficient in Python, SQL, and various deep learning frameworks.

Experience

Applied Scientist

DataFacade

Full-time•Jan 2023 - Present•Remote

Worked on the core ML pipeline of the platform involving data collection, preprocessing, and training pipelines for LLM. Optimized and engineered prompts efficiently, building prompt flow and A/B testing pipelines. Built hybrid search model (keyword and semantic) and production-ready RAG pipeline. Designed end-to-end pipeline for multiple features, including unit testing and API delivery.

Associate Machine Learning Engineer

VerSe Innovation - DailyHunt

Full-time•Sep 2021 - Dec 2022•Bengaluru

Built candidate generator model for recommendation using ALS. Developed automated pipelines for news article enrichment, creating workflows and DAGs. Worked on page diversity recommendation logic. Created in-house named entity recognizer using CRF model for 11 Indic languages. Leveraged knowledge graph from DBpedia and wikidata for entity linking and named entity disambiguation.

ML Intern

Feynman

Internship•Apr 2021 - Jul 2021•Remote

Developed a job recommendation system matching JDs with candidate resumes and vice versa. Created custom NER using spaCy to extract major candidate information from resumes.

Education

National Institute of Technology Agartala

Bachelor of Technology

Electrical Engineering

Jul 2018 - Jun 2022•Grade: GPA: 8.75

Skills

Python
SQL
ScikitLearn
SpaCy
Pandas
Numpy
Pytorch
Flask
Langchain
LlamaIndex
PySpark
Airflow
GCP
Huggingface
Snorkel
Postman
Streamlit
Pinecone
ChromaDB
Tesseract OCR
NLTk
BeautifulSoup
ElasticSearch
Node.js
React
Puppeteer