Default profile banner
AG

Arjun Gupta

@arjun_gupta

MLE, Applied Science Intern at Stimuler

New Delhi, Delhi, India

StimulerCluster Innovation Centre, University of Delhi

Experience

MLE, Applied Science Intern

Stimuler

•Feb 2026 - Present

Building and optimizing the core LLM response generation pipeline for a production voice-first AI tutoring platform serving millions of users. Contributed to the migration from commercial LLM APIs to self-hosted fine-tuned models using Ray Serve, improving system control and reducing dependency on external inference services. Improved real-time LLM inference latency and throughput through batching and distributed serving optimizations for large-scale conversational workloads.

NLP Intern

Neural Nurture

•Jun 2025 - Jan 2026

Built high-performance auditing frameworks and distributed evaluation pipelines in PyTorch and Hugging Face for analyzing LLM safety, privacy, and memorization, improving detection reliability by 22% over baseline and runtime by 30% across 7B–70B models. Deployed production-grade inference infrastructure using FastAPI, vLLM, and TensorRT-LLM with asynchronous batching and optimized GPU scheduling, supporting large-scale safety and privacy audits at 200+ queries/sec with 40% lower GPU utilization.

Research Intern

CyPSi Lab, IIC - University of Delhi

•Mar 2025 - Jun 2025

Researched active learning for low-resource Indian language datasets; built end-to-end PyTorch pipelines achieving 25% higher label efficiency and 12–18% F1 improvements over random sampling. Extended and benchmarked SOTA active-learning methods (AnchorAL, BADGE ) across vision and Indian NLP datasets, optimizing uncertainty-based selection for faster model improvement.

AI Intern

Indian Institute of Technology Jammu

•Jul 2024 - Sep 2024

Built a high-accuracy pipeline to detect PII leaks in network traffic using transformer embeddings refined with triplet loss, followed by a neural classifier; achieved 96.6% test accuracy. Applied quantization, knowledge distillation and pruning to large-scale NLP models, reducing inference time by 38% and model size by 75%. Deployed compressed deep learning models to Android using TensorFlow Lite and ONNX, achieving <100ms inference latency with real-time on-device predictions.

Education

Cluster Innovation Centre, University of Delhi

Bachelor of Technology

Information Technology and Mathematics

Jan 2022 - Present•Grade: 8.91/10.0

Skills

C
C++
Java
Python
MATLAB
R
Bash
SQL
PyTorch
TensorFlow
Keras
Scikit-learn
Transformers
spaCy
NLTK
ONNX
vLLM
TensorRT
TFLite
Docker
Git
Linux
AWS
Locust
OpenTelemetry
Jaeger