Arjun Gupta
@arjun_gupta
MLE, Applied Science Intern at Stimuler
New Delhi, Delhi, India
Experience
MLE, Applied Science Intern
Stimuler
Building and optimizing the core LLM response generation pipeline for a production voice-first AI tutoring platform serving millions of users. Contributed to the migration from commercial LLM APIs to self-hosted fine-tuned models using Ray Serve, improving system control and reducing dependency on external inference services. Improved real-time LLM inference latency and throughput through batching and distributed serving optimizations for large-scale conversational workloads.
NLP Intern
Neural Nurture
Built high-performance auditing frameworks and distributed evaluation pipelines in PyTorch and Hugging Face for analyzing LLM safety, privacy, and memorization, improving detection reliability by 22% over baseline and runtime by 30% across 7B–70B models. Deployed production-grade inference infrastructure using FastAPI, vLLM, and TensorRT-LLM with asynchronous batching and optimized GPU scheduling, supporting large-scale safety and privacy audits at 200+ queries/sec with 40% lower GPU utilization.
Research Intern
CyPSi Lab, IIC - University of Delhi
Researched active learning for low-resource Indian language datasets; built end-to-end PyTorch pipelines achieving 25% higher label efficiency and 12–18% F1 improvements over random sampling. Extended and benchmarked SOTA active-learning methods (AnchorAL, BADGE ) across vision and Indian NLP datasets, optimizing uncertainty-based selection for faster model improvement.
AI Intern
Indian Institute of Technology Jammu
Built a high-accuracy pipeline to detect PII leaks in network traffic using transformer embeddings refined with triplet loss, followed by a neural classifier; achieved 96.6% test accuracy. Applied quantization, knowledge distillation and pruning to large-scale NLP models, reducing inference time by 38% and model size by 75%. Deployed compressed deep learning models to Android using TensorFlow Lite and ONNX, achieving <100ms inference latency with real-time on-device predictions.
Education
Cluster Innovation Centre, University of Delhi
Bachelor of Technology
Information Technology and Mathematics