Aryan Pandey

@7000839272

GSOC 2024 at OpenVINO

Delhi

OpenVINODr. A.P.J Abdul Kalam Technical University

Aryan Pandey is a Machine Learning Engineer with over 1.5 years of experience in reinforcement learning, post-training alignment, and applied agents. He has a background in policy optimization using KL regularization and reward modeling. He has contributed to OpenVINO's inference infrastructure and multi-GPU LLM fine-tuning. Aryan is also active in the research community, having analyzed over 300 papers on RLHF, world models, and distributed training architectures.

Experience

GSOC 2024

OpenVINO

•Jan 2024 - Jan 2024

Developed and optimized inference extensions for Stable Diffusion pipelines using OpenVINO. Implemented PyTorch operator support (e.g., aten::mv) and aligned ONNX Relu-6 behavior with upstream frameworks. Added TensorFlow MatrixSetDiagV3 frontend support, increasing operator coverage. Debugged numerical inconsistencies between training-time PyTorch graphs and production inference graphs. Conducted performance benchmarking and contributed through design discussions and PR reviews.

Front End Developer Intern

Embifi

•Nov 2022 - Feb 2023•Delhi

Developed the front end of a web application using React and JavaScript. Integrated Google Firebase for data management, optimizing cross-platform user experiences. Collaborated with a Git team, enhancing code management and version control practices.

Open Source Contributor

Anarchy-ai

Contributed to LLM fine-tuning pipelines using Hugging Face with multi-GPU support. Assisted in distributed training workflows and memory optimization for large models. Worked on load-balancing and auto-scaling setups for LLM serving in cloud environments (GCP). Improved inference throughput and training stability for downstream applications.

Education

Dr. A.P.J Abdul Kalam Technical University

B.Tech

Computer Science

Jan 2021 - Present

Noida, Delhi

Licenses & Certifications

Deep Learning Specialization

DeepLearning.ai

• No expiration

Stanford Meta Reinforcement Learning

Stanford

• No expiration

LLM agents at UC Berkeley

UC Berkeley

• No expiration

Skills

Reinforcement Learning

MDPs

PPO

SAC

DQN

TRPO

DPO

KL-regularized optimization

reward modeling

Agent & Alignment Systems

Post-training methods

RLHF

reasoning evaluation

model behavior

Distributed & Systems

Multi-GPU training

Hugging Face Accelerate

OpenVINO

ONNX

inference benchmarking

Docker

Kubernetes

Linux

GCP

PyTorch

JAX

Hugging Face Transformers