Ritwik Raj
@ritwik_raj
Senior Data Engineer - II at MakeMyTrip
Bangalore, Karnataka
Ritwik is a Data Engineer and LLM Specialist with over 5 years of experience in designing, optimizing, and deploying large-scale data systems and AI-driven solutions. He specializes in building scalable pipelines using Spark, Kafka, and cloud technologies (AWS). His expertise includes Generative AI, RAG, and optimizing petabyte-scale data environments to drive business impact.
Experience
Senior Data Engineer - II
MakeMyTrip
Created a real-time stream processing system using Kafka and Spark Streaming, optimizing payment gateways by reducing latency and minimizing failures, increasing the payment success rate from 80% to 85%. Developed an end-to-end synthetic data generation pipeline leveraging LLMs, where the model dynamically generates Python code, reducing manual dataset creation time and improving data quality. Built a scalable data pipeline from scratch using PySpark, SQL, Python, and FastAPI, powering MakeMyTrip’s dynamic flight discount system, leading to a 5% increase in click share through real-time pricing optimization. Created a high-performance Python library for efficient interaction with Aerospike, optimizing large-scale data storage and retrieval, reducing query latency, improving system throughput, and cutting developer effort by 50%. Leveraged AWS services (DMS, S3, EMR, Glue, Athena, Redshift) to streamline big data processing, improving cost efficiency and performance of large-scale data environments.
Software Engineer - Data Platform
Ola Cabs
Built microservices to transfer real-time data from Kafka and MySQL to a data lake (S3), deployed on Kubernetes for scalable and efficient data storage. Led a proof of concept (POC) to deploy Apache Pinot and Trino on a Kubernetes cluster, enabling sub-second query performance on high-throughput Kafka topic data and reducing analytics latency by 50%. Developed an abstract library/API for seamless data ingestion into central Kafka topics, standardizing data flows and improving pipeline efficiency, reducing developer integration efforts by 50% and accelerating deployment timelines.
Cloud Data Engineer
Amazon Web Services
Developed an EMR debugging tool to efficiently analyze and troubleshoot jobs running on EMR clusters, reducing debugging time by 3x and improving system reliability. Designed and implemented an end-to-end data pipeline to transfer data from relational databases to a data lake with upsert support using Apache Iceberg, enhancing data ingestion efficiency and ensuring scalable data management. Experienced with AWS Big Data and Analytics services such as S3, Glue, Redshift, AWS DMS, EMR, Athena, and QuickSight, leveraging them for scalable data processing and visualization.
Education
B.M.S. Institute of Technology
B.E (CSE)
Computer Science
Completed courses in computer science with a specialization in Big data and Engineering.
Licenses & Certifications
AWS Certified Solutions Architect - Associate
AWS