Ghanshyam Prajapati
@Ghanshyam
Data Engineer at Nagarro
Bhopal
Data Engineer with ~3 years of experience designing and optimizing large-scale ETL pipelines using Python, PySpark, and SQL. Skilled in AWS (S3, Redshift, Lambda), Kedro, and workflow orchestration frameworks to deliver scalable and reliable data solutions. Experienced in implementing data quality checks and automated alerts to ensure high accuracy, and deploying pipelines using Docker and Kubernetes. Proven track record of pipeline migration, workflow automation, and enabling analytics that drive measurable business impact.
Experience
Data Engineer
Nagarro
Deployed onsite at ZS Associates for a US-based leading pharmaceutical client. Built and optimized large-scale ETL pipelines using PySpark, SQL, and AWS cloud services (S3, Redshift, RDS), processing 500GB+ of data daily. Automated 70%+ of data refresh and ingestion workflows, reducing manual effort by 40%. Implemented multi-layer data validation checks to ensure 99%+ data accuracy across pipelines. Improved pipeline reliability by 30% through debugging, workflow optimization, and monitoring. Enhanced workflow efficiency by 35% through optimized PySpark transformations and storage formats. Projects included: 1. Doctor Engagement Optimization System: Developed and scaled production-grade ETL pipelines processing 500GB+ data/day with 99% accuracy. 2. Dataiku-to-Kedro/Argo Pipeline Migration: Led the migration of legacy SQL-based pipelines from Dataiku to Kedro + PySpark, improving scalability and maintainability.
Education
Rajiv Gandhi Proudyogiki Vishawavidyalaya
B.Tech
Computer Science and Engineering
Served as Deputy President of the Student Activity Council for the 2021-2022 academic year.
Licenses & Certifications
Microsoft Certified [AZ-900]: Azure Fundamentals
Microsoft