Default profile banner
GP

Ghanshyam Prajapati

@Ghanshyam

Data Engineer at Nagarro

Bhopal

NagarroRajiv Gandhi Proudyogiki Vishawavidyalaya

Data Engineer with ~3 years of experience designing and optimizing large-scale ETL pipelines using Python, PySpark, and SQL. Skilled in AWS (S3, Redshift, Lambda), Kedro, and workflow orchestration frameworks to deliver scalable and reliable data solutions. Experienced in implementing data quality checks and automated alerts to ensure high accuracy, and deploying pipelines using Docker and Kubernetes. Proven track record of pipeline migration, workflow automation, and enabling analytics that drive measurable business impact.

Experience

Data Engineer

Nagarro

•Present•Maharashtra, Pune

Deployed onsite at ZS Associates for a US-based leading pharmaceutical client. Built and optimized large-scale ETL pipelines using PySpark, SQL, and AWS cloud services (S3, Redshift, RDS), processing 500GB+ of data daily. Automated 70%+ of data refresh and ingestion workflows, reducing manual effort by 40%. Implemented multi-layer data validation checks to ensure 99%+ data accuracy across pipelines. Improved pipeline reliability by 30% through debugging, workflow optimization, and monitoring. Enhanced workflow efficiency by 35% through optimized PySpark transformations and storage formats. Projects included: 1. Doctor Engagement Optimization System: Developed and scaled production-grade ETL pipelines processing 500GB+ data/day with 99% accuracy. 2. Dataiku-to-Kedro/Argo Pipeline Migration: Led the migration of legacy SQL-based pipelines from Dataiku to Kedro + PySpark, improving scalability and maintainability.

Education

Rajiv Gandhi Proudyogiki Vishawavidyalaya

B.Tech

Computer Science and Engineering

Served as Deputy President of the Student Activity Council for the 2021-2022 academic year.

Licenses & Certifications

Microsoft Certified [AZ-900]: Azure Fundamentals

Microsoft

• No expiration

Skills

PySpark
Python
SQL
AWS Cloud (S3, Redshift, RDS, Lambda)
Dataiku
Kedro Framework
Argo Workflows
GIT
Bitbucket
Grafana
Linux
Java
Jira
Data Processing
Data Modelling
Data Warehouse
Data Integration
Tableau
Microsoft Excel