Default profile banner
PS

Prajjual Shrivastava

@prajjualshrivastava

DATA ENGINEER

Pune, India

https://www.linkedin.com/in/prajjuals/

John DeereMadhav Institute of Technology and Science

Passionate data engineer with over 5 years of experience in designing and implementing scalable data solutions that enhance data accessibility and reliability. Adept at collaborating with cross-functional teams to drive data-driven decision-making. Proficient in building solutions on Databricks with Spark, working in multi-cloud environments, and leveraging advanced Data Engineering practices. Skilled in conceptualizing and implementing data pipelines, transforming data into actionable insights.

Experience

Data Engineer

John Deere

Jun 2022 - PresentPune, MH

Developed data pipelines and collaborated with stakeholders to design scalable data architecture, ensuring data quality and integrity across multiple projects. Built solutions in Apache Spark (PySpark & Spark SQL), Databricks workflows and Databricks clusters. Optimized Databricks Workflow to reduce the run duration by 50% by using check-pointing techniques and parallelize the processes. Developed a framework on Azure Synapse to implement SCD Type 2 to maintain the history data in Azure Lakehouse that would work for all tables. Made use of Delta Lake feature like Change Data Capture Feed to load the incremental data. Configure and tune Databricks workflow by choosing the right cluster to be able to efficiently process Shape/ Geometry data. Developed APIs to download the data from AWS S3 at client’s side using Python’s FastAPI package. Developed GitHub Actions to trigger Databricks Workflow using Databricks run job APIs. Experienced in Multi hop Architecture (Medallion Architecture) to process the raw data into Bronze layer, cleaning and preparing data in Silver layer and aggregating the data according to business logics in Gold layer. Developed various reports by using Python’s Matplotlib and Plotly packages that determines the usage of a dealer facing application by enabling Adobe Clickstream data feed to AWS S3 and preparing the data from S3 according to the business reports logic. Design and orchestrate Data pipelines using Apache Airflow. Remediated Data Quality Issues by enabling logging & auditing and fixing the root cause. Utilized GitHub for the source and version control. Worked in environment where Agile Methodology is used to respond to change and deliver value. Used JIRA for issue tracking and monitoring.

Databricks Developer

Accenture Pvt Ltd.

Feb 2021 - May 2022Mumbai, MH

Extract Transform and Load data from multiple file formats to Azure Data Storage services using Azure Data Factory, PySpark, Spark SQL. Ingested, transformed and aggregated the data in the dataframe using PySpark and spark SQL. Automated ETL processes using Databricks Workflows. Provisioned the appropriate cluster estimating the cluster size according to the workload. Mounted Azure storage to Databricks and created mount points that can be accessed through Databricks notebooks using file semantics. Used Azure key-vault to store secret to be used in Databricks notebook using Databricks Secret Scope. Created Pipelines in ADF using Linked Services/Datasets/Pipeline to extract, transform and load data from different sources. Created DDL for tables and executed them to create tables using PySpark and spark SQL. Applied the spark Reader/Writer Dataframe API.

Application Developer Analyst

Accenture Pvt Ltd.

Jun 2019 - Jan 2021Mumbai, MH

Handled E2E Migration of various banking financial applications from IBM DataStage v.8.7 to v.11.7. Improved data efficiency by 30% by automating processing with Python programs. Automated the Database refresh activity in Oracle SQL Developer that syncs up the database of prod and pre-prod environments. Managed and monitored applications, resolving any job or batch failure. Ensured resolution of user incidents within SLA.

Education

Madhav Institute of Technology and Science

Bachelor of Engineering

Aug 2015 - May 2019

Licenses & Certifications

Databricks Certified Data Engineer Professional

Databricks

Issued: Sep 2024

Databricks Certified Data Engineer Associate

Databricks

Issued: Apr 2024

AWS Certified Solution Architect Associate

AWS

Issued: May 2021

Skills

Apache Spark
PySpark
SQL
Python
AWS S3
Azure Data Factory
Databricks
Delta Lake
Apache Airflow
Data Lake
PostgreSQL
CloudFormation
Lambda
ADLS
Synapse
FastAPI
Matplotlib
Plotly
pandas
Seaborn
ETL
Data Warehousing