Default profile banner
PP

Prajakta Palve

@prajaktapalve

Data Engineer at Capgemini

Pune, Maharashtra, India

https://www.linkedin.com/in/prajakta-palve-a198a1157

CapgeminiC-DAC’s Advanced Computing Training School

Prajakta Palve is a Data Engineer with over 5 years of experience specializing in big data technologies and cloud platforms. She has a proven track record in developing data pipelines using Azure Synapse, PySpark, and Airflow. Her expertise includes Python scripting, Kubernetes management, and data visualization across various industries.

Experience

Data Engineer

Capgemini

May 2022 - PresentPune

Azure Synapse - Developed PySpark and SQL scripts to perform various operations on data stored in Parquet and Delta files, and store the processed results in Hive tables or Delta files. Created data pipelines for these jobs and monitored their execution daily to ensure seamless and efficient operations. Kubernetes - Developed Kubernetes secrets to securely store hidden parameters and configured them for access across all pods, ensuring secure and consistent management of sensitive data. Python scripts - Developed to identify Spark jobs with extended execution times and, in case of failure, automatically generate a text file to log the error message. Additionally, created a script to retrieve CSV files from an Azure Storage container, check for specific parameters across three columns, and generate results based on the defined criteria. Another script was implemented to drop files stored in Azure S3 that are marked as archived, ensuring efficient storage management.

Jr. Data Engineer

Jio Platforms Limited

Jun 2019 - Apr 2022Navi Mumbai

Airflow - Installed, configured, and monitored Airflow on both bare-metal and Kubernetes environments. Developed jobs to execute tasks, monitor their progress, take appropriate actions, and send alert notifications via email and messaging platforms. Additionally, implemented CI/CD pipelines for seamless DAG deployment. Spark - Developed Spark jobs in Scala to segregate, calculate, and aggregate various types of data. Additionally, created visualizations of the processed data in Tableau for enhanced data analysis and reporting. Elasticsearch - Utilized Elasticsearch to query and retrieve the required data, and visualized the results on Kibana. Designed and developed effective dashboards to facilitate data analysis and insights. Airbyte - Developed custom connectors in Airbyte to integrate and streamline data flow between various systems and applications. Django - Developed a Django-based web interface to display all alerts triggered by Airflow, with functionality to mark alerts as resolved. Data Science - Developed Python scripts to analyze and manipulate data using libraries such as Pandas, NumPy, and Scikit-learn. Additionally, created data visualizations using Matplotlib, ggplot, and Plotly to derive actionable insights.

Education

C-DAC’s Advanced Computing Training School

PG-Diploma

Big Data Analytics

Aug 2018 - Feb 2019Grade: 68.88%

Shri Chhatrapati Shivaji Maharaj College of Engineering, Ahmednagar

B.E

Computer Engineering

Jun 2015 - May 2018Grade: 65.66%

Sahyadri Polytechnic Sawarde

Diploma

Computer Engineering

Aug 2012 - May 2015Grade: 73.58%

G.N.S.&H.S. School Sawarde, Chiplun, Ratnagiri

SSC

Jun 2011 - Jun 2012Grade: 86.55%

Licenses & Certifications

PG-Diploma in Big Data Analytics

CDAC

PCAP – Python Certified Associate Programmer

Skills

Python
Scala
SQL
PostgreSQL
NoSQL databases
MongoDB
Cassandra
Spark
Pyspark
Spark-scala
Hadoop
Hive
Airflow
Airbyte
MS Azure Synapse
AWS
Tableau
PowerBI
CI/CD
Git
Jira
Attention to details
Problem-Solving
Collaboration & communication