Prajakta Palve
@prajaktapalve
Data Engineer at Capgemini
Pune, Maharashtra, India
Prajakta Palve is a Data Engineer with over 5 years of experience specializing in big data technologies and cloud platforms. She has a proven track record in developing data pipelines using Azure Synapse, PySpark, and Airflow. Her expertise includes Python scripting, Kubernetes management, and data visualization across various industries.
Experience
Data Engineer
Capgemini
Azure Synapse - Developed PySpark and SQL scripts to perform various operations on data stored in Parquet and Delta files, and store the processed results in Hive tables or Delta files. Created data pipelines for these jobs and monitored their execution daily to ensure seamless and efficient operations. Kubernetes - Developed Kubernetes secrets to securely store hidden parameters and configured them for access across all pods, ensuring secure and consistent management of sensitive data. Python scripts - Developed to identify Spark jobs with extended execution times and, in case of failure, automatically generate a text file to log the error message. Additionally, created a script to retrieve CSV files from an Azure Storage container, check for specific parameters across three columns, and generate results based on the defined criteria. Another script was implemented to drop files stored in Azure S3 that are marked as archived, ensuring efficient storage management.
Jr. Data Engineer
Jio Platforms Limited
Airflow - Installed, configured, and monitored Airflow on both bare-metal and Kubernetes environments. Developed jobs to execute tasks, monitor their progress, take appropriate actions, and send alert notifications via email and messaging platforms. Additionally, implemented CI/CD pipelines for seamless DAG deployment. Spark - Developed Spark jobs in Scala to segregate, calculate, and aggregate various types of data. Additionally, created visualizations of the processed data in Tableau for enhanced data analysis and reporting. Elasticsearch - Utilized Elasticsearch to query and retrieve the required data, and visualized the results on Kibana. Designed and developed effective dashboards to facilitate data analysis and insights. Airbyte - Developed custom connectors in Airbyte to integrate and streamline data flow between various systems and applications. Django - Developed a Django-based web interface to display all alerts triggered by Airflow, with functionality to mark alerts as resolved. Data Science - Developed Python scripts to analyze and manipulate data using libraries such as Pandas, NumPy, and Scikit-learn. Additionally, created data visualizations using Matplotlib, ggplot, and Plotly to derive actionable insights.
Education
C-DAC’s Advanced Computing Training School
PG-Diploma
Big Data Analytics
Shri Chhatrapati Shivaji Maharaj College of Engineering, Ahmednagar
B.E
Computer Engineering
Sahyadri Polytechnic Sawarde
Diploma
Computer Engineering
G.N.S.&H.S. School Sawarde, Chiplun, Ratnagiri
SSC
Licenses & Certifications
PG-Diploma in Big Data Analytics
CDAC
PCAP – Python Certified Associate Programmer