Mounisha Aripaka

@Mounisha_A

Data Engineer at CATALYST

Hyderabad, Telangana, India

CATALYSTGayatri Vidya Parishad College of Engineering (Autonomous)

Results-oriented and highly skilled Data Engineer with 4 years of experience in designing, implementing, and optimizing data pipelines and workflows. Proficient in working with large datasets, ETL processes, data transformation, and data ingestion across various sources. Expertise in Scala-Spark, PySpark, SQL and working with Cloudera platform. Adept at collaborating with cross-functional teams to deliver data solutions that enhance data accessibility and model training for data scientists. Strong understanding of platform management and service upgrades with hands-on experience in Linux environments. Also have good knowledge on Data Warehousing solution.

Experience

Data Engineer

CATALYST

•Mar 2023 - Present

Created Data engineering pipelines on top of Cloudera Machine Learning and Cloudera Data Engineering products. Created ETL pipelines to automate data movement from multiple sources such as Oracle and APIs to AWS S3 storage using Pyspark. Optimized usage of Cloudera Machine Learning. Optimized and Migrated existing ETL pipelines from Cloudera CML to Cloudera CDE thereby reducing the costs by 70%. Worked on development of key features for the Catalyst tool. Provided the Engineered data to the Data Science team for the model training.

Data Engineer

Modak Analytics LLP

•Feb 2022 - Present•Hyderabad

Designed and implemented ETL pipelines using Scala-Spark for data extraction from various sources (Oracle, SFTP, APIs), cleaning, and loading to AWS S3 and Hive. Collaborated with Data Scientists to process and prepare datasets for model training. Leveraged PySpark to fetch data from AWS S3, developed required features, and load cleaned and transformed data into Oracle. Actively contributed to platform management tasks in Cloudera CML. Automated the process of sending timely reports of data quality via email using SMTP. Reduced the billing costs by 70% by optimizing and migrating the Cloudera CML pipelines to CDE.

Data Engineer

DATALABS

•Feb 2022 - Feb 2023

Worked on ETL processes, handling data ingestions from Oracle, SFTP, APIs to Hive and AWS S3 using Scala Spark. Ensured smooth and efficient data transfer for better business insights. Worked for documentation of the ETL processes to provide an handbook for Clients.

Education

Gayatri Vidya Parishad College of Engineering (Autonomous)

B.Tech

Jan 2018 - Jan 2022•Grade: 9.26 CGPA

Narayana Junior College

Intermediate

Jan 2016 - Jan 2018•Grade: 98.2 %

Little Angels High School

SSC

Jan 2015 - Jan 2016•Grade: 10 CGPA

Skills

Scala

Python

PySpark

Spark

ETL

Cloudera AI

AWS Glue

AWS S3

Oracle SQL

Hive

GitHub

GitHub Actions

Azure DevOps

Cloudera

AWS

Platform Management

Service Upgrades

ETL tool deployments