Default profile banner
PS

Prarthana Shah

@prarthanashah

Data Scientist at Fractal

Pune, India

www.linkedin.com/in/prar

FractalJSPM College of Engineering, Pune University

Prarthana Shah is an experienced Data Scientist and Big Data Engineer with expertise in designing and implementing advanced machine learning models. Proficient in technologies like Spark, TensorFlow, AWS, and various data engineering tools including Airflow and Docker. Proven ability to work on complex projects involving model drift detection, feature selection, and building low-code deep learning platforms.

Experience

Data Scientist

Fractal

•Dec 2020 - Present

Designed and implemented model drift detection for Streamflux, for batch as well as scheduled (using airflow) drift calculation for streaming data where visualizations for both data and target drift were generated. Wrote APIs and created tables for drift job submission, generating & fetching visualization data. Added support for python custom code transformer, orchestrated in adding support for pyspark & python models in the platform. Added a feature that shows the state for every stage in a pipeline along with its performance logs. Created pipelines for different use cases using the platform like Churn Prediction, Energy Prediction, etc. Worked on adding support for different visualizations for models like sub-population analysis, SHAP, partial dependence chart, heatmaps, etc. Responsible for software maintenance across the backend codebase and code refactoring. Feature for data balancing using SMOTE using LSH in scala.

Big Data Engineer

Zerogons

•Jun 2019 - Dec 2020

Worked on adding AutoML to the platform along with hyperparameter tuning. Built feature selection to filter less relevant features using different filters with configurations provided by the users. Integrated different non-spark models like LightGBM, CatBoost, Stacking Classifier, H2O, etc, and wrote custom MLeap transformers to support deployment for these models. Coded for a low-code/no-code deep learning platform, and added the support of feature extraction and transfer learning using tensorflow, websockets, flask, redis, java, python, mlflow, tensorboard and generating visualizations for each CNN like interlayer visualization, SHAP, etc. Worked on a project to run parameterized jobs on the cluster through jupyter notebook using papermill, livy & sparkmagic. Added support for RocksDB as a backend state store for aggregations in streaming data for spark. Carried out POCs to check the feasibility of a feature and then added them like vector disassembler, isolation forest for outlier detection, etc. to the platform.

Education

JSPM College of Engineering, Pune University

Information Technology

Information Technology

•Grade: CGPA 8.4

Project: Factory surveillance, disaster prediction, and management system using IoT and machine learning where we used various types of sensors with Q-learning reinforcement algorithm with false-positive elimination.

Licenses & Certifications

Introduction to Machine Learning in Production

Coursera

• No expiration

Deep Learning & Machine Learning A-Z: Hands-on Artificial Neural Network

Udemy

• No expiration

Oracle Database 11g PL/SQL Fundamentals I & II

Udemy

• No expiration

Real-time hands-on course on Scala and Spark

Udemy

• No expiration

Docker & Kubernetes: The Complete Guide

Udemy

• No expiration

Google Cloud Platform Fundamentals

Udemy

• No expiration

AWS Concepts

Udemy

• No expiration

Skills

Python
Scala
C++
Java
Bash
SQL
PL-SQL
Spark
Machine Learning
AutoML
Deep Learning
TensorFlow
Transfer Learning
Image Recognition
Feature Extraction
Data Engineering
Airflow
Jupyter
AWS
GIT
Docker
Testing
Websockets
Livy
Hadoop
REST APIs
Database Management
MySQL
Redis
RocksDB
MongoDB
Oracle
SQLite