Prarthana Shah

@prarthanashah

Data Scientist at Fractal

Pune, India

www.linkedin.com/in/prar

FractalJSPM College of Engineering, Pune University

Prarthana Shah is an experienced Data Scientist and Big Data Engineer with expertise in designing and implementing advanced machine learning models. Proficient in technologies like Spark, TensorFlow, AWS, and various data engineering tools including Airflow and Docker. Proven ability to work on complex projects involving model drift detection, feature selection, and building low-code deep learning platforms.

Experience

Data Scientist

Fractal

•Dec 2020 - Present

Designed and implemented model drift detection for Streamflux, for batch as well as scheduled (using airflow) drift calculation for streaming data where visualizations for both data and target drift were generated. Wrote APIs and created tables for drift job submission, generating & fetching visualization data. Added support for python custom code transformer, orchestrated in adding support for pyspark & python models in the platform. Added a feature that shows the state for every stage in a pipeline along with its performance logs. Created pipelines for different use cases using the platform like Churn Prediction, Energy Prediction, etc. Worked on adding support for different visualizations for models like sub-population analysis, SHAP, partial dependence chart, heatmaps, etc. Responsible for software maintenance across the backend codebase and code refactoring. Feature for data balancing using SMOTE using LSH in scala.

Big Data Engineer

Zerogons

•Jun 2019 - Dec 2020

Worked on adding AutoML to the platform along with hyperparameter tuning. Built feature selection to filter less relevant features using different filters with configurations provided by the users. Integrated different non-spark models like LightGBM, CatBoost, Stacking Classifier, H2O, etc, and wrote custom MLeap transformers to support deployment for these models. Coded for a low-code/no-code deep learning platform, and added the support of feature extraction and transfer learning using tensorflow, websockets, flask, redis, java, python, mlflow, tensorboard and generating visualizations for each CNN like interlayer visualization, SHAP, etc. Worked on a project to run parameterized jobs on the cluster through jupyter notebook using papermill, livy & sparkmagic. Added support for RocksDB as a backend state store for aggregations in streaming data for spark. Carried out POCs to check the feasibility of a feature and then added them like vector disassembler, isolation forest for outlier detection, etc. to the platform.

Education

JSPM College of Engineering, Pune University

Information Technology

•Grade: CGPA 8.4

Project: Factory surveillance, disaster prediction, and management system using IoT and machine learning where we used various types of sensors with Q-learning reinforcement algorithm with false-positive elimination.

Licenses & Certifications

Introduction to Machine Learning in Production

Coursera

• No expiration

Deep Learning & Machine Learning A-Z: Hands-on Artificial Neural Network

Udemy

• No expiration

Oracle Database 11g PL/SQL Fundamentals I & II

Udemy

• No expiration

Real-time hands-on course on Scala and Spark

Udemy

• No expiration

Docker & Kubernetes: The Complete Guide

Udemy

• No expiration

Google Cloud Platform Fundamentals

Udemy

• No expiration

AWS Concepts

Udemy

• No expiration

Skills

Python

Scala

C++

Java

Bash

SQL

PL-SQL

Spark

Machine Learning

AutoML

Deep Learning

TensorFlow

Transfer Learning

Image Recognition

Feature Extraction

Data Engineering

Airflow

Jupyter

AWS

GIT

Docker

Testing

Websockets

Livy

Hadoop

REST APIs

Database Management

MySQL

Redis

RocksDB

MongoDB

Oracle

SQLite