Prarthana Shah is an experienced Data Scientist and Big Data Engineer with expertise in designing and implementing advanced machine learning models. Proficient in technologies like Spark, TensorFlow, AWS, and various data engineering tools including Airflow and Docker. Proven ability to work on complex projects involving model drift detection, feature selection, and building low-code deep learning platforms.
Experience
Data Scientist
Fractal
Designed and implemented model drift detection for Streamflux, for batch as well as scheduled (using airflow) drift calculation for streaming data where visualizations for both data and target drift were generated. Wrote APIs and created tables for drift job submission, generating & fetching visualization data. Added support for python custom code transformer, orchestrated in adding support for pyspark & python models in the platform. Added a feature that shows the state for every stage in a pipeline along with its performance logs. Created pipelines for different use cases using the platform like Churn Prediction, Energy Prediction, etc. Worked on adding support for different visualizations for models like sub-population analysis, SHAP, partial dependence chart, heatmaps, etc. Responsible for software maintenance across the backend codebase and code refactoring. Feature for data balancing using SMOTE using LSH in scala.
Big Data Engineer
Zerogons
Worked on adding AutoML to the platform along with hyperparameter tuning. Built feature selection to filter less relevant features using different filters with configurations provided by the users. Integrated different non-spark models like LightGBM, CatBoost, Stacking Classifier, H2O, etc, and wrote custom MLeap transformers to support deployment for these models. Coded for a low-code/no-code deep learning platform, and added the support of feature extraction and transfer learning using tensorflow, websockets, flask, redis, java, python, mlflow, tensorboard and generating visualizations for each CNN like interlayer visualization, SHAP, etc. Worked on a project to run parameterized jobs on the cluster through jupyter notebook using papermill, livy & sparkmagic. Added support for RocksDB as a backend state store for aggregations in streaming data for spark. Carried out POCs to check the feasibility of a feature and then added them like vector disassembler, isolation forest for outlier detection, etc. to the platform.
Education
JSPM College of Engineering, Pune University
Information Technology
Information Technology
Project: Factory surveillance, disaster prediction, and management system using IoT and machine learning where we used various types of sensors with Q-learning reinforcement algorithm with false-positive elimination.
Licenses & Certifications
Introduction to Machine Learning in Production
Coursera
Deep Learning & Machine Learning A-Z: Hands-on Artificial Neural Network
Udemy
Oracle Database 11g PL/SQL Fundamentals I & II
Udemy
Real-time hands-on course on Scala and Spark
Udemy
Docker & Kubernetes: The Complete Guide
Udemy
Google Cloud Platform Fundamentals
Udemy
AWS Concepts
Udemy