Default profile banner
SJ

Sumit Jha

@sumitjha

Data Scientist at Digital India Corporation

Greater Noida

Digital India CorporationIIIT Bangalore

Sumit Jha is a Data Scientist holding a PG Diploma in Data Science from IIIT-B. He possesses hands-on experience developing predictive models using supervised and unsupervised learning algorithms. Proficient in Python, SQL, Tableau, and AWS, he specializes in end-to-end MLOps pipelines, data visualization, and advanced analytics to drive data-informed decision-making.

Experience

Data Scientist

Digital India Corporation

Full-timeFeb 2001 - PresentNew Delhi

Developing predictive models on Ministry dataset using supervised learning algorithms such as Linear Regression, Decision Trees, Random Forests, and Support Vector Machines. Trained models on labeled datasets to make accurate predictions on new data. Collaborated with cross-functional teams to gather requirements and transformed raw data into meaningful visualizations that effectively conveyed key insights. Managed and maintained large databases, ensuring data accuracy, integrity, and security through regular backups and data validation checks. Conducted rigorous validation and evaluation of models by employing techniques like cross-validation and performance metrics such as accuracy, precision, recall, and F1-score to assess model effectiveness. Gathering data of all ministry and standardize the Data and Meta Data for interoperability and making data AI ready. Putting all data on common portal for accessible to all ministries for further investigation and analysis purpose.

Senior Executiv

Invalid Date - Jun 2001

EDA and Predictive Modelling (Linear Regression, logistic Regression, Decision Tree & Random Forest) on Media Dataset to provide insightful information with tools Python and Tableau. Worked on structured media Dataset and created a models like linear and Logistic Regression. Created a dashboards using Visualization tool Tableau. Sql, AWS and Mongodb also used for Data analytics. Conducted installation and Licensing of almost 400 NUC Meters to manage big chunk of Channels data. Planned and executed 10 + end to end maintenance drives of Antennas in stipulated time frame. Basic Operations of TELNET, PING and TRACERT. Panel Training for BAR-O-Meter. Working on Watermark Monitoring and Playout Infrastructure.

Associate Manager

Broadcast Audience Research Council India

Full-timeInvalid Date - Jan 2001Gurugram

Developed predictive models using supervised learning algorithms such as Linear Regression, Decision Trees, Random Forests, and Support Vector Machines. Trained models on labeled datasets to make accurate predictions on new data. Unsupervised learning methods including clustering (K-Means) and dimensionality reduction (PCA) to uncover patterns, trends, and insights within large datasets lacking labeled outcomes. Manipulated structured data from various sources, including relational databases and CSV files. Performed data preprocessing, cleaning, and transformation to ensure data quality and consistency for downstream analysis. Engineered relevant features from raw data to improve model performance. Leveraged domain knowledge to create informative and discriminative features for both supervised and unsupervised tasks. Collaborated with team of Data Scientists and Software Engineers, end-to-end MLOps pipelines for deploying machine learning models, ensuring smooth transition from development to production. Created interactive and insightful Tableau dashboards to visualize complex data sets, enabling data-driven decision-making for stakeholders.

Associate

Prime Focus Technologies

Full-timeInvalid Date - Invalid DateMumbai

DATA SCIENCE PROJECTS: Credit EDA Case Study (2 members): Applying EDA in a real business scenario to understand risk analytics in banking and financial services. Bike Sharing (2 Members): Creating a model for shared bike demand based on independent variables to help manipulate business strategy. NGO Clustering (2 Members): Categorizing countries using socio-economic and health factors to suggest NGO focus areas. X Education Lead Scoring Case Study (2 Members): Building a model to assign a lead score to select the most promising leads for conversion. Capstone Project – Credit Card Fraud Detection (2 Members): Developing a machine learning model to detect fraudulent transactions and analyzing the business impact to recommend mitigation strategies.

Education

IIIT Bangalore

PG Diploma in Data Science

Data Science

Jun 2001 - Feb 2001Grade: CGPA 3.48/4

UPTU

B.Tech

Electronics & Communications

Invalid Date - Invalid DateGrade: 71%

CBSE

Class 12th

Grade: 69.6%

CBSE

Class 10th

Grade: 70%

Skills

Python
Pandas
NumPy
Scikit-learn
Seaborn
Matplotlib
Machine Learning
Linear Regression
Logistic Regression
K-means
Hierarchical Clustering
KNN
SVM
Decision Tree
Random Forest
PCA
XGBoost
Hive
SQL
MySQL
NoSQL
Tableau
MS Excel
AWS
Power BI
Supervised Learning
Unsupervised Learning
Exploratory Data Analysis
Inferential Statistics
Hypothesis Testing
A/B Testing
Predictive Analytics
Data Visualization
AWS SageMaker
Deep Learning
Pytorch
Tensorflow
NLP
MLOps
Gen AI