MANTESH CHOUGULE
@manteshchougule
Big Data Engineer and Data Scientist
Pune, Maharashtra India
Mantesh Chougule is a skilled Big Data Engineer and Data Scientist with experience in monitoring and maintaining Hadoop and Kafka environments. He possesses in-depth knowledge of the Hadoop ecosystem, including Spark, Hive, and Map-Reduce. Furthermore, he has applied data science techniques, building predictive models and developing dashboards using Tableau and machine learning algorithms.
Experience
HADOOP AND KAFKA ADMIN (BIGDATA engineer)
Net Connect Global Pvt Ltd (Client IBM)
Monitoring and maintaining Jobs On IBM Workload Scheduler(Tivoli) And Rerun, Cancel, Hold Jobs as Per Requirements. Monitoring jobs through logs in Hadoop environment. Had in depth knowledge of Hadoop and all components of Hadoop ecosystem eg. Kafka, zookeeper, Hbase, yarn, map-reduce, sqoop,flume,spark,hive,impala,cloudera,cognos,oracle database. Monitoring and maintaining Kafka stream jobs and connectors if failed killing and rerunning them and checking file movements in both landing and archiving and source directories. Monitoring Kafka batches staging and fact table conversion if get vanished then checking logs for solution. Informing about reject count to the developers and loading data again in staging. Finding Hive and Spark Errors in shell. During Jobs showing error in Tivoli. And as per error taking action for different errors. Killing Job Manually If Required in shell and re-scheduling jobs in Cron-tab if require. Complete ELT Moving Files from Source to Target Directories. SFTP of Files After ELT to client location using scp command. Removing and Creating New Directories in Hdfs Using Hadoop and Linux Commands. Fact Count and Cross Validation of Total Files Received and Total Files Loaded On Target in Excel. Failure Tracker Report Filling and Action Taken. Monitoring Reconciliation Report and Reporting Managers About File Count Difference If Exists. Preparing Status Every Hour for Tivoli Jobs as Well as Kafka Streams and Batches. Running hive queries in hive-shell to find fact and staging tables as we all to track running and completed jobs of streams. Had in-depth knowledge of spark and its ecosystem and components eg. Kafka /spark/Hadoop integration and ETL v/s Kafka /spark streaming /ingestion/spark-sql/vCore and executors requirements/yarn complete operations. Running jobs in bag round through putty client shell and reading their updated logs. Killing batches and particular jobs and rerunning them again if failed or found long running.
Education
IEI India
B.Tech
Mechanical Engineering
Simplilearn
Master Program
Data Science and Business Analytics
Licenses & Certifications
Master's Program - Data Science & Business Analytics
Simplilearn Certified in collaboration with IBM