MAHESH PADEKAR

@maheshpadekar

Big Data Engineer

Thane, India

https://www.linkedin.com/in/mahesh-padekar-2890b5127/

SpiderWeb InfoTech PVT. LTDMumbai University

Mahesh Padekar is a Big Data Engineer with over 2 years of experience in Hadoop ecosystems, including HDFS, Spark, Hive, and MapReduce. He specializes in loading and transforming large datasets from RDBMS to HDFS using tools like Sqoop and processing data via Spark Core and Spark SQL. His expertise extends to cloud technologies, including AWS services such as EMR, Glue, and S3, and performance tuning of data warehousing solutions.

Experience

Big Data Developer

SpiderWeb InfoTech PVT. LTD

•Mar 2020 - Present

Maintained day-to-day data in Hive Datawarehouse from different source systems. Wrote Sqoop Jobs to transfer data from SQL Server and Oracle Databases to HDFS. Loaded processed data into Hive External table, creating Partitioning and Bucketing techniques using ORC and Parquet formats to improve performance. Scheduled Sqoop jobs using Airflow. Monitored SQL Database for successful daily ingestion jobs and debugged issues. Ran Analytics Pipeline for aggregating and joining multiple daily ingestion tables. Scheduled jobs and transferred final output data into Redshift for reporting. Created cluster groups in EMR for running daily ingestion and Analytics Pipeline jobs.

Education

Mumbai University

B.E.

Mechanical

Jan 2015

Licenses & Certifications

Big Data

• No expiration

Skills

Hadoop

MapReduce

Linux

Hive

Sqoop

HBase

Spark

Scala

HDFS

SparkSQL

YARN

Oozie

AWS Glue

AWS Athena

AWS S3

Python

Boto3

Airflow

Redshift

ORC

Parquet