Suhas Suresh

@suhassuresh

Lead Data Engineer

Bengaluru, IN

Ernst & YoungUniversity of Alabama in Huntsville

Suhas Suresh has a demonstrated history of evaluating systems and developing innovative processes to optimize information flow and storage. He has a track record of effective collaboration with cross-functional teams to align data strategies with business requirements. He is established in designing, deploying, and maintaining Big Data, MSSQL, Oracle, and NoSQL infrastructures supporting large volume, complex data transactions.

Experience

Lead Data Engineer

Ernst & Young

•Sep 2020 - Present•Bengaluru - IN

Spearheading a team of energetic junior data engineers handling all the Data Ingestion requirements, carrying out effort estimation, framework design and mentoring the overall project as a Data Ingestion SME for a Pharmaceutical client enabling data driven decision making on their clinical trials. Developed a generic ETL framework which is capable of ingesting data from disparate data sources (JDBC/Rest API/Sharepoint) to s3 datalake driven by configurations in metastore (MySQL RDS) and Airflow variables. Addition of new sources was completely driven configurations in RDS and Airflow with zero code change leveraging AWS cloud services and Spark for implementation increasing the bandwidth of the team by a great margin. Authored the whole development of a dynamic Airflow DAG creation framework which was completely driven by configurations captured in Airflow variables and MySQL RDS(metastore) with zero coding. Designed a generic framework to implement transient EMR to balance the data loads which enabled dynamic cluster creation on demand by just updating cluster config variables.

Senior Data Engineer

Quaero

•May 2019 - Sep 2020•Bengaluru - IN

Developed a robust Data Lake on Hive with HDFS mounted on AWS S3 for one of the pioneer OTT platform clients which provisioned seamless data flow from different data sources providing 360 unified customer view enabling Data science, Analytics and Marketing teams to make profound business decisions. Designed a new generic class of workflow packages using Python script which can extract data from any external DB to any file system (External DBs like MySQL, SQL Server, Mongo DB File Systems: S3, UNC, SFTP). Successfully migrated on premise Hive database with HDFS to Snowflake on AWS cloud mounted on S3 by designing PySpark scripts on Zeppelin Notebook. Designed flattening mechanism using Pyspark to flatten json format files into tabular structure before having them staged to Hive tables. Designed data pipelines using PySpark on EMR cluster to automate data operations to load data from various file formats to Hive stored as Parquet files on S3 storage which in turn consumed by Sisense (BI tool) for visualization. Designed and automated EMR Cluster creation and termination scripts in Python along with Bootstrapping actions to install modules during cluster creation reducing AWS billing cost by 20%. Implemented Autoscaling policy on EMR cluster using Python to scale up and down to spawn Spot Instances instead of OnDemand based on the volume of data loads which improved the efficiency of workflows by 35%. Setup Cloudwatch alarms and SNS alerts for EMR cluster and S3 storage monitoring helping us improve clusters efficiency by significant margin.

Database Engineer

Moxie Marketing Services LLC

•Nov 2017 - May 2019•Bengaluru - IN

Data Driven Digital-first advertising and CRM agency. Transitioned over one-hundred custom legacy surround code routines to SSIS packages and SSRS reports. Automated deployment of millions of email programs for Verizon CRM developing SSIS packages and stored procedures to perform standard ETL process minimizing manual intervention, ensuring high productivity and quality of the framework. Completely authored, redesigned and automated the process of producing Parquet files by consuming the log files present in S3 bucket using Crawlers and AWS Glue which proved to be a steppingstone project to get more business. Automated the ETL process using Airflow Python writing DAGs'(Directed Acyclic Graph) to ingest data into and export data out of Google Big Query data sets into Salesforce Marketing Cloud to deploy emails.

Application Developer

Oracle Financial Services Software Ltd

•Oct 2012 - May 2015•Bengaluru - IN

Oracle Automated Testing Suite: Built Oracles first endeavor in automation testing; a milestone project that reduced incident count by 75% and increased efficiency of fixes by 35%. Oracle FLEXCUBE for Microfinance: Dramatically increased customer base by 15%; created Microfinance module where loans are sanctioned to a group of people. Supported End-of-Day operations for branches during integration testing and resolved real-time issues throughout implementation. Developed grace period functionality for Islamic Banks which proved a precursor to Oracles venture into Islamic Banking.

Education

University of Alabama in Huntsville

Master's Degree In Computer Science

Jul 2015 - Aug 2017•Grade: 4.0 C.G.P.A

Visveswaraya Technological University

Bachelor of Computer Science and Engineering

Sep 2008 - Apr 2012•Grade: 79.62%

Licenses & Certifications

Hands On Essentials - Data Warehouse

Snowflake

Issued: Mar 2022• No expiration

Skills

AWS

Python

PySpark

Airflow

SQL

ETL

Software development lifecycle