Kishan Singh

@kishansingh

Senior Data Engineer at Gspann Technology

Gurgaon, Haryana

Gspann TechnologyTCS

Kishan Singh is an experienced Senior Data Engineer with expertise in building and maintaining robust data pipelines. He has extensive experience utilizing technologies like Apache Spark, Hive, AWS (S3, EMR, Redshift), and Kafka to process large datasets and create data lakes. His background includes developing data ingestion frameworks and leading technical efforts across various projects, ensuring high data quality for analytics and ML teams.

Experience

Senior Data Engineer

Gspann Technology

•Mar 2020 - Present•Gurgaon, Haryana

Maintain existing running email campaigns and modify based on requirement in optimal solution using Scala and spark. Used spark to process large Scala data set. Loaded final data to S3 and Redshift to build Datalake for Business and also for ML team to use as their source. Created processed layer using spark SQL to maintain high quality data usable for analytics/data researchers/scientists/ML team. Used Kafka and Spark to Publish messages on Kafka Topic for PN. Create and modify existing pipeline to schedule batch jobs using AWS data pipeline and Crontab. Worked on moving all jobs from Arch (unsecured domain) to Decaf (secure zone). Mentor juniors/ interns in their day to day work to bring them up to speed to work independently on any technology. Developed Unit Test Case in Spark Applications. Log analysis and monitoring using Splunk dashboard. Sonar integration to check code quality. Prepared Test Plan and Testing Strategies for Big Data Application. Collaborated with Clients and developers and prepared test plans & Scripts for producing high quality software applications.

Senior Data Engineer

IRIS software

•Mar 2019 - Mar 2020•Haryana

Develop a POC to understand the processing time between Oracle & Hive & Spark. Created Pipeline to load data from Oracle to Hadoop environment using sqoop or spark and analyze and test result and create audit report on daily basis.

Data Engineer

Intellicus Technologies

•Jun 2018 - Mar 2019•UP

Created Data Ingestion framework for different sources like files and relational Databases using spark scala. Created datalake for Kyvos feed using hive/spark. Solution designing for building data pipelines having robust & flexible models so that data can be feed to Kyvos. Built data flow process and data lake management platform. Understand business requirement and underlying data and give one solution using Kyvos at minimum cost and minimum time. Support different workflows (oozie and Control-M), bug fixing, analysis and resolution of job failures. Leading offshore offshore team technically and functionally to understand business requirements from clients and use kyvos to solve those problems.

Data Engineer

Phenom People

•Aug 2017 - Jun 2018•Hyderabad, Telangana

Built and maintained data acquisition system from different sources like file system. API, Databases etc. Built data flow process and data lake management platform on HDFS using Hadoop tools like spark and Hive. Created processed layer using spark SQL to maintain high quality data usable for analytics/data researchers/scientists. Developed spark application based on requirement and run on EMR cluster. Understand source data and convert into standard format. Taking release plan and production deployment for every release. Automate DNA Standardisation process to ensure 80-90% Data. Developed, implemented, supported and maintained data analytics protocols, standards and documentation. Communicated new or updated data requirements to global team.

Senior Data Engineer

Datametica Solution Pvt Ltd

•Jan 2015 - May 2017•Pune, Maharashtra

Requirements gathering and requirement analysis. Data migration from Oracle/sql/netezza etc using sqoop to create data lake in Hadoop infrastructure. Analyzed source data, worked on data cleansing by means of removing null values, target table structures using hive and pig. Translate complex functional and business requirements into technical requirements. Wrote HIVE scripts and automate into Oozie workflow for business report. Prepared Functional document and Document of understanding for Engineering and Support services folks. Developed, implemented, supported and maintained data analytics protocols, standards and documentation. Project2: Office Depot - Clickstream and Coverage analytics (Data lake creation). Requirements gathering and requirement analysis. Analyzed source data, worked on data cleansing by means of removing null values, target table structures using hive. Created frameworks for DB2 data unload to hdfs, Files (fixed/delimited etc). Created generic framework for data ingestion from different Relational DBs like (Oracle, SqlServer, Netezza )After Data ingestion to Hadoop datalake used hive, impala to analyze data.

Systems Engineer

TCS

•Mar 2012 - Jan 2015•Pune, Maharashtra

Education

TCS

Systems Engineer

Mar 2012 - Jan 2015

SIRT - Bhopal, MP

Bachelor of Engineering

Electrical, Electronics and Communications Engineering

Jul 2007 - Jun 2011

Skills

Hadoop

SQL

Apache Spark

Hive

Sqoop

Impala

No SQL (Hbase, MongoDB)

Kafka

AWS (S3, EC2, EMR, Redshift, Data Pipleine, SNS)

Azure

Scala

Python

Shell Script

GCP

CI-CD (jenkins, Github)

Data analysis

Leadership