Kishan Singh
@kishansingh
Senior Data Engineer at Gspann Technology
Gurgaon, Haryana
Kishan Singh is an experienced Senior Data Engineer with expertise in building and maintaining robust data pipelines. He has extensive experience utilizing technologies like Apache Spark, Hive, AWS (S3, EMR, Redshift), and Kafka to process large datasets and create data lakes. His background includes developing data ingestion frameworks and leading technical efforts across various projects, ensuring high data quality for analytics and ML teams.
Experience
Senior Data Engineer
Gspann Technology
Maintain existing running email campaigns and modify based on requirement in optimal solution using Scala and spark. Used spark to process large Scala data set. Loaded final data to S3 and Redshift to build Datalake for Business and also for ML team to use as their source. Created processed layer using spark SQL to maintain high quality data usable for analytics/data researchers/scientists/ML team. Used Kafka and Spark to Publish messages on Kafka Topic for PN. Create and modify existing pipeline to schedule batch jobs using AWS data pipeline and Crontab. Worked on moving all jobs from Arch (unsecured domain) to Decaf (secure zone). Mentor juniors/ interns in their day to day work to bring them up to speed to work independently on any technology. Developed Unit Test Case in Spark Applications. Log analysis and monitoring using Splunk dashboard. Sonar integration to check code quality. Prepared Test Plan and Testing Strategies for Big Data Application. Collaborated with Clients and developers and prepared test plans & Scripts for producing high quality software applications.
Senior Data Engineer
IRIS software
Develop a POC to understand the processing time between Oracle & Hive & Spark. Created Pipeline to load data from Oracle to Hadoop environment using sqoop or spark and analyze and test result and create audit report on daily basis.
Data Engineer
Intellicus Technologies
Created Data Ingestion framework for different sources like files and relational Databases using spark scala. Created datalake for Kyvos feed using hive/spark. Solution designing for building data pipelines having robust & flexible models so that data can be feed to Kyvos. Built data flow process and data lake management platform. Understand business requirement and underlying data and give one solution using Kyvos at minimum cost and minimum time. Support different workflows (oozie and Control-M), bug fixing, analysis and resolution of job failures. Leading offshore offshore team technically and functionally to understand business requirements from clients and use kyvos to solve those problems.
Data Engineer
Phenom People
Built and maintained data acquisition system from different sources like file system. API, Databases etc. Built data flow process and data lake management platform on HDFS using Hadoop tools like spark and Hive. Created processed layer using spark SQL to maintain high quality data usable for analytics/data researchers/scientists. Developed spark application based on requirement and run on EMR cluster. Understand source data and convert into standard format. Taking release plan and production deployment for every release. Automate DNA Standardisation process to ensure 80-90% Data. Developed, implemented, supported and maintained data analytics protocols, standards and documentation. Communicated new or updated data requirements to global team.
Senior Data Engineer
Datametica Solution Pvt Ltd
Requirements gathering and requirement analysis. Data migration from Oracle/sql/netezza etc using sqoop to create data lake in Hadoop infrastructure. Analyzed source data, worked on data cleansing by means of removing null values, target table structures using hive and pig. Translate complex functional and business requirements into technical requirements. Wrote HIVE scripts and automate into Oozie workflow for business report. Prepared Functional document and Document of understanding for Engineering and Support services folks. Developed, implemented, supported and maintained data analytics protocols, standards and documentation. Project2: Office Depot - Clickstream and Coverage analytics (Data lake creation). Requirements gathering and requirement analysis. Analyzed source data, worked on data cleansing by means of removing null values, target table structures using hive. Created frameworks for DB2 data unload to hdfs, Files (fixed/delimited etc). Created generic framework for data ingestion from different Relational DBs like (Oracle, SqlServer, Netezza )After Data ingestion to Hadoop datalake used hive, impala to analyze data.
Systems Engineer
TCS
Education
TCS
Systems Engineer
SIRT - Bhopal, MP
Bachelor of Engineering
Electrical, Electronics and Communications Engineering