ANKIT BARSAINYA
@ankitbarsainya
Data Engineer
Pune
Ankit Barsainya is a Data Engineer with over 12 years of experience building and enhancing data products. He specializes in solving complex data challenges related to scalability, real-time, and point-in-time analytics. His expertise includes migrating and pipelining existing data processing frameworks to ensure future readiness and building complete data platforms from the ground up.
Experience
Data Engineer
Kantar
Primary objective is to migrate the existing container based spark application to a Databricks Delta-Lake on Azure and migrating the existing Airflow pipeline of different interconnected pieces to make use of scalability, version control etc, provided by cloud with minimal changes. Major accomplishments listed below: Migrated Viewership Generation & Attribution related apps to utilize Delta-Lake's facilities instead of dealing with schema-less, fixed-width flat files in each job. Lead migration of legacy spark based application to Azure Databricks Delta-Lake. Lead Airflow pipeline migration to use updated application. Tech St ack: Delta-Lake, Azure, Databricks, Java.
Product Engineer
DataStax
DataStax Enterprise(DSE) provides an integrated common platform for Cassandra, Spark, and Solr out of the box. OpsCenter is a DSE management and monitoring tool written in Python & Clojure. Primary objective here is enhancement and maintenance of the codebase. Fixed multiple bugs of varying complexities & criticalities resulting in better customer engagement for DataStax. Picked up a lot of internal tools on the fly and started contributing on short notice. Lead the team from Persistent's end in technical capacity. Streamlined the onboarding process for developers be creating reference documents for various repositories and collating everything in one place resulting in ~40% reduction in onboarding times. Tech St ack: Cassandra, OpsCenter, Java, Python.
Data Engineer
TaskRabbit
TaskRabbit(TR) is a handyman aggregator platform based in US. TR had a home grown NodeJS based ETL tool which they needed to migrate out of in order to scale on demand and make use of task parallelization of spark. Primary objective was to migrate the existing codebase to PySpark on AWS. Migrated NodeJS based transformations to PySpark. Acheived snowflake space savings of ~70% on AWS by removing unnecessary update statements causing storage overflow by Time Travel. Integrated DataDog with existing systems and established monitoring and alerts wherever necessary. Integrated structlog with codebase for better APM via DataDog. Tech St ack: Spark, DataDog, Snowflake, Python.
Data Engineer
UBS
Objective is to migrated out of the Mainframe based data processing framework for banking transaction & analysis. This needed to be migrated into Hadoop/Spark based system without interrupting the existing pipeline. Primary accomplishments include: Created standardised framework to handle schema creation, management, & purging. Created automated framework for SCD Type II implementation. Converted business logic to complex SQL for Impala based processing. Created spark/Kafka based pipeline to process data & write to HBase. Tech St ack: Spark, Hive, Impala, Java, Shell Scripting.
Education
University Institute of Technology, RGPV
Bachelor's of Engineering
Information Technology