Default profile banner
VS

Vani Singh

@Vanisingh

Data Engineer | Databricks | SQL | Python | PySpark | Snowflake | AWS | ETL

Noida, Uttar Pradesh, India

AccentureJSS Academy Of Technical Education Noida

Data Engineer with 2.8 years of experience in designing, building, and optimizing Data Pipelines. Experienced in Cloud Migration, transforming legacy data systems into scalable cloud architectures. Proficient in SQL, Python, Pyspark, Databricks, Snowflake, AWS, Data Warehousing and ETL Workflows. Skilled in handling Relational, Geospatial, Semi-Structured and Streaming data.

Experience

Data Engineer

Accenture

Full-timeAug 2023 - PresentNavi Mumbai, Maharashtra, India

Migrated 4 legacy SAS-based data applications to a cloud-native Databricks–Snowflake–AWS architecture, enhancing scalability and reducing processing time by 60%. • Designed and implemented end-to-end ETL pipelines for the Advertising Analytics team, integrating cross-source datasets to enable targeted ad campaigns, contributing to a 15% reduction in claim count. • Developed a PySpark-based lightweight application to process coverage-level and state-level data, applying Spark performance tuning that improved runtime by 25%. • Orchestrated multiple Databricks jobs using Databricks API, AWS Lambda, Step Functions, EventBridge, and SNS, automating workflow execution and achieving a 70% gain in operational efficiency. • Initiated compute optimization efforts for Databricks and Snowflake, analyzing Spark UI and logs to identify performance bottlenecks, resulting in 11,500 DBUs saved annually. • Delivered scalable, automated, and fault-tolerant solutions with scheduled data refreshes and monitoring. • Developed Lambda-based automation to trigger and validate Snowflake SQL queries, streamlining data refresh cycles. • Built a data validation and quality check framework using Python and PySpark, automating data profiling, anomaly detection, and visualization, reducing manual effort by 80% and increasing data reliability. • Processed and transformed Geospatial datasets for property underwriting, managing 65 million records to support risk scoring and underwriting decision-making. • Performed parallel runs between on-premises SAS systems and Databricks pipelines to validate output accuracy during migration, ensuring 99% consistency across platforms. • Collaborated with cross-functional teams to document data workflows, define data lineage, and improve governance practices in the migration process. • Supported production operations by troubleshooting ETL job failures, optimizing cluster configurations, and ensuring stable daily and monthly refreshes. • Contributed to data pipeline enhancements, including partition pruning, caching strategies, and incremental load optimization to improve overall performance and cost efficiency.

Education

JSS Academy Of Technical Education Noida

Bachelors In Technology

Computer Science

Aug 2019 - Jun 2023Grade: 8.78

- Part of Google Developers Student Club (Developer Student Clubs is a Google Developers program for university students to help them learn and build together. DSC JSS Noida is a community of programmers, developers and designers who grow their knowledge in a peer to peer learning environment and build solutions for local business and their community.) - Part Of Impetus Student Society (Impetus Student Society is a student body in the JSS Academy of Technical Education, Noida, which uniquely aims at launching a movement to turn good students into better professionals, so as to provide the corporate world with quality professionals and hence making a student’s induction into the industry much easier. Impetus lays initiative to introduce today’s academician to tomorrows executive.)

Licenses & Certifications

Databricks Certified Data Engineer Associate

Databricks

Issued: May 2024Expires: May 2026

SnowPro Core Certification

Snowflake

Issued: Feb 2026Expires: Feb 2028

Github Copilot (GH-300)

Microsoft

Issued: Jan 2026Expires: Jan 2028

Skills

Databricks
Pyspark
Snowflake
AWS
Python
Git
GitHub
SAS Enterprise Guide
Autosys
Pandas
Numpy
ETL
Data Warehouse