Smriti Jain
@sjain6700
Data Engineer at Tata Consultancy Services (TCS)
Noida, Uttar Pradesh, India
Data Engineer experienced in designing and orchestrating scalable ETL pipelines using PySpark, SQL, Airflow, Databricks, and AWS Cloud Platform. Skilled in developing end-to-end data workflows, ensuring performance, reliability, and data integrity across systems.
Experience
Data Engineer
Tata Consultancy Services (TCS)
Modernized enterprise ETL and Data Quality (DQ) workflows after Databricks cluster migration, enhancing scalability, governance, and reliability by over 40%. Migrated 15+ Informatica workflows to Databricks SaaS, redesigning pipelines into modular Source → Canonical → Cleanse → DQ layers aligned with the Medallion Architecture. Implemented Delta Lake with schema enforcement, ACID versioning, and audit logging to ensure compliance, traceability, and simplified rollback management. Built a scalable PySpark-based JSON ingestion framework to dynamically flatten nested data into parent–child tables, supporting flexible relational modeling and downstream analytics. Optimized ingestion performance by 35% through partition pruning, salting, and balanced cluster resource utilization, minimizing shuffle overhead in shared-mode environments.
Summer Analyst Intern
Goldman Sachs
Automated 15+ manual processes, reducing operational effort by 40%. Developed AutoSys jobs to streamline Java application scheduling, improving efficiency and timeliness. Expanded dashboard functionality by adding 4 new monitoring flags, increasing usability for business teams.
Education
Pranveer Singh Institute of Technology
B.Tech.
CSE
Licenses & Certifications
AWS Certified Data Engineer – Associate
AWS
AWS Certified Cloud Practitioner
AWS
Web Development
Internshala