Syed Atif Ali
@syed.atif-ali
Data Engineer at Capgemini
Navi Mumbai, India
Results-driven Data Engineer with around 3 years of experience specializing in Spark, PySpark, Python, SQL, and Databricks. Proven expertise in developing and managing ETL pipelines, implementing business logic, and performing rigorous data quality checks. Skilled in automating deployment processes using Jenkins and managing code with Git/GitHub. Adept at understanding complex data models, creating comprehensive mapping documents, and ensuring seamless data processing workflows.
Experience
Data Engineer (GAP)
Capgemini
Designed and developed end-to-end data pipelines for transforming and integrating data into Delta tables using PySpark. Designed and implemented a real-time data pipeline to process semi-structured data by integrating raw records from data sources using Kafka and PySpark. Deployed data pipelines using Jenkins. Optimized Spark jobs for performance, resulting in up to 40% reduction in runtime and 30% lower resource consumption.
Data Engineer (Yahoo)
Capgemini
Led the conversion of legacy Hive ETL pipelines to PySpark. Conducted comprehensive performance optimization for PySpark pipelines, reducing data processing times by up to 35%. Managed data extraction and cleaning processes. Deployed and managed ETL pipelines on AWS EMR.
Education
Sant Gadge Baba Amravati University
B.E.
Licenses & Certifications
AWS Certified Developer – Associate
Amazon Web Services
Microsoft Certified: Azure Data Fundamentals
Microsoft
Academy Accreditation - Databricks Lakehouse Fundamentals
Databricks