Data Engineer with 4 years of experience in ADF, Databricks, PySpark, and SQL, building and optimizing cloud-based data pipelines. Skilled in ETL, data modeling, and performance tuning for efficient data processing.
Experience
Data Engineer
IQVIA
Designed and developed scalable ETL pipelines using Azure Data Factory (ADF) and Azure Databricks for data ingestion and transformation. Created ADF pipeline jobs, scheduled triggers, and implemented Mapping Data Flows, using Azure Key Vault for secure credential management. Implemented PySpark-based transformations to process large datasets efficiently in Delta Lake. Optimized data processing performance, reducing ETL execution time by 80% using Spark optimizations and SQL tuning. Managed SQL Database, Azure Data Lake Storage (ADLS), ensuring efficient data storage and retrieval. Developed data models and schema designs to support business reporting and analytics. Utilized Databricks Autoloader for incremental and real-time data ingestion from Azure Data Lake Storage (ADLS), improving data pipeline efficiency. Integrated Unity Catalog for centralized data governance, access control, and lineage tracking across Azure Databricks environments. Automated data validation and monitoring processes using Python and JIRA, reducing manual debugging efforts by 50%. Troubleshot data pipeline failures and performance bottlenecks, ensuring 99.9% uptime for critical workflows. Contributed to documentation and technical standards for data pipeline development.
Data Engineer
ATTRA InfoTech
Built ETL pipelines for structured and semi-structured data using Spark and SQL. Developed and optimized batch processing jobs in Azure Databricks for data transformation. Worked with SQL databases for querying, analysis, and performance tuning. Automated data pipelines using Airflow. Migrated on-premises SQL Server data to Azure SQL DB and Azure Synapse, ensuring data consistency and performance optimization. Used Spark-SQL to process the data and to run on Spark engine. Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, and Data Frame. Worked with SonarQube to ensure code quality and maintainability across data engineering projects. Collaborated with stakeholders to resolve data quality issues and improve reporting accuracy.
Education
GITAM University
Bachelor of Engineering
Computer Science
Licenses & Certifications
EMC Academic Associate, Data Science and Big data Analytics
EMC
Masters 6 months Intensive industry Big data program
TrendyTech