Pavithra Sridhar
@pavithrasridhar
Data Engineer at Altimetrik India Private Limited
Chennai, TN
Pavithra is a Data Engineer with 3 years of experience in designing and developing robust data pipelines. She possesses strong expertise in Big Data technologies, including Hadoop, Spark, Hive, and Sqoop, utilizing Python and SQL for data processing. Her skills include translating complex business requirements into efficient data models and optimizing data workflows.
Experience
Data Engineer
Altimetrik India Private Limited
Involved closely with Operations to identify customer needs and demands. Designed and developed data ingestion pipeline to extract data for Insights. Translated business proportions into quantitative queries and collected the necessary data. Extracted and analyzed data to identify key metrics and transform data into meaningful information. Collected, cleansed and provided structured and unstructured data for business initiatives. Developed tables and views in Snowflake to be used in visualization. Ingested data from multiple data sources using a combination of SQL using Python to create data views to be used in BI tools.
Programmer Trainee
Cognizant Technology Solutions
Monitored critical and daily production jobs and handled the occurred abends of the production jobs. Managed scheduling, holding and re-running the jobs as per the Developer’s request. Ensured higher priority issues are resolved by the respective Core Teams/Development within timeframe. Provided daily status/report for critical, abended and daily production jobs. Performed and handled the Weekly, Monthly and Bi-Monthly production releases.
Product Engineer
Fintuple Technologies Private Limited
Designed and developed a web crawling pipeline to extract the data for each indices using Scrapy. Designed an ingestion layer to ingest crawled raw data into HDFS. Designed and implemented various preprocessing module using PySpark. Designed a Data Warehouse using Hive, created and managed Hive tables in Hadoop. Implemented data export module to export processed data from Hadoop systems to Relational Database (MySQL) using Sqoop. Involved in optimisation task to improve Hive queries and Spark performance. Actively involved in data validation testing to check the correctness of processed data / crawled data. Managed Spark stand-alone cluster and BigData ecosystem on premises. Designed and constructed web crawling modules to extract the data for stock prices and download sector based PDFs through web crawling using Scrapy. Developed a module to load the crawled data and data extracted from PDFs into HDFS. Designed and implemented PDF content extraction module using Camelot and load the extracted data into appropriate table in Hive. Worked on data processing module to process extracted data using Pyspark. Implemented export module which pushes the processed data from Hive to Relational Database using Sqoop for Web UI access. Implemented data validation module which verifies the processed data by extracting the data from MySQL using SQLAlchemy and validating it automatically (Data Quality Check).
Education
Ethiraj College For Women
Bachelor of Computer Applications
Computer Applications
Licenses & Certifications
Azure DevOps Boards for Project Managers/Analyst/Developers course
Udemy
BigData Analysis : Hive, Spark SQL, DataFrames course
Coursera
Learning PySpark course
Udemy
Data Analysis using Pyspark
Coursera
BigData Essentials : HDFS, MapReduce and Spark RDD course
Coursera
Data Structures and Algorithms
Coursera
Object Oriented Programming in Python
Coursera
Design Patterns with Python
PluralSight