Sourabh Khadiwala
@sourabhkhadiwala
Architect at Cognizant
PUNE
Sourabh is a data warehousing and big data professional with over 11 years of experience providing solutions to complex business problems. He specializes in consulting and implementing big data solutions using platforms like Cloudera, Spark, and Kafka. His expertise spans ETL processes with Informatica, data virtualization using Denodo, and managing data governance initiatives. He has extensive experience in leading end-to-end big data projects within Agile frameworks.
Experience
Architect
Cognizant
Providing bigdata solutions and contributing to the project as an individual contributor. Designing and creating data pipeline for ingesting data into Data Hub and solutions to expose the data to the users. Doing PoC and PoT for different tools and technologies. Connect with data steward and data governance board to understand more about the data that is coming in and see sensitive data. Contributing and driving towards RFP for different business groups. Creating spark application, Impala/Kudu SQL for data processing. Driving effort for cluster migration from CDH to CDP.
Tech Lead
Principal Global Services Pvt. Ltd
Designing and implementing solution for big data technologies and exposing them to end users. Coordinating with architects for design reviews. Setting up flume configuration to pull near real time updated from messaging queue and pushing it to data reservoir (HDFS). Encrypting personal identifiable information and personal health information using HP Voltage. Creating Sqoop jobs for initial load of tables into data reservoir (HDFS). Creating hive tables and view on top of files placed in data reservoir (HDFS). Develop Spark job as per requirement. Creating virtualized view from multiple sources (Oracle,Db2,Hive,Files etc.) as per the business requirement. Providing demo to business user after every sprint about what business value will the data provide them. Interact with business analyst and business partner to know the actual use of data and helping them ingathering business requirement by proving the technical details about the data source.
Associate Consultant
Principal Global Services Pvt. Ltd
Technical Associate
Tech Mahindra Ltd
DTI (Data Transformation and Integration layer) is the central component that lies between the Data Fabric layer and the Data Source layer and plays the critical role of updating the Data Fabric repositories with enterprise data from various ATT data sources. DTI uses the best of breed technologies to extract data from a variety of source systems (Relational, Mainframe, Flat files etc.) in near real-time, if possible, or using a batch interface and to update the Data Fabric repositories with the data directly. Receive real-time feeds from GoldenGate sources, load to DTI databases, apply business rules for transforming and integrating, and load the data in near real-time to data fabric target databases using GoldenGate Change Data Capture extracts and replicats. Receive batch feeds in various formats (flat file, mainframe ebcdic, xml etc.), identify daily deltas, integrate, transform and load into target databases using Informatica. The Call Center Performance Management program is intended to aggregate data from various systems with a comprehensive set of tools for reporting to improve business operations and individual performance.
Education
Truba Institute of Engg. & Information Technology
Bachelor of Computer Science and Engineering
Graduated from RajivGandhiUniversity(Bhopal).