Professional Summary
Professional Summary
__________________________________________________________________________________
Professional Summary
Experience in working with major Hadoop distributions like Cloudera 5.x and
Hortonworks HDP 2.2 and above
Worked on Cloudera Impala and Apache Spark for real-time analytical processing
Experience in writing Pig and Hive scripts and extending the core functionality by
writing custom UDF’s
Good knowledge on File formats like sequence File, RC, ORC, Parquet and
compression techniques like, gzip, snappy and LZO
Worked on Apache Flume for collecting and aggregating huge amount of log data
and stored it on HDFS for doing further analysis
Experience in writing both time and data driven workflows using Oozie
Hands on experience in working on Spark SQL queries, Data frames, import data
from Data sources, perform transformations, perform read/write operations, save the
results to output directory into HDFS.
TECHNICAL SKILLS
PROFESSIONAL EXPERIENCE
Client: AT&T Inc:
Role: Hadoop Consultant
Description: Due to high maintenance cost and low performance, AT&T started moving
their traditional data from RDBMS(Oracle) to HDFS as a part of migration. Moreover as the
volume of data is growing huge day-by-day,they wanted to increase the performance in
analyzing the data
• Played a senior Hadoop developer role and involved in all the phases of the project, starting
from POC’s till implementation
• Involved in data migration activity using Sqoop JDBC drivers for oracle and IBM db2
connectors
• Exported the analyzed data to the relational databases using Sqoop for visualization and to
generate reports For the BI team.
• Created data model for structuring and storing the data efficiently. Implemented
partitioning and of tables in HBase.
• Involved in creating Hive tables, loading the data and writing Hive queries which will run
internally in map reduce way.
• Worked with various Hadoop file formats, including ORC and parquet.
• Involved in integration of Hive and HBase.
• Tested Apache(TM) Tez, an extensible framework for building high performance batch and
interactive data processing applications, on Hive jobs.
• Wrote Java API for HBase transactions on HBase tables and involved in building Oozie
work flows.
• To read files we used xml parsing technique in spark by writing code in Scala(POC).
• Designed, documented standard operational pprocedures using confluence.
Environment: Hortonworks, Hadoop, Hive, Sqoop, HBase, MapReduce, HDFS, Pig,
Tez,Cassandra, Java, Oracle 11g/10g, FileZilla, Unix Shell Scripting,Spark,Scala
• Involved in copying data generated by various telematic devices to HDFS for further
processing using Flume.
• Loaded data from LINUX file system to HDFS and created separate directory for every
four hour window.
• Used Oozie and Zookeeper operational services for coordinating cluster and scheduling
workflows.
• Modeled Impala partitions extensively for data separation to perform faster processing of
data, and followed best practices for tuning.
• Loading the data from the different Data sources like (Teradata, DB2) into HDFS using
Sqoop and load into Impala tables, which are partitioned.
• Stored the customers data onto HBase for further transactions and historical trip data onto
Impala.
• Hands on experience in exporting the results into relational databases using Sqoop for
visualization and to generate reports for the BI team using MSTR .
Client:Retail: IRC
Role: Hadoop Consultant
Description:In this project we collect the sales data to analyze and to generate reports that
helps in enhancement of business growth.
• Loaded data from LINUX file system to HDFS using shell script.
• Hands on writing Map Reduce code to make unstructured data as structured data and for
inserting data into HBase from HDFS.
• Performed data analysis on large datasets of product, period, ,store and sales data.
• Used Eclipse for writing code and Git for version control.
• Involved in creating Impala tables, loading with data and writing Impala queries for real-
time analytical processing
• Monitored the health of Map Reduce Programs which are running on the cluster.
• Developed Impala queries to process the data and generate the data cubes for visualizing
and reports.
Environment: Hadoop, Linux, CDH4, CDH5, MapReduce, HDFS, Impala, Pig, Shell
Scripting, Java, NoSQL, Eclipse, Oracle, Git.
Academic Details:
Bachelors in Computer Science Engineering GPA: 3.82/4.0
JNT University, Hyderabad, INDIA
Certifications
Oracle Certified professional (Java SE6 Programmer)