Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Vijaya Rama Krishna

[email protected]
(407) 476-6397

Summary:

 Over 5+ years of experience as Data Engineer with demonstrated expertise in building


and deploying data pipelines using open-source Hadoop based technologies such as
Apache Spark,Hive, Hadoop, Python, PySpark.
 Hands on Experience in developing Spark applications using PySpark Data Frame,
RDD, SparkSQL.
 Working with GCP cloud using in GCP Cloud storage, DataProc, Data Flow, Big Query,
Cloud Composer, Cloud Pub/Sub.
 Expert in working with cloud PUB/SUB to replicate data real-time from source system
to GCP Big Query.
 Good knowledge on GCP service accounts, billing projects, authorized views,
datasets, GCS buckets and gsutil commands
 Experienced in building and deploying Spark applications on Hortonworks Data
Platform and AWS EMR
 Experienced in working with AWS services such as – EMR, S3, EC2, IAM,
Lambda,Cloud Formation, Cloud Watch.
 Worked on structured and semi structured data storage formats such as Parquet,
ORC, CSV,JSON.
 Developed automation scripts using AWS Python Boto3 SDK.
 Proficient in developing UNIX Shell Scripts.
 Experienced in working with Snowflake cloud data warehouse and Snowflake Data
Modeling.
 Built ELT workflow using Python and Snowflake COPY utilities to load data into
Snowflake.
 Experienced in working with RDBMS databases Oracle, SQL Server.
 Developed complex SQL queries and performance tuning of SQL queries.
 Experienced in working with CICD pipelines like Jenkins, Bamboo.
 Experienced in working with Source Code management tools in GIT and Bit Bucket.
 Worked on Application Monitoring tools like Splunk, Elastic Search, Log Stash,
Kibana forapplication logging and monitoring.

Certification:

Certified in Google Cloud Platform Professional Data Engineer.


PROFESSIONAL EXPERIENCE:

GCP Data Engineer


Citi Bank, Tampa, FL | Jul 2020 – Till Date
Responsibilities:

 Experience in building multiple Data pipelines, end to end ETL and ELT process for Data
ingestion and transformation in GCP and coordinate task among the team.
 Design and implement various layer of Data lake, Design star schema in Big Query.
 Using g-cloud function with Python to load data in to Big query for on arrival csv files in
GCS bucket.
 Process and load bound and unbound data from Google pub/sub topic to Big query using
cloud Dataflow with Python.
 Designed Pipelines with Apache Beam, KubeFlow, Dataflow and orchestrated jobs into
GCP.
 Developed and Demonstrated the POC, to migrate on-prem workload to Google Cloud
Platform using GCS, Big Query, Cloud SQL and Cloud DataProc.
 Documented the inventory of modules, infrastructure, storage, components of existing On-
Prem data warehouse for analysis and identifying the suitable technologies/strategies
required for Google Cloud Migration.
 Design, development and implementation of performing ETL pipelines using python API
(pySpark) of Apache Spark.
 Worked on GCP POC to migrate data and applications from On-Prem to Google Cloud.
 Exposure on IAM roles in GCP.
 Create firewall rules to access Google data procs from other machines.
 Process and load bound and unbound Data from Google pub/subtopic to Big query using
cloud Dataflow with Python.
 Setup GCP Firewall rules to ingress or egress traffic to and from the VM's instances based on
specified configuration and used GCP cloud CDN (content delivery network) to deliver
content from GCP cache locations drastically improving user experience and latency.

Environment: GCP, Cloud SQL, Big Query, Cloud DataProc, GCS, Cloud SQL, Cloud Composer,
Informatica Power Center 10.1, Talend 6.4 for Big Data, Hadoop, Hive, Teradata, SAS, Teradata, Spark,
Python, Java, SQL Server.

Data Engineer
PWC | Tampa, FL | Dec 2019 – Jun 2020
Project: US-ASR NQO Consult & Advisory
Responsibilities:

 Worked on implementing scalable infrastructure and platform for large amounts of data
ingestion, aggregation, integration, analytics in Hadoop using Spark and Hive.
 Worked on developing streamlined workflows using high-performance API services
dealing withlarge amounts of structured and unstructured data.
 Developed Spark jobs in Python to perform data transformation, creating Data
Frames and
Spark SQL.
 Worked on processing un-structured data in JSON format to structured data in parquet
formatby performing several transformations using Pyspark.
 Developed Spark applications using spark libraries to perform ETL transformations
and therebyeliminating the need for utilizing ETL tools.
 Developed the end-to-end data pipeline in spark using python to ingest, transform and
analyses data.
 Created Hive tables using HiveQL, then loaded the data into Hive tables and analyzed
the databy developing Hive queries.
 Created and executed Unit test cases to validate transformations and process
functions areworking as expected.
 Worked on scheduling Control-M workflow engine to run multiple jobs.
 Written shell scripts to automate application deployments.
 Implemented solutions to switch schemas based on the dates so that the transformation
wouldbe automated.
 Developed custom functions and UDFs in python to incorporate methods and
functionality ofSpark.
 Developed data validation scripts in Hive and Spark and perform validation using
Jupiter Notebook by spinning up the query cluster in AWS EMR.
 Executed Hadoop and Spark jobs on AWS EMR using data stored in Amazon S3.
 Implemented Spark RDD transformations to map business analysis and apply actions
on top oftransformations.
 Worked on Data serialization formats for converting complex objects into sequence bits by
using
Parquet, Avro, JSON, CSV formats.
Environment: Hadoop, Hive, Zookeeper, Sqoop, Spark, Control-M, Python, Bamboo, SQL, Bit
bucket, AWS, Linux.

Data Engineer
Thomson Reuters |India |June 2016 – Jan 2019
Responsibilities:

 Creating and managing nodes that utilize Java jars and python, shell scripts for
scheduling jobsto customize Data ingestion.
 Developed Pig Scripts for change data capture and delta record processing between
newlyarrived data and already existing data in HDFS.
 Developed and performed Sqoop import from Oracle to load the data into HDFS.
 Created Partitions, Buckets based on State to further process using Bucket based Hive
joins.
 Created Hive tables to store the processed results in a tabular format.
 Scheduled MapReduce jobs in production environment using Oozie scheduler and
Autosys.
 Developed Kafka producer and brokers for message handling.
 Imported the data to Hadoop using Kafka and implemented the Oozie job for daily
imports.
 Configured Kafka ingestion pipeline to transmit the logs from web server to Hadoop.
 Worked with POC’s for stream processing using Apache NIFI.
 Worked on Hortonworks Hadoop Solutions with Real-time Streaming using Apache NIFI.
 Analyzed Hadoop logs using Pig scripts to oversee the errors caused by the team.
 Performed MySQL queries for efficient retrieval of ingested data using MySQL workbench.
 Implemented data ingestion and transformation using automated workflows using Oozie.
 Created and generated audit reports to notify security threat and track all user activity
usingvarious Hadoop components.
 Designed various plots showing HDFS analytics and other operations performed on
theenvironment.
 Worked with Infra team for testing the environment after patches, upgrades and migration.
 Developed multiple Java scripts for delivering end-to-end support while maintaining
productintegrity.
Environment: HDFS, Hive, MapReduce, Pig, Spark, Kafka, Sqoop, Scala, Oozie, Maven, GitHub, Java,
Python, MySQL, Linux.

Junior Data Engineer Intern


Analytics Quotient Services India Pvt.Ltd | India | Aug 2015 – May 2016
Responsibilities:
 Designed and implemented code changes in existing modules - Java, python, shell-
scripts forenhancement.
 Designed User Interface and the business logic for customer registration and maintenance.
 Integrating Web services and working with data in different servers.
 Involved in designing and Development of SOA services using Web Services.
 Created, developed, modified, and maintained database objects, PL/SQL packages,
functions, stored procedures, triggers, views, and materialized views to extract data
from different sources.
 Extracted data from various location and load them into the oracle table using SQL*LOADER.
 Developed PL/SQL Procedures and UNIX Scripts for Automation of UNIX jobs and running
files inbatch mode.
 Using Informatica Power Center Designer, analyzed the source data to Extract, transform
from various source systems (Oracle, SQL server and flat files) by incorporating business
rules using different objects and functions that the tool supports.
 Used Oracle OLTP databases as one of the main sources for ETL processing.
 Managed ETL process by pulling large volume of data from various data sources using
BCP instaging database from MS access and excel.
 Responsible for detecting errors in ETL operation and rectify them.
 Incorporated Error Redirection during ETL load in SSIS packages.
 Implemented various types of SSIS transformations in packages including
Aggregate, FuzzyLookup, Conditional Split, Row Count, Derived Column etc.
 Implemented the Master Child Package Technique to manage big ETL Projects efficiently.
 Involved in Unit testing and System Testing of ETL Process.

Environment: MS SQL Server, SQL, SSIS, MySQL, Unix, Oracle, Java, Python, Shell.

You might also like