Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Shekhar Nagle

Nagpur, India
+919766697897 | [email protected]

Introduction
Shekhar is an accomplished Data Engineer and Data Scientist. Apart from working full-time in the Meriye LLC,
he is also a contributor to the Data Science Community by running the full-fledged ‘Boffins Data Science
Academy’ in IT Park, Nagpur India (www.boffinsacademy.com).

Technology Highlights
 Hadoop, MapReduce, Hive, Sqoop, Core Java, Pig, HDFS and Strong Python, Java, Apache Spark
 EMC SAN Storage Operations (Symmetrix: DMX & VMAX)
 Redhat Enterprise Linux Administration, Networking & Fiber Channel Deployment

Certifications
1. Redhat Enterprise Linux 6 System Administrator (Number: 120-185-236)
2. VMWare Data centre virtualization (VCP 5.1) (Candidate ID: VMW-01187270N-00364382 - Verification
Code: 11372426-86B4-3EAB47D1F12B)

Education
1. BTech Computer Science and Engineering from MPSTME NMIMS Mumbai – CGPA: 2.93/4
2. Diploma in Electronics & Telecommunication Engineering– 72%
3. SSC from Maharashtra State Board
____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Professional Experience
Engineering Tech: Sr Data Scientist

Roles & Projects:


Machine Learning Engineer

Technologies: Pandas, numpy, seaborn, sklearn, python etc.


Environment: MySQL Workbench 5.7, Python 3.6.3, Jupyter notebook 5.5.0, Apache Spark 2.2, Tableau 10.4,
Hive 2.3.0, Sqoop 1.4.6, Hadoop 2.7.1, Spark 2.2, JIRA

 Used to Logistic regression to prediction the slot availability based on various parameters
 Collaborated with the business analyst on the requirements of the project and explored the data from
the database querying (SQL) search techniques, web services etc
 Preparing data using techniques like dimensionality reduction for reduction of features using (PCA, t-
SNE), cleaning the data using libraries of Python
 Applying advanced statistical techniques (Bayesian, sampling and experimental design) while performing
machine learning algorithms on the heterogeneous data.
 Used advanced analytical tools and programming languages such as Python (NumPy, pandas, SciPy) for
data analysis
 Constructed and evaluated various types of datasets by performing machine learning models using
algorithms and statistical modelling techniques such as clustering, classification, regression, decision
trees, support vector machines, anomaly detection, sequential pattern discovery, and text mining from
Python libraries (scikits.learn)
 Performing the Post pruning techniques in machine learning to reduce the complexity of the final
classifier which results in improving the predictive analysis by reducing over fitting, using python libraries
(sklearn)
 Performing predictive analytics and machine learning algorithms especially supervised (SVM, Logistic
Regression, Boosting), unsupervised (K-Means, LDA, EM) and Reinforcement learning (Random Forests)
methods
 Repeating steps as required for improving the Scalability, Reliability and performance of our Streaming
Data Pipelines which was built on top of Spark.
 Importing and exporting data from various RDBMS such as MySQL, Oracle, mainframe into Hadoop
Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data
back into an RDBMS using Sqoop
 Working with the Hadoop ecosystem, including HDFS, MapReduce, Hive, and Spark for managing data
processing and storage for big data applications running in clustered systems.
 Read the different data formats like API (JSON), XML, CSV, Rich Text Format (.rtf), Open Document Text
(. odt), HTML (.htm, .html), parquet, Avro
 Deployed Spark Ecosystem includes Spark SQL, Spark DataFrames, MLlib, GraphX, Spark Streaming,
Spark Core API increase productivity and can be seamlessly combined to create complex workflows
 Visualized graphs and reports using matplotlib, seaborn and panda packages in python on datasets for
analytical models to know the missing values, outliers, correlation between the features

Apache Spark and Hadoop Developer


Hadoop Project:

Cluster Specifications
 Data Size: 10TB
 Cluster Architecture: Fully Distributed
 Package Used: CDH4
 Cluster Capacity: 39.5 TB ~ 40 TB approx.
 No. of Nodes: 26 Data Nodes + 3 Masters + NFS Backup For NN

Map Reduce Programming for various Use Cases, Installing and maintaining a Hadoop cluster
 Training Peers and getting them acquainted on Hadoop, HDFS , Map Reduce, Sqoop, Pig And Hive
- Analyzing the Big Data requirements and transforming it into Hadoop Centric Technologies
- Worked in setting up the Hadoop Cluster and Administering the Cluster
- Worked on pulling the data from MySQL database into HDFS using Sqoop
- Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig
- Participated in developing capacity plans for new and existing systems of Hadoop
 Installing Hadoop Cluster and related applications. Installed pseudo node cluster and multi-node cluster
with Hive, Pig, Hbase, Flume, Sqoop

Hadoop Cluster Deployment, Administration


 Installing Hadoop Cluster and related applications and certifying them. Installed multi-node fully
distributed cluster with MR, Hive, Pig, Hbase, Flume, Sqoop
 Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job
requirement
 Enabled Hadoop task monitoring feature. Design, develop, and manage data on Hadoop cluster
 Commissioning and Decommissioning the Nodes on cluster. Data Rebalancing. Manage nodes on
Hadoop cluster
 End-to-end performance tuning of Hadoop clusters and Hadoop Map/Reduce routines against very large
data sets
 HDFS support and maintenance, maintaining backups for name node, Recovering from a NameNode
failure
 Setup of standby name node, Perform datanode scanning, Maintaining and monitoring clusters. Loaded
data into the cluster from dynamically generated files using Flume and from relational database
management systems using Sqoop
 Hadoop Performance Tunning, Monitoing Tool - Ganglia
 Monitor Hadoop cluster job performance and capacity planning, Hadoop cluster connectivity and
security
 Configure access control list (ACL), Configure SSH, Excellent understanding of Delegation token.

High Availability
 Fair Scheduler
 Load balancing of data across the cluster and performance tuning of various jobs running on the cluster
 Implementation of kerberos authentication. (Cross-realm authentication)
 Backup and Disaster Recovery (BDR)
 Cloudera Management Services - Health Checks/Alerts
 Navigator Integration (Data Audit and Access Control)
 Importing Data from Mysql to HDFS using Sqoop

Solution Provided
This map-reduce processing on a distributed framework can be achieved through JAVA, PIG and HIVE and
SQOOP which is a map only processor which is the easiest way of importing the data from RDBMS DB

JAVA/SQL DEVELOPER
 Create requirement, design and change control documents for the SLDC
 Extract data from the warehouse to transfer into dimension and fact tables in star schema
 MS SQL server writing SQL queries, stores procedures, function and triggers
 Object oriented design and analysis
 Developed the web application using java, jsp, servlets, mysql, html, css and ajax
 Created java extension to convert files into other file formats
 Retrieve data from database to create reports in java using Aspose
 Cells and/or Aspose PDF libraries
 Involved in installation and configuration of SQL Server Management tools using SQL Server setup
program.
 Involved in Transact SQL coding, writing queries, cursors, functions, and Triggers as required
 Created and modified tables, Stored procedures Views Indexes, UserDefined functions and Triggers as
Required

Redhat Enterprise Linux Experience


 Hands on experience in systems setup, configuration, upgrade, maintenance, performance monitoring
and troubleshooting on Red hat linux 5.0/6
 Implemented and installed of backup server.
 Experience supporting Apache on Linux environments
 Efficiently implemented network file sharing by configuring NFS to share files and resources across the
network
 Scheduling various cronjobs for backups, database, proprietary jobs using CRON and troubleshooting
daily job problems
 Supported Apache webservers running virtual hosts and also Linux servers
 Experience installing Linux (RHEL) using ISO images on VMWare Environment
 Linux kernel, memory upgrades and swaps area and performed Red hat Linux Kickstart installations
 Installed and configured Sudoaccess for users to access the root privileges
 User administration, management, File permissions& archiving
 Installing & administering NFS services using automounter
 Regular disk management like adding/replacing hard drives on existing servers/workstations,
partitioning according to requirements, creating new file systems or growing existing one over the hard
drives and managing file systems
 Creating users, assigning groups and home directories, setting quota and permissions; administering file
systems and recognizing file access problems
 Create/Manage file systems with Logical Volume Manager
 Configure file and printer server
 SAMBA – installation and trouble shooting
 SQUID-3- Developed proxy server using SQUID-3 to reduce network bandwidth usage by blocking
unwanted side
 Install and configure SSH to enable secure access to the servers
 Networking configuration of Ethernet, ifconfig & networking tools
 Advance package management tool like RPM, YUM. (ability to setup and implement YUM server and
client)
 Good on Cut, Grep, Sed and pipe commands
_____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Engineering Project
 Use Case: Twitter Analysis Using Flume HDFS and Hive
 Deriving Sentiments of users on twitter for a client product
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Contribution to the Data Science Community


Founder at Boffins Data Science Academy www.boffinsacademy.com

 The students, professionals and aspirants in Nagpur are deprived of correct education & employment
opportunities, right from the university syllabus to the availability of the jobs in the region
 With Boffins, I now have the opportunity to educate the Data Science aspirants and the corporate
companies to bring on one platform
 The syllabus designed for two programs, Data Engineer and Data Scientist creates a unique Talent-Pool
available to hire for the IT Companies across India through the in-campus interviews

You might also like