Learning Hunk
By Dmitry Anoshin and Sheypak Sergey
()
About this ebook
About This Book
- Explore your data in Hadoop and NoSQL data stores
- Create and optimize your reporting experience with advanced data visualizations and data analytics
- A comprehensive developer's guide that helps you create outstanding analytical solutions efficiently
Who This Book Is For
If you are Hadoop developers who want to build efficient real-time Operation Intelligence Solutions based on Hadoop deployments or various NoSQL data stores using Hunk, this book is for you. Some familiarity with Splunk is assumed.
What You Will Learn
- Deploy and configure Hunk on top of Cloudera Hadoop
- Create and configure Virtual Indexes for datasets
- Make your data presentable using the wide variety of data visualization components and knowledge objects
- Design a data model using Hunk best practices
- Add more flexibility to your analytics solution via extended SDK and custom visualizations
- Discover data using MongoDB as a data source
- Integrate Hunk with AWS Elastic MapReduce to improve scalability
In Detail
Hunk is the big data analytics platform that lets you rapidly explore, analyse, and visualize data in Hadoop and NoSQL data stores. It provides a single, fluid user experience, designed to show you insights from your big data without the need for specialized skills, fixed schemas, or months of development. Hunk goes beyond typical data analysis methods and gives you the power to rapidly detect patterns and find anomalies across petabytes of raw data.
This book focuses on exploring, analysing, and visualizing big data in Hadoop and NoSQL data stores with this powerful full-featured big data analytics platform.
You will begin by learning the Hunk architecture and Hunk Virtual Index before moving on to how to easily analyze and visualize data using Splunk Search Language (SPL). Next you will meet Hunk Apps which can easy integrate with NoSQL data stores such as MongoDB or Sqqrl. You will also discover Hunk knowledge objects, build a semantic layer on top of Hadoop, and explore data using the friendly user-interface of Hunk Pivot. You will connect MongoDB and explore data in the data store. Finally, you will go through report acceleration techniques and analyze data in the AWS Cloud.
Style and approach
A step-by-step guide starting right from the basics and deep diving into the more advanced and technical aspects of Hunk.
Read more from Dmitry Anoshin
Learning Hunk: A quick, practical guide to rapidly visualizing and analyzing your Hadoop data using Hunk Rating: 0 out of 5 stars0 ratingsMastering Business Intelligence with MicroStrategy Rating: 0 out of 5 stars0 ratingsTableau 2019.x Cookbook: Over 115 recipes to build end-to-end analytical solutions using Tableau Rating: 0 out of 5 stars0 ratingsSAP Lumira Essentials Rating: 4 out of 5 stars4/5Azure Data Factory Cookbook: A data engineer's guide to building and managing ETL and ELT pipelines with data integration Rating: 0 out of 5 stars0 ratings
Related to Learning Hunk
Related ebooks
Big Data Analytics with Java Rating: 0 out of 5 stars0 ratingsBig Data Analytics with Hadoop 3: Build highly effective analytics solutions to gain valuable insight into your big data Rating: 0 out of 5 stars0 ratingsLearning Kibana 5.0 Rating: 0 out of 5 stars0 ratingsModern Big Data Processing with Hadoop: Expert techniques for architecting end-to-end big data solutions to get valuable insights Rating: 0 out of 5 stars0 ratingsPractical Data Analysis - Second Edition Rating: 0 out of 5 stars0 ratingsData Analysis and Business Modeling with Excel 2013 Rating: 1 out of 5 stars1/5Scala Machine Learning Projects: Build real-world machine learning and deep learning projects with Scala Rating: 0 out of 5 stars0 ratingsBig Data Architect's Handbook: A guide to building proficiency in tools and systems used by leading big data experts Rating: 0 out of 5 stars0 ratingsMachine Learning with Spark - Second Edition Rating: 0 out of 5 stars0 ratingsScalable Big Data Architecture: A practitioners guide to choosing relevant Big Data architecture Rating: 0 out of 5 stars0 ratingsPostgreSQL Administration Essentials Rating: 0 out of 5 stars0 ratingsPractical Predictive Analytics Rating: 0 out of 5 stars0 ratingsLearning SAP BusinessObjects Dashboards Rating: 0 out of 5 stars0 ratingsMastering PostgreSQL 11: Expert techniques to build scalable, reliable, and fault-tolerant database applications, 2nd Edition Rating: 0 out of 5 stars0 ratingsArchitecting Data-Intensive Applications: Develop scalable, data-intensive, and robust applications the smart way Rating: 0 out of 5 stars0 ratingsMaking Big Data Work for Your Business: A guide to effective Big Data analytics Rating: 0 out of 5 stars0 ratingsApache Hive Essentials: Essential techniques to help you process, and get unique insights from, big data, 2nd Edition Rating: 0 out of 5 stars0 ratingsMastering PostgreSQL 10: Expert techniques on PostgreSQL 10 development and administration Rating: 0 out of 5 stars0 ratingsPredictive Analytics Using Rattle and Qlik Sense Rating: 0 out of 5 stars0 ratingsBuilding ERP Solutions with Microsoft Dynamics NAV Rating: 0 out of 5 stars0 ratingsScala and Spark for Big Data Analytics: Explore the concepts of functional programming, data streaming, and machine learning Rating: 0 out of 5 stars0 ratingsStream Analytics with Microsoft Azure: Real-time data processing for quick insights using Azure Stream Analytics Rating: 0 out of 5 stars0 ratingsHands-On Data Analysis with Scala: Perform data collection, processing, manipulation, and visualization with Scala Rating: 0 out of 5 stars0 ratingsMachine Learning Algorithms Rating: 0 out of 5 stars0 ratingsArtificial Intelligence for Big Data: Complete guide to automating Big Data solutions using Artificial Intelligence techniques Rating: 0 out of 5 stars0 ratingsAngular Services Rating: 0 out of 5 stars0 ratingsMachine Learning With Go: Leverage Go's powerful packages to build smart machine learning and predictive applications, 2nd Edition Rating: 0 out of 5 stars0 ratings
Data Visualization For You
Data Analytics for Beginners: Introduction to Data Analytics Rating: 4 out of 5 stars4/5How to Lie with Maps Rating: 4 out of 5 stars4/5The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios Rating: 4 out of 5 stars4/5DAX Patterns: Second Edition Rating: 5 out of 5 stars5/5Mastering Excel: Excel Apps Rating: 3 out of 5 stars3/5Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals Rating: 4 out of 5 stars4/5Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python Rating: 0 out of 5 stars0 ratingsSpatial Statistics Illustrated Rating: 5 out of 5 stars5/5Advanced Splunk Rating: 5 out of 5 stars5/5Deep Learning with Keras: Beginner’s Guide to Deep Learning with Keras Rating: 3 out of 5 stars3/5Learning pandas - Second Edition Rating: 4 out of 5 stars4/5R Machine Learning By Example Rating: 0 out of 5 stars0 ratingsData Visualization: A Practical Introduction Rating: 5 out of 5 stars5/5Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked Rating: 1 out of 5 stars1/5Teach Yourself VISUALLY Power BI Rating: 0 out of 5 stars0 ratingsData Analysis with Stata Rating: 5 out of 5 stars5/5Tableau For Dummies Rating: 4 out of 5 stars4/5Learning Tableau Rating: 0 out of 5 stars0 ratingsLearning Tableau 2019 - Third Edition: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition Rating: 0 out of 5 stars0 ratingsLearning PySpark Rating: 0 out of 5 stars0 ratingsData Analytics & Visualization All-in-One For Dummies Rating: 0 out of 5 stars0 ratingsFieldwork Handbook: A Practical Guide on the Go Rating: 0 out of 5 stars0 ratingsR for Data Science Rating: 5 out of 5 stars5/5Excel for Beginners 2023: A Step-by-Step and Comprehensive Guide to Master the Basics of Excel, with Formulas, Functions, & Charts Rating: 0 out of 5 stars0 ratingsThe Chicago Guide to Writing About Numbers Rating: 0 out of 5 stars0 ratings
Reviews for Learning Hunk
0 ratings0 reviews
Book preview
Learning Hunk - Dmitry Anoshin
Table of Contents
Learning Hunk
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Meet Hunk
Big data analytics
The big problem
The elegant solution
Supporting SPL
Intermediate results
Getting to know Hunk
Splunk versus Hunk
Hunk architecture
Connecting to Hadoop
Advance Hunk deployment
Native versus virtual indexes
Native indexes
Virtual index
External result provider
Computation models
Data streaming
Data reporting
Mixed mode
Hunk security
One Hunk user to one Hadoop user
Many Hunk users to one Hadoop user
Hunk user(s) to the same Hadoop user with different queues
Setting up Hadoop
Starting and using a virtual machine with CDH5
SSH user
MySQL
Starting the VM and cluster in VirtualBox
Big data use case
Importing data from RDBMS to Hadoop using Sqoop
Telecommunications – SMS, Call, and Internet dataset from dandelion.eu
Milano grid map
CDR aggregated data import process
Periodical data import from MySQL using Sqoop and Oozie
Problems to solve
Summary
2. Explore Hadoop Data with Hunk
Setting up Hunk
Extracting Hunk to a VM
Setting up Hunk variables and configuration files
Running Hunk for the first time
Setting up a data provider and virtual index for CDR data
Setting up a connection to Hadoop
Setting up a virtual index for data stored in Hadoop
Accessing data through a virtual index
Exploring data
Creating reports
The top five browsers report
Top referrers
Site errors report
Creating alerts
Creating a dashboard
Controlling security with Hunk
The default Hadoop security
One Hunk user to one Hadoop user
Summary
3. Meeting Hunk Features
Knowledge objects
Field aliases
Calculated fields
Field extractions
Tags
Event type
Workflow actions
Macros
Data model
Add auto-extracting fields
Adding GeoIP attributes
Other ways to add attributes
Introducing Pivot
Summary
4. Adding Speed to Reports
Big data performance issues
Hunk report acceleration
Creating a virtual index
Streaming mode
Creating an acceleration search
What's going on in Hadoop?
Report acceleration summaries
Reviewing summary details
Managing report accelerations
Hunk accelerations limits
Summary
5. Customizing Hunk
What we are going to do with the Splunk SDK
Supported languages
Solving problems
REST API
The implementation plan
The conclusion
Dashboard customization using Splunk Web Framework
Functionality
A description of time-series aggregated CDR data
Source data
Creating a virtual index for Milano CDR
Creating a virtual index for the Milano grid
Creating a virtual index using sample data
Implementation
Querying the visualization
Downloading the application
Custom Google Maps
Page layout
Linear gradients and bins for the activity value
Custom map components
Other components
The final result
Summary
6. Discovering Hunk Integration Apps
What is Mongo?
Installation
Installing the Mongo app
Mongo provider
Creating a virtual index
Inputting data from the recommendation engine backend
Data schemas
Data mechanics
Counting by shop in a single collection
Counting events in all collections
Counting events in shops for observed days
Summary
7. Exploring Data in the Cloud
An introduction to Amazon EMR and S3
Amazon EMR
Setting up an Amazon EMR cluster
Amazon S3
S3 as a data provider for Hunk
The advantages of EMR and S3
Integrating Hunk with EMR and S3
Method 1: BYOL
Setting up the Hunk AMI
Adding a license
Configuring the data provider
Configuring a virtual index
Setting up a provider and virtual index in the configuration file
Exploring data
Method 2: Hunk–hourly pricing
Provisioning a Hunk instance using the Cloud formation template
Provisioning a Hunk instance using the EC2 Console
Converting Hunk from an hourly rate to a license
Summary
Index
Learning Hunk
Learning Hunk
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: December 2015
Production reference: 1181215
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78217-482-0
www.packtpub.com
Credits
Authors
Dmitry Anoshin
Sergey Sheypak
Reviewers
Jigar Bhatt
Neil Mehta
Acquisition Editors
Hemal Desai
Reshma Raman
Content Development Editor
Anish Sukumaran
Technical Editor
Shivani Kiran Mistry
Copy Editor
Stephen Copestake
Project Coordinator
Izzat Contractor
Proofreader
Safis Editing
Indexer
Hemangini Bari
Graphics
Jason Monteiro
Production Coordinator
Nilesh Mohite
Cover Work
Nilesh Mohite
About the Authors
Dmitry Anoshin is a data-centric technologist and a recognized expert in building and implementing big data and analytics solutions. He has a successful track record when it comes to implementing business and digital intelligence projects in numerous industries, including retail, finance, marketing, and e-commerce.
Dmitry possesses in-depth knowledge of digital/business intelligence, ETL, data warehousing, and big data technologies. He has extensive experience in the data integration process and is proficient in using various data warehousing methodologies. Dmitry has constantly exceeded project expectations when he has worked for financial, machine tool, and retail industries.
He has completed a number of multinational full BI/DI solution life cycle implementation projects. With expertise in data modeling, Dmitry also has a background and business experience in multiple relation databases, OLAP systems, and NoSQL databases.
In addition, he has reviewed SAP BusinessObjects Reporting Cookbook, Creating Universes with SAP BusinessObjects, and Learning SAP BusinessObjects Dashboards, all by Packt Publishing and was the author of SAP Lumira Essentials, Packt Publishing.
I would like to tell my wife Sveta how much I love her. I dedicate this book to my wife and children, Vasily and Anna. Thank you for your never-ending support that keeps me going.
Sergey Sheypak started his so-called big data practice in 2010 as a Teradata PS consultant. His was leading the Teradata Master Data Management deployment in Sberbank, Russia (which has 110 billion customers). Later Sergey switched to AsterData and Hadoop practices. Sergey joined the Research and Development team at MegaFon (one of the top three telecom companies in Russia with 70 billion customers) in 2012. While leading the Hadoop team at MegaFon, Sergey built ETL processes from existing Oracle DWH to HDFS. Automated end-to-end tests and acceptance tests were introduced as a mandatory part of the Hadoop development process. Scoring geospatial analysis systems based on specific telecom data were developed and launched. Now, Sergey works as independent consultant in Sweden.
About the Reviewer
Jigar Bhatt is a computer engineering undergraduate from the National Institute of Technology, Surat. He specializes in big data technologies and has a deep interest in data science and machine learning. He has also engineered several cloud-based Android applications. He is currently working as a full-time software developer at a renowned start-up, focusing on building and optimizing cloud platforms and ensuring profitable business intelligence round the clock.
Apart from academics, he finds adventurous sports