Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Acquire Data Using CLI and Flume

Course Road Map

Lesson 5: Introduction to the Hadoop


Module 1: Big Data Management System Distributed File System (HDFS)

Lesson 6: Acquire Data using CLI, Fuse-


Module 2: Data Acquisition and Storage DFS, and Flume

Lesson 07: Acquire and Access Data


Module 3: Data Access and Processing
Using Oracle NoSQL Database

Module 4: Data Unification and Analysis Lesson 08: Primary Administrative Tasks
for Oracle NoSQL Database

Module 5: Using and Managing Oracle


Big Data Appliance

6-2
Objectives

After completing this lesson, you should be able to:


• Describe Uses of the Command Line Interface (CLI)
• Define Flume
• Describe the data-flow mechanism of Flume
• Identify the options for configuring Flume

6-3
Viewing File System Contents Using the CLI

6-4
Loading Data Using the CLI

Put files into HDFS:


$ hadoop fs –put ?site.xml
*site.xml /u01/bigdatasql_config/bigdatalite

6-5
What is Flume?

• Is a distributed service for collecting, aggregating, and


moving large data to a centralized data store
• Was developed by Apache
• Has the following features:
– Simple
– Reliable
– Fault tolerant and High Availability
– Used for online analytic applications

6-6
Flume: Architecture

A sink is responsible for delivering


the event to the next agent or
terminal repository (like HDFS) in the
Source Sink flow
• Logger
• Avro
• Hdfs
• file_roll
• org.apache.flume.sink.kafka.KafkaSink
Channel

Agent

HDFS
Web
Server

6-7
Flume Channels (Hold Events)

• Memory channel
• JDBC channel
• File channel
• Custom channel

Source Sink

Channel

Agent

Web
HDFS
Server

6-8
Flume: Data Flows

1. Agent 2. Processor 3. Collector

Source Sink Source Regex Sink Source Sink

Extract
browser name
Downstream Upstream Downstream
Tail Apache from log string Upstream HDFS://
processor agent collector
HTTPD logs and attach it to processor namenode/
HTTPD node node node
event node /weblogs/ HDFS
%(browser)/

6-9
Configuring Flume

1. Create a configuration file (flume.conf).


2. Store the file in the flume-ng/conf directory.
3. Configure individual components.
4. (Optional) Edit flume-env.sh.
5. Verify the installation by running the following command:
$ flume-ng help

6 - 10
Exploring a flume*.conf File

6 - 11
Additional Resources

• http://flume.apache.org/index.html

6 - 12
Summary

In this lesson, you should have learned to:


• Describe Uses of the Command Line Interface (CLI)
• Define Flume
• Describe the data-flow mechanism of Flume
• Identify the options for configuring Flume

6 - 13

You might also like