Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

A Convergence of Key Trends

• Companies have always kept large amounts of information


• But until recently, they stored most of that information on
tape
• While it’s true that the amount of data in the world keeps
growing, the real change has been in the ways that we
access that data and use it to create value
• Today, you have technologies like Hadoop, for example,
that make it functionally practical to access a tremendous
amount of data, and then extract value from it
• The availability of lower-cost hardware makes it easier and
more feasible to retrieve and process information, quickly
and at lower costs than ever before
1
A Convergence of Key Trends
• So it’s the convergence(combination) of several trends—
more data and less expensive, faster hardware—that ’s
driving this transformation
• Today, we got raw speed at an affordable price:
cost/benefit has really been a game changer for us
• The two scenarios described by Lucas are not fantasies:
 Yesterday, the cost of real-time data analysis was
prohibitive(very high price)
 Today, real-time analytics have become affordable
• As a result, market-leading companies are already using
Big Data Analytics to improve sales revenue, increase2
A Convergence of Key Trends
• Before moving on, it ’s worth repeating that not
all new Big Data technology is open source
• For example, SAP successfully entered the Big
Data market with SAP HANA, an in-memory
database platform for real-time analytics and
applications
• Products like SAP HANA are reminders that
suppliers of proprietary solutions, such as SAP,
SAS, Oracle, IBM, and Tera data, are playing—and
will obviously continue to play— significant roles
in the evolution of Big Data analytics
3
Relatively speaking
• Big Data, as you might expect, is a relative term
• Although many people define Big Data by
volume, definitions of Big Data that are based on
volume can be troublesome since some people
define volume by the number of occurrences
• Some people define volume based on the
number of interesting pieces of information for
each occurrence (or in database terminology, the
columns in a table or in analytics terminology the
features or dimensions) and some people define
volume by the combination of depth and width
4
Relatively speaking
• The industry has an evolving definition around Big Data that is
currently defined by three dimensions:
1. Volume
2. Variety
3. Velocity
• Data volume can be measured by the sheer quantity of
transactions, events, or amount of history that creates the data
volume, but the volume is often further exacerbated by the
attributes, dimensions, or predictive variables
• Typically, analytics have used smaller data sets called samples to
create predictive models
• Oftentimes, the business use case or predictive insight has been
severely blunted since the data volume has purposely been
limited due to storage or computational processing constraints 5
Relatively speaking
• Data variety is the assortment(mixture) of data
• Traditionally data, especially operational data, is
“structured” as it is put into a database based on the
type of data (i.e., character, numeric, floating point, etc.)
• Over the past couple of decades, data has increasingly
become “unstructured” as the sources of data have
proliferated beyond operational applications
• Oftentimes text, audio, video, image, geospatial, and
Internet data (including click streams and log files) are
considered unstructured data
• However, since many of the sources of this data are
programs the data is in actuality “semi structured”
6
Relatively speaking
• Semi-structured data is often a combination of different types of
data that has some pattern or structure that is not as strictly
defined as structured data
• For example, call center logs may contain customer name + date
of call + complaint where the complaint information is
unstructured and not easily synthesized into a data store
• Data velocity is about the speed at which data is created,
accumulated, ingested, and processed
• The increasing pace of the world has put demands on businesses
to process information in real-time or with near real- time
responses
• This may mean that data is processed on the fly or while
“streaming” by to make quick, real-time decisions or it may be
that monthly batch processes are run interday to produce more
timely decisions 7
A Wider Variety of Data
• The variety of data sources continues to increase
• Traditionally, internally focused operational systems, such as
ERP (enterprise resource planning) and CRM applications,
were the major source of data used in analytic processing
• Wider variety of data sources such as:
oInternet data (i.e., click stream, social media, social
networking links)
oPrimary research (i.e., surveys, experiments, observations)
oSecondary research (i.e., competitive and marketplace data,
industry reports, consumer data, business data)
oLocation data (i.e., mobile device data, geospatial data)
oImage data (i.e., video, satellite image, surveillance)
oSupply chain data (i.e., EDI, vendor catalogs and pricing,
quality
oinformation)
oDevice data (i.e., sensors, PLCs, RF devices, LIMs, telemetry) 8
Expanding Universe of Unstructured Data
• Unstructured data is basically information that either
does not have a predefined data model and/or does not
fit well into a relational database
• Unstructured information is typically text heavy, but may
contain data such as dates, numbers, and facts as well
• The term semi-structured data is used to describe
structured data that does not fit into a formal structure
of data models
• However, semi-structured data does contain tags that
separate semantic elements, which includes the
capability to enforce hierarchies within the data

9
Cubing – Education usage

• Questions:
1. I want to remember …
2. Something I learned today
3. One word to sum up what I learned
4. Something I already knew
5. I’m still confused about …
6. An “aha” moment that I had today
• RANDOM.ORG offers true random numbers to anyone on the
Internet
• Used mostly for holding drawings, lotteries and sweepstakes
• To drive online games, for scientific applications and for art and
music

10
Expanding Universe of Unstructured Data
• Big Data analytics uses a wide variety of advanced analytics, as listed
in Figure, to provide:

11
Expanding Universe of Unstructured Data
• Deeper insights
o Rather than looking at segments, classifications, regions,
groups, or other summary levels you will have insights into all
the individuals, all the products, all the parts, all the events, all
the transactions, etc
• Broader insights
o The world is complex: Operating a business in a global,
connected economy is very complex given constantly evolving
and changing conditions
o As humans, we simplify conditions so we can process events
and understand what is happening
o But our best laid plans often go astray because of the
estimating or approximating
• Big Data analytics takes into account all the data, including new
data sources, to understand the complex, evolving, and
interrelated conditions to accurate insights 12
Expanding Universe of Unstructured Data
• Frictionless actions.
• Increased reliability and accuracy that will allow the deeper and
broader insights to be automated into systematic actions

13
Expanding Universe of Unstructured Data

14

You might also like