Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Making big data simple

with Databricks
We are Databricks, the company behind Spark

Share of Spark code


Founded by the creators of
contributed by Databricks
Apache Spark in 2013 75% in 2014

Data Value

Created Databricks on top of Spark to make big data simple.


2
WORKING WITH BIG
DATA I S D I F F I C U LT

“Through 2017, 60% of big-data


projects will fail to go beyond
piloting and experimentation
and will be abandoned.”
GARTNER

3
PROBLEM

Building infrastructure and data


pipelines is complex

4
Your difficult journey to finding value in data
Building a Build and
Import and explore data with disparate tools deploy data
cluster
applications

Data Advanced Production


Exploration Analytics Deployment

ETL
Data Dashboards
Warehousing & Reports

Long delays $$ High costs


5
3 main causes of this problem:

Infrastructure is complex Tools are slow, clunky, Re-engineering of


to build and maintain and disparate prototypes for deployment

•  Expensive upfront investment •  Not user-friendly •  Duplicated effort


•  Months to build •  Long time to compute answers •  Complexity to achieve
•  Dedicated DevOps to operate •  Lots of integration required production quality

6
SOLUTION

Data Value

Build Databricks on top of Spark to make big data simple 7


A complete solution, from ingest to production
Instant & Seamless
secure Fast and easy-to-use tools in a single platform transition to
infrastructure production
Notebooks with one- Built-in ML and
click visualization Job scheduler
graph libraries
Easy connection
to diverse data Data Advanced Production
sources
Exploration Analytics Deployment

ETL
Data Dashboards
Warehousing & Reports
Real-time query engine Customizable dashboards
& 3rd party apps

Short time to value $$ Lower costs


8
Four components of Databricks
Make Big Data simple

Notebooks & Dashboards Jobs 3rd-Party Apps

Cluster Manager

Interactive workspace
Managed Spark clusters Production pipeline scheduler 3rd party applications
with notebooks
•  Easily provision clusters •  Schedule production workflows •  Explore data and develop code •  Connect powerful BI tools
•  Harness the power of Spark •  Implement complete pipelines in Java, Python, Scala, or SQL •  Leverage a growing
•  Import data seamlessly •  Monitor progress and results •  Collaborate with the entire team ecosystem of applications
•  Point and click visualization
•  Publish customized dashboards
9
Databricks benefits

Higher Faster deployment Data democratization


productivity of data pipelines

•  Maintenance-free •  Zero management •  One shared repository


infrastructure Spark clusters •  Seamless collaboration
•  Real time processing •  Instant transition from •  Easy to build sophisticated
•  Easy to use tools prototype to production dashboards and notebooks
10
A few examples of Databricks in action

Prepare data Perform analytics Build data products


•  Import data using APIs or •  Explore large data sets in •  Rapid prototyping
connectors real-time •  Implement advanced
•  Cleanse mal-formed data •  Find hidden patterns with analytics algorithms
•  Aggregate data to create a regression analysis •  Create and monitor robust
data warehouse •  Publish customized production pipelines
dashboards
11
CUSTOMER CASE STUDIES

12
Customer testimonials
“Without Databricks and the real-time insights from Spark, we wouldn't be
able to maintain our database at the pace needed for our customers”
Darian Shirazi, CEO, Radius Intelligence

“We condensed the 6 months we had planned for the initial prototype to
production process to just about a couple of weeks with Databricks.”
Rob Ferguson, Director of Engineering, Automatic Labs

“Databricks is used by over a third of our staff; After implementation, the


amount of analysis performed has increased sixfold, meaning more
questions are being asked, more hypotheses tested.”
Jaka Jančar, CTO, Celtra

13
Radius Intelligence
Gathering customer insights for marketers

CHALLENGE: Complex data integration

•  25 million businesses
•  Over 100 billion points of data

RESULT: Speed up the data pipeline

•  Entire data set processed in hours


instead of days
•  Deploy weekly updates to customers BENEFIT:
instead of monthly
Higher productivity
14
Automatic Labs
IoT for drivers – making car sensor data useful

CHALLENGE: Product idea validation

•  Ingest billions of data points


•  Rapidly test hypothesis
Location based
•  Iterate on ideas in real-time
Driving habits
X
RESULT: Shorter time from idea to product

3 weeks with Databricks vs.


2 months with previous solution
BENEFIT:
Faster Deployment of Data Pipelines
15
Celtra
Building and serving rich digital ads across platforms

CHALLENGE: Analytics specialist bottleneck

•  Billions of data points, operational data of


the entire company
•  Huge backlog of analytics projects

RESULT: Enable self-service for non-specialists

•  Grew the number of analysts by 4x


•  Increased analytics project completed by BENEFIT:
6x in four months
Data Democratization
16
Sharethrough
Intelligent ad placement

CHALLENGE: Slow performance, costly DevOps

•  Terabyte-scale clickstream data, long delays


in new feature prototyping $
•  Two full-time engineers to maintain
infrastructure

RESULT: Faster answers, zero management

•  Prototyped new feature in record time


•  Reduced system downtime with faster
root-cause analysis
•  Dramatically easier to maintain than Hive BENEFIT: Higher Productivity
17
Yesware
Delivering tools and analytics for sales teams

CHALLENGE: Infrastructure pain, slow & clunky tools

•  6 months to setup Pig, very problematic pipeline X


•  Too slow to extend reporting history beyond 1 month
•  Need to develop machine-learning algorithms

RESULT: Instant infrastructure, full suite of capabilities

•  3 weeks to setup robust Spark pipeline w/Databricks


•  Double data processed, in fraction of time BENEFIT:
•  Built-in machine learning libraries Faster Deployment of Data Pipelines
18
A few of our customers

19
WHAT’S NEW?

20
What’s new with Databricks
•  Databricks is now generally available (announced on June 15th, 2015)
•  Upcoming features during second half of 2015:
•  R-language notebooks: Analyze large-scale data sets using R in the
Databricks environment.
•  Access control and private notebooks: Manage permissions to view
and execute code at an individual level.
•  Version control: Track changes to source code in the Databricks
platform.
•  Spark streaming support: Enabling a fault-tolerant real-time
processing

21
What’s new with Spark
•  The general availability of Spark 1.4 was announced on June 10th 2015
•  Spark 1.4 is largest Spark release: more than 220 contributors and 1,200
commits.
•  Key new features introduced in Spark 1.4:
•  New R language API (SparkR)
•  Expansion of Spark’s Dataframe API’s: window functions, statistical and
mathematical functions, support for missing data.
•  API to build complete machine learning pipelines.
•  UI visualizations for debugging and monitoring programs: interactive event
timeline for jobs, DAG visualization, visual monitoring for Spark Streaming.

22
Data science made easy with Apache Spark
From ingest to production

þ  Unified þ  Zero management Higher productivity

þ  Fast at any scale þ  Real-time and collaborative

Faster deployment
þ  Flexible þ  Instant to production of data pipelines

þ  No lock-in þ  Open and extensible


Data democratization
23
Databricks is available today

Contact [email protected]

Or sign up for a trial at


https://databricks.com/registration
Thank you

You might also like