Start free trial Sign in

From the course: Big Data Analytics with Hadoop and Apache Spark

The combined power of Spark and Hadoop Distributed File System (HDFS)

From the course: Big Data Analytics with Hadoop and Apache Spark

Start my 1-month free trial Buy for my team

The combined power of Spark and Hadoop Distributed File System (HDFS)

“

- [Kumaran Ponnambalam] Data engineers often use stacks to leverage the power of multiple technologies. For example, there is often a need for not just scalable storage but also fast processing. Many teams find themselves using the combination of Hadoop for storage and Spark for compute, because it provides unparalleled scalability and performance for analytics pipelines. In order to harness this power, it is important to understand how Hadoop and Spark work with each other and utilize the levers available. My name is Kumaran Ponnambalam, in this course, I will show you how to build scalable and high performance analytics pipelines with Apache Hadoop and Spark. I will only discuss key tools and best practices for taking advantage of this combination. We will use a Hortonworks Sandbox for this course. You need prior familiarity, with both Apache Hadoop and Spark. In this course we will only focus on using Hadoop and Spark together. We will also use Zeppelin notebooks for our examples. Please refer to other essential courses and resources, if you want to learn the essentials of these technologies. That being said, let's explore how to maximize the combined power of Hadoop and Spark.

Contents