Launch Spark on Windows - Simplified

using Kubernetes on Docker

Estimated time for reading this article - 15 minutes and an hour for LAB setup

Couple of years back it was such a big task to setup Spark on windows. Thanks to the new world of container (Docker) and orchestration (Kubernetes). It has become a piece of cake now.

In the last couple of weeks, I got few questions from some dear friends, if there is a way to launch Spark shell using Kubernetes or Docker and that prompted me to write this article.

My answer is YES there is and how ? Simply by running Kubernetes on Docker

Let's get started.

Step 1:

First and most important step is to have docker engine / docker hub installed and running on your machine. I tried this setup on Windows 8 and Windows 10 and it worked exactly the same way. Ofcourse, you will need to signup to have an account on https://hub.docker.com/

See here for setup instructions: https://docs.docker.com/docker-for-windows/install/

Once you are done with setup, launch windows command prompt to run commands given in the following steps

Step 2: docker ps

No alt text provided for this image

By default, no container is up and running. Our goal is to launch minikube on Docker and then use it to setup Spark.

If you want to learn about minikube, here is the link https://kubernetes.io/docs/setup/learning-environment/minikube/

Step 3: minikube start

No alt text provided for this image

Step 4: docker ps

You should see minikube container running

No alt text provided for this image

Step 5: kubectl cluster-info

No alt text provided for this image

Step 6: Create deployment and services for Spark master and worker

kubectl apply -f https://raw.githubusercontent.com/big-data-europe/docker-spark/master/k8s-spark-cluster.yaml

No alt text provided for this image

Step 7: Launch Spark Shell

kubectl run spark-base --rm -it --labels="app=spark-client" --image bde2020/spark-base:2.4.5-hadoop2.7 -- bash ./spark/bin/spark-shell --master spark://spark-master:7077 --conf spark.driver.host=spark-client

No alt text provided for this image

Step 8: Try out some RDD commands 

No alt text provided for this image
No alt text provided for this image

Ctrl + c to exit from Spark-shell.

Enjoy !!

Pradip Sahoo

Second Vice President at Northern Trust

4y

Will be connecting with you soon 😊

Like
Reply

Thank you Mujtaba.. glad you liked it.

Like
Reply
Mujtaba M.

Software Test Engineer @ Ecosia 🌳 a better planet with every search

4y

Thanks, Amit Singh, Very nicely explained and easy to follow.

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics