Setting up a Spark Standalone Cluster in Local Machine

Wrote by Supun Setunga March 07, 2015 3 Comments

In addition to running on the Mesos or YARN cluster managers, Apache Spark also provides a simple standalone deploy mode, that can be launched on a single machine as well. To install Spark Standalone mode, we simply need a compiled version of Spark which matches the hadoop version we are using. If you haven't installed any hadoop in the machine or if the spark will not be using the external hdfs, then you can choose any version. You can download spark as your preference from here.

Once you have downloaded and unpack Spark, there are few simple configurations you have to make in the following files.

conf/slaves

Open <SPARK_HOME>/conf/slaves file in a text editor and add "localhost" to a newline.

conf/spark-env.sh

Create a new file <SPARK_HOME>/conf/spark-env.sh and add following. Spark by-default comes with a template file for spark-env.sh which can be found in the conf directory with name "spark-env.sh.template". You can use that template to configure, by renaming it to "spark-env.sh" and modifying it.

SPARK_HOME=/home/supun/Supun/Softwares/Spark/spark-1.2.1-bin-hadoop2.4
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_INSTANCES=2
SPARK_WORKER_DIR=/home/supun/Supun/Softwares/Spark/wroker
SPARK_MASTER_WEBUI_PORT=5001
SPARK_WORK_WEBUI_PORT=5002

Here, SPARK_WORKER_MEMORY is the amount of memory to allocate for worker nodes. SPARK_WORKER_INSTANCES is the number of worker nodes needed. Here I have created only two worker nodes. SPARK_WORKER_DIR is the place to store spark job related files such as logs and etc. for worker nodes.

SPARK_MASTER_WEBUI_PORT is the URL of the we-based dashboard of the master (If this is not set, Spark will set it to 8080. If the port 8080 is in use by some other application, then it will increment the port by one, and will set to 8081). SPARK_WORK_WEBUI_PORT is the starting port for URLs of the web-based dashboard of worker nodes. If there are more than one worker node, prot numbers will be set automatically starting from the value set here. (e.g: In our scenario, port numbers will be 5002 and 5003, since there are two worker nodes).

Now the configurations are all done.To start the Spark cluster Master, navigate to <SPARK_HOME>/sbin and execute he following.
./start-master.sh

If all the configurations have been done correctly, once the master is started, you should be able to access the master 's web UI at : http://localhost:5001

When starting worker nodes, master tries to access the workers through ssh. Therefore first check whether ssh is installed in the machine by executing "which ssh" and "which sshd". If ssh/sshd has not being installed. install them using:
sudo apt-get install ssh

Now if the software is installed, try the following command (ubuntu) to check whether ssh can access the localhost without a password.
ssh localhost

If this asks for a password (local machine's user's password), then execute the following.
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Then to start the worker nodes execute the following from the same directory (<SPARK_HOME>/sbin)
./start-slaves.sh

Once the worker nodes are up, you would be able to see the worker nodes have been listed in the Master webUI and also. You could also be able to access the webUIs of each of the worker nodes as well. (http://localhost:5002 and http://localhost:5003 in this scenario).

Finally, to run any applications this spark cluster, you can use the URL displayed at the top-left corner of the Master WebUI.

Tags: spark

3 comments

UnknownNovember 30, 2015 at 3:50 AM
Hi Admin,
This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic.s
Regards,
sas training chennai|sas institutes in Chennai|sas training institutes in Chennai
ReplyDelete
Replies
UnknownDecember 14, 2015 at 4:03 AM
Hello,
I really enjoyed while reading your article, the information you have mentioned in this post was damn good. Keep sharing your blog with updated and useful information.
Regards,
Informatica training in chennai|Best Informatica Training In Chennai|Informatica training center in Chennai
ReplyDelete
Replies
Piotr LAJuly 6, 2018 at 4:22 AM
Excellent article! Aasapolska.pl
ReplyDelete
Replies

Add comment

Setting up a Spark Standalone Cluster in Local Machine

Share:

3 comments

About Me

My Tech World

Blog Archive

Popular Posts

Like us on Facebook

Search This Blog

Labels

Report Abuse

Most Popular

Pages

FOLLOW US @ INSTAGRAM

Featured

JSON Manipulation with Ballerina

Looped Slider

FOLLOW US @ INSTAGRAM

Setting up a Spark Standalone Cluster in Local Machine

Share:

Related Articles

3 comments

About Me

My Tech World

Blog Archive

Popular Posts

Like us on Facebook

Search This Blog

Labels

Report Abuse

Most Popular

Pages

FOLLOW US @ INSTAGRAM

Featured

JSON Manipulation with Ballerina

Looped Slider

FOLLOW US @ INSTAGRAM