Setting up a Fully Distributed Hadoop Cluster

Here i will discuss on how to setup a fully distributed hadoop cluster with 1-master and 2 salves. Here the three nodes are setup in three different machines.

Updating Hostnames

To start off the things, lets first give hostnames to the three nodes. Edit the /etc/hosts file with following command.
sudo gedit /etc/hosts

Add following hostname and against the ip addresses of all three nodes. Do this for the all three nodes.    hadoop.master    hadoop.slave.1    hadoop.slave.2

Once you do that, update the /etc/hostname file to include hadoop.master/hadoop.slave.1/hadoop.slave.2 as the hostname of each of the machines respectively.


For security concerns, one might prefer to have a separate user for Hadoop. In order to create a separate user execute the following command in the terminal:
sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser
Give a desired password..

Then restart the machine.
sudo reboot

Install SSH

Hadoop needs to copy files between the nodes. For that it should be able to acces each node with ssh, without having to give username/password. Therefore, first we need to install ssh client and server.
sudo apt install openssh-client
sudo apt install openssh-server

Generate a key
ssh-keygen -t rsa -b 4096

Copy the key for each node
ssh-copy-id -i $HOME/.ssh/ hduser@hadoop.master
ssh-copy-id -i $HOME/.ssh/ hduser@hadoop.slave.1
ssh-copy-id -i $HOME/.ssh/ hduser@hadoop.slave.2

Try sshing to all the nodes. eg:
ssh hadoop.slave.1

You should be able to ssh to all the nodes, without proving the user credentials. Repeat this step in all three nodes.

Configuring Hadoop

To configure hadoop, change the following configurations:

Define hadoop master url in <hadoop_home>/etc/hadoop/core-site.xml , in all nodes.

Create two directories /home/wso2/Desktop/hadoop/localDirs/name and /home/wso2/Desktop/hadoop/localDirs/data (and make hduser the owner, if you create a separate user for hadop) . Give read/write rights to that folder.

Modify the <hadoop_home>/etc/hadoop/hdfs-site.xml as follows, in all nodes.

<hadoop_home>/etc/hadoop/mapred-site.xml (all nodes)

Add the hostname of master node, to <hadoop_home>/etc/hadoop/masters file, in all nodes.

Add hostname of slave nodes  to <hadoop_home>/etc/hadoop/slaves file, in all nodes.

(Only in Master) We need to format the namenodes, before we start hadoop. For that, in the master node, navigate to <hadoop_home>/etc/hadoop/bin/ directory and execute the following.
./hdfs namenode -format

Finally, start the hadoop server, by navigating to <hadoop_home>/etc/hadoop/sbin/ directory, and execute the following:

If everything goes well, hdfs should be started. And you can browse the webUI of the namenode from the URL: http://localhost:50070/dfshealth.jsp.