Setting up a Fully Distributed Hadoop Cluster
Here i will discuss on how to setup a fully distributed hadoop cluster with 1-master and 2 salves. Here the three nodes are setup in three different machines.
Updating Hostnames
To start off the things, lets first give hostnames to the three nodes. Edit the /etc/hosts file with following command.sudo gedit /etc/hosts
Add following hostname and against the ip addresses of all three nodes. Do this for the all three nodes.
192.168.2.14 hadoop.master 192.168.2.15 hadoop.slave.1 192.168.2.15 hadoop.slave.2
Once you do that, update the /etc/hostname file to include hadoop.master/hadoop.slave.1/hadoop.slave.2 as the hostname of each of the machines respectively.
Optional:
For security concerns, one might prefer to have a separate user for Hadoop. In order to create a separate user execute the following command in the terminal:sudo addgroup hadoop sudo adduser --ingroup hadoop hduserGive a desired password..
Then restart the machine.
sudo reboot
Install SSH
Hadoop needs to copy files between the nodes. For that it should be able to acces each node with ssh, without having to give username/password. Therefore, first we need to install ssh client and server.sudo apt install openssh-client sudo apt install openssh-server
Generate a key
ssh-keygen -t rsa -b 4096
Copy the key for each node
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@hadoop.master ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@hadoop.slave.1 ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@hadoop.slave.2
Try sshing to all the nodes. eg:
ssh hadoop.slave.1
You should be able to ssh to all the nodes, without proving the user credentials. Repeat this step in all three nodes.
Configuring Hadoop
To configure hadoop, change the following configurations:Define hadoop master url in <hadoop_home>/etc/hadoop/core-site.xml , in all nodes.
<property> <name>fs.default.name</name> <value>hdfs://hadoop.master:9000</value> </property>
Create two directories /home/wso2/Desktop/hadoop/localDirs/name and /home/wso2/Desktop/hadoop/localDirs/data (and make hduser the owner, if you create a separate user for hadop) . Give read/write rights to that folder.
Modify the <hadoop_home>/etc/hadoop/hdfs-site.xml as follows, in all nodes.
<property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.name.dir</name> <value>/home/wso2/Desktop/hadoop/localDirs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/home/wso2/Desktop/hadoop/localDirs/data</value> </property>
<hadoop_home>/etc/hadoop/mapred-site.xml (all nodes)
<property> <name>mapreduce.job.tracker</name> <value>HadoopMaster:5431</value> </property>
Add the hostname of master node, to <hadoop_home>/etc/hadoop/masters file, in all nodes.
hadoop.master
Add hostname of slave nodes to <hadoop_home>/etc/hadoop/slaves file, in all nodes.
hadoop.slave.1 hadoop.slave.2
(Only in Master) We need to format the namenodes, before we start hadoop. For that, in the master node, navigate to <hadoop_home>/etc/hadoop/bin/ directory and execute the following.
./hdfs namenode -format
Finally, start the hadoop server, by navigating to <hadoop_home>/etc/hadoop/sbin/ directory, and execute the following:
./start-dfs.sh
If everything goes well, hdfs should be started. And you can browse the webUI of the namenode from the URL: http://localhost:50070/dfshealth.jsp.
1 comments
This message is great. phone girls London
ReplyDelete