Setting up a Hadoop File System in Local Machine

In this article I will be describing how to setup a hadoop file system in the local machine to run in Pseudo-Distributed mode. First download hadoop form here.

Then extract it to any preffered location. Now we need to send the environment variables to this  extracted location. For that open  open ~/.bashrc file and add the following to two separate lines.

export HADOOP_HOME=/home/supun/Supun/Softwares/hadoop-2.2.0

export PATH=/home/supun/Supun/Softwares/hadoop-2.2.0/bin:$PATH
Where /home/supun/Supun/Softwares/hadoop-2.2.0 is the location of my hadoop file was extracted. I will be refering this location as HADOOP_HOME from here onwards.

Configuring:


Now we need to make some small configurations for the following files. Open each of the file and add the following to them.

 HADOOP_HOME/conf/core-site.xml

        <configuration>
                <property>
                        <name>fs.default.name</name>
                        <value>hdfs://localhost:9000</value>
                </property>
        </configuration>

HADOOP_HOME/conf/hdfs-site.xml

        <configuration>
                <property>
                        <name>dfs.replication</name>
                        <value>1</value>
                </property>
        </configuration>

HADOOP_HOME/conf/mapred-site.xml
        
        <configuration>
                <property>
                        <name>mapred.job.tracker</name>
                        <value>localhost:9001</value>
                </property>
        </configuration>



Optional: 


Check whether ssh is installed in the machine by executing "which ssh" and "which sshd". If ssh/sshd has not being installed. install them using:

     sudo apt-get install ssh

Now if the software is installed, try the following command (ubuntu) to check whether ssh can access the localhost without a password.

     ssh localhost

If this asks for a password (local machine's user's password), then execute the following.

     ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

     cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys


Start in Pseudo-Distributed mode


Before start the hdfs, you need to format the namenode. For that, navigate to HADOOP_HOME/bin directory and execute the following.
                    hdfs namenode -format

Then start the hdfs by navigateing to HADOOP_HOME/sbin and executing the following.
                   ./start-dfs.sh

Or if it didn't work, try executing "./start-dfs.sh -upgrade", instead of above.

If everything goes well, hdfs should be started. And you can browse the webUI of the name node from the URL: http://localhost:50070/dfshealth.jsp. Please refer [1] for further details on setting up hdfs in different modes.

Now you can use hadoop shell commands to manage files in this hdfs. Refer [2] and [3] for Hadoop Commands.

References:

Share:

10 comments

  1. There are lots of information about hadoop have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get to the next level in big data. Thanks for sharing this.

    Hadoop training in Tambaram
    Hadoop course in Tambaram

    ReplyDelete
  2. I was just wondering how I missed this article so far, this is a great piece of content I have ever seen in the entire Internet. Thanks for sharing this worth able information in here and do keep blogging like this.

    Hadoop Training Chennai | Big Data Training in Chennai | Big Data Training Chennai

    ReplyDelete
  3. Your blog has given me that thing which I never expect to get from all over the websites. Nice post guys!

    Web Developer Melbourne

    ReplyDelete
  4. I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.
    google-cloud-platform-training-in-chennai


    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. It is amazing that you share your knowledge with us.

    Grace

    ReplyDelete
  7. I gathered a lot of information through this article.Every example is easy to undestandable and explaining the logic easily.google cloud platform training in bangalore

    ReplyDelete