Guide to install and run Hadoop on Windows

Adithya S.T.
Republic of Coders — India
4 min readFeb 4, 2021

--

Hadoop is a software framework from Apache Software Foundation which is used to store and process Big Data. In this article I’ve compiled the steps to install and run Hadoop on Windows

Prerequisite:

Install Java Development Kit: https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html

Install Visual C++ and other runtimes: https://www.computerbase.de/downloads/systemtools/all-in-one-runtimes/

  1. Download Hadoop:

Download Hadoop 3.1.0: https://mirrors.huaweicloud.com/apache/hadoop/core/hadoop-3.1.0/

Open Winrar as Administrator

Extract the tar file

2. Setup System Environment variables:

Search for “environment” in start menu search bar

Click on Environment variables

Click on New and create a new variable called HADOOP_HOME and paste the path of the Hadoop bin file in variable value

Click on New and create a new variable called JAVA_HOME and paste the path of the java bin folder in variable value

Click on Path and click on Edit

Click on Edit and add the paths for Java and Hadoop here

3. Configurations:

Open the etc folder in the Hadoop directory

Open core-site.xml in the Hadoop directory using notepad and copy this in xml property in the configuration of the file

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

Open mapred-site.xml file with notepad and copy this property in the configuration

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

Create a folder called ‘data’ in the Hadoop directory

Create two folders called datanode and namenode inside the data folder

Open the hdfs-site.xml using notepad and copy the below configuration

Note: The path of namenode and datanode would be the path of the datanode and namenode you just created

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>E:\hadoop-3.1.0\data\namenode</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>E:\hadoop-3.1.0\data\datanode</value>

</property>

</configuration>

Open yarn-site.xml and change the configuration

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

Open Hadoop-env.cmd using notepad and set the path of JAVA_HOME

4. Install Windows OS specific files:

Download the bin folder from : https://github.com/s911415/apache-hadoop-3.1.0-winutils

Replace this bin folder with the one currently in Hadoop directory

5. Verify:

Verify if Hadoop is installed by running the following command:

hadoop version

6. Format the namenode:

hdfs namenode –format

7. Change the directory to sbin folder:

cd E:\hadoop-3.1.0\sbin

8. Start datanode and namenode:

start-dfs.cmd

Two separate cmd windows will open for namenode and datanode

9. Start yarn:

start-yarn.cmd

Two separate cmd windows will open for yarn resource manager and yarn node manager

Note:If you get the error:NoClassDefFoundError org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorManager

copy “hadoop-yarn-server-timelineservice-3.x.x” from ~\hadoop-3.x.x\share\hadoop\yarn\timelineservice to ~\hadoop-3.x.x\share\hadoop\yarn folder.

And

If the error is permissions is set incorrectly then open Command prompt as Administrator

10. Start Hadoop in browser:

Address for namenode information:

localhost:9870

Address for nodemanager:

localhost:8042

Hadoop is now installed

Cheers✌

--

--