Guide to install and run Hadoop on Windows

Published in

Republic of Coders — India

4 min readFeb 4, 2021

Hadoop is a software framework from Apache Software Foundation which is used to store and process Big Data. In this article I’ve compiled the steps to install and run Hadoop on Windows

Prerequisite:

Install Java Development Kit: https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html

Install Visual C++ and other runtimes: https://www.computerbase.de/downloads/systemtools/all-in-one-runtimes/

Download Hadoop:

Download Hadoop 3.1.0: https://mirrors.huaweicloud.com/apache/hadoop/core/hadoop-3.1.0/

Open Winrar as Administrator

Extract the tar file

2. Setup System Environment variables:

Search for “environment” in start menu search bar

Click on Environment variables

Click on New and create a new variable called HADOOP_HOME and paste the path of the Hadoop bin file in variable value

Click on New and create a new variable called JAVA_HOME and paste the path of the java bin folder in variable value

Click on Path and click on Edit

Click on Edit and add the paths for Java and Hadoop here

3. Configurations:

Open the etc folder in the Hadoop directory

Open core-site.xml in the Hadoop directory using notepad and copy this in xml property in the configuration of the file

<name>fs.defaultFS</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

Open mapred-site.xml file with notepad and copy this property in the configuration

<name>mapreduce.framework.name</name>

</property>

</configuration>

Create a folder called ‘data’ in the Hadoop directory

Create two folders called datanode and namenode inside the data folder

Open the hdfs-site.xml using notepad and copy the below configuration

Note: The path of namenode and datanode would be the path of the datanode and namenode you just created

<name>dfs.replication</name>

</property>

<name>dfs.namenode.name.dir</name>

<value>E:\hadoop-3.1.0\data\namenode</value>

</property>

<name>dfs.datanode.data.dir</name>

<value>E:\hadoop-3.1.0\data\datanode</value>

</property>

</configuration>

Open yarn-site.xml and change the configuration

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

Open Hadoop-env.cmd using notepad and set the path of JAVA_HOME