How to Install Hive on Windows 10

Install Hive on Windows 10 is not an easy process. You need to be aware of prerequisites and a basic understanding of how the Hive tool works. Big Data Hadoop is known to people who are into data science and work in the data warehouse vertical. Large data can be handled via Big Data Hadoop framework. 

Structure

In this article, we are going to see how you install Hive (a data query processing tool) and have it configured in the Hadoop framework.

Prerequisites to successfully perform Hive Installation

Before you start the process of installing and configuring Hive, it is necessary to have the following tools available in your local environment. 

If not, you will need to have the below software for Hive to be working appropriately.

  • Java 
  • Hadoop
  • Yarn 
  • Apache Derby

Install Hive on Windows 10 [step-by-step guide]

  1. Check whether Java is available in your machine. Follow the steps below to verify the same.
    • Initiate CMD window.
CMD
  • Enter the text as a command below and hit ENTER.
C:\Users\Administrator\java -version
  • You will see the details as output shown below.
Version Check
  • In case your Java version is older, you will need to update by following the next steps.
  • In the search bar at the bottom left, enter the keyword “About java”.
search box
  • You will see the search results available. 
about java
  • Open the Java app. The pop up appears as below.
Java app
  • Click the link shown in the text, you will be redirected to the Java webpage.
  • Click the red (agree and start free download) button shown in the image below.
Download Java
  • An exe file will be downloaded and saved in your Downloads folder.
  • Run the exe file by double-clicking. See below.
Installing
  • You will get a prompt that states the old version is available in the system.
Uninstall outdated version
  • Choose Uninstall
Next button
  • Choose the Next option. Once the new version is installed, you will see the success message as below.
Close button
  • Choose Close to shut the window.

Install Hadoop 3.3.0 in Windows 10

  1. Install Hadoop 3.3.0 in your Windows 10.
Hadoop
  • Click the top right corner green button that says “Download tar.gz”.
  • Once downloaded, check whether Java is installed. 

NOTE: We already have installed / updated Java in the previous step. Java JDK 8 is the prerequisite for Hadoop installation.

  • Before we proceed with installation steps, make sure your Java is installed in your root drive (c:\). If not, please move the folder from C:\Program Files to C:\Java. 

NOTE: This will avoid conflict while setting environment variables.

  1. In your Windows 10 System Settings, search for environment settings.
Display settings
  • Choose the option Edit system environment variables.
Environmental variables
  • Click the button that says Environment Variables. See the image below.
setup variables
  • Add a new variable by clicking New
add new variables
  • Enter the name as JAVA-HOME. 
  • Enter the path where Java is located. Java path for us is under C:\Java\{jdknamwithversion}\bin.
  • Once the new variable is set, edit the path variable.
  • To do so, select the Path variable and click Edit.
Choose path
  • Click New and paste the path.
new path
  • Check Java is working as expected by entering the command javac from the command line window.
Check java version
  • Now, go to the folder where hadoop tar.gz is downloaded.
unzip
  • Extract hadoop-3.3.0.tar.gz. You will get another tar file.
  • Extract hadoop-3.3.0.tar. Once done, you will see the extracted folder.
  • Copy the folder hadoop-3.3.0 in your C:\ drive.
  • Edit 5 files under this folder. Go to C:\hadoop-3.3.0\etc\hadoop.
    • core-site.xml
    • hadoop-env.cmd
    • hdfs-site.xml
    • mapred-site.xml
    • yarn-site.xml
  • Open these files in notepad editor.
    • Enter the code below in the core-site.xml file.
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property></configuration>
  • Enter the code below in the mapred-site.xml file.
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property></configuration>
  • Enter the code below in the yarn-site.xml file.
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value> </property></configuration>
  • Create 2 folders “datanode” and “namenode” in your C:\hadoop-3.3.0\data folder before we update the hdfs-site.xml. The folder paths will look like this.
C:\Hadoop-3.3.0\data\datanodeC:\Hadoop-3.3.0\data\namenode
  • Enter the code below in the hdfs-site.xml.
<configuration> <property><name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///C:/hadoop-3.3.0/data/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/C:/hadoop-3.3.0/data/datanode</value> </property></configuration>
  • Set the JDK path into the hadoop-env.cmd file as below.
JDK path
  • Save all the files we updated as above.
  • Next, set the HADOOP path variable from Windows 10 system settings.
  • Choose the option Edit system environment variables.
Edit system environment variables
  • Click the button that says Environment Variables
  • Create a new variable HADOOP_HOME.
  • Set the path as C:\hadoop-3.3.0\bin.
environment variables
  • Edit the path variable and set the path of Hadoop as below.
set path
  • Enter C:\hadoop-3.3.0\bin and OK.
  • Set another path for the sbin folder. Perform the same step as above.
click ok
  • Now, go to the bin folder under your hadoop-3.3.0 folder.
hadoop folder
  • Copy configuration files for Hadoop under this folder. Please refer to the Configuration zip file to copy the files.
  • Delete the existing bin folder and copy the bin folder from this configuration.zip to C:\hadoop-3.3.0\.
  • You are ready as you have successfully installed hadoop. To verify success, open CMD as administrator, enter the command below.
namenode
  • You will get the message as below.
cmd message
  • Now, the next step is to start all the services. If your installation is successful, go to your sbin directory and enter the command as below.
executing commands
  • You will see namenode and datanode windows will start after executing the above command.
  • Then, give the command, start-yarn. Two yarn windows will open up and will keep running.

NOTE: If all of the above resource files do not shut down automatically, be assured that your installation and configuration is successful.

  • Enter the command jps. You will see a number of processes running on all four resources.
  1. To access Hadoop, open your browser and enter localhost:9870. You will see below.
Hadoop Page
  1. To check yarn, enter localhost:8088 in a new window.
local host
  1. Now, you are ready to install Hive. Download the package from https://downloads.apache.org/hive/hive-3.1.2/ by clicking the apache-hive-3.1.2-bin.tar.gz link.
  2. Extract the folder using the 7zip extractor. Once extracted, you will see hive-3.1.2.tar file. Extract the same again.
  3. The way we have set environment variables for hadoop, we need to set the environment variable and path for Hive too.
    • Create the following variables and their paths.
      • HIVE_HOME: C:\hadoop-3.3.0\apache-hive-3.1.2\
      • DERBY_HOME: C:\hadoop-3.3.0\db-derby-10.14.2.0\
      • HIVE_LIB: %HIVE_HOME%\lib
      • HIVE_BIN: %HIVE_HOME%\bin  
      • HADOOP_USER_CLASSPATH_FIRST: true
    • Set the above path for each variable as shown after “:”.
  4. Copy and paste all Derby libraries (.jar files) from derby package to the Hive directory: C:\hadoop-3.3.0\apache-hive-3.1.2\lib
libraries
  1. Locate hive-site.xml in the bin directory. Enter the code below in the XML file.
xml file
  1. Start hadoop services by: start -dfs and start-yarn as we saw in the Hadoop section above.
  2. Start derby services by: C:\hadoop-3.3.0\db-derby-10.14.2.0\bin\StartNetworkServer -h 0.0.0.0
  3. Start hive service by: go to your Hive bin directory through the command line. Enter hive. If that command doesn’t work, the following message will be shown.
error

NOTE: This is due to Hive 3.x.x version not supporting the commands in Windows 10. You can download the cmd libraries from the https://github.com/HadiFadl/Hive-cmd link. Also, replace the guava-19.0.jar to guava-27.0-jre.jar from the Hadoop’s hdfs\lib folder.

  1. Once done, run the command hive again. It should be executed successfully.
  2. Metastore initialization after starting the hive service.

NOTE: Again, you will need to use the cgywin tool to execute linux commands in Windows.

  • Create the 2 folders: C:\cygdrive and E:\cygdrive.
  • Open the command window and enter the following commands.
command
  • Specify the environment variables as below.
command
  • Enter the command below to initialize Metastore.
command
  • Now, open the command window and enter the command as shown below.
command
  • Open another command window and type hive. You should be able to successfully start hive service.

Is Installing Hive on Windows 10 is a complicated process?

As we saw, version 3.x.x of Hive is a little difficult to install in the Windows machine due to unavailability of commands support. You will need to install a commands library that supports linux based commands to initialize Metastore and Hive services for successful execution.

If you set environment variables and path correctly in Hadoop and Hive configuration, life will become easier without getting errors on starting the services. 

In this article, you have got an overview on the steps on “install hive on Windows 10”.

Read More: How to Install Server Nginx on Ubuntu