Sunday, 2 August 2015

First Hive script on Hadoop cluster

Dear Viewers,

This post will help you to write and execute your first Hive script from Linux terminal. Before we go ahead, you have to start hadoop. You can start it by calling start-all.sh file available in hadoop directory. To run Hive script, you must have Hive directory available on your system.

In my case I have hadoop available at /usr/local/hadoop-2.6.0 and Hive at /usr/local/hive-0.11.0




Steps:
1. Run hadoop
[pavan@Pavan ~]$/usr/local/hadoop-2.6.0/sbin/start-all.sh
[pavan@Pavan ~]$ jps
6945 ResourceManager
9958 Jps
6784 SecondaryNameNode
7072 NodeManager
6609 DataNode
6472 NameNode

2. Create Hive script
You may follow given sequence while creating your first hive script

A. Drop table if exists
B. Create a new table with appropriate fields in it
C. describe table schema
D. Load data from local file into Hive table
E. Select data from Hve table

3. Open a file, put above all commands into it, and save it by ".sql" or ".hql" extension.


4. Contents of FirstHive.hql file is shown below:



5.  Contents of student.txt, input file to be loaded into hive table


 6. Run hive script
Syntax: $hive -f hivescriptfilename

[pavan@Pavan ~]$ hive -f FirstHive.hql


7. Cross check the output from hive terminal


 Congratulations. You have successfully prepared and executed Hive script file.


Reference:
1. http://www.edureka.co/blog/how-to-run-hive-scripts/

3 comments:

Unknown said...

There are lots of information about latest technology, like Hadoop cluster is a special type of computational cluster designed specifically for storing and analyzing huge amounts of unstructured data in a distributed computing environment. This information seems to be more unique and interesting. Thanks for sharing.
Big Data Training in Chennai | Big Data Training | Hadoop Course in Chennai

Unknown said...

Hadoop Distributed File System (HDFS) – the Java-based scalable system that stores data across multiple machines without prior organization. Map Reduce – a software programming model for processing large sets of data in parallel.
Regards,
Hadoop Training Chennai | Hadoop Training in Chennai | Hadoop Training institutes in Chennai

Unknown said...

Excellent post!!Hadoop is especially used for Big Data maintenance. It uses Hadoop distributed file system . Its operating system is in cross platform. Its framework mostly written in java programming language. The other languages which are used by hadoop framework are c, c++ (c with classes) and sometimes in shell scripting.Thank you..!!
Regards,
Big Data Training in Chennai | Hadoop Training in Chennai