- Hadoop 2.7.x single node cluster set up
- Configuring & Demonstrating HIVE
This post will help you to run Hadoop MapReduce Word Count program in Java. Prerequisite: 1. Running Hadoop environment on your Linux box. (I have Hadoop 2.6.0) 2. Java installed on your Linux box. (I have Java 1.7.0) 3. External jar - hadoop-core-1.2.1.jar 4. Text input file (I have Inputfile.txt) Flow: 1. Prepare 3 Java source code files namely – WordCount.java, WordMapper.java, WordReducer.java 2. WordCount.java is a main class. You may also refer this as a Driver class. From it's source
code, it refers to WordMapper.class and WordReducer.class 3. WordMapper.java file splits up the user input and as an output generates <key,value> pair.
That is <word, and its count>. 4. WordReducer.java accepts output of Mapper as an input. It means it combines output provided
by WordMapper.class and generate final output which is also a <key,value> pair. In final output
it indicates how many time each word has occurred. 5. Compile source code files and make use of external jar file. 6. Post successful compilation, create jar file by putting together all .class files 7. Run your program. Syntax to be followed while running this program is as below.
$ hadoop jar jar-file-name Driver/main-class-name Input-file-name-on-hdfs
Output-file-directory-on-hdfs 7. Output file on HDFS generated by Mapper class will have a name “part-m-00000” and Reducer
class will have a name part-r-00000. So open file part-r-00000 from terminal to see final output.