Showing posts with label Hadoop. Show all posts
Showing posts with label Hadoop. Show all posts

Friday, 1 February 2019

Two day hands-on workshop on “Apache Spark with Hadoop”, on 28 - 29 March 2019.


Dear all,
Greetings from PICT !!!

Department of Computer Engineering is organizing Two day hands-on workshop on Apache Spark with Hadoop”, During 28th - 29th March 2019.

The workshop aims to explore Spark with Hadoop to process big data. In this world of technology, Spark is known to be the one of the most popular data process tool. Spark is used in big companies like Amazon, eBay etc.

CONVENER AND COORDINATORS 
Prof. Swarupkumar Suradkar
Mob:+91 7249324171

Prof. Pavan Jaiswal
Email:prjaiswal@pict.edu
Mob:+91 9545200881 

Prof. Bhumesh Masram
Mob: +91 7045449943

Registration Fee
Rs. 800/-
Registration is open for all. 

Last date of registration: 27th March 2019
Online registration is mandatory to participate in the workshop
Link for online registration:

Registration is based on first come first serve basis.
Course package includes course material, certificate, break fast, tea and lunch.
Venue: Seminar Hall of Computer Engg. Department (A1-010), PICT, Pune.

Thank you. 


COURSE CONTENTS
               1.     Introduction
·       What is Hadoop, Apache Spark
·       Understand Big Data
  1. Hadoop 2.7.x single node cluster set up
                3.     Hadoop Distributed File System(HDFS)
·       HDFS commands
·       Exploring HDFS administration
  1. Configuring & Demonstrating HIVE
·       HIVE Architecture, table creation
·       Optimized Row Columnar(ORC)
·       Parquet File Formats
·       Partitioning
                5.     Apache Spark
·       Apache Spark Architecture
·       RDD(Resilient Distributed Dataset)
·       Creation of RDD from HDFS files
·       Concept of partitioning
·       Transformations: map, flatMap,filter, sample, union,intersection, distinct, groupByKey,
reduceByKey, sortByKey, join, cartesian, coalesce, repartition.
·       Actions: reduce, collect, count, first, take, takeSample, saveAsTextFile, Caching
·       Spark DataFrame: Introduction to catalyst and tungsten, Operations on DataFrame:map, filter, where,createTempView, distinct, select,selecrExpr, toPandas, 
toLocalIterator,dropna etc.creating DataFrame from HIVE tables, DataFrame to HIVE table conversion,org.apache.spark.sql.types  and org.apache.spark.sql.functions
·       Spark Streaming, Spark Dataset,
·       Spark SQL: select, create, where, HIVE UDF, writing own UDF, MLlib package using  PySpark


Wednesday, 21 February 2018

HANDS-ON WORKSHOP on "BIG DATA ANALYSIS USING HADOOP"

Dear All,

Dept of Comp Engg at PICT Pune announces 

TWO DAYS HANDS-ON WORKSHOP on "BIG DATA ANALYSIS USING HADOOP"
Date : 25th & 26th February 2018.

Considering current industry requirements following will be the workshop contents: 

Hadoop configuration and administration,

HDFS, Map Reduce programming, Hive and Sqoop. 

Registration Fee :

Rs. 500/- per participant (For All Faculty, Students, Industry Person) which includes registration kit, course material, certificate, breakfast and lunch. 


For Pre Registration Fill Following Registration Form


For further information please contact to:

Prof. Swarup Suradkar
7249324171

Mr.Pavan R.Jaiswal
9545200881

Prof. Bhumesh Masram
7045449943

Convener:
Prof. Swarup Suradkar

Please share this among your references.

Thank you.







Monday, 21 September 2015

Hadoop MapReduce WordCount Program in Java

Dear Viewers,
 
This post will help you to run Hadoop MapReduce Word Count program in Java. 

Prerequisite:
1. Running Hadoop environment on your Linux box. (I have Hadoop 2.6.0)
2. Java installed on your Linux box. (I have Java 1.7.0)
3. External jar - hadoop-core-1.2.1.jar
4. Text input file  (I have Inputfile.txt)

Flow:
1. Prepare 3 Java source code files namely – WordCount.java, WordMapper.java, WordReducer.java
2. WordCount.java is a main class. You may also refer this as a Driver class. From it's source 
code, it refers to WordMapper.class and WordReducer.class
3. WordMapper.java file splits up the user input and as an output generates <key,value> pair. 
That is <word, and its count>.
4. WordReducer.java accepts output of Mapper as an input. It means it combines output provided 
by WordMapper.class and generate final output which is also a <key,value> pair. In final output 
it indicates how many time each word has occurred. 
5. Compile source code files and make use of external jar file. 
6. Post successful compilation, create jar file by putting together all .class files
7. Run your program. Syntax to be followed while running this program is as below. 
$ hadoop jar jar-file-name Driver/main-class-name Input-file-name-on-hdfs 
Output-file-directory-on-hdfs
7. Output file on HDFS generated by Mapper class will have a name “part-m-00000” and Reducer 
class will have a name part-r-00000. So open file part-r-00000 from terminal to see final output. 

Sunday, 20 September 2015

Hadoop MapReduce WordCount Program in Python

Dear Viewers,
This post will help you to run Hadoop MapReduce Word Count program in Python.



Prerequisite:
1. Running Hadoop environment on your Linux box. (I have Hadoop 2.6.0)
2. Python installed on your Linux box. (I have Python 2.7.3)
3. External jar - hadoop-streaming-2.6.0.jar
4. Text input file (I have Employee.txt)



Flow:
  1. Prepare 2 python source code file namely - Mapper.py and Reducer.py
  2. Mapper.py file splits up the user input and as an output generate <key,value> pair. That is <word, and its count>.
  3. Reducer.py accepts output of Mapper as an input. It means it combines output provided by Mapper.py and generate final output which is also a <key,value> pair. In final output it indicates how many time each word has occurred.
  4. To run these files from Linux terminal, change the permission of these files to executable permission. To do so, you may use command  $chmod +x *.py
  5. Syntax to be followed while running this program is as below. '\' on terminal indicates line continuation.
    $ hadoop jar hadoop-streaming-2.6.0.jar \
    -file filename -mapper mapperfile \
    -file filename -reducer reducerfile \
    -input inputfilename \
    -output outputfiledirectory
  6. Output file on HDFS will have name “part-00000”. So open this file from terminal to see final output.

Monday, 10 August 2015

Sqoop import and export between Mysql and HDFS

Dear Viewers,

Following script will help you to perform import and export between MySql and HDFS using Sqoop.

# Script Starts here SqoopJobs.sh ########

#!/bin/bash
# Title: Shell script to perform import and export between MySQL to HDFS and vice-versa using Sqoop
# Author: Pavan Jaiswal
#Note: This program can be made using shell script or Java. I have chosen shell script here.

Friday, 7 August 2015

Data analysis using Hive script

Dear Viewers,

This post will help you to run hive script on hive table. Hive script is made for below given title.

Hive script can be saved with the extension '.hql' or '.sql' . Copy paste the below code in a file "HiveScript.hql". In hive script, single line comment is given by '--'.


Friday, 31 July 2015

Hadoop 2.6.0 Single Node Setup on Fedora

Dear viewers,

This post will help you to single node setup of hadoop 2.6.0 on Fedora or similar systems.

Steps:

1. Install Java if not exists. (Version 1.6 +)
use sudo yum install java-package-name
After installation, cross check it by below commands

[pavan@Pavan ~]$ java -version
java version "1.7.0_b147-icedtea"
OpenJDK Runtime Environment (fedora-2.1.fc17.6-x86_64)
OpenJDK 64-Bit Server VM (build 22.0-b10, mixed mode)

[pavan@Pavan ~]$ which java
/usr/bin/java

[pavan@Pavan ~]$ whereis java
java: /bin/java /usr/bin/java /etc/java /lib64/java /usr/lib64/java /usr/share/java /usr/share/man/man1/java.1.gz