Dear all,
Greetings from PICT !!!
Department of Computer Engineering is organizing Two day hands-on workshop on “Apac he Spark with Hadoop”, During 28th - 29th March 2019.
The workshop aims to explore Spark with Hadoop to process big data. In this world of technology, Spark is known to be the one of the most popular data process tool. Spark is used in big companies like Amazon, eBay etc.
CONVENER AND COORDINATORS
Prof. Swarupkumar Suradkar
Email: sssuradkar@pict.edu
Mob:+91 7249324171
Prof. Pavan Jaiswal
Email:prjaiswal@pict.edu
Mob:+91 9545200881
Prof. Bhumesh Masram
Email: bpmasram@pict.edu
Mob: +91 7045449943
Registration Fee
Rs. 800/-
Registration is open for all.
Last date of registration: 27th March 2019
Online registration is mandatory to participate in the workshop
Link for online registration:
https://goo.gl/EVgkfR
Registration is based on first come first serve basis.
Course package includes course material, certificate, break fast, tea and lunch.
Venue: Seminar Hall of Computer Engg. Department (A1-010), PICT, Pune.
Thank you.
COURSE CONTENTS
1.
Introduction
·
What is Hadoop, Apache Spark
·
Understand Big Data
- Hadoop 2.7.x single node cluster set up
3.
Hadoop Distributed File System(HDFS)
·
HDFS commands
·
Exploring HDFS administration
- Configuring & Demonstrating HIVE
·
HIVE Architecture, table creation
·
Optimized Row
Columnar(ORC)
· Parquet File Formats
·
Partitioning
5. Apache Spark
· Apache Spark Architecture
· RDD(Resilient Distributed Dataset)
· Creation of RDD from HDFS files
· Concept of partitioning
· Transformations: map, flatMap,filter, sample, union,intersection,
distinct, groupByKey,
reduceByKey, sortByKey, join,
cartesian, coalesce, repartition.
· Actions: reduce, collect, count, first, take, takeSample,
saveAsTextFile, Caching
· Spark DataFrame: Introduction to catalyst and tungsten, Operations on
DataFrame:map, filter, where,createTempView, distinct, select,selecrExpr,
toPandas,
toLocalIterator,dropna etc.creating
DataFrame from HIVE tables, DataFrame to HIVE table conversion,org.apache.spark.sql.types and org.apache.spark.sql.functions
· Spark Streaming, Spark Dataset,
· Spark SQL: select, create, where, HIVE UDF, writing own UDF, MLlib
package using PySpark