Friday, 1 February 2019

Two day hands-on workshop on “Apache Spark with Hadoop”, on 28 - 29 March 2019.

Dear all,
Greetings from PICT !!!

Department of Computer Engineering is organizing Two day hands-on workshop on Apache Spark with Hadoop”, During 28th - 29th March 2019.

The workshop aims to explore Spark with Hadoop to process big data. In this world of technology, Spark is known to be the one of the most popular data process tool. Spark is used in big companies like Amazon, eBay etc.

Prof. Swarupkumar Suradkar
Mob:+91 7249324171

Prof. Pavan Jaiswal
Mob:+91 9545200881 

Prof. Bhumesh Masram
Mob: +91 7045449943

Registration Fee
Rs. 800/-
Registration is open for all. 

Last date of registration: 27th March 2019
Online registration is mandatory to participate in the workshop
Link for online registration:

Registration is based on first come first serve basis.
Course package includes course material, certificate, break fast, tea and lunch.
Venue: Seminar Hall of Computer Engg. Department (A1-010), PICT, Pune.

Thank you. 

               1.     Introduction
·       What is Hadoop, Apache Spark
·       Understand Big Data
  1. Hadoop 2.7.x single node cluster set up
                3.     Hadoop Distributed File System(HDFS)
·       HDFS commands
·       Exploring HDFS administration
  1. Configuring & Demonstrating HIVE
·       HIVE Architecture, table creation
·       Optimized Row Columnar(ORC)
·       Parquet File Formats
·       Partitioning
                5.     Apache Spark
·       Apache Spark Architecture
·       RDD(Resilient Distributed Dataset)
·       Creation of RDD from HDFS files
·       Concept of partitioning
·       Transformations: map, flatMap,filter, sample, union,intersection, distinct, groupByKey,
reduceByKey, sortByKey, join, cartesian, coalesce, repartition.
·       Actions: reduce, collect, count, first, take, takeSample, saveAsTextFile, Caching
·       Spark DataFrame: Introduction to catalyst and tungsten, Operations on DataFrame:map, filter, where,createTempView, distinct, select,selecrExpr, toPandas, 
toLocalIterator,dropna etc.creating DataFrame from HIVE tables, DataFrame to HIVE table conversion,org.apache.spark.sql.types  and org.apache.spark.sql.functions
·       Spark Streaming, Spark Dataset,
·       Spark SQL: select, create, where, HIVE UDF, writing own UDF, MLlib package using  PySpark