Overview

Getting intellectuals ready to become Big Data Experts!

During this interactive training on Zoom, you will learn about the different ingredients of Big Data such as Hadoop, Spark, Pig, Hive & Sqoop.

Further, you will have hands-on experience on different pillars of the Big Data Ecosystem starting from parallel processing frameworks like Map Reduce & Spark, Distributed Storage techniques like HDFS, Big Data Administration Ambari etc.

At the end of the training, you will have an in-depth understanding & hands-on related to Big Data solutions like Cloudera & HortonWorks.

Tools Covered

What are the Pre-Requisites?

Degree Levels: Bachelors,Masters
Field of Studies: Information Technology
Prior Job Experience: NA
Other Programs:

Who Should Attend?

Executives who want to build a Big Data Analytics department in their start-ups/organizations.
People who are working in the Big Data Analytics domain and want to advance their career..
Graduate or Masters Students with IT, CS or SE background who want to start their career in the Big Data Analytics domain..

What are the Takeaways?

Communication skills: Develop effective communication skills to explain findings, and recommendations to stakeholders with different backgrounds and levels of expertise.
Machine learning on big data,: Learn to apply machine learning algorithms on big data using tools such as Spark MLlib and TensorFlow.
Hands-on experience: Gain practical experience in big data analytics through real-world projects and case studies.
Data processing and analysis,: Learn to process, analyze, and visualize big data using tools such as Pig, Hive, and Apache Zeppelin.
Understanding of big data concepts: Gain a deep understanding of fundamental concepts such as Hadoop, Spark, and NoSQL databases.

Certification Included

SPARK Fundamentals
Accessing Hadoop Data Using HIVE
Moving Data into Hadoop
MapReduce & YARN
Big Data 101

Course Outline

Week 1
Week 2

What is Big Data?
4 V’s of Big Data.
What is Data Discovery?
What is Hadoop & its History?Components of Hadoop Cluster (Master Node, Data Node, Namenode, Job Tracker, Task Tracker)
How HDFS Works
Sandbox tour – Understanding Ambari

Sandbox Configuration & Overview
Parallel Processing Basics
Introduction to Apache Hive
Hive Data Loading
Hive Table Location
Hive Views & Hive use for XML
Block Compression and Storage Formats in Hive
HDFS Commands
What is MapReduce
Hive Alignment with SQL
Hive Managed Tables
Hive Bucketing & Partitioning
Hive Supported File Formats
HDFS Data Ingestion (Lab)
How MapReduce works

Week 3
Week 4

Built-In and External SerDes in Hive (Lab)
Hive vs. Impala
Introduction to YARN Architecture
YARN Application Manager
YARN Performance Measuring
Containers Concept in Hadoop
Data Ingestion with Kafka-Confluent
Hive complex data types (Array, Map, Struct)
Impala Architecture
YARN Resource Manager
YARN Schedulers
YARN System Health

Project 01: Building a Sentiment Analysis Application to find the sentiment of tweets
Tez DAGs
PIG Architecture
PIG Commands
PIG Joins
PIG Execution Mechanism
introduction to Apache Tez
Introduction Apache Pig
PIG-Latin
Loading Data in PIG
Debugging Using PIG
Pig integration with Hive – HCatalog
Tez vs MapReduce
Pig vs. Hive
Grunt Shell & PIG Scripting (Lab)
PIG Filter

Week 5
Week 6

Introduction to Apache Scope
Migrating data with Scope (Lab)
Installing WIFI as a service (Lab)
Understanding WIFI UI and Creating data flows (Lab)
Scope Architecture
Introduction to Data Flow
Flow files, Processors and Connectors
Apache Nifi as a Data Flow tool
Nifi Templates

Introduction to Apache Spark
Spark Driver
Spark Core Abstraction – RDDs, DataFrames, Datasets
Spark Actions (Collect, First, Take, Count, Reduce, Save-as-text)
Scala vs. Pyspark
Spark vs. MapReduce
Spark Context
Transformations vs. Actions
Lazy Execution
Spark as a In memory processing engine (Lab)
Spark Architecture
Spark Executors
Spark Transformations (Map, Flatmap, Filter, Distinct)
SparkContext, HiveContext, SqlContext
Troubleshooting Jobs in Spark UI

Week 7
Week 8

Introduction to Streaming Analytics
Spark Streaming
What are Messaging (Pub/Sub) systems
Topic, Partitions and Offsets
Kafka as a messaging system (Lab)
Bounded data vs. Unbounded data
Structured Streaming
Introduction to Apache Kafka
Kafka Brokers
Intro to Databricks (Spark over cloud)
Spark as a stream processing engine
Streaming Analytics in Spark (Lab)
Kafka – Core capabilities and Use cases
Kafka Producers and Consumers
Databricks Deltalake Implementation/Medallion Architecture

Components of a Big data platform
Building batch mode and real time big data pipelines – case studies (Lab)
SQL vs. NoSQL
Next Steps
Big Data Architectures
Realm of NoSQL databases
MongoDB as a NoSQL database
Databricks Spark structure Streaming Implementation
Lambda and Kappa Architecture
NoSQL databases types
Up and running with MongoDB (Lab)
Intro to NoSQL & ELK & casandara

Our Methodology

Industry Usecases

With real world projects and immersive content built in partnership with top tier companies, you’ll master the tech skills companies want.

Technical Support

Our knowledgeable mentors guide your learning and are focused on answering your questions, motivating you and keeping you on track.

Career Mentorship

You’ll have access to resume support, portfolio review and optimization to help you advance your career and land a high-paying role.

Frequently Asked Questions

Will I get a certificate after this course ?

Yes, You will be awarded a course completion certificate by Dice Analytics if you pass the course.

How much hands-on will be performed in this course?

Since our trainings are led by Industry Experts so it is made sure that content covered in workshop is designed with hand on knowledge of more than 70-75 % along with supporting theory.

What are the PC requirements?

For this professional workshop, you need to have a PC with minimum 4GB RAM and ideally 8GB RAM.

What If I miss any of the lectures?

Don’t worry! We have got you covered. You shall be shared recorded lectures after each session, in case you want to revise your concepts or miss the lecture due to some personal or professional commitments.

LATEST NEWS

CONTACTS