Over 10 years we helping companies reach their financial and branding goals. Onum is a values-driven SEO agency dedicated.

LATEST NEWS
CONTACTS

Overview

Getting intellectuals ready to become Big Data Experts!  

 
During this interactive training on Zoom, you will learn about the different ingredients of Big Data such as Hadoop, Spark, Pig, Hive & Sqoop.
 
Further, you will have hands-on experience on different pillars of the Big Data Ecosystem starting from parallel processing frameworks like Map Reduce & Spark, Distributed Storage techniques like HDFS, Big Data Administration Ambari etc.
 
At the end of the training, you will have an in-depth understanding & hands-on related to Big Data solutions like Cloudera & HortonWorks.

Tools Covered

What are the Pre-Requisites?
  • Degree LevelsBachelors,Masters
  • Field of StudiesInformation Technology
  • Prior Job ExperienceNA
  • Other Programs:
Who Should Attend?
  • Executives who want to build a Big Data Analytics department in their start-ups/organizations.
  • People who are working in the Big Data Analytics domain and want to advance their career..
  • Graduate or Masters Students with IT, CS or SE background who want to start their career in the Big Data Analytics domain..
What are the Takeaways?
  • Communication skills: Develop effective communication skills to explain findings, and recommendations to stakeholders with different backgrounds and levels of expertise.
  • Machine learning on big data,: Learn to apply machine learning algorithms on big data using tools such as Spark MLlib and TensorFlow.
  • Hands-on experience: Gain practical experience in big data analytics through real-world projects and case studies.
  • Data processing and analysis,: Learn to process, analyze, and visualize big data using tools such as Pig, Hive, and Apache Zeppelin.
  • Understanding of big data concepts: Gain a deep understanding of fundamental concepts such as Hadoop, Spark, and NoSQL databases.
Certification Included
  • SPARK Fundamentals
  • Accessing Hadoop Data Using HIVE
  • Moving Data into Hadoop
  • MapReduce & YARN
  • Big Data 101

Course Outline

  • What is Big Data?
  • 4 V’s of Big Data.
  • What is Data Discovery?
  • What is Hadoop & its History?Components of Hadoop Cluster (Master Node, Data Node, Namenode, Job Tracker, Task Tracker)
  • How HDFS Works
  • Sandbox tour – Understanding Ambari
 
  • Sandbox Configuration & Overview
  • Parallel Processing Basics
  • Introduction to Apache Hive
  • Hive Data Loading
  • Hive Table Location
  • Hive Views & Hive use for XML
  • Block Compression and Storage Formats in Hive
  • HDFS Commands
  • What is MapReduce
  • Hive Alignment with SQL
  • Hive Managed Tables
  • Hive Bucketing & Partitioning
  • Hive Supported File Formats
  • HDFS Data Ingestion (Lab)
  • How MapReduce works
  • Built-In and External SerDes in Hive (Lab)
  • Hive vs. Impala
  • Introduction to YARN Architecture
  • YARN Application Manager
  • YARN Performance Measuring
  • Containers Concept in Hadoop
  • Data Ingestion with Kafka-Confluent
  • Hive complex data types (Array, Map, Struct)
  • Impala Architecture
  • YARN Resource Manager
  • YARN Schedulers
  • YARN System Health
  • Project 01: Building a Sentiment Analysis Application to find the sentiment of tweets
  • Tez DAGs
  • PIG Architecture
  • PIG Commands
  • PIG Joins
  • PIG Execution Mechanism
  • introduction to Apache Tez
  • Introduction Apache Pig
  • PIG-Latin
  • Loading Data in PIG
  • Debugging Using PIG
  • Pig integration with Hive – HCatalog
  • Tez vs MapReduce
  • Pig vs. Hive
  • Grunt Shell & PIG Scripting (Lab)
  • PIG Filter
  • Introduction to Apache Scope
  • Migrating data with Scope (Lab)
  • Installing WIFI as a service (Lab)
  • Understanding WIFI UI and Creating data flows (Lab)
  • Scope Architecture
  • Introduction to Data Flow
  • Flow files, Processors and Connectors
  • Apache Nifi as a Data Flow tool
  • Nifi Templates
  • Introduction to Apache Spark
  • Spark Driver
  • Spark Core Abstraction – RDDs, DataFrames, Datasets
  • Spark Actions (Collect, First, Take, Count, Reduce, Save-as-text)
  • Scala vs. Pyspark
  • Spark vs. MapReduce
  • Spark Context
  • Transformations vs. Actions
  • Lazy Execution
  • Spark as a In memory processing engine (Lab)
  • Spark Architecture
  • Spark Executors
  • Spark Transformations (Map, Flatmap, Filter, Distinct)
  • SparkContext, HiveContext, SqlContext
  • Troubleshooting Jobs in Spark UI
  • Introduction to Streaming Analytics
  • Spark Streaming
  • What are Messaging (Pub/Sub) systems
  • Topic, Partitions and Offsets
  • Kafka as a messaging system (Lab)
  • Bounded data vs. Unbounded data
  • Structured Streaming
  • Introduction to Apache Kafka
  • Kafka Brokers
  • Intro to Databricks (Spark over cloud)
  • Spark as a stream processing engine
  • Streaming Analytics in Spark (Lab)
  • Kafka – Core capabilities and Use cases
  • Kafka Producers and Consumers
  • Databricks Deltalake Implementation/Medallion Architecture
 
  • Components of a Big data platform
  • Building batch mode and real time big data pipelines – case studies (Lab)
  • SQL vs. NoSQL
  • Next Steps
  • Big Data Architectures
  • Realm of NoSQL databases
  • MongoDB as a NoSQL database
  • Databricks Spark structure Streaming Implementation
  • Lambda and Kappa Architecture
  • NoSQL databases types
  • Up and running with MongoDB (Lab)
  • Intro to NoSQL & ELK & casandara

Our Methodology

Industry Usecases

With real world projects and immersive content built in partnership with top tier companies, you’ll master the tech skills companies want.

Technical Support

Our knowledgeable mentors guide your learning and are focused on answering your questions, motivating you and keeping you on track.

Career Mentorship

You’ll have access to resume support, portfolio review and optimization to help you advance your career and land a high-paying role.

Frequently Asked Questions

Will I get a certificate after this course ?

Yes, You will be awarded a course completion certificate by Dice Analytics if you pass the course.

How much hands-on will be performed in this course?

Since our trainings are led by Industry Experts so it is made sure that content covered in workshop is designed with hand on knowledge of more than 70-75 % along with supporting theory.

What are the PC requirements?

For this professional workshop, you need to have a PC with minimum 4GB RAM and ideally 8GB RAM.

What If I miss any of the lectures?

Don’t worry! We have got you covered. You shall be shared recorded lectures after each session, in case you want to revise your concepts or miss the lecture due to some personal or professional commitments.

Language