Course Hadoop for Big Data

  • Content
  • Training
  • Modules
  • General
  • Reviews
  • Certificate
  • Course Hadoop for Big Data : Content

    In the course Hadoop for Big Data participants learn how to use Apache Hadoop for the storage and processing of large amounts of data.

    Hadoop Architecture

    In the course Hadoop for Big Data the architecture of Hadoop is explained in depth. Hadoop uses a simple programming model in a distributed environment over a cluster of computers.


    The Hadoop Distributed File System (HDFS) is used as file system within a Hadoop cluster. In the course Hadoop for Big Data HDFS in explained in detail. HDFS is a horizontal scalable file system that is stored on a cluster of servers. The data is stored in a distributed manner and the file system automatically ensures replication of data over the cluster.


    An important algorithm for the processing of data is the MapReduce algorithm and this is given extensive attention.


    Finally attention is paid to tools and utilities that are often used in combination with Hadoop such as Zookeeper, Scoop, Ozie and Pig.

  • Course Hadoop for Big Data : Training

    Audience Course Hadoop for Big Data

    The course Hadoop for Big Data is intended for developers, data analysts and others who want to learn how to process data with Hadoop.

    Prerequisites training Hadoop for Big Data

    To participate in this course prior knowledge of programming in Java and databases is beneficial for the understanding. Prior knowledge of Java or Hadoop is not necessary.

    Realization Course Hadoop for Big Data

    The theory is treated on the basis of presentations. Illustrative demos are used to clarify the covered concepts. There is ample opportunity to practice and theory and practice are interchanged. The course times are from 9.30 to 16.30.

    Official Certificate Course Hadoop for Big Data

    Participants receive an official certificate Hadoop for Big Data after successful completion of the course.

    Course Hadoop for Big Data
  • Course Hadoop for Big Data : Modules

    Module 1 : Hadoop Intro

    Module 2 : Java API

    Module 3 : HDFS

    Big Data Handling
    No SQL
    Comparison to Relational DB
    Hadoop Eco-System
    Hadoop Distributions
    Pseudo-Distributed Installation
    Namenode Safemode
    Namenode High Availability
    Secondary Namenode
    Hadoop Filesystem Shell
    Create via Put method
    Read via Get method
    Update via Put method
    Delete via Delete method
    Create Table
    Drop Table
    Scan API
    Scan Caching
    Scan Batching
    Hadoop Environment
    Hadoop Stack
    Hadoop Yarn
    Distributed File System
    HDFS Architecture
    Parallel Operations
    Working with Partitions
    RDD Partitions
    HDFS Data Locality
    DAG (Direct Acyclic Graph)

    Module 4 : Hbase Key Design

    Module 5 : MapReduce

    Module 6 : Submitting Jobs

    Storage Model
    Querying Granularity
    Table Design
    Tall-Narrow Tables
    Flat-Wide Tables
    Column Family
    Column Qualifier
    Storage Unit
    Querying Data by Timestamp
    Querying Data by Row-ID
    Types of Keys and Values
    SQL Access
    MapReduce Model
    MapReduce Theory
    YARN and MapReduce 2.0 Daemons
    MapReduce on YARN single node
    MapReduce framework
    Tool and ToolRunner
    Running MapReduce Locally
    Running MapReduce on Cluster
    Packaging MapReduce Jobs
    MapReduce CLASSPATH
    Decomposing into MapReduce
    MapReduce Job
    Using JobControl class
    Joining data-sets
    User Defined Functions
    Logs and Web UI
    Input and Output Formats
    Anatomy of Mappers
    Reducers and Combiners
    Partitioners and Counters
    Speculative Execution
    Distributed Cache
    YARN Components

    Module 7 : Hadoop Streaming

    Module 8 : Utilities

    Module 9 : Hive

    Implement a Streaming Job
    Contrast with Java Code
    Create counts in Streaming App
    Text Processing Use Case
    Key Value Pairs
    $yarn command
    Using Pipes
    Introduce Oozie
    Deploy and Run Oozie Workflow
    Pig Overview
    Execution Modes
    Developing Pig Script
    Hive Concepts
    Hive Clients
    Table Creation and Deletion
    Loading Data into Hive
  • Course Hadoop for Big Data : General

    Course Forms

    All our courses are classroom courses in which the students are guided through the material on the basis of an experienced trainer with in-depth material knowledge. Theory is always interspersed with exercises.


    We also do custom classes and then adjust the course content to your wishes. On request we will also discuss your practical cases.

    Course times

    The course times are from 9.30 to 16.30. But we are flexible in this. Sometimes people have to bring children to the daycare and other times are more convenient for them. In good consultation we can then agree on different course times.


    We take care of the computers on which the course can be held. The software required for the course has already been installed on these computers. You do not have to bring a laptop to participate in the course. If you prefer to work on your own laptop, you can take it with you if you wish. The required software is then installed at the start of the course.


    Our courses are generally given with Open Source software such as Eclipse, IntelliJ, Tomcat, Pycharm, Anaconda and Netbeans. You will receive the digital course material to take home after the course.


    The course includes lunch that we use in a restaurant within walking distance of the course room.


    The courses are planned at various places in the country. A course takes place at a location if at least 3 people register for that location. If there are registrations for different locations, the course will take place at our main location, Houten which is just below Utrecht. A course at our main location also takes place with 2 registrations and regularly with 1 registration. And we also do courses at the customer’s location if they appreciate that.


    At the end of each course, participants are requested to evaluate the course in terms of course content, course material, trainer and location. The evaluation form can be found at https://www.klantenvertellen.nl/reviews/1039545/spiraltrain?lang=en. The evaluations of previous participants and previous courses can also be found there.


    The intellectual property rights of the published course content, also referred to as an information sheet, belong to SpiralTrain. It is not allowed to publish the course information, the information sheet, in written or digital form without the explicit permission of SpiralTrain. The course content is to be understood as the description of the course content in sentences as well as the division of the course into modules and topics in the modules.

  • Course Hadoop for Big Data : Reviews

  • Course Hadoop for Big Data : Certificate