Hadoop For Administrators Training Course

Apache Hadoop stands as the leading framework for processing Big Data across server clusters. This course, spanning three days with an optional fourth day, equips attendees with a comprehensive understanding of the business advantages and use cases associated with Hadoop and its ecosystem. Participants will learn to plan for cluster deployment and scalability, as well as how to install, maintain, monitor, troubleshoot, and optimise Hadoop systems. Practical skills include performing bulk data loads on clusters, exploring various Hadoop distributions, and installing and managing tools within the Hadoop ecosystem. The course concludes with a focus on securing the cluster using Kerberos.

\u201c...The materials were very well prepared and covered thoroughly. The Lab was very helpful and well organised\u201d
\u2014 Andrew Nguyen, Principal Integration DW Engineer, Microsoft Online Advertising

Audience

Hadoop administrators

Format

A blend of lectures and hands-on labs, with an approximate split of 60% lectures and 40% practical laboratory work.

This course is available as onsite live training in Botswana or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction
- History and core concepts of Hadoop
- The Hadoop ecosystem
- Various distributions
- High-level architecture overview
- Common Hadoop myths
- Hadoop challenges (hardware and software)
- Labs: Discuss your Big Data projects and challenges
Planning and installation
- Selecting software and Hadoop distributions
- Sizing the cluster and planning for future growth
- Selecting appropriate hardware and network infrastructure
- Rack topology design
- Installation procedures
- Multi-tenancy considerations
- Directory structure and log management
- Benchmarking performance
- Labs: Perform cluster installation and run performance benchmarks
HDFS operations
- Core concepts (horizontal scaling, replication, data locality, and rack awareness)
- Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
- Health monitoring protocols
- Administration via command-line and browser interfaces
- Adding storage capacity and replacing defective drives
- Labs: Familiarise yourself with HDFS command lines
Data ingestion
- Using Flume for logs and other data ingestion into HDFS
- Using Sqoop for importing data from SQL databases to HDFS, and exporting back to SQL
- Implementing Hadoop data warehousing with Hive
- Copying data between clusters using distcp
- Leveraging S3 as a complementary solution to HDFS
- Best practices and architectures for data ingestion
- Labs: Set up and utilise Flume and Sqoop
MapReduce operations and administration
- Parallel computing before MapReduce: comparing HPC with Hadoop administration
- Managing MapReduce cluster loads
- Nodes and Daemons (JobTracker, TaskTracker)
- Walk-through of the MapReduce UI
- MapReduce configuration options
- Job configuration specifics
- Strategies for optimising MapReduce performance
- Preparing for MapReduce success: guidance for programmers
- Labs: Execute MapReduce examples
YARN: New architecture and capabilities
- YARN design goals and implementation architecture
- New actors: ResourceManager, NodeManager, Application Master
- Installing YARN
- Job scheduling within YARN
- Labs: Investigate job scheduling mechanisms
Advanced topics
- Hardware monitoring techniques
- Comprehensive cluster monitoring
- Adding and removing servers, and upgrading Hadoop versions
- Backup, recovery, and business continuity planning
- Oozie job workflows
- Hadoop High Availability (HA)
- Hadoop Federation
- Securing your cluster with Kerberos
- Labs: Set up monitoring systems
Optional tracks
- Cloudera Manager for cluster administration, monitoring, and routine tasks; installation and usage. In this track, all exercises and labs are conducted within the Cloudera distribution environment (CDH5).
- Ambari for cluster administration, monitoring, and routine tasks; installation and usage. In this track, all exercises and labs are performed within the Ambari cluster manager and Hortonworks Data Platform (HDP 2.0).

Requirements

Comfort with basic Linux system administration
Basic scripting skills

Prior knowledge of Hadoop and Distributed Computing is not required, as these topics will be introduced and explained during the course.

Lab environment

Zero Install: There is no need to install Hadoop software on your personal machines! A functional Hadoop cluster will be provided for use by all students.

Students will require the following tools:

An SSH client (Linux and Mac systems come with SSH clients built-in; for Windows, PuTTY is recommended)
A browser to access the cluster. We recommend using the Firefox browser with the FoxyProxy extension installed.

21 Hours

Need help picking the right course?
southafrica@nobleprog.co.za or +27 (0)10 005 5793

Testimonials (1)

Hands on exercises. Class should have been 5 days, but the 3 days helped to clear up a lot of questions that I had from working with NiFi already

Hadoop For Administrators Training Course

Audience

Format

Course Outline

Requirements

Lab environment

Testimonials (1)

James - BHG Financial

Course - Apache NiFi for Administrators

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

Hadoop For Administrators Training Course

Audience

Format

Course Outline

Requirements

Lab environment

Testimonials (1)

James - BHG Financial

Course - Apache NiFi for Administrators

Related Courses

Infomatica with Big Data (BDM)

Apache NiFi for Administrators

Apache NiFi for Developers

Python, Spark, and Hadoop for Big Data

Related Categories

Hadoop

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites