Pentaho Data Integration Fundamentals Training Course
Pentaho Data Integration is an open-source data integration tool for defining jobs and data transformations.
In this instructor-led, live training, participants will learn how to use Pentaho Data Integration's powerful ETL capabilities and rich GUI to manage an entire big data lifecycle and maximize the value of data within their organization.
By the end of this training, participants will be able to:
- Create, preview, and run basic data transformations containing steps and hops
- Configure and secure the Pentaho Enterprise Repository
- Harness disparate sources of data and generate a single, unified version of the truth in an analytics-ready format.
- Provide results to third-part applications for further processing
Audience
- Data Analyst
- ETL developers
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Course Outline
Introduction
Installing and Configuring Pentaho
Overview of Pentaho Features and Architecture
Understanding Pentaho's In-Memory Caching
Navigating the User Interface
Connecting to a Data Source
Configuring the Pentaho Enterprise Repository
Transforming Data
Viewing the Transformation Results
Resolving Transformation Errors
Processing a Data Stream
Reusing Transformations
Scheduling Transformations
Securing Pentaho
Integrating with Third-party Applications (Hadoop, NoSQL, etc.)
Analytics and Reporting
Pentaho Design Patterns and Best Practices
Troubleshooting
Summary and Conclusion
Requirements
- An understanding of relational databases
- An understanding of data warehousing
- An understanding of ETL (Extract, Transform, Load) concepts
Need help picking the right course?
southafrica@nobleprog.co.za or +27 (0)10 005 5793
Pentaho Data Integration Fundamentals Training Course - Enquiry
Testimonials (1)
It's a hands-on session.
Vorraluck Sarechuer - Total Access Communication Public Company Limited (dtac)
Course - Talend Open Studio for ESB
Related Courses
Data Engineering Integration for Developers
21 HoursThis course is designed for software engineers. It equips participants with the skills to accelerate Data Engineering Integration through techniques such as high-volume data ingestion, incremental loading, complex transformations, advanced file processing, dynamic mappings, and Python scripting. You will explore how to repurpose application logic for Data Engineering scenarios, covering monitoring, troubleshooting, and industry best practices.
Learning Objectives
Upon successful completion of this course, learners will be able to:
- Ingest large volumes of data into Hive and HDFS
- Execute incremental loads within Mass Ingestion
- Conduct both initial and incremental data loads
- Integrate with relational databases using SQOOP
- Apply transformations across diverse computing engines
- Run mappings in JDBC mode with Spark
- Implement stateful computing and windowing functions
- Handle complex file structures
- Parse hierarchical data using the Spark engine
- Run profiles and select sampling options on the Spark engine
- Execute Dynamic Mappings
- Generate audits for mappings
- Monitor logs via the REST Operations Hub
- Monitor logs through Log Aggregation and perform troubleshooting
- Run mappings within a Databricks environment
- Create mappings to access Delta Lake tables
- Optimize performance for Spark and Databricks jobs
KNIME Analytics Platform - Comprehensive Training
35 HoursThe "KNIME Analytics Platform" course offers a comprehensive overview of this free data analysis platform. The programme covers an introduction to data processing and analysis, installation and configuration of KNIME, workflow construction, methodologies for creating business models and data modelling. The course also discusses advanced data analysis tools, workflow import and export, tool integration, ETL processes, data mining, visualisation, extensions, and integrations with tools such as R, Java, Python, Gephi, and Neo4j. The conclusion covers reporting, integration with BIRT, and KNIME WebPortal.
Oracle GoldenGate
14 HoursThis instructor-led live training, conducted in Botswana (online or onsite), is designed for system administrators and developers who wish to set up, deploy, and manage Oracle GoldenGate for data transformation.
By the end of this training, participants will be able to:
- Install and configure Oracle GoldenGate.
- Comprehend database replication using Oracle GoldenGate.
- Understand the Oracle GoldenGate architecture.
- Configure and execute database replication and migration tasks.
- Optimize Oracle GoldenGate performance and resolve technical issues.
Pentaho Open Source BI Suite Community Edition (CE)
28 HoursThe Pentaho Open Source BI Suite Community Edition (CE) is a comprehensive business intelligence package designed to support data integration, reporting, dashboard creation, and data loading capabilities.
Through this instructor-led live training, participants will discover how to fully leverage the capabilities of the Pentaho Open Source BI Suite Community Edition (CE).
Upon completing this training, participants will be equipped to:
- Install and configure the Pentaho Open Source BI Suite Community Edition (CE)
- Grasp the core concepts, tools, and features of Pentaho CE
- Generate reports using Pentaho CE
- Integrate third-party data sources into Pentaho CE
- Utilise big data and analytics functionalities within Pentaho CE
Audience
- Programmers
- BI Developers
Course Format
- A blend of lectures, discussions, exercises, and extensive hands-on practice
Note
- To arrange a customized training session for this course, please contact us.
Pentaho Data Integration Advanced
21 HoursPentaho Data Integration serves as a robust platform for constructing enterprise-grade ETL processes and data pipelines.
This instructor-led live training, available either online or at your premises, is designed for advanced engineers aiming to master high-performance, enterprise-scale, and heavily automated PDI solutions.
Upon completing this course, participants will be able to:
- Architect large-scale ETL pipelines with sophisticated orchestration.
- Optimise complex transformations for peak performance.
- Implement scripting, automation, and hybrid integration patterns.
- Design robust, maintainable, production-ready workflows.
Format of the Course
- Expert-led demonstrations and architectural discussion.
- Extensive lab work on advanced real-world ETL challenges.
- Hands-on development in a production-like environment.
Course Customization Options
- Contact us if you require a customized version of this training.
Pentaho Data Integration Intermediate
21 HoursPentaho Data Integration serves as a robust platform for extracting, transforming, and loading data.
This instructor-led live training, available both online and on-site, is designed for intermediate practitioners looking to deepen their PDI capabilities to handle more complex transformation scenarios.
Upon completing this course, participants will be equipped to:
- Design multi-step transformations with enhanced performance.
- Utilise variables, parameters, and reusable components effectively.
- Integrate PDI with databases, APIs, and external systems.
- Implement best practices for maintaining and scaling ETL pipelines.
Course Format
- Interactive demonstrations coupled with instructor explanations.
- Guided exercises and scenario-based practice.
- Practical application within a real-world ETL project environment.
Customisation Options
- Should you require a bespoke version of this course, please reach out to us for customisation.
Talend Administration Center (TAC)
14 HoursThis instructor-led, live training in Botswana (online or onsite) targets system administrators, data scientists, and business analysts who wish to set up Talend Administration Center to deploy and manage the organisation's roles and tasks.
By the end of this training, participants will be able to:
- Install and configure Talend Administration Center.
- Understand and implement Talend management fundamentals.
- Build, deploy, and run business projects or tasks in Talend.
- Monitor the security of datasets and develop business routines based on the TAC framework.
- Obtain a broader comprehension of big data applications.
Talend Big Data Integration
28 HoursThis instructor-led, live training in Botswana (online or onsite) is aimed at technical persons who wish to deploy Talend Open Studio for Big Data to simplifying the process of reading and crunching through Big Data.
By the end of this training, participants will be able to:
- Install and configure Talend Open Studio for Big Data.
- Connect with Big Data systems such as Cloudera, HortonWorks, MapR, Amazon EMR and Apache.
- Understand and set up Open Studio's big data components and connectors.
- Configure parameters to automatically generate MapReduce code.
- Use Open Studio's drag-and-drop interface to run Hadoop jobs.
- Prototype big data pipelines.
- Automate big data integration projects.
Talend Data Stewardship
14 HoursThis instructor-led live training in Botswana (available online or onsite) is designed for data analysts at beginner to intermediate levels who wish to deepen their understanding and skills in managing and improving data quality using Talend Data Stewardship.
Upon completion of this training, participants will be able to:
- Develop a thorough understanding of the role data stewardship plays in maintaining data quality.
- Utilise Talend Data Stewardship to oversee data quality tasks.
- Create, assign, and manage tasks within Talend Data Stewardship, including the customisation of workflows.
- Leverage the tool’s reporting and monitoring features to track data quality and stewardship activities.
Talend Open Studio for ESB
21 HoursIn this instructor-led live training in Botswana, participants will learn how to use Talend Open Studio for ESB to create, connect, mediate, and manage services and their interactions.
By the end of this training, participants will be able to:
- Integrate, enhance, and deliver ESB technologies as single packages in a variety of deployment environments.
- Understand and utilize the most commonly used components of Talend Open Studio.
- Integrate any application, database, API, or Web services.
- Seamlessly integrate heterogeneous systems and applications.
- Embed existing Java code libraries to extend projects.
- Leverage community components and code to extend projects.
- Rapidly integrate systems, applications, and data sources within a drag-and-drop Eclipse environment.
- Reduce development time and maintenance costs by generating optimized, reusable code.