Databricks Migration Workshop: From Stored Procedures to Lakehouse (5-Day Intensive) Training Course

Databricks serves as a unified Lakehouse platform that integrates Spark, Delta Lake, and governance capabilities (Unity Catalog) to facilitate scalable data engineering and analytical workflows.

This instructor-led, live training session (available online or on-site) is designed for intermediate-level technology managers with a data engineering background who aim to migrate complex procedural OLAP logic to a Lakehouse architecture utilizing Databricks, Spark, Delta Lake, Unity Catalog, and native workflows.

Upon finishing this training, participants will be capable of:

Explaining Lakehouse architecture and the Bronze to Silver to Gold (Medallion) pattern.
Converting stored-procedure logic into Spark DataFrame and notebook implementations.
Designing and executing incremental ingestion, merge, and optimization routines using Delta Lake.
Constructing end-to-end orchestrated pipelines with Databricks Workflows, incorporating version control, testing, and governance.

Course Format

Intensive, instructor-led sessions featuring focused demonstrations and explanations.
Daily practical labs utilizing representative datasets and migration exercises.
Guided code reviews, performance tuning clinics, and practice with workflow orchestration.

Course Customization Options

This course can be customized to fit your specific environment, datasets, and governance requirements; please contact us to arrange customization.

This course is available as onsite live training in Botswana or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction, Objectives, and Migration Strategy

Course goals, alignment with participant profiles, and success criteria
High-level migration approaches and risk considerations
Setting up workspaces, repositories, and lab datasets

Day 1 — Migration Fundamentals and Architecture

Lakehouse concepts, Delta Lake overview, and Databricks architecture
SMP versus MPP differences and their implications for migration
Medallion (Bronze to Silver to Gold) design and Unity Catalog overview

Day 1 Lab — Translating a Stored Procedure

Practical migration of a sample stored procedure to a notebook
Mapping temp tables and cursors to DataFrame transformations
Validation and comparison with the original output

Day 2 — Advanced Delta Lake & Incremental Loading

ACID transactions, commit logs, versioning, and time travel
Auto Loader, MERGE INTO patterns, upserts, and schema evolution
OPTIMIZE, VACUUM, Z-ORDER, partitioning, and storage tuning

Day 2 Lab — Incremental Ingestion & Optimization

Implementing Auto Loader ingestion and MERGE workflows
Applying OPTIMIZE, Z-ORDER, and VACUUM; validating results
Measuring improvements in read/write performance

Day 3 — SQL in Databricks, Performance & Debugging

Analytical SQL features: window functions, higher-order functions, JSON/array handling
Reading the Spark UI, DAGs, shuffles, stages, tasks, and diagnosing bottlenecks
Query tuning patterns: broadcast joins, hints, caching, and spill reduction

Day 3 Lab — SQL Refactoring & Performance Tuning

Refactoring a heavy SQL process into optimized Spark SQL
Using Spark UI traces to identify and resolve skew and shuffle issues
Benchmarking before/after results and documenting tuning steps

Day 4 — Tactical PySpark: Replacing Procedural Logic

Spark execution model: driver, executors, lazy evaluation, and partitioning strategies
Transforming loops and cursors into vectorized DataFrame operations
Modularization, UDFs/pandas UDFs, widgets, and reusable libraries

Day 4 Lab — Refactoring Procedural Scripts

Refactoring a procedural ETL script into modular PySpark notebooks
Introducing parametrization, unit-style tests, and reusable functions
Code review and application of best-practice checklists

Day 5 — Orchestration, End-to-End Pipeline & Best Practices

Databricks Workflows: job design, task dependencies, triggers, and error handling
Designing incremental Medallion pipelines with quality rules and schema validation
Integration with Git (GitHub/Azure DevOps), CI, and testing strategies for PySpark logic

Day 5 Lab — Build a Complete End-to-End Pipeline

Assembling a Bronze to Silver to Gold pipeline orchestrated with Workflows
Implementing logging, auditing, retries, and automated validations
Running the full pipeline, validating outputs, and preparing deployment notes

Operationalization, Governance, and Production Readiness

Unity Catalog governance, lineage, and access controls best practices
Cost management, cluster sizing, autoscaling, and job concurrency patterns
Deployment checklists, rollback strategies, and runbook creation

Final Review, Knowledge Transfer, and Next Steps

Participant presentations of migration work and lessons learned
Gap analysis, recommended follow-up activities, and training materials handoff
References, further learning paths, and support options

Requirements

A solid understanding of data engineering concepts
Experience with SQL and stored procedures (Synapse / SQL Server)
Familiarity with ETL orchestration concepts (ADF or similar tools)

Audience

Technology managers with a data engineering background
Data engineers transitioning procedural OLAP logic to Lakehouse patterns
Platform engineers responsible for Databricks adoption

35 Hours

Need help picking the right course?
southafrica@nobleprog.co.za or +27 (0)10 005 5793

Databricks Migration Workshop: From Stored Procedures to Lakehouse (5-Day Intensive) Training Course

Course Outline

Requirements

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

Databricks Migration Workshop: From Stored Procedures to Lakehouse (5-Day Intensive) Training Course

Course Outline

Requirements

Related Courses

Machine Learning with Azure Databricks for Finance

Databricks

Data Analysis with Databricks for Finance

Related Categories

Databricks

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites