Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
AI Sovereignty and Local LLM Deployment
- Risks associated with cloud LLMs: data retention, training on user inputs, and foreign jurisdiction issues.
- Ollama architecture: model server, registry, and OpenAI-compatible API.
- Comparison with vLLM, llama.cpp, and Text Generation Inference.
- Model licensing: terms for Llama, Mistral, Qwen, and Gemma.
Installation and Hardware Configuration
- Installing Ollama on Linux with CUDA and ROCm support.
- CPU-only fallback options and AVX/AVX2 optimization.
- Docker deployment and persistent volume mapping.
- Multi-GPU configurations and VRAM allocation strategies.
Model Management
- Retrieving models from the Ollama registry: example 'ollama pull llama3'.
- Importing GGUF models from HuggingFace and TheBloke.
- Understanding quantization levels: trade-offs between Q4_K_M, Q5_K_M, and Q8_0.
- Switching models and understanding limits for concurrent model loading.
Custom Modelfiles
- Writing Modelfile syntax: FROM, PARAMETER, SYSTEM, and TEMPLATE commands.
- Tuning temperature, top_p, and repeat_penalty parameters.
- Engineering system prompts for specific role-based behaviours.
- Creating and publishing custom models to the local registry.
API Integration
- Utilizing the OpenAI-compatible /v1/chat/completions endpoint.
- Handling streaming responses and JSON mode.
- Integrating with LangChain, LlamaIndex, and custom applications.
- Implementing authentication and rate limiting via reverse proxy.
Performance Optimization
- Managing context window sizing and KV cache.
- Handling batch inference and parallel requests.
- Allocating CPU threads and understanding NUMA awareness.
- Monitoring GPU utilization and memory pressure.
Security and Compliance
- Establishing network isolation for model serving endpoints.
- Implementing input filtering and output moderation pipelines.
- Maintaining audit logs for prompts and completions.
- Verifying model provenance and hash integrity.
Requirements
- Intermediate knowledge of Linux and container administration.
- High-level understanding of machine learning concepts and transformer models.
- Familiarity with REST APIs and JSON.
Target Audience
- AI engineers and developers seeking alternatives to cloud LLM APIs.
- Organizations handling sensitive data that precludes the use of cloud models.
- Government and defence teams requiring isolated, air-gapped language models.
14 Hours