LLM Instruction Tuning Workshop

This intermediate-level workshop extends MLOps concepts to large language model systems, focusing on instruction tuning workflows with LoRA/QLoRA.

Introduction

Large Language Models (LLMs) have revolutionized natural language processing, but fine-tuning them for specific tasks traditionally requires substantial computational resources. Parameter-Efficient Fine-Tuning (PEFT) techniques like LoRA (Low-Rank Adaptation) make it practical to adapt LLMs on consumer-grade hardware by training only a small fraction of the parameters.

In this workshop, you'll learn how to apply MLOps principles to LLM systems:

Parameter-efficient fine-tuning with LoRA/QLoRA
Experiment tracking with MLflow
Model evaluation and versioning in MLflow Model Registry
Containerization and Kubernetes-native deployment
LLM serving lifecycle management with vLLM

Overview of the Exercises

The workshop follows the lifecycle of an LLM serving application:

Setup & Exploration — Verify your environment, load base models (TinyLlama/Phi-2), and explore tokenization
Data Preparation — Load and format the Dolly 15K instruction dataset, create train/validation splits
LoRA Tuning — Configure and run parameter-efficient fine-tuning with MLflow experiment tracking
Evaluation — Calculate perplexity, compare base vs. fine-tuned model outputs, navigate MLflow experiments
Versioning & Packaging — Register models in MLflow Model Registry, merge LoRA weights, create Docker images
Deployment & Serving — Deploy to OpenShift/Kubernetes, configure resource limits, test the endpoint

Model: TinyLlama-1.1B

We use TinyLlama-1.1B as our base model — a compact 1.1 billion parameter LLM that fits on consumer GPUs with 4-bit quantization. With LoRA, we can fine-tune it using only ~2-4 GB of GPU memory.

Model	Parameters	FP16 Memory	4-bit Quantized Memory
TinyLlama-1.1B	1.1B	~2.2 GB	~0.6 GB
Phi-2	2.7B	~5.4 GB	~1.5 GB

Directory Structure

The workshop materials are organized for a complete MLOps workflow:

02_llm_instruction_tuning/
├── notebooks/
│   ├── 01_setup_exploration.ipynb
│   ├── 02_data_preparation.ipynb
│   ├── 03_lora_tuning.ipynb
│   ├── 04_evaluation.ipynb
│   ├── 05_versioning_packaging.ipynb
│   └── 06_deployment_serving.ipynb
├── scripts/
│   ├── build_and_push.sh
│   ├── mlflow_register.py
│   └── test_client.py
├── k8s/
│   ├── deployment.yaml
│   └── service.yaml
├── data/              # Datasets (gitignored)
├── models/            # Model outputs (gitignored)
├── Dockerfile
├── requirements.txt
├── environment.yml
└── README.md

Prerequisites

OpenShift AI notebook environment (or local GPU with 8GB+ VRAM)
Basic understanding of Python and machine learning
Familiarity with MLOps concepts from the beginner workshop
Access to an OpenShift/Kubernetes cluster (for Exercise 6)

Technologies Used

Technology	Purpose
Hugging Face Transformers	Model loading and inference
PEFT	LoRA/QLoRA parameter-efficient fine-tuning
bitsandbytes	4-bit quantization for memory efficiency
MLflow	Experiment tracking and model registry
vLLM	Optimized LLM serving engine
FastAPI	REST API serving framework
Docker / OpenShift	Containerization and orchestration

Getting Started

Launch an OpenShift AI notebook environment
Navigate to labs/02_intermediate/02_llm_instruction_tuning/
Install dependencies: pip install -r requirements.txt
Begin with Exercise 1 below

Hands-On Sessions

Start with the setup exploration, then proceed through the exercises in order:

Workshop Timing (1-Day Format)

Time	Activity
9:00–9:30	Introduction & Environment Setup
9:30–10:30	Exercise 1: Setup & Exploration
10:30–10:45	Break
10:45–12:00	Exercise 2: Data Preparation
12:00–1:00	Lunch
1:00–2:15	Exercise 3: LoRA Tuning
2:15–2:30	Break
2:30–3:45	Exercise 4: Evaluation
3:45–4:00	Break
4:00–5:00	Exercises 5 & 6: Packaging & Deployment