Skip to content

Exercise 1: Setup & Exploration

Objective

In this exercise, you will: 1. Launch and verify your OpenShift AI notebook environment 2. Explore the base model architecture (TinyLlama or Phi-2) 3. Understand GPU memory constraints and resource limitations 4. Perform basic tokenization and text generation 5. Set up MLflow tracking for your experiments

Prerequisites

  • Access to an OpenShift AI notebook environment
  • The repository cloned and the current working directory set to labs/02_intermediate/02_llm_instruction_tuning
  • Dependencies installed (pip install -r requirements.txt)

Step 1: Environment Check

Let's start by checking our environment and verifying GPU availability.

import torch
import transformers
import mlflow

# Check GPU
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA devices: {torch.cuda.device_count()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

# Check library versions
print(f"Transformers: {transformers.__version__}")
print(f"PyTorch: {torch.__version__}")

Step 2: Load the Base Model

Load TinyLlama, a small but capable 1.1B parameter model:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
print(f"Model loaded: {model_name}")
print(f"Parameters: {model.num_parameters() / 1e6:.1f}M")

Step 3: Explore Tokenization

Understand how the tokenizer converts text to tokens:

text = "MLOps combines machine learning and operations."
tokens = tokenizer(text)
decoded = tokenizer.decode(tokens.input_ids[0])

print(f"Original: {text}")
print(f"Token IDs: {tokens.input_ids[0]}")
print(f"Decoded: {decoded}")
print(f"Vocab size: {tokenizer.vocab_size}")

Step 4: Test Text Generation

Try generating text with the base model before fine-tuning:

prompt = "What is machine learning?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.7,
    do_sample=True,
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Step 5: Set Up MLflow Tracking

Configure MLflow to track experiments:

mlflow.set_tracking_uri("file://./mlruns")
mlflow.set_experiment("llm-lora-tuning")

# Log a test run
with mlflow.start_run(run_name="setup-test"):
    mlflow.log_param("model_name", model_name)
    mlflow.log_param("num_parameters", model.num_parameters())
    mlflow.log_metric("vocab_size", tokenizer.vocab_size)
    print(f"MLflow run: {mlflow.active_run().info.run_id}")

Summary

In this exercise, you: 1. Verified your GPU environment and installed dependencies 2. Loaded TinyLlama, a 1.1B parameter language model 3. Explored how the tokenizer converts text to tokens 4. Generated text with the base model 5. Configured MLflow for experiment tracking