Exercise 1: Setup & Exploration
Objective
In this exercise, you will: 1. Launch and verify your OpenShift AI notebook environment 2. Explore the base model architecture (TinyLlama or Phi-2) 3. Understand GPU memory constraints and resource limitations 4. Perform basic tokenization and text generation 5. Set up MLflow tracking for your experiments
Prerequisites
- Access to an OpenShift AI notebook environment
- The repository cloned and the current working directory set to
labs/02_intermediate/02_llm_instruction_tuning - Dependencies installed (
pip install -r requirements.txt)
Step 1: Environment Check
Let's start by checking our environment and verifying GPU availability.
import torch
import transformers
import mlflow
# Check GPU
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA devices: {torch.cuda.device_count()}")
if torch.cuda.is_available():
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
# Check library versions
print(f"Transformers: {transformers.__version__}")
print(f"PyTorch: {torch.__version__}")
Step 2: Load the Base Model
Load TinyLlama, a small but capable 1.1B parameter model:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
print(f"Model loaded: {model_name}")
print(f"Parameters: {model.num_parameters() / 1e6:.1f}M")
Step 3: Explore Tokenization
Understand how the tokenizer converts text to tokens:
text = "MLOps combines machine learning and operations."
tokens = tokenizer(text)
decoded = tokenizer.decode(tokens.input_ids[0])
print(f"Original: {text}")
print(f"Token IDs: {tokens.input_ids[0]}")
print(f"Decoded: {decoded}")
print(f"Vocab size: {tokenizer.vocab_size}")
Step 4: Test Text Generation
Try generating text with the base model before fine-tuning:
prompt = "What is machine learning?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.7,
do_sample=True,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Step 5: Set Up MLflow Tracking
Configure MLflow to track experiments:
mlflow.set_tracking_uri("file://./mlruns")
mlflow.set_experiment("llm-lora-tuning")
# Log a test run
with mlflow.start_run(run_name="setup-test"):
mlflow.log_param("model_name", model_name)
mlflow.log_param("num_parameters", model.num_parameters())
mlflow.log_metric("vocab_size", tokenizer.vocab_size)
print(f"MLflow run: {mlflow.active_run().info.run_id}")
Summary
In this exercise, you: 1. Verified your GPU environment and installed dependencies 2. Loaded TinyLlama, a 1.1B parameter language model 3. Explored how the tokenizer converts text to tokens 4. Generated text with the base model 5. Configured MLflow for experiment tracking