AI & Machine Learning

Fine-Tuning Large Language Models: A Practical Guide with Code

📅 December 12, 2025 ⏱️ 2 min read 👁️ 11 views 🏷️ AI & Machine Learning

Fine-tuning allows you to customize large language models for specific tasks, improving performance while reducing computational costs. This guide covers practical techniques with working code.

Why Fine-Tune LLMs?

  • Customize model behavior for specific domains
  • Improve performance on specialized tasks
  • Reduce inference costs with smaller, focused models
  • Add company-specific knowledge and terminology

1. Setting Up the Environment


# Install required packages
pip install torch transformers datasets peft accelerate bitsandbytes

# For QLoRA (4-bit quantization)
pip install -U bitsandbytes

2. Preparing Your Dataset


from datasets import Dataset
import json

# Prepare instruction-following dataset
training_data = [
    {
        "instruction": "Summarize the following text:",
        "input": "The quick brown fox jumps over the lazy dog...",
        "output": "A fox jumps over a dog."
    },
    {
        "instruction": "Translate to French:",
        "input": "Hello, how are you?",
        "output": "Bonjour, comment allez-vous?"
    }
]

def format_prompt(example):
    return {
        "text": f"""### Instruction:
{example['instruction']}

### Input:
{example['input']}

### Response:
{example['output']}"""
    }

dataset = Dataset.from_list(training_data)
dataset = dataset.map(format_prompt)

3. LoRA Fine-Tuning (Efficient)


from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer

# Load base model
model_name = "meta-llama/Llama-2-7b-hf"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

# LoRA configuration
lora_config = LoraConfig(
    r=16,                    # Rank
    lora_alpha=32,           # Alpha for scaling
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Output: trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06%

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    warmup_steps=100,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    save_strategy="epoch",
    optim="adamw_torch"
)

# Train
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    args=training_args,
    tokenizer=tokenizer,
    dataset_text_field="text",
    max_seq_length=512
)

trainer.train()
model.save_pretrained("./fine-tuned-model")

4. QLoRA (4-bit Quantized Fine-Tuning)


from transformers import BitsAndBytesConfig

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

# Load 4-bit quantized model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto"
)

# Prepare for k-bit training
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, lora_config)

# Train as before...

5. Inference with Fine-Tuned Model


from peft import PeftModel

# Load fine-tuned model
base_model = AutoModelForCausalLM.from_pretrained(model_name)
model = PeftModel.from_pretrained(base_model, "./fine-tuned-model")
model = model.merge_and_unload()  # Merge LoRA weights

# Generate
def generate_response(prompt):
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(
        **inputs,
        max_new_tokens=200,
        temperature=0.7,
        do_sample=True
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

response = generate_response("Summarize the benefits of exercise:")
print(response)

Fine-Tuning Comparison

MethodVRAM RequiredTraining TimeQuality
Full Fine-Tuning80GB+LongBest
LoRA16-24GBMediumVery Good
QLoRA (4-bit)8-12GBMediumGood

Start with QLoRA for experimentation, then scale up as needed for production!

🏷️ Tags:
llm fine-tuning lora qlora gpt machine learning transformers

📚 Related Articles