🦙 Fine-Tuning Llama 3.2 (3B) with Unsloth on the Taskd Dataset

Objective

This project fine-tunes the Llama 3.2 3B Instruct model using the Unsloth framework for efficient LoRA-based adaptation. It trains on a small Taskd dataset to teach the model how to answer questions about the AI automation company Taskd, and deploys the fine-tuned model using vLLM to create an OpenAI-compatible API server for inference.

Tech Summary

Component	Description
Llama 3.2 3B	Meta’s 3-billion parameter instruction-tuned model optimized for chat and reasoning tasks.
Unsloth	Framework providing optimized, authentication-free Llama variants and efficient fine-tuning with LoRA.
vLLM	High-throughput and memory-efficient inference engine supporting OpenAI-compatible APIs for local deployment.
LoRA	Parameter-efficient fine-tuning technique that trains lightweight adapters instead of full model weights.
Weights & Biases (W&B)	Optional experiment tracker for monitoring training metrics and comparing fine-tuned runs.

Prerequisites

Before you begin, ensure you have the following installed:

Python 3.10+
Git
pip (Python package manager)
vllm (can be installed via pip)

Optional but recommended:

GPU with CUDA for faster training
A Weights & Biases (W&B) account for run tracking
uv is a fast Python package and environment manager.
```
pip install uv
```

⚙️ Steps to Follow

1. Download/Clone the repository

git clone https://github.com/Arpnik/taskd-technical-challenge.git

2. Create env using uv (optional) and install dependencies

uv venv .venv
source .venv/bin/activate   # for macOS / Linux

3. Run fine-tuning with default parameters

Update the Weights and biases account API key here in the script: wandb.login(key="xyz")

python fine_tune_taskd-v1.py

Customize hyperparameters

To override defaults via command-line arguments:

python fine_tune_taskd-v1.py \
  --epochs 200 \
  --lr 2e-4 \
  --lora_rank 128 \
  --lr_type cosine \
  --weight_decay 0.02 \
  --run_name "taskd-llama-v5" \
  --max_new_tokens 256

Arguments:

Argument	Description	Default
`--epochs`	Number of training epochs	150
`--lr`	Learning rate	1e-4
`--lora_rank`	LoRA rank	64
`--lr_type`	LR scheduler type	cosine
`--weight_decay`	Optimizer weight decay	0.01
`--run_name`	W&B run name	taskd-llama-finetune-v4
`--max_new_tokens`	Max tokens to generate during testing	512

Note:

The fine-tuned model will be saved in ./taskd_lora_model/ by default and the training loss curve will look something like this:

Model Testing & Inference

After fine-tuning your model, you can test it in two ways:

1. Local Inference — test_fine_tuned_model.py

This script directly loads your fine-tuned LoRA model (taskd_lora_model/) using Unsloth, runs inference, and extracts clean multiline assistant responses.

python test_fine_tuned_model.py [--max_new_tokens 512] [--temperature 0.01]

Optional

Flag	Default	Description
`--max_new_tokens`	512	Maximum number of tokens to generate
`--temperature`	0.01	Controls randomness (lower = more deterministic)

How It Works

Loads your LoRA fine-tuned model in 4-bit precision for fast inference.
Applies the same Llama-3.1 chat template used during training.
Cleans raw model outputs using a regex-based extractor for multiline responses.
Tests a list of example prompts automatically to verify model performance.

2. API Inference via vLLM — run_inference_on_vllm.py

If you’re running your fine-tuned model using vLLM (for efficient inference serving), you can use this client script to query the model through its OpenAI-compatible API.

Prerequisites

First, launch your vLLM server:

vllm serve ./taskd-technical-challenge/taskd_merged_model --port 8000

Then, in another terminal, run:

python run_inference_on_vllm.py

Once started, the client enters interactive Q&A mode.

Optional Arguments

Flag	Default	Description
`--host`	`http://localhost:8000`	vLLM API endpoint
`--max_tokens`	512	Maximum response length
`--temperature`	0.01	Sampling temperature (creativity control)

How It Works

Sends your prompts to the vLLM server via its /v1/chat/completions endpoint.
Returns the assistant’s message using OpenAI-compatible JSON schema.
Handles connection errors gracefully and supports adjustable generation parameters.
The chatbot does not remember previous prompts between messages when using this script.

Environment Notes

How to Verify the Deployment

In a new terminal, verify the server is running and the model is loaded:

curl http://localhost:8000/v1/models

Expected response format:

{
  "object": "list",
  "data": [
    {
      "id": "unsloth/Llama-3.2-3B-Instruct",
      "object": "model",
      "created": <timestamp>,
      "owned_by": "vllm",
      "root": "unsloth/Llama-3.2-3B-Instruct",
      "parent": null,
      "permission": [...]
    }
  ]
}

How to Test Chat Completion

Install jq for JSON formatting (optional but recommended):

apt install jq

Send a test request to the chat completion endpoint:

curl -s http://localhost:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "unsloth/Llama-3.2-3B-Instruct",
    "messages": [
      {"role": "user", "content": "Say hi"}
    ]
  }' | jq -r '.choices[0].message.content'

Expected response: A friendly greeting from the model.

Common Issues

Out of Memory Error:

Reduce --gpu-memory-utilization to 0.7 or 0.8
Reduce --max-model-len to 2048

Connection Refused:

Ensure you're using 0.0.0.0 as the host
Check that port 8000 is not already in use
Wait for the model to fully load before making requests

Model Download Fails:

Verify internet connectivity
Confirm you're using unsloth/Llama-3.2-3B-Instruct (no HuggingFace login needed)

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
assets		assets
README.md		README.md
fine_tune_taskd-v1.py		fine_tune_taskd-v1.py
requirements.txt		requirements.txt
run_inference_on_vllm.py		run_inference_on_vllm.py
test_fine_tunned_model.py		test_fine_tunned_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦙 Fine-Tuning Llama 3.2 (3B) with Unsloth on the Taskd Dataset

Objective

Tech Summary

Prerequisites

⚙️ Steps to Follow

1. Download/Clone the repository

2. Create env using uv (optional) and install dependencies

3. Run fine-tuning with default parameters

Customize hyperparameters

Arguments:

Note:

Model Testing & Inference

1. Local Inference — test_fine_tuned_model.py

Optional

How It Works

2. API Inference via vLLM — run_inference_on_vllm.py

Prerequisites

Optional Arguments

How It Works

Environment Notes

How to Verify the Deployment

How to Test Chat Completion

Common Issues

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Arpnik/taskd-technical-challenge

Folders and files

Latest commit

History

Repository files navigation

🦙 Fine-Tuning Llama 3.2 (3B) with Unsloth on the Taskd Dataset

Objective

Tech Summary

Prerequisites

⚙️ Steps to Follow

1. Download/Clone the repository

2. Create env using uv (optional) and install dependencies

3. Run fine-tuning with default parameters

Customize hyperparameters

Arguments:

Note:

Model Testing & Inference

1. Local Inference — test_fine_tuned_model.py

Optional

How It Works

2. API Inference via vLLM — run_inference_on_vllm.py

Prerequisites

Optional Arguments

How It Works

Environment Notes

How to Verify the Deployment

How to Test Chat Completion

Common Issues

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages