Everything you need to know to start fine-tuning LLMs in the privacy of your home

Got a modern Nvidia or AMD graphics card? Custom Llamas are only a few commands and a little data prep away

Fine-tuning is easy, data prep not so much

With all of that out of the way, we need to talk about data. As it turns out, fine-tuning the model isn't the hard part, it's preparing and cleaning your dataset so that the model actually does what you want it to.

So where do you get the data to fine-tune your model? Well, with something like an email assistant or customer service chat bot, you don't have to look far. Simply opening the sent folder in your email will give you a decent starting place for finding organic data. If you are using a local chat assistant like Continue, it may automatically generate training data that can be used to fine-tune models.

If writing code comments is the bane of your existnace, you could always let the AI do it. Just remember to read them before committing.

Who needs GitHub Copilot when you can roll your own AI code assistant at home

READ MORE

While high-quality organic data is ideal, it may not encompass the full range of scenarios you're likely to run into. For example, let's say you'd like the model to generate responses to incoming emails that redirect the sender to a different department or team. If this is something that only happens very occasionally, you may not have a lot of organic data to train with.

This is where using LLMs to generate synthetic data or partially synthetic data can come in handy. Doing this is fairly straightforward and involves feeding a few examples into a model and asking it to generate new data that mimics it. You might need to play around with the prompt until you find something that works.

You can either do this for both the input and output, or generate the inputs and manually write out your response to them. In our testing we found that synthetic data generally lacked nuance, semi-synthetic data rendered good results, and fully organic data worked best.

Regardless of whether you generate your dataset using organic or synthetic data, you will want to take time to clean your dataset by removing things like personally identifiable information or bad samples. For example, if you're fine-tuning a model for customer service or support, you might want to strip out an agent or user's name, number, and other information from your data.

While fine-tuning is most useful for changing the behavior of the model, it will still pick up on details, such as names, which appear consistently through the dataset.

Once you've gathered up your data — you don't actually need much: Even 100 samples might be enough to change the model's behavior in a noticeable way — you need to format it in a way the model can make sense of it. There are lots of ways of doing this but for our purposes we found this JSON template, which uses the Alpaca data format, to work pretty well.

[
    {
      "instruction": "generate an appropriate response to this chat message",
      "input": "I'm having trouble getting Product X to work properly. ",
      "output": "Can you tell me more about what isn't working?"
    },
...
]

Setting up Axolotl

There are numerous frameworks out there for fine-tuning LLMs, such as Unsloth and Hugging Face's Transformers Trainer. However, for this hands on, we're going to be using Axolotl.

The open source project aims to abstract much of the complexity associated with fine-tuning popular language models and boasts support for a wide range of different training techniques, so if you start with QLoRA on your workstation or gaming PC and decide you want to scale up to a full fine-tune in the cloud, you can.

Another benefit of Axolotl is it provides a fairly large library of sample templates for fine-tuning popular LLMs, so you're not having to start from scratch figuring out what hyperparameters to use for your given model or dataset.

Prerequisites

  1. A GPU with at least 16 GB of VRAM. Any reasonably modern Nvidia GPU should work just fine. For those on Team Red, you'll want an AMD Radeon RX 7900 or better for the job. We tested with an RTX 3090 TI 24GB, RTX 6000 Ada Generation 48GB, AMD Radeon RX 7900 XT 20GB, and Radeon Pro W7900 48GB.
  2. For this guide, we're going to keep things simple and use Ubuntu Desktop 24.04.  
  3. The latest GPU drivers and CUDA (Nvidia) or ROCm (AMD) binaries for your card. Since this process can be a bit of a headache if you've never done it before, we'll go over settings these on Ubuntu 24.04.
  4. We also assume you're comfortable using the command line on a Linux system. The following instructions involve running commands in a terminal in such an environment.

Because set up is slightly different for Nvidia and AMD cards, we'll be breaking those out into sections.

More about

TIP US OFF

Send us news


Other stories you might like