This article is more than 1 year old
You can now fine-tune OpenAI's GPT-3.5 for specific tasks – it may even beat GPT-4
And work out cheaper than top-end model
Developers can now fine-tune OpenAI's GPT-3.5 Turbo model to improve its performance on specific tasks – making it potentially more effective and cheaper to run that OpenAI's ostensibly more-advanced GPT-4 model.
Fine-tuning allows users to better shape the behaviors and capabilities of an already-trained large language model by further training it on carefully chosen, custom data. A health-and-wellness chatbot, for example, powered by a language model fine-tuned on additional medical advice will be more likely to generate more accurate and effective responses than a general off-the-shelf system.
In some cases, therefore, it may be better for organizations to fine-tune OpenAI's GPT-3.5 Turbo than use GPT-4, the latter of which OpenAI has billed as a superior model.
"Early tests have shown a fine-tuned version of GPT-3.5 Turbo can match, or even outperform, base GPT-4-level capabilities on certain narrow tasks," OpenAI said. The machine-learning super lab said it should be possible to, for example, guide the model during fine-tuning so that it consistently generates text in a specific language, tone, or structure.
Without fine-tuning, developers have to come up with better input prompts to instruct the large language model on how to behave and complete tasks.
Every time the model is run, OpenAI charges users for the amount of tokens it has to process in the input prompt as well as the number of tokens generated in its output. A token is a portion of a word; for English words, four or so characters equals one token. Fine-tuning can help reduce these costs if they can squeeze the same performance from the model by using a shorter input prompt.
A customized GPT-3.5 Turbo model can save developers money in the long run if it's cheaper to run and just as effective if not more in some use cases compared to GPT-4 out of the box, OpenAI claimed.
We can see that being the case: GPT-4 is more expensive to use than GPT-3.5 Turbo; if a fine-tuned GPT-3.5 Turbo model continues to be cheaper than GPT-4, then, well, there's your savings. GPT-4 is supposed to be more powerful than GPT-3.5 – though bear in mind regressions are possible, as we previously detailed – and a fine-tuned GPT-3.5 model may be able to catch up with or overtake the general-purpose GPT-4.
Also, don't forget: GPT-4 and GPT-3.5 are at the heart of the ChatGPT bot; they are also accessible via OpenAI's API. As always with LLMs, take them with a pinch of salt: they are likely to make stuff up or get things wrong so confidently, you may not even notice.
- Hallucinating ChatGPT finds a role playing Dungeons & Dragons
- Humans stressed out by content moderation? Just use AI, says OpenAI
- Our AI habit is already changing the way we build datacenters
- ChatGPT's odds of getting code questions correct are worse than a coin flip
A quick look at OpenAI's pricing page shows that it costs users $0.012 and $0.016 per 1,000 tokens to process inputs and generate outputs for a fine-tuned GPT-3.5 Turbo model, respectively, which is cheaper than the base $0.03 and $0.06 per 1,000 tokens for the same input and outputs for GPT-4. Bear in mind, fine-tuning GPT-3.5 Turbo will result in additional training costs of an estimated $0.008 per 1,000 tokens.
That said, it's difficult to get a proper apples-to-apples comparison from OpenAI since the operational cost of a model is dependent on the size of the context window - the maximum number of tokens a model can process per input query - which differs depending on the model configuration. Here is the pricing for GPT-4, which offers 8,000 and 32,000-token context windows:
GPT-4 model | Input | Output |
---|---|---|
8K context | $0.03 / 1K tokens | $0.06 / 1K tokens |
32K context | $0.06 / 1K tokens | $0.12 / 1K tokens |
Yet the context window size for a fine-tuned GPT-3.5 Turbo model is not given; it may be less than 16,000. We know the context window sizes of GPT-4 and GPT-3.5 Turbo but not a fine-tuned GPT-3.5 Turbo. The Register has asked OpenAI for clarification; it did not offer an answer.
Below is the pricing for the base GPT-3.5 Turbo model, with 4,000 and 16,000-token context windows, without fine-tuning:
GPT-3.5 model | Input | Output |
---|---|---|
4K context | $0.0015 / 1K tokens | $0.002 / 1K tokens |
16K context | $0.003 / 1K tokens | $0.004 / 1K tokens |
OpenAI estimated that fine-tuning a model on training data containing 100,000 tokens for three runs will cost $2.40.
"Fine-tuning GPT models can make them better for specific applications, but it requires a careful investment of time and effort. We recommend first attempting to get good results with prompt engineering, prompt chaining (breaking complex tasks into multiple prompts), and function calling," it said.
Meanwhile, running a fine-tuned GPT-3.5 Turbo model can cost up to eight times more than a base GPT 3.5 Turbo model. It's $0.002 per 1,000 tokens to generate output from normal GPT-3.5 Turbo, and $0.016 per 1,000 tokens for a fine-tuned GPT-3.5 Turbo. Below is the full pricing:
Fine-tuned model | Training | Input usage | Output usage |
---|---|---|---|
babbage-002 | $0.0004 / 1K tokens | $0.0016 / 1K tokens | $0.0016 / 1K tokens |
davinci-002 | $0.0060 / 1K tokens | $0.0120 / 1K tokens | $0.0120 / 1K tokens |
GPT-3.5 Turbo | $0.0080 / 1K tokens | $0.0120 / 1K tokens | $0.0160 / 1K tokens |
Companies will have to figure out whether it's worth paying upfront to fine-tune a model for a specific task, or defining a more efficient prompt to save downstream costs of running it in production.
And yes, it appears fine-tuned models are private to their respective developers, and training data for fine-tuning will be moderated.
OpenAI plans to offer fine-tuning capabilities for GPT-4 later this year. We'll wait and see what pricing is like on that. ®