Software

AI + ML

Who needs GitHub Copilot when you can roll your own AI code assistant at home

Here's how to get started with the open source tool Continue


Hands on Code assistants have gained considerable attention as an early use case for generative AI – especially following the launch of Microsoft's GitHub Copilot. But, if you don't relish the idea of letting Microsoft loose on your code or paying $10/month for the privilege, you can always build your own.

While Microsoft was among the first to commercialize an AI code assistant and integrate it into an IDE, it's far from the only option out there. In fact, there are numerous large language models (LLMs) trained specifically with code generation in mind.

What's more, there's a good chance the computer you're sitting in front of right now is capable of running these models. The trick is integrating them into an IDE in a way that's actually useful.

This is where apps like Continue come into play. The open source code assistant is designed to plug into popular IDEs like JetBrains or Visual Studio Code and connect to popular LLM runners you might already be familiar with – like Ollama, Llama.cpp, and LM Studio.

Like other popular code assistants, Continue supports code completion and generation, as well as the ability to optimize, comment, or refactor your code for different use cases. Additionally, Continue also sports an integrated chatbot with RAG functionality, which effectively allows you to talk to your codebase.

We'll be looking at using Continue with Ollama in this guide, but the app also works with several proprietary models – including OpenAI and Anthropic – via their respective APIs, if you'd rather pay per token than a fixed monthly price.

Here's what you'll need:

  1. A machine capable of running modest LLMs. A system with a relatively recent processor will work, but for best performance, we recommend a Nvidia, AMD, or Intel GPU with at least 6GB of vRAM. If you're more of a Mac person, any Apple Silicon system, including the original M1, should work just fine – though we do recommend at least 16GB of memory for best results.
  2. This guide also assumes you have the Ollama model runner set up and running on your machine. If you don't, you can find our guide here, which should have you up in running in less than ten minutes. For those with Intel Integrated or Arc graphics, you can find a guide for deploying Ollama with IPEX-LLM here.
  3. A compatible IDE. At the time of writing Continue supports both JetBrains and Visual Studio Code. If you'd like to skip Microsoft's telemetry entirely, as we do, the open source community build – VSCodium – works just fine too.

Installing Continue

For this guide, we'll be deploying Continue in VSCodium. To get started, launch the IDE and open the extensions panel. From there, search for and install "Continue."

After a few seconds, Continue's initial setup wizard should launch, directing you to choose whether you'd like to host your models locally or tap into another provider's API.

In this case, we're going to host our models locally via Ollama, so we'll select "Local models." This will configure Continue to use the following models out of the box. We'll discuss how to change these out for alternative ones in a bit, but for now these offer a good starting place:

If for whatever reason Continue skips past the launch wizard, don't worry, you can pull these models manually using Ollama by running the following in your terminal:

ollama pull llama3
ollama pull nomic-embed-text
ollama pull starcoder2:3b

For more information on setting up and deploying models with Ollama, check out our quick start guide here.

Telemetry warning:

Before we continue, it's worth noting that by default, Continue collects anonymized telemetry data including:

You can opt out of this by modifying the .continue file located in your home directory or by unticking the "Continue: Telemetry Enabled" box in VS Code settings.

More information on Continue's data gathering policies can be found here.

Ask and you will receive. Will it work? That's another story

With the installation out of the way, we can start digging into the various ways to integrate Continue into your workflow. The first of these is arguably the most obvious: generating code snippets from scratch.

If, for example, you wanted to generate a basic web page for a project, you'd press Ctrl-I or Command-I on your keyboard and enter your prompt in the action bar.

In this case, our prompt was "Generate a simple landing page in HTML with inline CSS." Upon submitting our prompt, Continue loads the relevant model – this can take a few seconds depending on your hardware – and presents us with a code snippet to accept or reject.

Code generated in Continue will appear in VS Code in green blocks which you can approve or reject.

Reworking your code

Continue can also be used to refactor, comment, optimize, or otherwise edit your existing code.

For example, let's say you've got a Python script for running an LLM in PyTorch that you want to refactor to run on an Apple Silicon Mac. You'd start by selecting your document, hitting Ctrl-I on your keyboard and prompting the assistant to do just that.

After a few seconds, Continue passes along the model's recommendations for what changes it thinks you should make – with new code highlighted in green and code marked for removal marked with red.

In addition to refactoring existing code, this functionality can also be useful for generating comments and/or docstrings after the fact. These functions can be found under "Continue" in the right-click context menu.

Tab auto completion

While code generation can be useful for quickly mocking up proof of concepts or refactoring existing code, it can still be a little hit and miss depending on what model you're using.

Anyone who's ever asked ChatGPT to generate a block of code will know that sometimes it just starts hallucinating packages or functions. These hallucinations do become pretty obvious, since bad code tends to fail rather spectacularly. But, as we've previously discussed, these hallucinated packages can become a security threat if suggested frequently enough.

If letting an AI model write your code for you is a bridge too far, Continue also supports code completion functionality. That at least gives you more control over what edits or changes the model does or doesn't make.

This functionality works a bit like tab completion in the terminal. As you type, Continue will automatically feed your code into a model – like Starcoder2 or Codestral – and offer suggestions for how to complete a string or function.

The suggestions appear in gray and are updated with each keystroke. If Continue guesses correctly, you can accept the suggestion by pressing the Tab on your keyboard.

Chatting with your codebase

Along with code generation and prediction, Continue features an integrated chatbot with RAG-style functionality. You can learn more about RAG in our hands-on guide here, but in the case of Continue, it uses a combination of Llama 3 8B and the nomic-embed-text embedding model to make your codebase searchable.

Continue features an integrated chatbot that ties into your LLM of choice.

This functionality is admittedly a bit of a rabbit hole, but here are a couple of examples of how it can be used to speed up your workflow:

Changing out models

How reliably Continue actually is in practice really depends on what models you're using, as the plug-in itself is really more of a framework for integrating LLMs and code models into your IDE. While it dictates how you interact with these models, it has no control over the actual quality of the generated code.

The good news is Continue isn't married to any one model or technology. As we mentioned earlier it plugs into all manner of LLM runners and APIs. If a new model is released that's optimized for your go-to programming language, there's nothing stopping you – other than your hardware of course – from taking advantage of it.

And since we're using Ollama as our model server, swapping out models is, for the most part, a relatively straightforward task. For example, if you'd like to swap out Llama 3 for Google's Gemma 2 9B and Starcoder2 for Codestral you'd run the following commands:

ollama pull gemma2
ollama pull codestral

Note: At 22 billion parameters and with a context window of 32,000 tokens, Codestral is a pretty hefty model to run at home even when quantized to 4-bit precision. If you're having trouble with it crashing, you may want to look at something smaller like DeepSeek Coder's 1B or 7B variants.

To swap out the model used for the chatbot and code generator you can select it from Continue's selection menu. Alternatively, you can cycle through downloaded models using Ctrl-'

Changing out the model used for the tab autocomplete functionality is a little trickier and requires tweaking the plug-in's config file.

After pulling down your model of choice [1], click on the gear icon in the lower right corner of the Continue sidebar [2] and modify "title" and "model" entries under "tabAutocompleteModel" section [3]. If you're using Codestral, that section should look something like this:

  "tabAutocompleteModel": {
    "title": "codestral",
    "provider": "ollama",
    "model": "codestral"
  },

Fine-tuning a custom code model

By default, Continue automatically collects data on how you build your software. The data can be used to fine-tune custom models based on your particular style and workflows.

To be clear, this data is stored locally under .continue/dev_data in your home directory, and, from what we understand, isn't included in the telemetry data Continue gathers by default. But, if you're concerned, we recommend turning that off.

The specifics of fine-tuning large language models are beyond the scope of this article, but you can find out more about the kind of data collected by the app and how it can be utilized in this blog post.

We hope to explore fine-tuning in more detail in a future hands-on, so be sure to share your thoughts on local AI tools like Continue as well as what you'd like to see us try next in the comments section. ®

Editor's Note: The Register was provided an RTX 6000 Ada Generation graphics card by Nvidia and an Arc A770 GPU by Intel to support stories like this. Neither supplier had any input as to the contents of this and other articles.

Send us news
28 Comments

Python dethrones JavaScript as the most-used language on GitHub

Yearly report finds explosion of GenAI projects, new users from outside the coding community responsible for boost

AI firms and civil society groups plead for passage of federal AI law ASAP

Congress urged to act before year's end to support US competitiveness

Apple quietly admits 8GB isn't enough in 2024, M4 iMac to ship with 16GB as standard

The silicon no longer limited to Cupertino's priciest iPads

Chinese chips, quantum and AI now on US investment blacklist

Wouldn’t want to inadvertently fund the PLA

Meta gives nod to weaponizing Llama – but only for the good guys

Change of mind follows discovery China was playing with it uninvited?

UK’s new Minister for Science and Technology comes to US touting Britain's AI benefits

$82B in investment shows we've still got it as a nation

Linus Torvalds: 90% of AI marketing is hype

Linux kernel creator says let's see which workloads use GenAI in five years

OpenAI loses another senior figure, disperses safety research team he led

Artificial General Intelligence readiness advisor Miles Brundage bails, because nobody is ready

Polish radio station ditches DJs, journalists for AI-generated college kids

Station claims it's visionary, ex-employees claim it's cynical; reality appears way more fiscal

UK gov report to propose special zones for datacenters, 'AI visas'

Vendors not keen on 'lengthy bureaucracy,' and cost when they try to hire skilled foreigners

Apple throws shade on pokey AI PCs, claims its maxed out M4 chips are 4x faster

Busy week for Cupertino sees shrunken Mac minis, updated lappies, and new SoCs

Meta spruiks benefits of open sourcing Llama models – to its own bottom line

It's not like Zuck needs the coin despite increased infrastructure spend, headcount, losses on VR