Running a (not-so) large LLM locally

Today, it is possible to run a surprisingly capable large language model (LLM) at home on a modest consumer grade machine. The full LLMs that power services like ChatGPT, Gemini, Bard, or CoPilot have hundreds of billions or even trillions of parameters. This means that they require an absurd amount of memory to run.

When I was attending the Fully Connected conference in May, I spoke with a couple of people who turned me on to ways to run LLMs locally. The methods ranged from the insanely easy to rather easy. Once I became aware of them, I started to see articles and YouTube videos talking about them as well. I thought that I would share my experience using Ollama, which is insanely easy to get started with.

Ollama is an open source framework to download and run LLMs on your own machine. I love two things about this. First, it means that you can preserve your privacy. All of the data and computation remains in an environment that you control completely. Second, using a local machine means that you know the costs – and they are limited to electricity (and what you might have already spent on your hardware). I hate waking up in the morning realizing that my cloud job ran over night and resulted in a big compute bill.

I’ve been using Ollama on linux extensively, but for this article I thought that I would check to see if it worked just as well under Windows 11 (windows version).

Getting started is actually really truly this easy:

  1. Download and run the Windows installer from https://ollama.com/download/OllamaSetup.exe
  2. Open a terminal and run Ollama, telling it which LLM you would like to use.
  3. Chat away!

To use the rather nice 4B parameter Phi3 model from Microsoft it as easy as:

PS C:\...> ollama run phi3
Plaintext

The first time that you do this, Ollama will download the model and print out some information while you wait. This version of the Phi3 model is around 3.8GB in total. You may have time for a coffee and think about ordering a bigger disk!

pulling manifest
pulling b26e6713dc74... 100% ▕█████████████████████████████████████████████████████▏ 2.4 GB
pulling fa8235e5b48f... 100% ▕█████████████████████████████████████████████████████▏ 1.1 KB
pulling 542b217f179c... 100% ▕█████████████████████████████████████████████████████▏  148 B
pulling 8dde1baf1db0... 100% ▕█████████████████████████████████████████████████████▏   78 B
pulling f91db7a2deb9... 100% ▕█████████████████████████████████████████████████████▏  485 B
verifying sha256 digest
writing manifest
removing any unused layers
success
Plaintext

Once the model is downloaded, Ollama sits waiting for your question. Let’s ask it why grass is green.

>>> why is grass green?
Grass appears green to us primarily due to the way it reflects light. The color of an object, in a basic sense, depends on which wavelengths of visible light it absorbs and which ones it reflects. Plants contain chlorophyll, a pigment crucial for photosynthesis—the process by which plants convert sunlight into energy.

Chlorophyll is particularly efficient at absorbing blue and red light frequencies but does not absorb green wavelengths effectively. Instead, it reflects these green light waves back to our eyes more than other colors do. This reflection of the green spectrum makes grass appear predominantly green.

It's also worth mentioning that different types of plants have varying shades of green based on their chlorophyll content and can sometimes contain other pigments, which might give them slightly different hues<br>under certain conditions (e.g., autumn leaves are not purely green because they reflect more red light as well).

This mechanism is an evolutionary adaptation that helps plants optimize the process of photosynthesis while also providing a deterrent to herbivores, who may find bright green foliage less appetizing than other colors.

>>>> Send a message (/? for help)
Plaintext

To me, this seems like a pretty high quality answer to this question. The Phi3 model is not a specialist for botany or physics. It has rather a lot of general knowledge that you can now access locally. Asking it about topics like digital preservation or even relationships and dating gives remarkably thoughtful and complete answers.

I am super impressed with the quality of the Ollama implementation. It is very easy to download, install, and use. It is also very sophisticated and supports a wide range of applications. For example, it will use a GPU to run a model more efficiently, but it will seamlessly blend this with the CPU and regular RAM if it doesn’t fit in the GPU. That means that your machines can run larger models that you might expect.

I want to highlight that a model with 4, 8, or 13 billion parameters is not as capable as one with hundreds of billions or more. Phi3 or Llama13 is not the same as Claude 1.5 or ChatGPT 4o. But they are surprisingly capable simplified versions of these larger models and benefit substantially from their big brothers.

I may follow up with a longer post on using Ollama and my experiences with the various LLMs that are in the Ollama library including the multimodal ones that can handle a mix of images and text. There is also a growing ecosystem of tools and systems to work with a mix of cloud and local models. I’ll be writing about some of them over the coming weeks.

Leave a Reply

Your email address will not be published. Required fields are marked *