Using a Local Large Language Model (LLM): Running Ollama on Your Laptop

You can now run powerful LLMs like Llama 3.1 directly on your laptop using Ollama. There is no cloud, and there is no cost. Just install, pull a model, and start chatting, all in a local shell.

Large Language Models (LLMs) have revolutionized how we interact with data and systems, but many assume you need significant cloud resources or specialized hardware to run them. Today, I want to walk you through getting started with Ollama, an approachable tool that lets you run large language models locally on your laptop.

Why Run LLMs Locally?

Before diving into the details of working with a local LLM, let’s address why you might want to run LLMs on your own hardware:

Privacy: Your data never leaves your machine
Cost control: No cloud, no usage based billing
Offline usage: Work anywhere without an internet connection
Learning: Better understanding of how these models work
Customization: Fine tune models for your specific needs

Installing Ollama

On macOS, we can use Homebrew:

brew install ollama

For Linux users, the official installation script works well:

curl -fsSL https://ollama.com/install.sh | sh

Windows users can download the installer from https://ollama.com/download/windows.

You might be wondering why I’m not using a Docker container, which is typically my preferred choice for tasks like this. Currently, Docker Desktop and OrbStack do not support GPU passthrough, and allowing Ollama access to the GPU is essential for optimal performance. While running on the CPU is possible, performance will not be great. For those looking for a solution, Docker Model Runner is an excellent (and emerging) option. It enables GPU acceleration for large language models (LLMs) on Docker Desktop. I’ll be blogging about this topic in the near future.

Starting the Service

Once installed, we need to start the Ollama service:

ollama serve &

On Linux and MacOS, the ampersand (&) runs the Ollama process in the background, freeing up your terminal for further commands. Windows users, open a new terminal.

Launching an Interactive Session

Let’s start with a simple interaction using the llama3.1 model at the command line. Since we specified the llama3.1 model, Ollama will download this model to your local machine.

ollama run llama3.1

pulling manifest 
pulling 667b0c1932bc...   6% ▕███████                                                                                                             ▏ 310 MB/4.9 GB   20 MB/s   3m42s[pulling manifest 
pulling 667b0c1932bc...  85% ▕██████████████████████████████████████████████████████████████████████████████████████████████████                  ▏ 4.2 GB/4.9 GB   22 MB/s     34s[pulling manifest 
pulling 667b0c1932bc...  97% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████   ▏ 4.8 GB/4.9 GB   23 MB/s      5s[pulling manifest 
pulling 667b0c1932bc... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.9 GB                         pulling manifest 
pulling 667b0c1932bc... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.9 GB                         
pulling 667b0c1932bc... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.9 GB                         
pulling 667b0c1932bc... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.9 GB                         
pulling 667b0c1932bc... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.9 GB                         
pulling 948af2743fc7... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.5 KB                         
pulling 0ba8f0e314b4... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  12 KB                         
pulling 56bb8bd477a5... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   96 B                         
pulling 455f34728c9b... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  487 B                         
verifying sha256 digest 
writing manifest 
success 

>>> Send a message (/? for help)

Interacting with a Model at the CLI

Once the model is downloaded, you will be dropped into the Ollama CLI, where you can interact with the LLM by typing at the prompt and pressing enter.

This launches an interactive session where you can chat with the model. In the output below, you can see that I asked the LLM, Give me a short sentence describing what a large language model is?

Type /bye when you’re done to exit.

Please give me a short sentence describing what a large language model is.

A large language model is a computer program that can process and understand human language 
to generate responses, answers, or text based on the input it receives.

>>> /bye

Listing and Managing Models

Once you’ve pulled and ran a model on your local machine, you can check a model’s status and verify that they are installed by running:

ollama ps

NAME               ID              SIZE      PROCESSOR    UNTIL              
llama3.1:latest    46e0c10c039e    7.0 GB    100% GPU     4 minutes from now

This command lists all currently running models and provides information such as their ID, processor usage, and the remaining time for the session.

To see what models are available locally:

ollama list
NAME                       ID              SIZE      MODIFIED       
llama3.1:latest            46e0c10c039e    4.9 GB    44 hours ago

Downloading Additional Models

Ollama makes it easy to access various LLMs, allowing you to experiment with different models depending on your use case. Here’s how you can pull models to your local machine:

ollama pull llama3

In this case, we’re pulling the llama3 model, but you can substitute the model name with any available model, such as llama3.1, nomic-embed-text, or others. Ollama will download the model and its associated files to your local storage.

During the download process, you’ll see progress logs similar to the following:

pulling manifest 
pulling 6a0746a1ec1a... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.7 GB                         
pulling 4fa551d4f938... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  12 KB                         
pulling 8ab4849b038c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  254 B                         
pulling 577073ffcc6c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  110 B                         
pulling 3f8eb4da87fa... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  485 B                         
verifying sha256 digest 
writing manifest 
success

Using ollama list again, we can see that the additional model has been downloaded.

ollama list

NAME                       ID              SIZE      MODIFIED       
llama3:latest              365c0bd3c000    4.7 GB    44 minutes ago    
llama3.1:latest            46e0c10c039e    4.9 GB    44 hours ago

Choosing Between Models

Ollama supports a range of models, each with different capabilities and optimizations. For instance, the llama3.1 model is optimized for general-purpose tasks, while models like nomic-embed-text are designed for more specific applications, such as text embedding for semantic search. We’ll be using nomic-embed-text later in this blog series.

Head over to https://ollama.com/search for a listing of available models.

Exploring the REST API

Ollama provides a REST API for easy integration into applications. After pulling and starting a model, you can interact with it programmatically. The API is accessible at http://localhost:11434 by default. You can send requests to perform actions like generating completions, embedding text, or managing models. When you submit a request to the API, by default, it will respond using a streaming response, which enables faster retrieval of partial responses, improving user experience in interactive applications.

curl -k http://localhost:11434/api/generate \
 -H "Content-Type: application/json" \
     -d '{ 
 "model":  "llama3.1", 
 "prompt": "Please give me a short sentence describing what a large language model is." 
 }'


{"model":"llama3.1","created_at":"2025-04-18T00:24:01.921459Z","response":"A","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:01.996083Z","response":" large","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:02.063884Z","response":" language","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:02.129147Z","response":" model","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:02.189396Z","response":" (","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:02.250582Z","response":"LL","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:02.31109Z","response":"M","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:02.374554Z","response":")","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:02.437303Z","response":" is","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:02.498758Z","response":" a","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:02.561764Z","response":" computer","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:02.624437Z","response":" program","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:02.688192Z","response":" that","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:02.756787Z","response":" uses","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:02.829267Z","response":" artificial","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:02.895286Z","response":" intelligence","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:02.956524Z","response":" to","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:03.016778Z","response":" process","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:03.07894Z","response":" and","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:03.139738Z","response":" generate","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:03.201174Z","response":" human","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:03.262151Z","response":"-like","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:03.3223Z","response":" text","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:03.385279Z","response":" based","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:03.446721Z","response":" on","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:03.507238Z","response":" patterns","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:03.568208Z","response":" learned","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:03.629634Z","response":" from","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:03.691328Z","response":" massive","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:03.752396Z","response":" amounts","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:03.812789Z","response":" of","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:03.873268Z","response":" data","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:03.934768Z","response":".","done":false}
{"model":"llama3.1","created_at":"2025-04-18T00:24:03.996892Z","response":"","done":true,"done_reason":"stop","context":[128006,882,128007,271,36227,757,264,2875,11914,23524,1148,264,3544,4221,1646,374,30,128009,128006,78191,128007,271,32,3544,4221,1646,320,4178,44,8,374,264,6500,2068,430,5829,21075,11478,311,1920,323,7068,3823,12970,1495,3196,389,12912,9687,505,11191,15055,315,828,13],"total_duration":2377351459,"load_duration":18090834,"prompt_eval_count":23,"prompt_eval_duration":280934292,"eval_count":34,"eval_duration":2077071583}

The response includes the answer and metadata about processing time and additional metadata.

Performance Considerations

The performance of these models significantly depends on your hardware, particularly the capability of your GPU. While Ollama is designed for ease of use, it is important to keep a few key factors in mind when assessing performance.

GPU Usage: Ollama is compatible with CPU and GPU systems. It uses the neural engine for enhanced performance on machines that use Apple Silicon. Running Ollama with a GPU is preferred as it significantly improves responsiveness. You may experience slower response times on systems with only a CPU.
Disk Space: Models can range from a few GBs to tens of GBs. Be mindful of how much disk space each model consumes on your system, and remove models that are no longer being used.

Integrating with Databases

I’m a database pro and am particularly interested in how LLMs can enhance database operations. For example, with embedding models like nomic-embed-text, you can:

Generate embeddings from your database’s text fields
Store these vectors in Azure SQL Database, and soon SQL Server 2025
Perform semantic similarity searches
Build intelligent query interfaces for your data using natural language prompts

This bridges the gap between traditional structured data and natural language processing. I’ll be blogging on this topic in some upcoming posts.

Conclusion

Running large language models (LLMs) locally with Ollama democratizes access to AI technologies. What once required specialized knowledge and advanced hardware is now accessible to anyone with a reasonably powerful laptop.

In future posts, I will explore how to integrate these locally hosted models with Azure SQL Database. This will enable hybrid solutions that combine structured data with natural language AI.