Using a Local Large Language Model (LLM): Interacting with Local LLMs Using PowerShell

As AI continues to evolve, many of us are looking for ways to leverage large language models (LLMs) without relying on cloud services. As we learned in my previous post “Using a Local Large Language Model (LLM): Running Ollama on Your Laptop”, running models locally gives you complete control over your data, eliminates API costs, and can be integrated seamlessly into your existing workflows. Today, I’d like to share how you can interact with local LLMs using PowerShell through the Ollama API. I’m choosing PowerShell for this implementation to showcase the accessibility and simplicity of building a chatbot through the command line, allowing you to get started with this experience quickly and smoothly.

Here’s a GitHub gist with all of the code from this post

Getting Started with Basic Requests

Let’s begin with a simple example and useInvoke-RestMethod to send a prompt to our local large language model (LLM). First, you’ll need to define some parameters. For the -Method, we will use POST. The ContentType will be set to application/json. Finally, for the -Body, you’ll need to create a JSON payload that specifies the the model we want to use; in this case, llama3.1 and also a prompt asking "Who invented PowerShell and why?".

Run the code below in a PowerShell prompt to receive a response from the Ollama API. This process involves sending a single prompt to the model, which yields a reply. While the Invoke-RestMethod command blocks the console until the entire response is received—meaning you won’t see the output streamed—streaming responses are typically preferred for longer replies because they provide immediate feedback. In our previous post, we demonstrated a streaming reply using curl, where the output is displayed in real-time as it arrives. When streaming output is enabled, the returned object type is System.String

$body = @{
 model = "llama3.1"
 prompt = "Who invented PowerShell and why?"
} | ConvertTo-Json -Depth 10 -Compress

# Send the POST request and save the response to a variable
Invoke-RestMethod -Uri "http://localhost:11434/api/generate" -Method Post -ContentType "application/json" -Body $body

{"model":"llama3.1","created_at":"2025-04-20T15:59:05.2947Z","response":"Power","done":false}
{"model":"llama3.1","created_at":"2025-04-20T15:59:05.353658Z","response":"Shell","done":false}
{"model":"llama3.1","created_at":"2025-04-20T15:59:05.411148Z","response":" was","done":false}
...output omitted...
{"model":"llama3.1","created_at":"2025-04-20T15:59:28.066271Z","response":" time","done":false}
{"model":"llama3.1","created_at":"2025-04-20T15:59:28.123685Z","response":".","done":false}
{"model":"llama3.1","created_at":"2025-04-20T15:59:28.181078Z","response":"","done":true,"done_reason":"stop","context":[128006,882,128007,271,15546,36592,75062,323,3249,30,128009,128006,78191,128007,271,15335,26354,574,3549,555,42107,13358,2017,11,264,12717,3063,2068,6783,520,5210,11,3235,449,813,2128,13,578,1176,2373,315,75062,574,6004,304,220,1049,21,439,961,315,5632,35712,382,39727,8233,13358,2017,374,3629,14183,311,439,279,330,23881,1,315,75062,13,1283,374,459,13673,40260,6500,48888,323,11726,889,6575,520,5210,369,927,220,508,1667,1603,60873,304,220,679,24,13,13358,2017,596,11376,369,75062,574,311,1893,264,8147,11,19303,11,323,1327,37864,3290,8614,12811,430,1053,2187,38212,323,13707,311,69711,323,10299,5632,6067,810,30820,382,11439,311,13358,2017,11,279,4623,369,75062,574,9405,704,315,33086,449,279,9669,315,6484,68522,15823,11,1778,439,7309,3626,320,29687,14034,63,3626,8,323,5632,14025,16492,320,7585,39,570,1283,4934,264,4221,430,1436,3493,264,810,34030,323,78223,1648,311,3350,20070,11,439,1664,439,2731,18052,449,5632,34456,382,791,2926,2373,315,75062,574,3196,389,279,7434,315,330,8883,10145,1,1389,2678,7620,5439,304,356,2,477,1023,15823,430,649,387,1511,311,2804,3230,9256,13,4314,5556,10145,1051,6319,311,387,6847,7142,481,11,62671,11,323,470,17877,11,10923,3932,311,1893,6485,20070,555,35271,4382,11545,382,15335,26354,596,2955,9021,5343,1473,16,13,3146,1951,13115,1413,15840,96618,14969,1288,539,1205,311,11196,922,279,16940,8292,3649,26,4619,11,814,649,5357,389,1148,814,1390,311,22829,627,17,13,3146,6035,2968,96618,75062,1288,387,4228,311,1005,439,264,68522,4221,11,10923,3932,311,69711,59177,9256,323,1893,88568,627,18,13,3146,53564,449,5632,34456,96618,578,12811,1288,3493,47970,2680,311,5632,1887,6956,11,1778,439,10106,18524,11,5856,11216,11,323,19989,6373,382,15335,26354,706,2533,3719,459,7718,5507,369,1690,8871,15749,11,449,11028,2561,389,5361,15771,320,13466,11,68278,11,14677,570,6104,13358,2017,374,912,5129,22815,6532,304,279,2447,11,813,20160,9731,1555,279,19564,315,1023,13707,323,3932,889,617,17626,323,13241,75062,927,892,13],"total_duration":26616924500,"load_duration":3474381292,"prompt_eval_count":16,"prompt_eval_duration":250076750,"eval_count":399,"eval_duration":22889818625}

Each line of the response is a single token in the streaming response from Ollama’s API. The JSON structure returned for each token contains the used model name (llama3.1), a timestamp, the actual token content (“Power” - the first part of “PowerShell”), and the “done” status false indicating the response is still in progress. This demonstrates how streaming responses return content incrementally as tokens are generated rather than waiting for the complete answer. But remember, Invoke-RestMethod blocks and outputs as one large string.

Understanding Ollama’s REST API Endpoints

Before moving on to our examples, let’s explore the REST endpoints that Ollama offers. Here’s a list of the API endpoints we use in this post and some upcoming blog posts. To learn more about the additional available API endpoints head over to Ollama’s docs

/API/generate: This is the one we used above, and it is used for one-off text generation without maintaining conversation context.
- Parameters: model, prompt
- Response: Contains the generated text along with metadata like token usage
/api/chat: Build conversational interactions with message history.
- Parameters: model, messages
- The messages array includes objects with role (system/user/assistant) and content
/api/embed: Creates vector embeddings for text, useful for semantic search in databases.
- Parameters: model, input
- Returns numeric vector representations of the input text

Common API Parameters

Ollama’s API endpoints support several parameters that give you fine-grained control:

model: Specifies which model to use (e.g., “llama3.1”, “llama3”, “nomic-embed-text”)
prompt/messages/input: The input text or conversation history
stream: Controls whether responses are streamed (default: true)

Handling Streaming Responses

To achieve a real streaming response using PowerShell, you can use the HttpWebRequest class. The example below shows how to interact with Ollama’s API by creating a POST request to the /API/generate endpoint, writing the JSON payload, reading each response line in real time, and outputting it to the console. This approach gives the user a more interactive experience, an experience most users are used to when interacting with a chatbot.

# Create the HTTP request
$httpRequest = [System.Net.HttpWebRequest]::Create("http://localhost:11434/api/generate")
$httpRequest.Method = "POST"
$httpRequest.ContentType = "application/json"
$httpRequest.Headers.Add("Accept", "application/json")

# Write the body to the request stream
$bodyBytes = [System.Text.Encoding]::UTF8.GetBytes($body)
$httpRequest.ContentLength = $bodyBytes.Length
$requestStream = $httpRequest.GetRequestStream()
$requestStream.Write($bodyBytes, 0, $bodyBytes.Length)
$requestStream.Close()

# Get the response and handle streaming
$responseStream = $httpRequest.GetResponse().GetResponseStream()
$streamReader = New-Object System.IO.StreamReader($responseStream)

# Read the response line by line (streaming)
Write-Output "Streaming Response:"
while ($null -ne ($line = $streamReader.ReadLine())) {
 Write-Output $line
}

# Clean up resources
$streamReader.Close()
$responseStream.Close()

This approach provides more control over the HTTP request and allows us to process the response as it arrives. It creates a more interactive experience, streaming the output to the user as the response comes in from the API. I’ve omitted the output from this example as it looks identical to the first example’s output—it’s just streaming.

Disabling Streaming for Full Responses

Sometimes, you want the complete response at once rather than streamed chunks. To turn off streaming, define the parameter stream in the body of your JSON payload and set it to $false.

# Prepare the request body with streaming disabled
$body = @{
 model = "llama3.1"
 prompt = "Who invented PowerShell and why?"
 stream = $false
} | ConvertTo-Json -Depth 10 -Compress

# Send the POST request
$response_initial_streaming = Invoke-RestMethod -Uri "http://localhost:11434/api/generate" -Method Post -ContentType "application/json" -Body $body

# Output the full response
Write-Output "Full Response (Streaming Disabled):"
Write-Output $response_initial_streaming

When streaming is disabled, the returned object is a System.Management.Automation.PSCustomObject, which has a more structured format than a streaming response, a System.String. This structured format provides easier access to each of the object’s members. For instance, by accessing $response_initial_streaming, you can get the entire response text as a single data attribute. If your application or user can wait for the complete response without streaming, this approach simplifies your coding patterns compared to parsing the streaming response token by token.

# Examine the response structure
$response_initial_streaming | Get-Member

 TypeName: System.Management.Automation.PSCustomObject

Name                 MemberType   Definition
----                 ----------   ----------
Equals               Method       bool Equals(System.Object obj)
GetHashCode          Method       int GetHashCode()
GetType              Method       type GetType()
ToString             Method       string ToString()
context              NoteProperty Object[] context=System.Object[]
created_at           NoteProperty datetime created_at=4/22/2025 10:48:45 AM
done                 NoteProperty bool done=True
done_reason          NoteProperty string done_reason=stop
eval_count           NoteProperty long eval_count=350
eval_duration        NoteProperty long eval_duration=20416535917
load_duration        NoteProperty long load_duration=2470249250
model                NoteProperty string model=llama3.1
prompt_eval_count    NoteProperty long prompt_eval_count=16
prompt_eval_duration NoteProperty long prompt_eval_duration=267175417
response             NoteProperty string response=PowerShell was invented by Jeffrey Snover, a well-known expert in Windows administration and scripting. At the time, Microsoft had been working on…
total_duration       NoteProperty long total_duration=23155164708

Building Conversations with a Chat API Using PowerShell

Now that we understand how to send requests to the API and how the responses are structured let’s move on to creating a conversation with a Chat API using PowerShell. This is where large language models (LLMs) excel: maintaining context across multiple interactions. The chat API facilitates this by allowing us to track the conversation history over subsequent calls to the REST API.

The code below defines the initial state, which sets the foundation for a multi-turn conversation with our local LLM.

First, we create an array using the @() syntax. This array will hold our entire conversation thread. Inside that array, we define two hashtables representing different messages in our conversation.

The first hashtable, with role = "system", sets the model’s behavior. In this case, we instruct it to act as a travel assistant. This system message is invisible to the end user but shapes how the model generates responses. The second hashtable, with role = "user", contains the user’s initial query, establishing the conversation about planning a summer trip to Italy.

By structuring the data this way, we can continue appending to this array as the conversation progresses. Each time the model responds, we’ll add another hashtable with role = "assistant" storing the previous response from the model, and each time our user asks another question, we’ll add another entry with role = "user" adding another question or directive to move the conversation forward.

This pattern enables the model to maintain context throughout the interaction rather than treating each prompt as a standalone question. It transforms what could be a disjointed series of questions and answers into a flowing, contextual conversation where the model remembers previous exchanges. Run this code to initilize the $conversationHistory variable.

# Initialize the conversation history
$conversationHistory = @(
@{
 role    = "system"
 content = "You are a travel assistant who helps users plan trips."
 },
@{
 role    = "user"
 content = "I want to plan a trip to Italy for 6 days in the summer. Can you help me?"
 }
)

After constructing the JSON for the conversation history and storing it in $conversationHistory, the code below sends an API request to the chat endpoint of Ollama’s REST API using Invoke-RestMethod. This makes a POST request to the local Ollama server on port 11434. The JSON payload in $body_part1 includes the model (llama3.1), messages, the conversation history, and the streaming settings we’ve set to false.

The response, which includes the model’s answer to a user’s question about planning a trip to Italy, is stored in $response_part1 for further processing. This represents the initial API call in a multi-turn conversation. Run the code below, and you’ll get your first response from the chat API, helping you start planning your summer trip to Italy.

# First call: Send the initial conversation history
$body_part1 = @{
 model = "llama3.1"
 messages = $conversationHistory
 stream = $false
} | ConvertTo-Json -Depth 10 -Compress
$response_part1 = Invoke-RestMethod -Uri "http://localhost:11434/api/chat" -Method Post -ContentType "application/json" -Body $body_part1

# Output the response from the first call
Write-Output "Response from First Call:"
Write-Output $response_part1

model                : llama3.1
created_at           : 4/22/2025 11:14:05 AM
message              : @{role=assistant; content=Italy is a wonderful destination, especially during the summer when the weather is warm and sunny.
                       
 To start planning your trip, I'll need to know a few details from you:
                       
 1. **What time of year are you traveling in June, July, or August**? Each month has its own charm and pros.
 2. **Where would you like to go in Italy?** Are you interested in visiting specific cities, such as Rome, Florence, Venice, or other regions like Tuscany, Amalfi Coast, or 
 Cinque Terre?
 3. **What type of activities are you interested in doing during your trip?** History and culture, food and wine, beach relaxation, outdoor adventures (hiking, biking), or a 
 mix of everything?
 4. **How will you be traveling?** Solo, with family, friends, or a romantic getaway? And what is your approximate budget for accommodations, transportation, and activities?
 5. **What are your accommodation preferences?** Would you like to stay in an apartment, hotel, villa, or agriturismo (farmstay)?
                       
 Once I have this information, I can start providing personalized recommendations for your 6-day Italian adventure!
                       
 Buon viaggio!}
done_reason          : stop
done                 : True
total_duration       : 18564411625
load_duration        : 3725815209
prompt_eval_count    : 47
prompt_eval_duration : 456217958
eval_count           : 246
eval_duration        : 14377836750

In the output above, the chatbot requests more details about what we’d like from our trip, so let’s continue the conversation. To do this, after each interaction, we will update the conversation history.

We do this by appending a new hashtable to $conversationHistory, with role set to assistant and the content taken from $response_part1.message.content. Each time the user asks another question or provides additional instructions, we will add another entry to the history with role set to user that contains the text of that request. By appending this new information, $conversationHistory will now include the initial request, the assistant’s response, and the user’s subsequent inquiries.

# Add the assistant's response and the next user input to the conversation history
$conversationHistory += @(
@{
 role    = "assistant"
 content = $response_part1.message.content
 },
@{
 role    = "user"
 content = "I'm planning to go in June, and I'd like to visit museums and try local food, and we love wine, so we must go to a salumaria in Rome."
 }
)

Below is the current contents of $conversationHistory, showing both parts of our conversation in the array.

$conversationHistory                                                                                                                     

Name                           Value
----                           -----
content                        You are a travel assistant who helps users plan trips.
role                           system
content                        I want to plan a trip to Italy for 6 days in the summer. Can you help me?
role                           user
content                        Italy is a wonderful destination, especially during the summer when the weather is warm and sunny.…
role                           assistant
content                        I'm planning to go in June, and I'd like to visit museums and try local food, and we love wine, so we must go to a salumaria in Rome.
role                           user

Now we repeat the API call and build a new JSON payload with our parameters for the chat endpoint specifying model, messages, and our stream preference. We set messages to the updated $conversationHistory holding the whole conversation, and then we POST that to the API with Invoke-RestMethod to get our following response.

# Second call: Continue the conversation
$body_part2 = @{
 model = "llama3.1"
 messages = $conversationHistory
 stream = $false
} | ConvertTo-Json -Depth 10 -Compress

$response_part2 = Invoke-RestMethod -Uri "http://localhost:11434/api/chat" -Method Post -ContentType "application/json" -Body $body_part2

# Output the response from the second call
Write-Output "Response from Second Call:"
Write-Output $response_part2

model                : llama3.1
created_at           : 4/22/2025 11:34:32 AM
message              : @{role=assistant; content=June is a great time to visit Italy before the peak summer crowds arrive.
                       
 Based on your interests, here's a suggested itinerary for your 6-day trip:
                       
 **Day 1: Arrival in Rome**
                       
 * Arrive in Rome and check-in to your accommodation (I can recommend some options).
 * Spend the afternoon exploring the city, starting with the Colosseum ( tickets can be booked in advance) and then walking to the Roman Forum and Palatine Hill.
 * In the evening, head to Trastevere neighborhood for dinner and gelato.
                       
 **Day 2: Rome's Museums and Food**
                       
 * Visit the Vatican City, including the Vatican Museums (book tickets in advance) and St. Peter's Basilica.
 * Try some of Rome's famous supplì (fried risotto balls filled with mozzarella) at a local food stand or market.
 * In the evening, head to Monti neighborhood for dinner and wine.
                       
 **Day 3: Salumeria Roscioli**
                       
 * Visit the iconic Salumeria Roscioli in Rome, as you mentioned. This salumaria is famous for its cured meats and cheeses, and it's a great place to try some local 
 specialties.
 * Try some of their delicious pan con tonno (tuna sandwich) or other sandwiches made with their fresh ingredients.
 * Spend the afternoon exploring the charming Campo de' Fiori market.
                       
 **Day 4: Florence**
                       
 * Take an early train to Florence (approximately 2 hours journey).
 * Visit the Uffizi Gallery, one of the world's oldest and most famous art museums (book tickets in advance).
 * Explore the historic city center, including the Ponte Vecchio and Piazza della Signoria.
                       
 **Day 5: Wine Tasting**
                       
 * Take a day trip to the Chianti region, famous for its wine production. Visit a local vineyard or winery for a wine tasting tour (I can recommend some options).
 * Enjoy lunch at a local trattoria serving traditional Tuscan cuisine.
 * Return to Florence in the evening and enjoy dinner at a Michelin-starred restaurant.
                       
 **Day 6: Departure**
                       
 * Spend the morning shopping for souvenirs or visiting any last-minute sights.
 * Depart for home, bringing back memories of your delicious Italian food and wine adventure!
                       
 This is just one possible itinerary, but I can adjust it to fit your preferences and interests. How does this sound?
                       
 Also, keep in mind that June is a great time to visit Italy's countryside, as the weather is warm and sunny, and many vineyards are open for tours and tastings.
                       
 Shall I make any changes or recommendations?}
done_reason          : stop
done                 : True
total_duration       : 34235295667
load_duration        : 16975167
prompt_eval_count    : 339
prompt_eval_duration : 1978689750
eval_count           : 541
eval_duration        : 32236773541

From the response above, you can see we have a six-day trip planned to Italy. You can continue the pattern of appending responses and requests to the conversation history array.

Dynamic Personality Switching

One of the more fun capabilities is dynamically changing the model’s behavior mid-conversation. Here we’ll have our travel assistant adopt the personality of Super Mario by updating the user role with some additional instructions as content

# Add the assistant's response and the next user input to the conversation history
$conversationHistory += @(
@{
 role    = "assistant"
 content = $response_part2.message.content
 },
@{
 role    = "user"
 content = "Mamma Mia, I forgot to tell you that you're Super Mario, can you give me the itinerary again per favore?"
 }
)

# Third call: Continue the conversation but as Super Mario...its-a-me!!!
$body_part3 = @{
 model = "llama3.1"
 messages = $conversationHistory
 stream = $false
} | ConvertTo-Json -Depth 10 -Compress

$response_part3 = Invoke-RestMethod -Uri "http://localhost:11434/api/chat" -Method Post -ContentType "application/json" -Body $body_part3

# Output the response from the fourth call
Write-Output "Response from Third Call:"
Write-Output $response_part3

model                : llama3.1
created_at           : 4/22/2025 11:41:14 AM
message              : @{role=assistant; content=It's-a me, Super Mario!
                       
 Don't worry about forgetting, it's-a easy to forget. Let me give you the itinerary again, with a little extra Italian flair.
                       
 **Day 1: Arrival in Roma**
                       
 * Arrive in Rome and power-jump into your accommodation (I recommend the Hotel Ponte Sisto).
 * After lunch, warp-speed to the Colosseum ( ticketti possono essere prenotati in anticipo) and jump for joy on the Roman Forum and Palatine Hill.
 * In the evening, grab some spaghetti at a local trattoria and gelato for dessert. Don't worry about Goomba's trying to steal your plate!
                       
 **Day 2: Vatican City**
                       
 * Koopa Troopa-proof yourself with a visit to the Vatican Museums ( prenotate i biglietti in anticipo). Be careful, those museums are-a full of ancient treasures!
 * Jump for joy at St. Peter's Basilica and take a spin on the dome for an amazing view.
 * In the evening, head back to Monti neighborhood for dinner and wine. Try some supplì (fried risotto balls filled with mozzarella) - it's-a like a power-up for your taste 
 buds!
                       
 **Day 3: Salumeria Roscioli**
                       
 * Warp-speed to Salumeria Roscioli in Rome. This salumaria is-a famous for its cured meats and cheeses, just like Princess Peach's favorite snacks!
 * Try some of their delicious pan con tonno (tuna sandwich) or other sandwiches made with fresh ingredients from the local market.
 * Spend the afternoon exploring Campo de' Fiori market - it's-a a Koopa-free zone!
                       
 **Day 4: Firenze**
                       
 * Jump into an early train to Florence (approximately 2 hours journey). Be careful, Bowser might try to hijack the train!
 * Visit the Uffizi Gallery, one of the world's oldest and most famous art museums (book tickets in advance - no Goomba's allowed!).
 * Explore the historic city center, including Ponte Vecchio and Piazza della Signoria. Don't worry about Bullet Bills trying to ruin your sightseeing!
                       
 **Day 5: Wine Tasting**
                       
 * Take a day trip to the Chianti region, famous for its wine production (just like Warp Pipes leading to secret vineyards!).
 * Visit a local winery or vineyard for a wine tasting tour. Don't worry about getting lost in the cellar - I'll make sure you find your way back!
 * Enjoy lunch at a local trattoria serving traditional Tuscan cuisine.
                       
 **Day 6: Departure**
                       
 * Spend the morning shopping for souvenirs or visiting any last-minute sights. No Bullet Bills will ruin your souvenir hunt, I promise!
 * Depart for home, bringing back memories of your delicious Italian food and wine adventure!
                       
 That's-a it! Your Super Mario-approved itinerary is ready. Now go forth and have fun in Italy - just like a power-up mushroom makes you bigger, this trip will make your 
 taste buds grow!
done_reason          : stop
done                 : True
total_duration       : 40496171208
load_duration        : 27355958
prompt_eval_count    : 915
prompt_eval_duration : 102155750
eval_count           : 627
eval_duration        : 40359237417

We’re having a little fun there, but as you can see, Super Mario and Princess Peach are sharing their Italian experience with us - it-a-me!

Managing Conversation History

As conversations grow longer, it’s essential to manage the history effectively, and $conversationHistory holds the entire conversation over time, which may require truncation or summarization. You can implement strategies such as truncating older messages, summarizing previous chats, and selectively removing less relevant exchanges to handle this. Further, $conversationHistory is an in-memory structure; you can persist conversations to a database for file for later retrieval, ensuring that the information is not held solely in memory.

Getting Started

To use these examples, you’ll need:

Ollama installed and running locally (available at ollama.ai)
The llama3.1 model pulled (ollama pull llama3.1)
Basic familiarity with PowerShell

Head over to my previous post “Using a Local Large Language Model (LLM): Running Ollama on Your Laptop” to get Ollama up and running.

Conclusion

We can use large language models to create interactive tools that understand natural language and enable chat-based experiences, such as our Italian vacation planner and others like troubleshooting assistants or data analysis. In this implementation, we focus on local models to enhance data privacy, speed, and flexibility without depending on external services that may have performance, availability, or privacy issues.

I am using PowerShell to simplify the process of building chatbots, but this is not the only option available. Any programming language or command-line interface tool capable of communicating with the REST API provided by a large language model can be used.