In the age of AI, we often rely on cloud giants like OpenAI or Anthropic. But what if you could have a powerful AI running entirely on your laptop? No internet required, no subscription fees, and 100% data privacy.
Today, we are going to build a Local GPT using two powerful tools: Ollama (to run the AI) and Streamlit (to build the interface).
1. The Concepts: What & Why? 🤔
What is “Local GPT”?
A Local GPT is a Large Language Model (LLM) that runs on your own hardware (CPU or GPU) rather than on a remote server. Instead of sending your prompt to the cloud and waiting for a response, your own computer does the math and generates the text.
Why do we need Local GPTs?
- 🛡️ Privacy: Your data never leaves your machine. Perfect for sensitive documents or personal journals.
- 💰 Cost: Once you have the hardware, running the model is free.
- ✈️ Offline Access: You can code, write, or brainstorm on a plane without Wi-Fi.
- ⚡ Latency: For smaller models, local inference can be incredibly snappy.
2. Prerequisites: The Setup 🛠️
Before we write the Python code, we need to set up the “brain” of our operation.
Step 1: Install Ollama
Ollama is the tool that actually runs the AI models on your computer in the background.
💡 Action: Go to the Official Ollama Website, download the installer for your OS, and run it.
Step 2: Download (“Pull”) a Model
Just like a video game console needs a game cartridge, Ollama needs a Model to work. We need to download these weights from the internet once.
Open your terminal (or Command Prompt) and run one of these commands:
# To download the popular Llama3 model (approx 4.7GB)
ollama pull llama3
# OR for a lighter, faster model (good for older laptops)
ollama pull mistral
Once the progress bar hits 100%, you are ready to code!
3. The Code: Building the App 💻
We will build the interface using Streamlit. Create a file named app.py and let’s break it down section by section.
Section 1: Imports & Page Config
First, we import our libraries and set up the browser tab settings.
import streamlit as st
from ollama import Client
st.set_page_config(
page_title="Local Ollama Chat",
page_icon="👾",
layout="centered" # Options: "centered" or "wide"
)
import streamlit: The framework that handles the UI (buttons, chat inputs).from ollama import Client: The bridge that connects Python to your local AI model.layout="centered": keeps the chat in a neat middle column, similar to ChatGPT.
Section 2: Sidebar & Connection
This section connects to the Ollama server (running at localhost:11434) and checks which models you have installed.
with st.sidebar:
st.title("🤖 Configuration")
# Allow manual URL configuration
ollama_url = st.text_input("Ollama URL", value="http://localhost:11434")
client = Client(host=ollama_url)
# Fetch available models
try:
models_info = client.list()
# Extract model names from the response
model_names = [m['model'] for m in models_info['models']]
st.success(f"Connected to Ollama! Found {len(model_names)} models.")
except Exception as e:
st.error(f"Failed to connect: {e}")
st.info("💡 Tip: Try changing 'localhost' to '127.0.0.1'.")
model_names = []
if model_names:
selected_model = st.selectbox("Select a Model", model_names, index=0)
else:
selected_model = "llama3"
st.warning("No models detected. Using default 'llama3'.")
Client(host=...): Initializes the connection.try/except: This makes the app robust. If Ollama isn’t running, the app won’t crash; instead, it shows a helpful error message.st.selectbox: Allows you to swap betweenllama3andmistralinstantly.
Section 3: Chat Logic & Memory
Streamlit reruns the whole script on every interaction. We need Session State to remember the conversation history.
st.title("👾 Local Chatbot")
# Initialize chat history
if "messages" not in st.session_state:
st.session_state.messages = []
# Display chat history
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
st.session_state: Acts as the app’s short-term memory.- The Loop: Ensures that when you send a new message, the old messages stay on the screen.
Section 4: Processing the Response
Finally, we handle the user input and stream the AI’s reply.
if prompt := st.chat_input("What is on your mind?"):
# 1. Add user message to history
st.session_state.messages.append({"role": "user", "content": prompt})
# 2. Display user message
with st.chat_message("user"):
st.markdown(prompt)
# 3. Generate response
with st.chat_message("assistant"):
message_placeholder = st.empty()
full_response = ""
try:
stream = client.chat(
model=selected_model,
messages=[
{"role": m["role"], "content": m["content"]}
for m in st.session_state.messages
],
stream=True,
)
# Stream the response word by word
for chunk in stream:
content = chunk['message']['content']
full_response += content
message_placeholder.markdown(full_response + "▌")
# Final update without cursor
message_placeholder.markdown(full_response)
# 4. Save assistant response to history
st.session_state.messages.append({"role": "assistant", "content": full_response})
except Exception as e:
st.error(f"An error occurred: {str(e)}")
stream=True: This is critical. Instead of waiting for the full answer, we get it token-by-token."▌": We add this character to simulate a typing cursor, making the app feel alive.
Conclusion
And there you have it! With less than 100 lines of Python, you have a private, secure, and free AI chatbot running on your own machine.
Next Steps:
- Run the app:
streamlit run app.py - Try pulling different models like
codellamafor programming help! - Star the repo on GitHub: iamharisai/localgpt to get updates on new implementations like Chainlit and RAG!
Happy Coding! 🚀