Build Your Own Local GPT

Privacy, Speed & Python

In the age of AI, we often rely on cloud giants like OpenAI or Anthropic. But what if you could have a powerful AI running entirely on your laptop? No internet required, no subscription fees, and 100% data privacy.

Today, we are going to build a Local GPT using two powerful tools: Ollama (to run the AI) and Streamlit (to build the interface).

1. The Concepts: What & Why? 🤔

What is “Local GPT”?

A Local GPT is a Large Language Model (LLM) that runs on your own hardware (CPU or GPU) rather than on a remote server. Instead of sending your prompt to the cloud and waiting for a response, your own computer does the math and generates the text.

Why do we need Local GPTs?

  • 🛡️ Privacy: Your data never leaves your machine. Perfect for sensitive documents or personal journals.
  • 💰 Cost: Once you have the hardware, running the model is free.
  • ✈️ Offline Access: You can code, write, or brainstorm on a plane without Wi-Fi.
  • ⚡ Latency: For smaller models, local inference can be incredibly snappy.

2. Prerequisites: The Setup 🛠️

Before we write the Python code, we need to set up the “brain” of our operation.

Step 1: Install Ollama

Ollama is the tool that actually runs the AI models on your computer in the background.

💡 Action: Go to the Official Ollama Website, download the installer for your OS, and run it.

Step 2: Download (“Pull”) a Model

Just like a video game console needs a game cartridge, Ollama needs a Model to work. We need to download these weights from the internet once.

Open your terminal (or Command Prompt) and run one of these commands:

# To download the popular Llama3 model (approx 4.7GB)
ollama pull llama3

# OR for a lighter, faster model (good for older laptops)
ollama pull mistral

Once the progress bar hits 100%, you are ready to code!


3. The Code: Building the App 💻

We will build the interface using Streamlit. Create a file named app.py and let’s break it down section by section.

Section 1: Imports & Page Config

First, we import our libraries and set up the browser tab settings.

import streamlit as st
from ollama import Client

st.set_page_config(
    page_title="Local Ollama Chat", 
    page_icon="👾",
    layout="centered" # Options: "centered" or "wide"
)
  • import streamlit: The framework that handles the UI (buttons, chat inputs).
  • from ollama import Client: The bridge that connects Python to your local AI model.
  • layout="centered": keeps the chat in a neat middle column, similar to ChatGPT.

Section 2: Sidebar & Connection

This section connects to the Ollama server (running at localhost:11434) and checks which models you have installed.

with st.sidebar:
    st.title("🤖 Configuration")
    
    # Allow manual URL configuration
    ollama_url = st.text_input("Ollama URL", value="http://localhost:11434")
    client = Client(host=ollama_url)

    # Fetch available models
    try:
        models_info = client.list()
        # Extract model names from the response
        model_names = [m['model'] for m in models_info['models']]
        st.success(f"Connected to Ollama! Found {len(model_names)} models.")
    except Exception as e:
        st.error(f"Failed to connect: {e}")
        st.info("💡 Tip: Try changing 'localhost' to '127.0.0.1'.")
        model_names = []

    if model_names:
        selected_model = st.selectbox("Select a Model", model_names, index=0)
    else:
        selected_model = "llama3" 
        st.warning("No models detected. Using default 'llama3'.")
  • Client(host=...): Initializes the connection.
  • try/except: This makes the app robust. If Ollama isn’t running, the app won’t crash; instead, it shows a helpful error message.
  • st.selectbox: Allows you to swap between llama3 and mistral instantly.

Section 3: Chat Logic & Memory

Streamlit reruns the whole script on every interaction. We need Session State to remember the conversation history.

st.title("👾 Local Chatbot")

# Initialize chat history
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat history
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])
  • st.session_state: Acts as the app’s short-term memory.
  • The Loop: Ensures that when you send a new message, the old messages stay on the screen.

Section 4: Processing the Response

Finally, we handle the user input and stream the AI’s reply.

if prompt := st.chat_input("What is on your mind?"):
    # 1. Add user message to history
    st.session_state.messages.append({"role": "user", "content": prompt})
    
    # 2. Display user message
    with st.chat_message("user"):
        st.markdown(prompt)

    # 3. Generate response
    with st.chat_message("assistant"):
        message_placeholder = st.empty()
        full_response = ""
        
        try:
            stream = client.chat(
                model=selected_model,
                messages=[
                    {"role": m["role"], "content": m["content"]}
                    for m in st.session_state.messages
                ],
                stream=True,
            )
            
            # Stream the response word by word
            for chunk in stream:
                content = chunk['message']['content']
                full_response += content
                message_placeholder.markdown(full_response + "▌")
                
            # Final update without cursor
            message_placeholder.markdown(full_response)
            
            # 4. Save assistant response to history
            st.session_state.messages.append({"role": "assistant", "content": full_response})
            
        except Exception as e:
            st.error(f"An error occurred: {str(e)}")
  • stream=True: This is critical. Instead of waiting for the full answer, we get it token-by-token.
  • "▌": We add this character to simulate a typing cursor, making the app feel alive.

Conclusion

And there you have it! With less than 100 lines of Python, you have a private, secure, and free AI chatbot running on your own machine.

Next Steps:

  1. Run the app: streamlit run app.py
  2. Try pulling different models like codellama for programming help!
  3. Star the repo on GitHub: iamharisai/localgpt to get updates on new implementations like Chainlit and RAG!

Happy Coding! 🚀


Discover more from I am Harisai

Subscribe now to keep reading and get access to the full archive.

Continue reading