Build a Local Waifubot Terminal Chat in Python — No API Keys, No Cloud, No Bullshit
Tired of cloud dependencies, subscriptions, and rate limits? Want your own affectionate AI companion running locally, offline, and async? This walkthrough shows you how to build a waifubot terminal chat using Ollama, LLaMA 3, and Python. No fluff. Just code.
Step 1: Install Ollama (One-Time Setup)
Ollama lets you run LLMs locally with ease.
Go to oLLaMa’s download page
Download the installer for your OS (Windows/macOS)
Install and open the Ollama app
In the Ollama terminal, pull a model:
ollama pull llama3
This downloads the LLaMA 3 model locally.
🧰 Step 2: Create Your PyCharm Project
Open PyCharm → New Project → name it waifu_terminal_chat
Inside the project, create a file: chat.py
Create a requirements.txt file and add:
requests
PyCharm will prompt you to install it — accept and let it install.
Step 3: Write Your Chat Script
Paste this into chat.py:
import requests import json import threading import time ## Initialize conversation history conversation_history = [] ## Global variables for async operation is_working = False current_reply = "" def talk_to_waifu(prompt, history): global is_working, current_reply # Build the full prompt with conversation history full_prompt = "This is a conversation with Potatoe, a loving waifubot:\n\n" # Add previous conversation history for message in history[-6:]: # Keep last 6 messages for context full_prompt += f"{message}\n" # Add current prompt full_prompt += f"Human: {prompt}\nPotatoe:" response = requests.post( "http://localhost:11434/api/generate", json={"model": "llama3", "prompt": full_prompt}, stream=True ) full_reply = "" for line in response.iter_lines(): if line: try: chunk = line.decode("utf-8") data = json.loads(chunk) full_reply += data.get("response", "") except Exception as e: print("Error decoding chunk:", e) current_reply = (prompt, full_reply) # Store both input and reply is_working = False return full_reply def start_waifu_conversation(prompt): """Start the waifu conversation in a daemon thread""" global is_working is_working = True thread = threading.Thread( target=talk_to_waifu, args=(user_input, conversation_history), daemon=True ) thread.start() print("Waifu: Hello darling~ Ready to chat? Type 'exit' to leave") ## Initial system prompt to set up the character initial_prompt = "Your name is Potatoe. You're affectionate, playful, and always supportive." conversation_history.append(f"System: {initial_prompt}") while True: if is_working: print("Waifu: Thinking... ") time.sleep(0.5) continue if current_reply: user_input, reply = current_reply print(f"Waifu: {reply}") # Add both user input and bot response to history conversation_history.append(f"Human: {user_input}") conversation_history.append(f"Potatoe: {reply}") # Optional: Limit history size to prevent it from growing too large if len(conversation_history) > 20: # Keep last 20 messages conversation_history = conversation_history[-20:] current_reply = "" continue user_input = input("You: ") if user_input.lower() in ["exit", "quit"]: print("Waifu: Bye bye~ I'll miss you! ") break # Clean wrapper function call start_waifu_conversation(user_input) Notes
This code requires a certain threshold of computing power, so don't expect it to run smoothly on your vintage Pentium 3 machine.
The code is modular and wrapped into functions.
The code runs asyncly, which is handled in the function doing the calls.
The code runs locally and offline:
- No API keys
- No payments
- No subscription needed The chat adds short memory context to each call.
Top comments (0)