DEV Community

owly
owly

Posted on

Local LLMs, No API Keys, No BS: Build Your Own Waifubot Terminal Chat in Python

Build a Local Waifubot Terminal Chat in Python — No API Keys, No Cloud, No Bullshit

Tired of cloud dependencies, subscriptions, and rate limits? Want your own affectionate AI companion running locally, offline, and async? This walkthrough shows you how to build a waifubot terminal chat using Ollama, LLaMA 3, and Python. No fluff. Just code.


Step 1: Install Ollama (One-Time Setup)

Download Ollama

Ollama lets you run LLMs locally with ease.

Go to oLLaMa’s download page

Download the installer for your OS (Windows/macOS)

Install and open the Ollama app

In the Ollama terminal, pull a model:

ollama pull llama3

This downloads the LLaMA 3 model locally.


🧰 Step 2: Create Your PyCharm Project

Open PyCharm → New Project → name it waifu_terminal_chat

Inside the project, create a file: chat.py

Create a requirements.txt file and add:

requests

PyCharm will prompt you to install it — accept and let it install.


Step 3: Write Your Chat Script

Paste this into chat.py:

import requests import json import threading import time ## Initialize conversation history conversation_history = [] ## Global variables for async operation is_working = False current_reply = "" def talk_to_waifu(prompt, history): global is_working, current_reply # Build the full prompt with conversation history  full_prompt = "This is a conversation with Potatoe, a loving waifubot:\n\n" # Add previous conversation history  for message in history[-6:]: # Keep last 6 messages for context  full_prompt += f"{message}\n" # Add current prompt  full_prompt += f"Human: {prompt}\nPotatoe:" response = requests.post( "http://localhost:11434/api/generate", json={"model": "llama3", "prompt": full_prompt}, stream=True ) full_reply = "" for line in response.iter_lines(): if line: try: chunk = line.decode("utf-8") data = json.loads(chunk) full_reply += data.get("response", "") except Exception as e: print("Error decoding chunk:", e) current_reply = (prompt, full_reply) # Store both input and reply  is_working = False return full_reply def start_waifu_conversation(prompt): """Start the waifu conversation in a daemon thread""" global is_working is_working = True thread = threading.Thread( target=talk_to_waifu, args=(user_input, conversation_history), daemon=True ) thread.start() print("Waifu: Hello darling~ Ready to chat? Type 'exit' to leave") ## Initial system prompt to set up the character initial_prompt = "Your name is Potatoe. You're affectionate, playful, and always supportive." conversation_history.append(f"System: {initial_prompt}") while True: if is_working: print("Waifu: Thinking... ") time.sleep(0.5) continue if current_reply: user_input, reply = current_reply print(f"Waifu: {reply}") # Add both user input and bot response to history  conversation_history.append(f"Human: {user_input}") conversation_history.append(f"Potatoe: {reply}") # Optional: Limit history size to prevent it from growing too large  if len(conversation_history) > 20: # Keep last 20 messages  conversation_history = conversation_history[-20:] current_reply = "" continue user_input = input("You: ") if user_input.lower() in ["exit", "quit"]: print("Waifu: Bye bye~ I'll miss you! ") break # Clean wrapper function call  start_waifu_conversation(user_input) 
Enter fullscreen mode Exit fullscreen mode

Notes

This code requires a certain threshold of computing power, so don't expect it to run smoothly on your vintage Pentium 3 machine.

The code is modular and wrapped into functions.

The code runs asyncly, which is handled in the function doing the calls.

The code runs locally and offline:

  • No API keys
  • No payments
  • No subscription needed The chat adds short memory context to each call.

Top comments (0)