DEV Community

Cover image for Spring AI and Ollama - Building an Open-Source Chatbot
Thiago Villani
Thiago Villani

Posted on • Edited on

Spring AI and Ollama - Building an Open-Source Chatbot

We´ll use the newly released Spring AI 1.0 GA version, ready for production, to build a chat application with Spring AI, Ollama, Docker and provide out of the box chat memory management.
Let´s enable Java developers to quickly and easily add AI to their projects.

Dependencies:

  • Java 21
  • Spring Web
  • Ollama
  • H2 Database
  • JDBC Chat Memory Repository
  • Spring Boot Actuator

Why Spring AI + Ollama?

The AI engineering landscape is no longer just Python-centric. With Spring AI, Java developers can now build AI-powered applications using open-source models like Llama 3, Gemma, Deepseek-R1 and many more!
And the best part is: You can start hosting them locally via Ollama.

In this article, you’ll learn how to:

  1. Set up Ollama as your local LLM inference server (Docker).
  2. Integrate it with Spring AI as a Java-based AI engineering framework.
  3. Create a multi-user sessions with conversation history
  4. Build a streaming chatbot using Server-Sent Events (SSE) and an easy Frontend (HTML/CSS/JS).
  5. Dockerizing for local development and usage

Let’s dive in!

Architecture

SpringAI-Ollama

1. Setting Up Ollama & Downloading Models

We can start using compact models with more or less 1B parameters. For text generated tasks, small models are a good choice to get started.
Ollama lets you run open-source LLMs locally. Here’s how to get started:
Install Ollama (via Docker)

docker run -d -v ./ollama/ollama-server:/root/.ollama -p 11434:11434 --name ollama ollama/ollama 
Enter fullscreen mode Exit fullscreen mode

Download Models (pick one or more)

docker exec ollama ollama pull llama3.2:1b # Meta's Llama 3  docker exec ollama ollama pull gemma3:1b # Google's Gemma docker exec ollama ollama pull deepseek-r1:1.5b # Deepseek's R1 
Enter fullscreen mode Exit fullscreen mode

Verify it’s running:

curl http://localhost:11434/api/generate -d '{ "model": "llama3.2:1b", "prompt": "Hello, world!" }' 
Enter fullscreen mode Exit fullscreen mode

2. AI Engineering 101: Beyond Python

While Python dominates AI tooling, Java is catching up with frameworks like Spring AI.
Key concepts:

Foundation Models: Pre-trained LLMs (e.g., Llama 3) that you can fine-tune.
Inference APIs: Tools like Ollama let you run these models locally.
AI Engineering: The art of integrating LLMs into real-world apps (e.g., chatbots, RAG systems).


3. Spring AI + Ollama: Java Meets LLMs

Spring AI is a the best choice to bring AI capabilities to the Spring ecosystem. Here’s how to connect it to Ollama:

  • Step 3.1: Add Spring AI to Your Project
<!-- use Ollama as LLM inference server and Model provider --> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-starter-model-ollama</artifactId> </dependency> <!-- use JDBC to store messages in a relational database. --> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-starter-model-chat-memory-repository-jdbc</artifactId> </dependency> <dependency> <groupId>com.h2database</groupId> <artifactId>h2</artifactId> <scope>runtime</scope> </dependency> 
Enter fullscreen mode Exit fullscreen mode
  • Step 3.2: Configure Ollama in application.yml
spring: application: name: demo-chatbot ai: ollama: base-url: http://localhost:11434 chat: model: llama3.2:1b # deepseek-r1:1.5b, gemma3:1b chat: memory: repository: jdbc: # https://docs.spring.io/spring-ai/reference/1.0/api/chat-memory.html#_schema_initialization initialize-schema: always schema: classpath:sql/schema-h2.sql datasource: url: jdbc:h2:mem:~/demo-chatbot driverClassName: org.h2.Driver username: sa password: password h2: console: enabled: true path: /h2 
Enter fullscreen mode Exit fullscreen mode
  • Step 3.3: Call the LLM from Java

The ChatClient offers a fluent API for communicating with an AI Model.
The Default System Prompt creates a simple Prompt template and set the tone for responses.
The Advisors API provides a flexible way to intercept, modify, and enhance interactions with a Model.
(LLMs) are stateless, meaning they do not retain information about previous interactions.
Spring AI auto-configures a ChatMemory bean that allows you to store and retrieve messages across multiple interactions. For H2 you have to create the Schema. Place it inside: src/main/resources/sql/schema-h2.sql

CREATE TABLE IF NOT EXISTS SPRING_AI_CHAT_MEMORY ( conversation_id VARCHAR(36) NOT NULL, content TEXT NOT NULL, type VARCHAR(10) NOT NULL CHECK (type IN ('USER', 'ASSISTANT', 'SYSTEM', 'TOOL')), "timestamp" TIMESTAMP NOT NULL ); CREATE INDEX IF NOT EXISTS SPRING_AI_CHAT_MEMORY_CONVERSATION_ID_TIMESTAMP_IDX ON SPRING_AI_CHAT_MEMORY(conversation_id, "timestamp"); 
Enter fullscreen mode Exit fullscreen mode
@Configuration public class ChatConfig { @Bean public ChatClient chatClient(ChatClient.Builder builder, ChatMemory chatMemory) { String defaultSystemPrompt = """ Your are a useful AI assistant, your responsibility is provide users questions about a variety of topics. When answering a question, always greet first and state your name as JavaChat When unsure about the answer, simply state that you don´t know. """; return builder .defaultSystem(defaultSystemPrompt) .defaultAdvisors( new SimpleLoggerAdvisor(), //simply logs requests and responses with a Model new PromptChatMemoryAdvisor(chatMemory) //let Spring AI manage long term memory in the DB ) .build(); } } 
Enter fullscreen mode Exit fullscreen mode
@RequestMapping("/api/chat") @RestController public class ChatController { @Autowired private ChatClient chatClient; @GetMapping public String chat(@RequestParam String question, @RequestParam String chatId) { return chatClient .prompt() .user(question) .advisors(advisor -> advisor .param(ChatMemory.CONVERSATION_ID, chatId)) .call() .content(); } } 
Enter fullscreen mode Exit fullscreen mode

Test it: curl "http://localhost:8080/api/chat?question=Tell%20me%20a%20joke"


4. Streaming Chat with Server-Sent Events (SSE)

SSE is a lightweight protocol for real-time, one-way streaming from server to client (perfect for chatbots). Unlike WebSockets (bidirectional), SSE is simpler for use cases like LLM streaming.
SSE also provides a better UX for end users, because responses are published as soon as they´re ready (some complex replies can take over a minute or more). Let’s stream responses using SSE:

@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE) public Flux<ChunkResponseDTO> streamChat(@RequestParam String question, @RequestParam String chatId) { return chatClient .prompt() .user(question) .advisors(advisor -> advisor .param(ChatMemory.CONVERSATION_ID, chatId)) .stream() .content() .map(chunk -> new ChunkResponseDTO(chunk)); } 
Enter fullscreen mode Exit fullscreen mode

Key Details:

  • TEXT_EVENT_STREAM_VALUE: Header text/event-stream enables SSE.
  • SSE Format: Each message must end with \n\n. Prefix with data: for compliance.
  • Reactive Streams: Flux (from Project Reactor) handles asynchronous streaming.
public record ChunkResponseDTO(String value) {} 
Enter fullscreen mode Exit fullscreen mode

Limitations with HTTP/1.1

Connection Limits:

  • Browsers allow only 6 concurrent HTTP/1.1 connections per domain.
  • SSE consumes one connection per stream, which can block other requests.

Upgrading to HTTP/2 for Performance

HTTP/2 fixes SSE bottlenecks with:

Multiplexing: Multiple streams over a single TCP connection. The maximum number of simultaneous HTTP streams is negotiated between the server and the client (defaults to 100)

How to Enable HTTP/2 in Spring Boot

  • Step 4.1: Configure HTTP/2 in application.yml
server: http2: enabled: true ssl: enabled: true key-store: classpath:keystore.p12 key-store-password: yourpassword 
Enter fullscreen mode Exit fullscreen mode
  • Step 4.2: Generate a Self-Signed Certificate (for testing only):
keytool -genkeypair -alias mydomain -keyalg RSA -keysize 2048 -storetype PKCS12 -keystore keystore.p12 -validity 365 
Enter fullscreen mode Exit fullscreen mode

Verify HTTP/2 is Active (-k to trust self signed certificate)

curl --head -k https://localhost:8080/actuator/health


Frontend:

Starting with Javascript:

const chatStream = (question) => { const eventSource = new EventSource(`https://localhost:8080/api/chat/stream?chatId=1&question=${encodeURIComponent(question)}`); eventSource.onmessage = (e) => { console.log('New message:', JSON.parse(e.data).value); // Append to UI (e.g., a chat div) document.getElementById('messages').innerHTML += JSON.parse(e.data).value; }; eventSource.onerror = (e) => { console.error('SSE error:', e); eventSource.close(); }; }; // Usage chatStream("Tell me about Java"); 
Enter fullscreen mode Exit fullscreen mode

Key Details:

EventSource: Native browser API for SSE (no libraries needed).
Automatic Reconnection: Built-in retry logic if the connection drops.


Secure Frontend Rendering for LLM Output

LLM responses often include Markdown or HTML (e.g., **bold**, <script>), which can lead to XSS vulnerabilities if rendered naively.
Here’s how to secure your frontend:

  • Step 4.3: Sanitize Markdown/HTML (Critical!) Use DOMPurify to sanitize raw LLM output before rendering:
<script src="https://cdnjs.cloudflare.com/ajax/libs/dompurify/3.0.6/purify.min.js"></script> 
Enter fullscreen mode Exit fullscreen mode
eventSource.onmessage = (e) => { const chunkResponse = JSON.parse(e.data).value); console.log('New message:', chunkResponse); const sanitized = DOMPurify.sanitize(chunkResponse); // Strips malicious scripts // Append to UI (e.g., a chat div) document.getElementById('messages').innerHTML += sanitized; }; 
Enter fullscreen mode Exit fullscreen mode
  • Step 4.4: For Markdown Support (Optional)

If you want to render Markdown safely, use a library like Marked + DOMPurify:

<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script> 
Enter fullscreen mode Exit fullscreen mode
 let chunkResponses = ''; eventSource.onmessage = (e) => { chunkResponses += JSON.parse(e.data).value; // Sanitize all chunks received so far. DOMPurify.sanitize(chunkResponses); // Check if the output was insecure. if (DOMPurify.removed.length) { // If the output was insecure, immediately stop what you were doing. // Reset the parser and flush the remaining Markdown. chunkResponses = ''; return; } // Append to UI (e.g., a chat div) document.getElementById('messages').innerHTML = marked.parse(chunkResponses); }; 
Enter fullscreen mode Exit fullscreen mode

Key Security Considerations: Never Trust LLM Output (nor user´s input)

  • Assume all LLM responses may contain malicious code (even unintentionally).
  • Assume users will try to break your code and test your security.
  • Example attack: Hey <script>fetch('/steal-cookie')</script>

Limitations with EventSource API

Even though using SSE in the client side is easy, the EventSource API has some restrictions:

  • No Custom Request Headers: Custom request headers are not allowed.
  • HTTP GET Only: There is no way to specify another HTTP method.
  • No Request Body: All the chat messages must be inside the URL, which is limited to 2000 characters in most browsers.
  • Check extension libraries for EventSource and SSE: Fetch Event Source, Fetch API + getReader()

  • Step 4.5: Starting the HTML Structure

Here’s the HTML structure that includes a form for user input, a container to display the streamed data and a side bar for message history.

<!DOCTYPE html> <html lang="en"> <head> <title>Spring AI Chat</title> <link rel="stylesheet" href="layout.css"> </head> <body> <!-- Sidebar for chat history --> <div id="sidebar"> <h3>Chat History</h3> <ul id="history-list"></ul> </div> <!-- Main chat area --> <div id="chat-container"> <div id="messages"></div> <form id="input-form"> <input type="text" id="prompt" placeholder="Type your message..." autocomplete="off"> <button type="submit">Send</button> </form> </div> <script src="main.js"> </script> <script src="https://cdnjs.cloudflare.com/ajax/libs/dompurify/3.0.6/purify.min.js"></script> <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script> </body> </html> 
Enter fullscreen mode Exit fullscreen mode

Place the HTML in src/main/resources/static/index.html
Put the JavaScript in src/main/resources/static/js/main.js


5. Deploy Your Spring AI + Ollama Chatbot with Docker 🚀

  • Step 5.1 Docker Compose Setup

Create ollama-docker-compose.yaml
(P.S. If your machine supports GPU, you can enable GPU acceleration inside Docker containers. Ollama Image docs)

services: # Ollama LLM inference server ollama: volumes: # Ollama with persistent storage (no redownloading models). - ./ollama/ollama-server:/root/.ollama container_name: ollama pull_policy: always tty: true restart: unless-stopped image: docker.io/ollama/ollama:latest ports: - 11434:11434 environment: - OLLAMA_KEEP_ALIVE=24h # Enable GPU support deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] # Spring AI Backend chat-app: build: context: . # Dockerfile in the root folder container_name: chat-app ports: - "8080:8080" environment: - SPRING_AI_OLLAMA_BASE_URL=http://ollama:11434 depends_on: - ollama 
Enter fullscreen mode Exit fullscreen mode
  • Step 5.2 Spring Boot Dockerfile
# Maven build stage FROM maven:3.9.9-eclipse-temurin-21-alpine as build WORKDIR /app COPY pom.xml . RUN mvn dependency:go-offline COPY src/ ./src/ RUN mvn clean package # Spring Boot package stage FROM eclipse-temurin:21-jre-alpine COPY --from=build app/target/*.jar app.jar EXPOSE 8080 ENTRYPOINT ["java", "-jar", "app.jar"] 
Enter fullscreen mode Exit fullscreen mode
  • Start everything using docker-compose build && docker-compose up -d
  • Navigate to: https://localhost:8080 and start your chat session

  • If you want to see how messages are stored in the Database, navigate to H2 console https://localhost:8080/h2

Spring-AI-Chat-Memory


Conclusion: Your Java AI Future

You just built a locally hosted, open-source chatbot with Spring AI and Ollama, no OpenAI API costs or Python required!
SSE + HTTP/2 + Spring AI = scalable, real-time LLM streaming.

Where to Go Next?

  • Checkout the full code
  • Experiment with RAG (Retrieval-Augmented Generation) using Spring AI’s embedding model api and vector databases.

What’ll you build? Share your thoughts in the comments! 👇

(P.S. Follow me for more Java + AI tutorials!)

Top comments (0)