We´ll use the newly released Spring AI 1.0 GA version, ready for production, to build a chat application with Spring AI, Ollama, Docker and provide out of the box chat memory management.
Let´s enable Java developers to quickly and easily add AI to their projects.
Dependencies:
- Java 21
- Spring Web
- Ollama
- H2 Database
- JDBC Chat Memory Repository
- Spring Boot Actuator
Why Spring AI + Ollama?
The AI engineering landscape is no longer just Python-centric. With Spring AI, Java developers can now build AI-powered applications using open-source models like Llama 3, Gemma, Deepseek-R1 and many more!
And the best part is: You can start hosting them locally via Ollama.
In this article, you’ll learn how to:
- Set up Ollama as your local LLM inference server (Docker).
- Integrate it with Spring AI as a Java-based AI engineering framework.
- Create a multi-user sessions with conversation history
- Build a streaming chatbot using Server-Sent Events (SSE) and an easy Frontend (HTML/CSS/JS).
- Dockerizing for local development and usage
Let’s dive in!
Architecture
1. Setting Up Ollama & Downloading Models
We can start using compact models with more or less 1B parameters. For text generated tasks, small models are a good choice to get started.
Ollama lets you run open-source LLMs locally. Here’s how to get started:
Install Ollama (via Docker)
docker run -d -v ./ollama/ollama-server:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Download Models (pick one or more)
docker exec ollama ollama pull llama3.2:1b # Meta's Llama 3 docker exec ollama ollama pull gemma3:1b # Google's Gemma docker exec ollama ollama pull deepseek-r1:1.5b # Deepseek's R1
Verify it’s running:
curl http://localhost:11434/api/generate -d '{ "model": "llama3.2:1b", "prompt": "Hello, world!" }'
2. AI Engineering 101: Beyond Python
While Python dominates AI tooling, Java is catching up with frameworks like Spring AI.
Key concepts:
Foundation Models: Pre-trained LLMs (e.g., Llama 3) that you can fine-tune.
Inference APIs: Tools like Ollama let you run these models locally.
AI Engineering: The art of integrating LLMs into real-world apps (e.g., chatbots, RAG systems).
3. Spring AI + Ollama: Java Meets LLMs
Spring AI is a the best choice to bring AI capabilities to the Spring ecosystem. Here’s how to connect it to Ollama:
- Step 3.1: Add Spring AI to Your Project
<!-- use Ollama as LLM inference server and Model provider --> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-starter-model-ollama</artifactId> </dependency> <!-- use JDBC to store messages in a relational database. --> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-starter-model-chat-memory-repository-jdbc</artifactId> </dependency> <dependency> <groupId>com.h2database</groupId> <artifactId>h2</artifactId> <scope>runtime</scope> </dependency>
- Step 3.2: Configure Ollama in
application.yml
spring: application: name: demo-chatbot ai: ollama: base-url: http://localhost:11434 chat: model: llama3.2:1b # deepseek-r1:1.5b, gemma3:1b chat: memory: repository: jdbc: # https://docs.spring.io/spring-ai/reference/1.0/api/chat-memory.html#_schema_initialization initialize-schema: always schema: classpath:sql/schema-h2.sql datasource: url: jdbc:h2:mem:~/demo-chatbot driverClassName: org.h2.Driver username: sa password: password h2: console: enabled: true path: /h2
- Step 3.3: Call the LLM from Java
The ChatClient
offers a fluent API for communicating with an AI Model.
The Default System Prompt
creates a simple Prompt template and set the tone for responses.
The Advisors API
provides a flexible way to intercept, modify, and enhance interactions with a Model.
(LLMs) are stateless, meaning they do not retain information about previous interactions.
Spring AI auto-configures a ChatMemory
bean that allows you to store and retrieve messages across multiple interactions. For H2 you have to create the Schema. Place it inside: src/main/resources/sql/schema-h2.sql
CREATE TABLE IF NOT EXISTS SPRING_AI_CHAT_MEMORY ( conversation_id VARCHAR(36) NOT NULL, content TEXT NOT NULL, type VARCHAR(10) NOT NULL CHECK (type IN ('USER', 'ASSISTANT', 'SYSTEM', 'TOOL')), "timestamp" TIMESTAMP NOT NULL ); CREATE INDEX IF NOT EXISTS SPRING_AI_CHAT_MEMORY_CONVERSATION_ID_TIMESTAMP_IDX ON SPRING_AI_CHAT_MEMORY(conversation_id, "timestamp");
@Configuration public class ChatConfig { @Bean public ChatClient chatClient(ChatClient.Builder builder, ChatMemory chatMemory) { String defaultSystemPrompt = """ Your are a useful AI assistant, your responsibility is provide users questions about a variety of topics. When answering a question, always greet first and state your name as JavaChat When unsure about the answer, simply state that you don´t know. """; return builder .defaultSystem(defaultSystemPrompt) .defaultAdvisors( new SimpleLoggerAdvisor(), //simply logs requests and responses with a Model new PromptChatMemoryAdvisor(chatMemory) //let Spring AI manage long term memory in the DB ) .build(); } }
@RequestMapping("/api/chat") @RestController public class ChatController { @Autowired private ChatClient chatClient; @GetMapping public String chat(@RequestParam String question, @RequestParam String chatId) { return chatClient .prompt() .user(question) .advisors(advisor -> advisor .param(ChatMemory.CONVERSATION_ID, chatId)) .call() .content(); } }
Test it: curl "http://localhost:8080/api/chat?question=Tell%20me%20a%20joke"
4. Streaming Chat with Server-Sent Events (SSE)
SSE is a lightweight protocol for real-time, one-way streaming from server to client (perfect for chatbots). Unlike WebSockets (bidirectional), SSE is simpler for use cases like LLM streaming.
SSE also provides a better UX for end users, because responses are published as soon as they´re ready (some complex replies can take over a minute or more). Let’s stream responses using SSE:
@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE) public Flux<ChunkResponseDTO> streamChat(@RequestParam String question, @RequestParam String chatId) { return chatClient .prompt() .user(question) .advisors(advisor -> advisor .param(ChatMemory.CONVERSATION_ID, chatId)) .stream() .content() .map(chunk -> new ChunkResponseDTO(chunk)); }
Key Details:
- TEXT_EVENT_STREAM_VALUE: Header
text/event-stream
enables SSE. - SSE Format: Each message must end with
\n\n
. Prefix withdata:
for compliance. - Reactive Streams:
Flux
(from Project Reactor) handles asynchronous streaming.
public record ChunkResponseDTO(String value) {}
Limitations with HTTP/1.1
Connection Limits:
- Browsers allow only 6 concurrent HTTP/1.1 connections per domain.
- SSE consumes one connection per stream, which can block other requests.
Upgrading to HTTP/2 for Performance
HTTP/2 fixes SSE bottlenecks with:
Multiplexing: Multiple streams over a single TCP connection. The maximum number of simultaneous HTTP streams is negotiated between the server and the client (defaults to 100)
How to Enable HTTP/2 in Spring Boot
- Step 4.1: Configure HTTP/2 in
application.yml
server: http2: enabled: true ssl: enabled: true key-store: classpath:keystore.p12 key-store-password: yourpassword
- Step 4.2: Generate a Self-Signed Certificate (for testing only):
keytool -genkeypair -alias mydomain -keyalg RSA -keysize 2048 -storetype PKCS12 -keystore keystore.p12 -validity 365
Verify HTTP/2 is Active (-k to trust self signed certificate)
curl --head -k https://localhost:8080/actuator/health
Frontend:
Starting with Javascript:
const chatStream = (question) => { const eventSource = new EventSource(`https://localhost:8080/api/chat/stream?chatId=1&question=${encodeURIComponent(question)}`); eventSource.onmessage = (e) => { console.log('New message:', JSON.parse(e.data).value); // Append to UI (e.g., a chat div) document.getElementById('messages').innerHTML += JSON.parse(e.data).value; }; eventSource.onerror = (e) => { console.error('SSE error:', e); eventSource.close(); }; }; // Usage chatStream("Tell me about Java");
Key Details:
EventSource
: Native browser API for SSE (no libraries needed).
Automatic Reconnection: Built-in retry logic if the connection drops.
Secure Frontend Rendering for LLM Output
LLM responses often include Markdown or HTML (e.g., **bold**
, <script>
), which can lead to XSS vulnerabilities
if rendered naively.
Here’s how to secure your frontend:
- Step 4.3: Sanitize Markdown/HTML (Critical!) Use DOMPurify to sanitize raw LLM output before rendering:
<script src="https://cdnjs.cloudflare.com/ajax/libs/dompurify/3.0.6/purify.min.js"></script>
eventSource.onmessage = (e) => { const chunkResponse = JSON.parse(e.data).value); console.log('New message:', chunkResponse); const sanitized = DOMPurify.sanitize(chunkResponse); // Strips malicious scripts // Append to UI (e.g., a chat div) document.getElementById('messages').innerHTML += sanitized; };
- Step 4.4: For Markdown Support (Optional)
If you want to render Markdown safely, use a library like Marked + DOMPurify:
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
let chunkResponses = ''; eventSource.onmessage = (e) => { chunkResponses += JSON.parse(e.data).value; // Sanitize all chunks received so far. DOMPurify.sanitize(chunkResponses); // Check if the output was insecure. if (DOMPurify.removed.length) { // If the output was insecure, immediately stop what you were doing. // Reset the parser and flush the remaining Markdown. chunkResponses = ''; return; } // Append to UI (e.g., a chat div) document.getElementById('messages').innerHTML = marked.parse(chunkResponses); };
Key Security Considerations: Never Trust LLM Output (nor user´s input)
- Assume all LLM responses may contain malicious code (even unintentionally).
- Assume users will try to break your code and test your security.
- Example attack: Hey
<script>fetch('/steal-cookie')</script>
Limitations with EventSource API
Even though using SSE in the client side is easy, the EventSource API has some restrictions:
- No Custom Request Headers: Custom request headers are not allowed.
- HTTP GET Only: There is no way to specify another HTTP method.
- No Request Body: All the chat messages must be inside the URL, which is limited to 2000 characters in most browsers.
- Check extension libraries for EventSource and SSE: Fetch Event Source, Fetch API + getReader()
- Step 4.5: Starting the HTML Structure
Here’s the HTML structure that includes a form for user input, a container to display the streamed data and a side bar for message history.
<!DOCTYPE html> <html lang="en"> <head> <title>Spring AI Chat</title> <link rel="stylesheet" href="layout.css"> </head> <body> <!-- Sidebar for chat history --> <div id="sidebar"> <h3>Chat History</h3> <ul id="history-list"></ul> </div> <!-- Main chat area --> <div id="chat-container"> <div id="messages"></div> <form id="input-form"> <input type="text" id="prompt" placeholder="Type your message..." autocomplete="off"> <button type="submit">Send</button> </form> </div> <script src="main.js"> </script> <script src="https://cdnjs.cloudflare.com/ajax/libs/dompurify/3.0.6/purify.min.js"></script> <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script> </body> </html>
Place the HTML in src/main/resources/static/index.html
Put the JavaScript in src/main/resources/static/js/main.js
5. Deploy Your Spring AI + Ollama Chatbot with Docker 🚀
- Step 5.1 Docker Compose Setup
Create ollama-docker-compose.yaml
(P.S. If your machine supports GPU, you can enable GPU acceleration inside Docker containers. Ollama Image docs)
services: # Ollama LLM inference server ollama: volumes: # Ollama with persistent storage (no redownloading models). - ./ollama/ollama-server:/root/.ollama container_name: ollama pull_policy: always tty: true restart: unless-stopped image: docker.io/ollama/ollama:latest ports: - 11434:11434 environment: - OLLAMA_KEEP_ALIVE=24h # Enable GPU support deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] # Spring AI Backend chat-app: build: context: . # Dockerfile in the root folder container_name: chat-app ports: - "8080:8080" environment: - SPRING_AI_OLLAMA_BASE_URL=http://ollama:11434 depends_on: - ollama
- Step 5.2 Spring Boot Dockerfile
# Maven build stage FROM maven:3.9.9-eclipse-temurin-21-alpine as build WORKDIR /app COPY pom.xml . RUN mvn dependency:go-offline COPY src/ ./src/ RUN mvn clean package # Spring Boot package stage FROM eclipse-temurin:21-jre-alpine COPY --from=build app/target/*.jar app.jar EXPOSE 8080 ENTRYPOINT ["java", "-jar", "app.jar"]
- Start everything using
docker-compose build && docker-compose up -d
- Navigate to:
https://localhost:8080
and start your chat session
- If you want to see how messages are stored in the Database, navigate to H2 console
https://localhost:8080/h2
Conclusion: Your Java AI Future
You just built a locally hosted, open-source chatbot with Spring AI and Ollama, no OpenAI API costs or Python required!
SSE + HTTP/2 + Spring AI = scalable, real-time LLM streaming.
Where to Go Next?
- Checkout the full code
- Experiment with RAG (Retrieval-Augmented Generation) using Spring AI’s embedding model api and vector databases.
What’ll you build? Share your thoughts in the comments! 👇
(P.S. Follow me for more Java + AI tutorials!)
Top comments (0)