A Java-based server leveraging Apache Tika to extract content and metadata from files (PDF, DOCX, TXT, etc.) in a local files-to-extract directory. Supports HTML (with CSS styling) and text extraction, file listing, and metadata retrieval via MCP-compliant tools and REST APIs. Built with Spring Boot, Jetty, and MCP SDK.
java html pdf parser mcp extractor pdf-extractor html-extraction html-extractor pdf-extraction mcp-server modelcontextprotocol extractor-to-html
- Updated
Aug 30, 2025 - Java