The Lucene UnifiedHighlighter is the the highest-performing highlighter, especially for large documents. In this Lucene tutorial, learn to highlight search terms found in the indexed documents/files.
1. Prerequisites
We are assuming that you have already created the Lucene indexes by reading some text files and writing them into the index location. If not, follow the Lucene example to write some text files, first.
2. Maven
Start with adding these Lucene dependencies. We are using Lucene 9.10.0 and Java 21.
<properties> <maven.compiler.source>21</maven.compiler.source> <maven.compiler.target>21</maven.compiler.target> <lucene.version>9.10.0</lucene.version> </properties> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>${lucene.version}</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-analysis-common</artifactId> <version>${lucene.version}</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-queryparser</artifactId> <version>${lucene.version}</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-highlighter</artifactId> <version>${lucene.version}</version> </dependency>Please note that we will be using these two folders for demo:
- ‘c:/temp/lucene/inputFiles‘ contains all text files which we want to index.
- ‘c:/temp/lucene/indexedFiles‘ contains the Lucene indexed documents. We will search the index inside it.
3. Highlighting Fragments with UnifiedHighlighter
Java example to use UnifiedHighlighter to highlight searched phrases or queries in lucene search results.
In this example:
- An IndexSearcher is used to search the index.
- A QueryParser is used to parse the search query.
- The highlighter is configured with a SimpleHTMLFormatter to wrap the highlighted terms in <b>tags.
- The search results are retrieved and the highlighted text fragments are printed in the console.
import java.nio.file.Paths; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.queryparser.classic.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.Sort; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.uhighlight.UnifiedHighlighter; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; public class LuceneUnifiedHighlighterExample { //This contains the lucene indexed documents private static final String INDEX_DIR = "c:/temp/lucene/indexedFiles"; private static String search_query = "Questions"; public static void main(String[] args) throws Exception { //Get directory reference Directory dir = FSDirectory.open(Paths.get(INDEX_DIR)); //Index reader - an interface for accessing a point-in-time view of a lucene index IndexReader reader = DirectoryReader.open(dir); //Create lucene searcher. It searches over a single IndexReader. IndexSearcher searcher = new IndexSearcher(reader); //analyzer with the default stop words Analyzer analyzer = new StandardAnalyzer(); //Query parser to be used for creating TermQuery QueryParser qp = new QueryParser("contents", analyzer); //Create the query Query query = qp.parse(search_query); //Search the lucene documents TopDocs hits = searcher.search(query, 10, Sort.INDEXORDER); System.out.println("Search terms found in :: " + hits.totalHits + " files"); UnifiedHighlighter highlighter = new UnifiedHighlighter(searcher, analyzer); highlighter.setFormatter(new SimpleHTMLFormatter("<b>", "</b>")); String[] fragments = highlighter.highlight("contents", query, hits); for (String f : fragments) { System.out.println(f); } //To get which fragment belong to which doc/file /*for (int i = 0; i < hits.scoreDocs.length; i++) { int docid = hits.scoreDocs[i].doc; Document doc = searcher.doc(docid); String filePath = doc.get("path"); System.out.println(filePath); System.out.println(fragments[i]); }*/ dir.close(); } }The program output:
Search terms found in :: 3 files Questions Girl private rich in do up or both. Questions explained agreeable preferred strangers too him her son. Questions Or neglected agreeable of discovery concluded oh it sportsman.Happy Learning !!
 
  
 
Comments