Java Search Engine Framework

soluzioni •  Regular expression (can be slow and memory hungry) •  Lucene (full-text search engine library) •  Solr (standalone full-text search server ) •  SolrJ (java client per solr)

Regular expression •  (cos’è) una sequenza di simboli (quindi una stringa) che identifica un insieme di stringhe •  (che fa) definisce una funzione che prende in ingresso una stringa, e restituisce in uscita un valore del tipo sì/no, a seconda che la stringa segua o meno un certo pattern.

Regular expression (esempio) 1.  Pattern p = Pattern.compile("eur*usd"); 2.  Matcher m = p.matcher( 3.  “In quel ramo del lago di eUr&uSd”).toLowerCase() 4.  ); 5.  If(m.find()) { //trovato! Ma dove nella stringa? 6.  }

Lucene •  Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. •  Apache Software Foundation •  Stable release 4.3.0 / May 6, 2013 •  Development status Active

Lucene (esempio) •  Analyzer analyzer = null; •  Directory index = null; •  IndexWriterConfig config = null; •  IndexWriter w = null; •  //analyzer = new StandardAnalyzer(Version.LUCENE_43); •  analyzer = new KeywordAnalyzer(); •  index = new RAMDirectory(); •  config = new IndexWriterConfig(Version.LUCENE_43, analyzer); •  w = new IndexWriter(index, config);

Lucene (esempio 2) 1.  private void addDoc(long time, String value, String flag) throws Exception { 2.  Document doc = new Document(); 3.  doc.add(new StringField("time", String.valueOf(time), Field.Store.YES)); 4.  doc.add(new StringField("value", value, Field.Store.YES)); 5.  doc.add(new StringField("flag", flag, Field.Store.YES)); 6.  w.addDocument(doc); 7.  } à w.commit(); //da eseguire alla fine del batch

Lucene (esempio 3) 1.  IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(index)); 2.  MultiFieldQueryParser queryParser = new MultiFieldQueryParser( 3.  Version.LUCENE_43, 4.  new String[] {"time", "value", "flag"}, 5.  analyzer); 6.  QueryParser queryParser = new QueryParser( 7.  Version.LUCENE_43, 8.  "value", 9.  analyzer); 10.  TopDocs hits = searcher.search(queryParser.parse("VALUE:(+eurusd)"), 50); 11.  System.out.println(hits.totalHits); 12.  for(ScoreDoc scoreDoc : hits.scoreDocs) { 13.  Document doc = searcher.doc(scoreDoc.doc); 14.  System.out.println(doc.toString()); 15.  }

Solr •  Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Jetty. •  Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required. •  Apache Software Foundation •  Stable release 4.3.0 / May 6, 2013 •  Development status Active

SolrJ •  SolrJ is a java client to access Solr. •  It offers a java interface to add, update, and query the solr index. •  Last version: 1.4.X

SolrJ (esempio) 1.  SolrServer server = new HttpSolrServer("http://localhost:8983/solr/"); 2.  server.deleteByQuery( "*:*" );// CAUTION: deletes everything! 3.  SolrInputDocument doc1 = new SolrInputDocument(); 4.  doc1.addField( "id", 23425); 5.  doc1.addField( "name", "doc1"); 6.  doc1.addField( "price", 100980 ); 7.  SolrInputDocument doc2 = new SolrInputDocument(); 8.  doc2.addField( "id", 63432); 9.  doc2.addField( "name", "doc2"); 10. doc2.addField( "price", 205345 ); 11. Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>(); 12. docs.add(doc1); 13. docs.add(doc2); 14.  server.add(docs); 15.  server.commit(); 16.  SolrQuery query = new SolrQuery(); 17.  query.setQuery("+name:*c1 +price:100980"); 18.  QueryResponse rsp = server.query(query);

SolrJ (esempio) 1.  SolrDocumentList docsr = rsp.getResults(); 2.  for(SolrDocument document : docsr){ 3.  Object formName = document.getFieldValue("id"); 4.  System.out.println(formName); 5.  } 6.  List<Product> products = rsp.getBeans(Product.class); 7.  for(Product product : products){ 8.  Object empName = product.getId(); 9.  System.out.println(empName); 10.  }

SolrJ (Product class) 1.  public class Product { 2.  private String id; 3.  public String getId() { 4.  return id; 5.  } 6.  @Field("id") 7.  public void setId(String id) { 8.  this.id = id; 9.  } …the same for price and name attributes. 10. }

SolrJ (file indexing) 1.  public static void indexPdfWithSolrJ(String fileName, String solrId) throws Exception { 2.  String urlString = "http://localhost:8983/solr"; 3.  SolrServer solr = new HttpSolrServer(urlString); 4.  ContentStreamUpdateRequest up = new longnameclass("/update/extract"); 5.  up.addFile(new File(fileName),"application/pdf"); 6.  up.setParam("literal.id",solrId); 7.  up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); 8.  solr.request(up); 9.  QueryResponse rsp = solr.query(new SolrQuery("*:*")); 10.  System.out.println(rsp); 11.  }

references •  Lucene & Solr http://lucene.apache.org/solr/ •  SolrJ http://wiki.apache.org/solr/Solrj •  Tika http://tika.apache.org/

Java Search Engine Framework

More Related Content

What's hot

Similar to Java Search Engine Framework

More from Appsterdam Milan

Recently uploaded

Java Search Engine Framework