Lucene - SPLessons

Lucene Search Operations

Home > Lesson > Chapter 7
SPLessons 5 Steps, 3 Clicks
5 Steps - 3 Clicks

Lucene Search Operations

Lucene Search Operation

shape Description

Seeking strategy is again one of the middle helpfulness gave by Lucene. Its stream resemble that of requesting methodology. The fundamental quest for Lucene can be made using taking after classes which can similarly be named as foundation classes for all chase related operations. IndexSearcher is the most imperative and focus fragment of the seeking technique. The following diagram illustrates the searching procedure. IndexSearcher is the most vital and center segment of the searching procedure.

The indexing process

shape Description

Ordering is a system of changing over substance data into a plan that supports snappy looking for. A direct closeness is a rundown user would find toward the end of a book: That document guides you toward the zone of focuses that appear in the book. Lucene stores the information in the information structure called a revised rundown, which is secured on the archive system or memory as a course of action of rundown records. Most web searchers use a switched record. It allows customers to perform fast watchword look-ups and finds the chronicles that match a given request. Preceding the substance data is added to the rundown, it is taken care of by an analyzer.

Analysis

shape Description

Analysis is changing over the substance data into a critical unit of looking for, which is called as term. In the midst of examination, the substance data encounters different operations: removing the words, ousting typical words, neglecting complement, decreasing words to root shape, changing words to lowercase, etc. Examination happens just before requesting and question parsing. Examination changes over substance data into tokens and these tokens are incorporated as terms in the Lucene record.

TermQuery

shape Description

The most key sort for looking for a record. TermQuery can be fabricated using a single term. The term regard should be case-tricky, notwithstanding, this is not by any methods honest to goodness. Take note of that the terms go for looking should be unfaltering with the terms conveyed by the examination of documents, since analyzers play out various operations on the primary substance before building a record. Taking after is an illustration. [java] /Search mails having the word "java" in the subject field Searcher indexSearcher = new IndexSearcher(indexDirectory); Term term = new Term("subject","java"); Query termQuery = new TermQuery(term); TopDocs topDocs = indexSearcher.search(termQuery,10); [/java]

RangeQuery

shape Description

Every one of the terms are orchestrated lexicographically in the record. Lucene's RangeQuery permits clients to intrigue terms inside a range. The range can be made plans to utilize a beginning term and a fulfillment term, which might be either included or banned. [java]/* RangeQuery example:Search mails from 01/06/2009 to 6/06/2009 both inclusive */ Term begin = new Term("date","20090601"); Term end = new Term("date","20090606"); Query query = new RangeQuery(begin, end, true);[/java]

Displaying search results

shape Description

IndexSearcher gives back an assortment of references to situated rundown things, for instance, records that match a given request. The client can pick the amount of top question things that ought to be recuperated by deciding it in the IndexSearcher's request procedure. Changed paging can be founded on top of this. The client can incorporate a custom Web application or desktop application to show question things. Fundamental classes required in recouping the inquiry things are ScoreDoc and TopDocs. [java]/* First parameter is the query to be executed and second parameter indicates the no of search results to fetch */ TopDocs topDocs = indexSearcher.search(query,20); System.out.println("Total hits "+topDocs.totalHits); // Get an array of references to matched documents ScoreDoc[] scoreDosArray = topDocs.scoreDocs; for(ScoreDoc scoredoc: scoreDosArray){ //Retrieve the matched document and show relevant details Document doc = indexSearcher.doc(scoredoc.doc); System.out.println("\nSender: "+doc.getField("sender").stringValue()); System.out.println("Subject: "+doc.getField("subject").stringValue()); System.out.println("Email file location: " +doc.getField("emailDoc").stringValue()); } [/java]

Removing documents from an index

shape Description

Applications routinely need to upgrade the document with the latest data and clear more settled data. For example, because of web crawlers, the record ought to be overhauled routinely as new Web pages get included and non-existent Web pages ought to be emptied. Lucene gives the IndexReader interface that allows you to play out these operations on a rundown. [java]// Delete all the mails from the index received in May 2009. IndexReader indexReader = IndexReader.open(indexDirectory); indexReader.deleteDocuments(new Term("month","05")); //close associate index files and save deletions to disk indexReader.close();[/java]

Summary

shape Key Points

  • Lucene also provides search capabilities for the Eclipse IDE, Nutch, and companies such as IBM, HP.
  • Lucene underpins parsing of human-entered rich query expressions.
  • Lucene underpins PhraseQuery, WildcardQuery, RangeQuery, FuzzyQuery, etc.