Lucene Sorting

Home > Lesson > Chapter 12

5 Steps - 3 Clicks

Lucene Sorting

Description

Sort is a term used to delineate the route toward orchestrating data in a particular demand mulling over information to be found less difficult. For example, names and contact information may be sorted in consecutive request demand to allow the individual hunting down a name to check whether it is available. A list is a brief, tough, and particularly interconnected manual for the substance. It is a major gadget to getting to and coming back to content. It offers references to the zones of basic information and deliberately restricts references to unessential information. Seeking method is again one of the inside helpfulness gave by Lucene. Its stream resemble that of requesting system. The basic quest for Lucene can be made using taking after classes which can in like manner be named as foundation classes for all chase related operations.

Description

Relevance alludes to the association the data in the inventory database and the pursuit string user entered and the hunt channels client has picked. Catchphrases are a word or words that you go into the hunt box on the index look screen. These are huge words that will help to find the things you're searching for in the list. The word is additionally called look terms. Seek terms and catchphrases mean a similar thing. A database record is relevant to user catchphrases in light of the fact that the watchwords show up in the record. The more times your catchphrases show up in a record, the more significant that database record is thought to be. Lucene gives comes about by the most significant hit at the top. [java]private void sortUsingRelevance(String searchQuery) throws IOException, ParseException{ searcher = new Searcher(indexDir); long startTime = System.currentTimeMillis(); //create a term to search file name Term term = new Term(LuceneConstants.FILE_NAME, searchQuery); //create the term query object Query query = new FuzzyQuery(term); searcher.setDefaultFieldSortScoring(true, false); //do the search TopDocs hits = searcher.search(query,Sort.RELEVANCE); long endTime = System.currentTimeMillis(); System.out.println(hits.totalHits + " documents found. Time :" + (endTime - startTime) + "ms"); for(ScoreDoc scoreDoc : hits.scoreDocs) { Document doc = searcher.getDocument(scoreDoc); System.out.print("Score: "+ scoreDoc.score + " "); System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH)); } searcher.close(); }[/java]

Description

At the point when characterizing files, client ought to consider whether the information for the list key segment ought to be put away in climbing or slipping request. Rising is the default and keeps up similarity with prior adaptations of SQL Server. This is sorting mode utilized by Lucene as a part of which the first record listed is indicated first in the indexed lists. [java]private void sortUsingIndex(String searchQuery) throws IOException, ParseException{ searcher = new Searcher(indexDir); long startTime = System.currentTimeMillis(); //create a term to search file name Term term = new Term(LuceneConstants.FILE_NAME, searchQuery); //create the term query object Query query = new FuzzyQuery(term); searcher.setDefaultFieldSortScoring(true, false); //do the search TopDocs hits = searcher.search(query,Sort.INDEXORDER); long endTime = System.currentTimeMillis(); System.out.println(hits.totalHits + " documents found. Time :" + (endTime - startTime) + "ms"); for(ScoreDoc scoreDoc : hits.scoreDocs) { Document doc = searcher.getDocument(scoreDoc); System.out.print("Score: "+ scoreDoc.score + " "); System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH)); } searcher.close(); }[/java]

Example

Following structure of the example.

LuceneConstants.java [java]package com.splessons; public class LuceneConstants { public static final String CONTENTS="contents"; public static final String FILE_NAME="filename"; public static final String FILE_PATH="filepath"; public static final int MAX_SEARCH = 10; }[/java] This class is utilized to provide various constants over the application. Searcher.java [java]package com.splessons; import java.io.File; import java.io.IOException; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Sort; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class Searcher { IndexSearcher indexSearcher; QueryParser queryParser; Query query; public Searcher(String indexDirectoryPath) throws IOException{ Directory indexDirectory = FSDirectory.open(new File(indexDirectoryPath)); indexSearcher = new IndexSearcher(indexDirectory); queryParser = new QueryParser(Version.LUCENE_36, LuceneConstants.CONTENTS, new StandardAnalyzer(Version.LUCENE_36)); } public TopDocs search( String searchQuery) throws IOException, ParseException{ query = queryParser.parse(searchQuery); return indexSearcher.search(query, LuceneConstants.MAX_SEARCH); } public TopDocs search(Query query) throws IOException, ParseException{ return indexSearcher.search(query, LuceneConstants.MAX_SEARCH); } public TopDocs search(Query query,Sort sort) throws IOException, ParseException{ return indexSearcher.search(query, LuceneConstants.MAX_SEARCH,sort); } public void setDefaultFieldSortScoring(boolean doTrackScores, boolean doMaxScores){ indexSearcher.setDefaultFieldSortScoring( doTrackScores,doMaxScores); } public Document getDocument(ScoreDoc scoreDoc) throws CorruptIndexException, IOException{ return indexSearcher.doc(scoreDoc.doc); } public void close() throws IOException{ indexSearcher.close(); } }[/java] The FSD stands for functional specification document is a formal archive used to portray in detail for programming engineers an item's planned abilities, appearance, and associations with clients. The standard analyzer is the default analyzer which is utilized if none is determined. It gives linguistic use based tokenization (in view of the Unicode Text Segmentation calculation, as indicated in Unicode Standard Annex #29) and functions admirably for general dialects. LuceneTester.java [java]package com.splessons; import java.io.IOException; import org.apache.lucene.document.Document; import org.apache.lucene.index.Term; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.search.FuzzyQuery; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Sort; import org.apache.lucene.search.TopDocs; public class LuceneTester { String indexDir = "D:\\Splessons\\Lucene\\Index"; String dataDir = "D:\\Splessons\\Lucene\\Data"; Indexer indexer; Searcher searcher; public static void main(String[] args) { LuceneTester tester; try { tester = new LuceneTester(); tester.sortUsingRelevance("cord3.txt"); tester.sortUsingIndex("cord3.txt"); } catch (IOException e) { e.printStackTrace(); } catch (ParseException e) { e.printStackTrace(); } } private void sortUsingRelevance(String searchQuery) throws IOException, ParseException{ searcher = new Searcher(indexDir); long startTime = System.currentTimeMillis(); //create a term to search file name Term term = new Term(LuceneConstants.FILE_NAME, searchQuery); //create the term query object Query query = new FuzzyQuery(term); searcher.setDefaultFieldSortScoring(true, false); //do the search TopDocs hits = searcher.search(query,Sort.RELEVANCE); long endTime = System.currentTimeMillis(); System.out.println(hits.totalHits + " documents found. Time :" + (endTime - startTime) + "ms"); for(ScoreDoc scoreDoc : hits.scoreDocs) { Document doc = searcher.getDocument(scoreDoc); System.out.print("Score: "+ scoreDoc.score + " "); System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH)); } searcher.close(); } private void sortUsingIndex(String searchQuery) throws IOException, ParseException{ searcher = new Searcher(indexDir); long startTime = System.currentTimeMillis(); //create a term to search file name Term term = new Term(LuceneConstants.FILE_NAME, searchQuery); //create the term query object Query query = new FuzzyQuery(term); searcher.setDefaultFieldSortScoring(true, false); //do the search TopDocs hits = searcher.search(query,Sort.INDEXORDER); long endTime = System.currentTimeMillis(); System.out.println(hits.totalHits + " documents found. Time :" + (endTime - startTime) + "ms"); for(ScoreDoc scoreDoc : hits.scoreDocs) { Document doc = searcher.getDocument(scoreDoc); System.out.print("Score: "+ scoreDoc.score + " "); System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH)); } searcher.close(); } }[/java] Output: Now the result will be as follows. [java]10 documents found. Time :31ms Score: 1.3179655 File: D:\Splessons\Lucene\Data\record3.txt Score: 0.790779 File: D:\Splessons\Lucene\Data\record1.txt Score: 0.790779 File: D:\Splessons\Lucene\Data\record2.txt Score: 0.790779 File: D:\Splessons\Lucene\Data\record4.txt Score: 0.790779 File: D:\Splessons\Lucene\Data\record5.txt Score: 0.790779 File: D:\Splessons\Lucene\Data\record6.txt Score: 0.790779 File: D:\Splessons\Lucene\Data\record7.txt Score: 0.790779 File: D:\Splessons\Lucene\Data\record8.txt Score: 0.790779 File: D:\Splessons\Lucene\Data\record9.txt Score: 0.2635932 File: D:\Splessons\Lucene\Data\record10.txt 10 documents found. Time :0ms Score: 0.790779 File: D:\Splessons\Lucene\Data\record1.txt Score: 0.2635932 File: D:\Splessons\Lucene\Data\record10.txt Score: 0.790779 File: D:\Splessons\Lucene\Data\record2.txt Score: 1.3179655 File:D:\Splessons\Lucene\Data\record3.txt Score: 0.790779 File: D:\Splessons\Lucene\Data\record4.txt Score: 0.790779 File: D:\Splessons\Lucene\Data\record5.txt Score: 0.790779 File: D:\Splessons\Lucene\Data\record6.txt Score: 0.790779 File: D:\Splessons\Lucene\Data\record7.txt Score: 0.790779 File: D:\Splessons\Lucene\Data\record8.txt Score: 0.790779 File: D:\Splessons\Lucene\Data\record9.txt[/java]

Key Points

Relevance refers to the connection the information in the catalog database
The standard analyzer is nothing but the default analyzer that used if none is specified.

Hide Index Show Index

Chapter 12

Lucene Sorting

Basic Info/Lessons

Lucene Sorting

Sorting In Lucene

Description

Sorting By Relevance

Description

Sorting By IndexOrder

Description

Example

Summary

Key Points