Lucene - SPLessons

Lucene StandardAnalyzer

Home > Lesson > Chapter 13
SPLessons 5 Steps, 3 Clicks
5 Steps - 3 Clicks

Lucene StandardAnalyzer

Lucene StandardAnalyzer

shape Description

StandardAnalyzer Class is the essential class characterized in Lucene Analyzer library. It is especially particular for flipping StandardTokenizer with StandardFilter, LowerCaseFilter, and StopFilter, utilizing a rundown of English stop words.This analyzer is the most sophisticated one as it can go for taking care of fields like email address, names, numbers and so forth. Following are the declaration of the StandardAnalyzer class. [java] public final class StandardAnalyzer extends StopwordAnalyzerBase [/java]

Constructors And Methods

shape Description

The following are the methods and constructors of StandardAnalyzer.
Methods Description
void setMaxTokenLength(int length) To set the token length.
protected Reusable Analyzer Base. Token Stream Components create Components(String fieldName, Reader reader) To create ReusableAnalyzerBase.
Constructors Description
StandardAnalyzer(Version matchVersion, Set stopWords) To build an analyzer by using stop words.
StandardAnalyzer(Version matchVersion) To build an analyzer by using stop words.
StandardAnalyzer(Version matchVersion, Reader stopwords) To build an analyzer by using stop words from the given reader.

shape Example

Follwing is an example. LuceneConstants.java [java] package com.splessons; public class LuceneConstants { public static final String CONTENTS="contents"; public static final String FILE_NAME="filename"; public static final String FILE_PATH="filepath"; public static final int MAX_SEARCH = 10; } [/java] The above class is utilized to give different constants for the sample application. LuceneTester.java [java] package com.splessons; import java.io.IOException; import java.io.StringReader; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.StandardAnalyzer; import org.apache.lucene.analysis.tokenattributes.TermAttribute; import org.apache.lucene.util.Version; public class LuceneTester { public static void main(String[] args) { LuceneTester tester; tester = new LuceneTester(); try { tester.displayTokenUsingStandardAnalyzer(); } catch (IOException e) { e.printStackTrace(); } } private void displayTokenUsingStandardAnalyzer() throws IOException{ String text = "Lucene is simple yet powerful java based search library."; Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36); TokenStream tokenStream = analyzer.tokenStream( LuceneConstants.CONTENTS, new StringReader(text)); TermAttribute term = tokenStream.addAttribute(TermAttribute.class); while(tokenStream.incrementToken()) { System.out.print("[" + term.term() + "] "); } } } [/java] The above class is used to provide the searching abilities for the library of the Lucene. Output: Now compile the code result will be as follows in the console. [java] [lucene] [simple] [yet] [powerful] [java] [based] [search] [library] [/java]

Summary

shape Key Points

  • SimpleAnalyzer class spilts the content in a record in light of non-letter characters and after that lowercase them.
  • StopAnalizer will remove the regulsr words such as a, an, the, etc.