Lucene StandardAnalyzer

Home > Lesson > Chapter 13

5 Steps - 3 Clicks

Lucene StandardAnalyzer

Description

StandardAnalyzer Class is the essential class characterized in Lucene Analyzer library. It is especially particular for flipping StandardTokenizer with StandardFilter, LowerCaseFilter, and StopFilter, utilizing a rundown of English stop words.This analyzer is the most sophisticated one as it can go for taking care of fields like email address, names, numbers and so forth. Following are the declaration of the StandardAnalyzer class. [java] public final class StandardAnalyzer extends StopwordAnalyzerBase [/java]

Description

The following are the methods and constructors of StandardAnalyzer.

Methods	Description
void setMaxTokenLength(int length)	To set the token length.
protected Reusable Analyzer Base. Token Stream Components create Components(String fieldName, Reader reader)	To create ReusableAnalyzerBase.

Constructors	Description
StandardAnalyzer(Version matchVersion, Set stopWords)	To build an analyzer by using stop words.
StandardAnalyzer(Version matchVersion)	To build an analyzer by using stop words.
StandardAnalyzer(Version matchVersion, Reader stopwords)	To build an analyzer by using stop words from the given reader.

Example

Follwing is an example. LuceneConstants.java [java] package com.splessons; public class LuceneConstants { public static final String CONTENTS="contents"; public static final String FILE_NAME="filename"; public static final String FILE_PATH="filepath"; public static final int MAX_SEARCH = 10; } [/java] The above class is utilized to give different constants for the sample application. LuceneTester.java [java] package com.splessons; import java.io.IOException; import java.io.StringReader; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.StandardAnalyzer; import org.apache.lucene.analysis.tokenattributes.TermAttribute; import org.apache.lucene.util.Version; public class LuceneTester { public static void main(String[] args) { LuceneTester tester; tester = new LuceneTester(); try { tester.displayTokenUsingStandardAnalyzer(); } catch (IOException e) { e.printStackTrace(); } } private void displayTokenUsingStandardAnalyzer() throws IOException{ String text = "Lucene is simple yet powerful java based search library."; Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36); TokenStream tokenStream = analyzer.tokenStream( LuceneConstants.CONTENTS, new StringReader(text)); TermAttribute term = tokenStream.addAttribute(TermAttribute.class); while(tokenStream.incrementToken()) { System.out.print("[" + term.term() + "] "); } } } [/java] The above class is used to provide the searching abilities for the library of the Lucene. Output: Now compile the code result will be as follows in the console. [java] [lucene] [simple] [yet] [powerful] [java] [based] [search] [library] [/java]

Key Points

SimpleAnalyzer class spilts the content in a record in light of non-letter characters and after that lowercase them.
StopAnalizer will remove the regulsr words such as a, an, the, etc.

Hide Index Show Index

Chapter 13

Lucene StandardAnalyzer

Basic Info/Lessons

Lucene StandardAnalyzer

Lucene StandardAnalyzer

Description

Constructors And Methods

Description

Example

Summary

Key Points