Uses of Class
org.apache.lucene.analysis.TokenStream

Packages that use TokenStream
org.apache.lucene.analysis API and code to convert text into indexable/searchable tokens. 
org.apache.lucene.analysis.br Analyzer for Brazilian. 
org.apache.lucene.analysis.cjk Analyzer for Chinese, Japanese and Korean. 
org.apache.lucene.analysis.cn Analyzer for Chinese. 
org.apache.lucene.analysis.cz Analyzer for Czech. 
org.apache.lucene.analysis.de Analyzer for German. 
org.apache.lucene.analysis.el Analyzer for Greek. 
org.apache.lucene.analysis.fr Analyzer for French. 
org.apache.lucene.analysis.ngram   
org.apache.lucene.analysis.nl Analyzer for Dutch. 
org.apache.lucene.analysis.payloads
Provides various convenience classes for creating payloads on Tokens. 
org.apache.lucene.analysis.ru Analyzer for Russian. 
org.apache.lucene.analysis.sinks
Implementations of the SinkTokenizer that might be useful. 
org.apache.lucene.analysis.snowball TokenFilter and Analyzer implementations that use Snowball stemmers. 
org.apache.lucene.analysis.standard A fast grammar-based tokenizer constructed with JFlex. 
org.apache.lucene.analysis.th   
org.apache.lucene.document The logical representation of a Document for indexing and searching. 
org.apache.lucene.index.memory High-performance single-document main memory Apache Lucene fulltext search index. 
org.apache.lucene.search.highlight The highlight package contains classes to provide "keyword in context" features typically used to highlight search terms in the text of results pages. 
org.apache.lucene.wikipedia.analysis   
 

Uses of TokenStream in org.apache.lucene.analysis
 

Subclasses of TokenStream in org.apache.lucene.analysis
 class CachingTokenFilter
          This class can be used if the Tokens of a TokenStream are intended to be consumed more than once.
 class CharTokenizer
          An abstract base class for simple, character-oriented tokenizers.
 class ISOLatin1AccentFilter
          A filter that replaces accented characters in the ISO Latin 1 character set (ISO-8859-1) by their unaccented equivalent.
 class KeywordTokenizer
          Emits the entire input as a single token.
 class LengthFilter
          Removes words that are too long and too short from the stream.
 class LetterTokenizer
          A LetterTokenizer is a tokenizer that divides text at non-letters.
 class LowerCaseFilter
          Normalizes token text to lower case.
 class LowerCaseTokenizer
          LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together.
 class PorterStemFilter
          Transforms the token stream as per the Porter stemming algorithm.
 class SinkTokenizer
          A SinkTokenizer can be used to cache Tokens for use in an Analyzer
 class StopFilter
          Removes stop words from a token stream.
 class TeeTokenFilter
          Works in conjunction with the SinkTokenizer to provide the ability to set aside tokens that have already been analyzed.
 class TokenFilter
          A TokenFilter is a TokenStream whose input is another token stream.
 class Tokenizer
          A Tokenizer is a TokenStream whose input is a Reader.
 class WhitespaceTokenizer
          A WhitespaceTokenizer is a tokenizer that divides text at whitespace.
 

Fields in org.apache.lucene.analysis declared as TokenStream
protected  TokenStream TokenFilter.input
          The source of tokens for this filter.
 

Methods in org.apache.lucene.analysis that return TokenStream
 TokenStream WhitespaceAnalyzer.reusableTokenStream(String fieldName, Reader reader)
           
 TokenStream StopAnalyzer.reusableTokenStream(String fieldName, Reader reader)
           
 TokenStream KeywordAnalyzer.reusableTokenStream(String fieldName, Reader reader)
           
 TokenStream SimpleAnalyzer.reusableTokenStream(String fieldName, Reader reader)
           
 TokenStream PerFieldAnalyzerWrapper.reusableTokenStream(String fieldName, Reader reader)
           
 TokenStream Analyzer.reusableTokenStream(String fieldName, Reader reader)
          Creates a TokenStream that is allowed to be re-used from the previous time that the same thread called this method.
 TokenStream WhitespaceAnalyzer.tokenStream(String fieldName, Reader reader)
           
 TokenStream StopAnalyzer.tokenStream(String fieldName, Reader reader)
          Filters LowerCaseTokenizer with StopFilter.
 TokenStream KeywordAnalyzer.tokenStream(String fieldName, Reader reader)
           
 TokenStream SimpleAnalyzer.tokenStream(String fieldName, Reader reader)
           
 TokenStream PerFieldAnalyzerWrapper.tokenStream(String fieldName, Reader reader)
           
abstract  TokenStream Analyzer.tokenStream(String fieldName, Reader reader)
          Creates a TokenStream which tokenizes all the text in the provided Reader.
 

Constructors in org.apache.lucene.analysis with parameters of type TokenStream
CachingTokenFilter(TokenStream input)
           
ISOLatin1AccentFilter(TokenStream input)
           
LengthFilter(TokenStream in, int min, int max)
          Build a filter that removes words that are too long or too short from the text.
LowerCaseFilter(TokenStream in)
           
PorterStemFilter(TokenStream in)
           
StopFilter(TokenStream in, Set stopWords)
          Constructs a filter which removes words from the input TokenStream that are named in the Set.
StopFilter(TokenStream input, Set stopWords, boolean ignoreCase)
          Construct a token stream filtering the given input.
StopFilter(TokenStream input, String[] stopWords)
          Construct a token stream filtering the given input.
StopFilter(TokenStream in, String[] stopWords, boolean ignoreCase)
          Constructs a filter which removes words from the input TokenStream that are named in the array of words.
TeeTokenFilter(TokenStream input, SinkTokenizer sink)
           
TokenFilter(TokenStream input)
          Construct a token stream filtering the given input.
 

Uses of TokenStream in org.apache.lucene.analysis.br
 

Subclasses of TokenStream in org.apache.lucene.analysis.br
 class BrazilianStemFilter
          Based on GermanStemFilter
 

Methods in org.apache.lucene.analysis.br that return TokenStream
 TokenStream BrazilianAnalyzer.tokenStream(String fieldName, Reader reader)
          Creates a TokenStream which tokenizes all the text in the provided Reader.
 

Constructors in org.apache.lucene.analysis.br with parameters of type TokenStream
BrazilianStemFilter(TokenStream in)
           
BrazilianStemFilter(TokenStream in, Set exclusiontable)
           
 

Uses of TokenStream in org.apache.lucene.analysis.cjk
 

Subclasses of TokenStream in org.apache.lucene.analysis.cjk
 class CJKTokenizer
          CJKTokenizer was modified from StopTokenizer which does a decent job for most European languages.
 

Methods in org.apache.lucene.analysis.cjk that return TokenStream
 TokenStream CJKAnalyzer.tokenStream(String fieldName, Reader reader)
          get token stream from input
 

Uses of TokenStream in org.apache.lucene.analysis.cn
 

Subclasses of TokenStream in org.apache.lucene.analysis.cn
 class ChineseFilter
          Title: ChineseFilter Description: Filter with a stop word table Rule: No digital is allowed.
 class ChineseTokenizer
          Title: ChineseTokenizer Description: Extract tokens from the Stream using Character.getType() Rule: A Chinese character as a single token Copyright: Copyright (c) 2001 Company: The difference between thr ChineseTokenizer and the CJKTokenizer (id=23545) is that they have different token parsing logic.
 

Methods in org.apache.lucene.analysis.cn that return TokenStream
 TokenStream ChineseAnalyzer.tokenStream(String fieldName, Reader reader)
          Creates a TokenStream which tokenizes all the text in the provided Reader.
 

Constructors in org.apache.lucene.analysis.cn with parameters of type TokenStream
ChineseFilter(TokenStream in)
           
 

Uses of TokenStream in org.apache.lucene.analysis.cz
 

Methods in org.apache.lucene.analysis.cz that return TokenStream
 TokenStream CzechAnalyzer.tokenStream(String fieldName, Reader reader)
          Creates a TokenStream which tokenizes all the text in the provided Reader.
 

Uses of TokenStream in org.apache.lucene.analysis.de
 

Subclasses of TokenStream in org.apache.lucene.analysis.de
 class GermanStemFilter
          A filter that stems German words.
 

Methods in org.apache.lucene.analysis.de that return TokenStream
 TokenStream GermanAnalyzer.tokenStream(String fieldName, Reader reader)
          Creates a TokenStream which tokenizes all the text in the provided Reader.
 

Constructors in org.apache.lucene.analysis.de with parameters of type TokenStream
GermanStemFilter(TokenStream in)
           
GermanStemFilter(TokenStream in, Set exclusionSet)
          Builds a GermanStemFilter that uses an exclusiontable.
 

Uses of TokenStream in org.apache.lucene.analysis.el
 

Subclasses of TokenStream in org.apache.lucene.analysis.el
 class GreekLowerCaseFilter
          Normalizes token text to lower case, analyzing given ("greek") charset.
 

Methods in org.apache.lucene.analysis.el that return TokenStream
 TokenStream GreekAnalyzer.tokenStream(String fieldName, Reader reader)
          Creates a TokenStream which tokenizes all the text in the provided Reader.
 

Constructors in org.apache.lucene.analysis.el with parameters of type TokenStream
GreekLowerCaseFilter(TokenStream in, char[] charset)
           
 

Uses of TokenStream in org.apache.lucene.analysis.fr
 

Subclasses of TokenStream in org.apache.lucene.analysis.fr
 class ElisionFilter
          Removes elisions from a token stream.
 class FrenchStemFilter
          A filter that stemms french words.
 

Methods in org.apache.lucene.analysis.fr that return TokenStream
 TokenStream FrenchAnalyzer.tokenStream(String fieldName, Reader reader)
          Creates a TokenStream which tokenizes all the text in the provided Reader.
 

Constructors in org.apache.lucene.analysis.fr with parameters of type TokenStream
ElisionFilter(TokenStream input)
          Constructs an elision filter with standard stop words
ElisionFilter(TokenStream input, Set articles)
          Constructs an elision filter with a Set of stop words
ElisionFilter(TokenStream input, String[] articles)
          Constructs an elision filter with an array of stop words
FrenchStemFilter(TokenStream in)
           
FrenchStemFilter(TokenStream in, Set exclusiontable)
           
 

Uses of TokenStream in org.apache.lucene.analysis.ngram
 

Subclasses of TokenStream in org.apache.lucene.analysis.ngram
 class EdgeNGramTokenFilter
          Tokenizes the given token into n-grams of given size(s).
 class EdgeNGramTokenizer
          Tokenizes the input from an edge into n-grams of given size(s).
 class NGramTokenFilter
          Tokenizes the input into n-grams of the given size(s).
 class NGramTokenizer
          Tokenizes the input into n-grams of the given size(s).
 

Constructors in org.apache.lucene.analysis.ngram with parameters of type TokenStream
EdgeNGramTokenFilter(TokenStream input)
           
EdgeNGramTokenFilter(TokenStream input, EdgeNGramTokenFilter.Side side, int minGram, int maxGram)
          Creates EdgeNGramTokenFilter that can generate n-grams in the sizes of the given range
EdgeNGramTokenFilter(TokenStream input, String sideLabel, int minGram, int maxGram)
          Creates EdgeNGramTokenFilter that can generate n-grams in the sizes of the given range
NGramTokenFilter(TokenStream input)
          Creates NGramTokenFilter with default min and max n-grams.
NGramTokenFilter(TokenStream input, int minGram, int maxGram)
          Creates NGramTokenFilter with given min and max n-grams.
 

Uses of TokenStream in org.apache.lucene.analysis.nl
 

Subclasses of TokenStream in org.apache.lucene.analysis.nl
 class DutchStemFilter
          A filter that stems Dutch words.
 

Methods in org.apache.lucene.analysis.nl that return TokenStream
 TokenStream DutchAnalyzer.tokenStream(String fieldName, Reader reader)
          Creates a TokenStream which tokenizes all the text in the provided TextReader.
 

Constructors in org.apache.lucene.analysis.nl with parameters of type TokenStream
DutchStemFilter(TokenStream _in)
           
DutchStemFilter(TokenStream _in, Set exclusiontable)
          Builds a DutchStemFilter that uses an exclusiontable.
DutchStemFilter(TokenStream _in, Set exclusiontable, Map stemdictionary)
           
 

Uses of TokenStream in org.apache.lucene.analysis.payloads
 

Subclasses of TokenStream in org.apache.lucene.analysis.payloads
 class NumericPayloadTokenFilter
          Assigns a payload to a token based on the Token.type()
 class TokenOffsetPayloadTokenFilter
          Adds the Token.setStartOffset(int) and Token.setEndOffset(int) First 4 bytes are the start
 class TypeAsPayloadTokenFilter
          Makes the Token.type() a payload.
 

Constructors in org.apache.lucene.analysis.payloads with parameters of type TokenStream
NumericPayloadTokenFilter(TokenStream input, float payload, String typeMatch)
           
TokenOffsetPayloadTokenFilter(TokenStream input)
           
TypeAsPayloadTokenFilter(TokenStream input)
           
 

Uses of TokenStream in org.apache.lucene.analysis.ru
 

Subclasses of TokenStream in org.apache.lucene.analysis.ru
 class RussianLetterTokenizer
          A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters in a given "russian charset".
 class RussianLowerCaseFilter
          Normalizes token text to lower case, analyzing given ("russian") charset.
 class RussianStemFilter
          A filter that stems Russian words.
 

Methods in org.apache.lucene.analysis.ru that return TokenStream
 TokenStream RussianAnalyzer.tokenStream(String fieldName, Reader reader)
          Creates a TokenStream which tokenizes all the text in the provided Reader.
 

Constructors in org.apache.lucene.analysis.ru with parameters of type TokenStream
RussianLowerCaseFilter(TokenStream in, char[] charset)
           
RussianStemFilter(TokenStream in, char[] charset)
           
 

Uses of TokenStream in org.apache.lucene.analysis.sinks
 

Subclasses of TokenStream in org.apache.lucene.analysis.sinks
 class DateRecognizerSinkTokenizer
          Attempts to parse the Token.termBuffer() as a Date using a DateFormat.
 class TokenRangeSinkTokenizer
          Counts the tokens as they go by and saves to the internal list those between the range of lower and upper, exclusive of upper
 class TokenTypeSinkTokenizer
          If the Token.type() matches the passed in typeToMatch then add it to the sink
 

Uses of TokenStream in org.apache.lucene.analysis.snowball
 

Subclasses of TokenStream in org.apache.lucene.analysis.snowball
 class SnowballFilter
          A filter that stems words using a Snowball-generated stemmer.
 

Methods in org.apache.lucene.analysis.snowball that return TokenStream
 TokenStream SnowballAnalyzer.tokenStream(String fieldName, Reader reader)
          Constructs a StandardTokenizer filtered by a StandardFilter, a LowerCaseFilter and a StopFilter.
 

Constructors in org.apache.lucene.analysis.snowball with parameters of type TokenStream
SnowballFilter(TokenStream in, String name)
          Construct the named stemming filter.
 

Uses of TokenStream in org.apache.lucene.analysis.standard
 

Subclasses of TokenStream in org.apache.lucene.analysis.standard
 class StandardFilter
          Normalizes tokens extracted with StandardTokenizer.
 class StandardTokenizer
          A grammar-based tokenizer constructed with JFlex
 

Methods in org.apache.lucene.analysis.standard that return TokenStream
 TokenStream StandardAnalyzer.reusableTokenStream(String fieldName, Reader reader)
           
 TokenStream StandardAnalyzer.tokenStream(String fieldName, Reader reader)
          Constructs a StandardTokenizer filtered by a StandardFilter, a LowerCaseFilter and a StopFilter.
 

Constructors in org.apache.lucene.analysis.standard with parameters of type TokenStream
StandardFilter(TokenStream in)
          Construct filtering in.
 

Uses of TokenStream in org.apache.lucene.analysis.th
 

Subclasses of TokenStream in org.apache.lucene.analysis.th
 class ThaiWordFilter
          TokenFilter that use java.text.BreakIterator to break each Token that is Thai into separate Token(s) for each Thai word.
 

Methods in org.apache.lucene.analysis.th that return TokenStream
 TokenStream ThaiAnalyzer.tokenStream(String fieldName, Reader reader)
           
 

Constructors in org.apache.lucene.analysis.th with parameters of type TokenStream
ThaiWordFilter(TokenStream input)
           
 

Uses of TokenStream in org.apache.lucene.document
 

Methods in org.apache.lucene.document that return TokenStream
 TokenStream Fieldable.tokenStreamValue()
          The value of the field as a TokenStream, or null.
 TokenStream Field.tokenStreamValue()
          The value of the field as a TokesStream, or null.
 

Methods in org.apache.lucene.document with parameters of type TokenStream
 void Field.setValue(TokenStream value)
          Expert: change the value of this field.
 

Constructors in org.apache.lucene.document with parameters of type TokenStream
Field(String name, TokenStream tokenStream)
          Create a tokenized and indexed field that is not stored.
Field(String name, TokenStream tokenStream, Field.TermVector termVector)
          Create a tokenized and indexed field that is not stored, optionally with storing term vectors.
 

Uses of TokenStream in org.apache.lucene.index.memory
 

Subclasses of TokenStream in org.apache.lucene.index.memory
 class SynonymTokenFilter
          Injects additional tokens for synonyms of token terms fetched from the underlying child stream; the child stream must deliver lowercase tokens for synonyms to be found.
 

Methods in org.apache.lucene.index.memory that return TokenStream
 TokenStream MemoryIndex.keywordTokenStream(Collection keywords)
          Convenience method; Creates and returns a token stream that generates a token for each keyword in the given collection, "as is", without any transforming text analysis.
 TokenStream PatternAnalyzer.tokenStream(String fieldName, Reader reader)
          Creates a token stream that tokenizes all the text in the given Reader; This implementation forwards to tokenStream(String, String) and is less efficient than tokenStream(String, String).
 TokenStream PatternAnalyzer.tokenStream(String fieldName, String text)
          Creates a token stream that tokenizes the given string into token terms (aka words).
 

Methods in org.apache.lucene.index.memory with parameters of type TokenStream
 void MemoryIndex.addField(String fieldName, TokenStream stream)
          Equivalent to addField(fieldName, stream, 1.0f).
 void MemoryIndex.addField(String fieldName, TokenStream stream, float boost)
          Iterates over the given token stream and adds the resulting terms to the index; Equivalent to adding a tokenized, indexed, termVectorStored, unstored, Lucene Field.
 

Constructors in org.apache.lucene.index.memory with parameters of type TokenStream
SynonymTokenFilter(TokenStream input, SynonymMap synonyms, int maxSynonyms)
          Creates an instance for the given underlying stream and synonym table.
 

Uses of TokenStream in org.apache.lucene.search.highlight
 

Methods in org.apache.lucene.search.highlight that return TokenStream
static TokenStream TokenSources.getAnyTokenStream(IndexReader reader, int docId, String field, Analyzer analyzer)
          A convenience method that tries a number of approaches to getting a token stream.
static TokenStream TokenSources.getTokenStream(IndexReader reader, int docId, String field)
           
static TokenStream TokenSources.getTokenStream(IndexReader reader, int docId, String field, Analyzer analyzer)
           
static TokenStream TokenSources.getTokenStream(TermPositionVector tpv)
           
static TokenStream TokenSources.getTokenStream(TermPositionVector tpv, boolean tokenPositionsGuaranteedContiguous)
          Low level api.
 

Methods in org.apache.lucene.search.highlight with parameters of type TokenStream
 String Highlighter.getBestFragment(TokenStream tokenStream, String text)
          Highlights chosen terms in a text, extracting the most relevant section.
 String[] Highlighter.getBestFragments(TokenStream tokenStream, String text, int maxNumFragments)
          Highlights chosen terms in a text, extracting the most relevant sections.
 String Highlighter.getBestFragments(TokenStream tokenStream, String text, int maxNumFragments, String separator)
          Highlights terms in the text , extracting the most relevant sections and concatenating the chosen fragments with a separator (typically "...").
 TextFragment[] Highlighter.getBestTextFragments(TokenStream tokenStream, String text, boolean mergeContiguousFragments, int maxNumFragments)
          Low level api to get the most relevant (formatted) sections of the document.
 

Uses of TokenStream in org.apache.lucene.wikipedia.analysis
 

Subclasses of TokenStream in org.apache.lucene.wikipedia.analysis
 class WikipediaTokenizer
          Extension of StandardTokenizer that is aware of Wikipedia syntax.
 



Copyright © 2000-2008 Apache Software Foundation. All Rights Reserved.