|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use TokenStream | |
---|---|
org.apache.lucene.analysis | API and code to convert text into indexable/searchable tokens. |
org.apache.lucene.analysis.br | Analyzer for Brazilian. |
org.apache.lucene.analysis.cjk | Analyzer for Chinese, Japanese and Korean. |
org.apache.lucene.analysis.cn | Analyzer for Chinese. |
org.apache.lucene.analysis.cz | Analyzer for Czech. |
org.apache.lucene.analysis.de | Analyzer for German. |
org.apache.lucene.analysis.el | Analyzer for Greek. |
org.apache.lucene.analysis.fr | Analyzer for French. |
org.apache.lucene.analysis.ngram | |
org.apache.lucene.analysis.nl | Analyzer for Dutch. |
org.apache.lucene.analysis.payloads | Provides various convenience classes for creating payloads on Tokens. |
org.apache.lucene.analysis.ru | Analyzer for Russian. |
org.apache.lucene.analysis.sinks | Implementations of the SinkTokenizer that might be useful. |
org.apache.lucene.analysis.snowball | TokenFilter and Analyzer implementations that use Snowball
stemmers. |
org.apache.lucene.analysis.standard | A fast grammar-based tokenizer constructed with JFlex. |
org.apache.lucene.analysis.th | |
org.apache.lucene.document | The logical representation of a Document for indexing and searching. |
org.apache.lucene.index.memory | High-performance single-document main memory Apache Lucene fulltext search index. |
org.apache.lucene.search.highlight | The highlight package contains classes to provide "keyword in context" features typically used to highlight search terms in the text of results pages. |
org.apache.lucene.wikipedia.analysis |
Uses of TokenStream in org.apache.lucene.analysis |
---|
Subclasses of TokenStream in org.apache.lucene.analysis | |
---|---|
class |
CachingTokenFilter
This class can be used if the Tokens of a TokenStream are intended to be consumed more than once. |
class |
CharTokenizer
An abstract base class for simple, character-oriented tokenizers. |
class |
ISOLatin1AccentFilter
A filter that replaces accented characters in the ISO Latin 1 character set (ISO-8859-1) by their unaccented equivalent. |
class |
KeywordTokenizer
Emits the entire input as a single token. |
class |
LengthFilter
Removes words that are too long and too short from the stream. |
class |
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters. |
class |
LowerCaseFilter
Normalizes token text to lower case. |
class |
LowerCaseTokenizer
LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together. |
class |
PorterStemFilter
Transforms the token stream as per the Porter stemming algorithm. |
class |
SinkTokenizer
A SinkTokenizer can be used to cache Tokens for use in an Analyzer |
class |
StopFilter
Removes stop words from a token stream. |
class |
TeeTokenFilter
Works in conjunction with the SinkTokenizer to provide the ability to set aside tokens that have already been analyzed. |
class |
TokenFilter
A TokenFilter is a TokenStream whose input is another token stream. |
class |
Tokenizer
A Tokenizer is a TokenStream whose input is a Reader. |
class |
WhitespaceTokenizer
A WhitespaceTokenizer is a tokenizer that divides text at whitespace. |
Fields in org.apache.lucene.analysis declared as TokenStream | |
---|---|
protected TokenStream |
TokenFilter.input
The source of tokens for this filter. |
Methods in org.apache.lucene.analysis that return TokenStream | |
---|---|
TokenStream |
WhitespaceAnalyzer.reusableTokenStream(String fieldName,
Reader reader)
|
TokenStream |
StopAnalyzer.reusableTokenStream(String fieldName,
Reader reader)
|
TokenStream |
KeywordAnalyzer.reusableTokenStream(String fieldName,
Reader reader)
|
TokenStream |
SimpleAnalyzer.reusableTokenStream(String fieldName,
Reader reader)
|
TokenStream |
PerFieldAnalyzerWrapper.reusableTokenStream(String fieldName,
Reader reader)
|
TokenStream |
Analyzer.reusableTokenStream(String fieldName,
Reader reader)
Creates a TokenStream that is allowed to be re-used from the previous time that the same thread called this method. |
TokenStream |
WhitespaceAnalyzer.tokenStream(String fieldName,
Reader reader)
|
TokenStream |
StopAnalyzer.tokenStream(String fieldName,
Reader reader)
Filters LowerCaseTokenizer with StopFilter. |
TokenStream |
KeywordAnalyzer.tokenStream(String fieldName,
Reader reader)
|
TokenStream |
SimpleAnalyzer.tokenStream(String fieldName,
Reader reader)
|
TokenStream |
PerFieldAnalyzerWrapper.tokenStream(String fieldName,
Reader reader)
|
abstract TokenStream |
Analyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader. |
Constructors in org.apache.lucene.analysis with parameters of type TokenStream | |
---|---|
CachingTokenFilter(TokenStream input)
|
|
ISOLatin1AccentFilter(TokenStream input)
|
|
LengthFilter(TokenStream in,
int min,
int max)
Build a filter that removes words that are too long or too short from the text. |
|
LowerCaseFilter(TokenStream in)
|
|
PorterStemFilter(TokenStream in)
|
|
StopFilter(TokenStream in,
Set stopWords)
Constructs a filter which removes words from the input TokenStream that are named in the Set. |
|
StopFilter(TokenStream input,
Set stopWords,
boolean ignoreCase)
Construct a token stream filtering the given input. |
|
StopFilter(TokenStream input,
String[] stopWords)
Construct a token stream filtering the given input. |
|
StopFilter(TokenStream in,
String[] stopWords,
boolean ignoreCase)
Constructs a filter which removes words from the input TokenStream that are named in the array of words. |
|
TeeTokenFilter(TokenStream input,
SinkTokenizer sink)
|
|
TokenFilter(TokenStream input)
Construct a token stream filtering the given input. |
Uses of TokenStream in org.apache.lucene.analysis.br |
---|
Subclasses of TokenStream in org.apache.lucene.analysis.br | |
---|---|
class |
BrazilianStemFilter
Based on GermanStemFilter |
Methods in org.apache.lucene.analysis.br that return TokenStream | |
---|---|
TokenStream |
BrazilianAnalyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader. |
Constructors in org.apache.lucene.analysis.br with parameters of type TokenStream | |
---|---|
BrazilianStemFilter(TokenStream in)
|
|
BrazilianStemFilter(TokenStream in,
Set exclusiontable)
|
Uses of TokenStream in org.apache.lucene.analysis.cjk |
---|
Subclasses of TokenStream in org.apache.lucene.analysis.cjk | |
---|---|
class |
CJKTokenizer
CJKTokenizer was modified from StopTokenizer which does a decent job for most European languages. |
Methods in org.apache.lucene.analysis.cjk that return TokenStream | |
---|---|
TokenStream |
CJKAnalyzer.tokenStream(String fieldName,
Reader reader)
get token stream from input |
Uses of TokenStream in org.apache.lucene.analysis.cn |
---|
Subclasses of TokenStream in org.apache.lucene.analysis.cn | |
---|---|
class |
ChineseFilter
Title: ChineseFilter Description: Filter with a stop word table Rule: No digital is allowed. |
class |
ChineseTokenizer
Title: ChineseTokenizer Description: Extract tokens from the Stream using Character.getType() Rule: A Chinese character as a single token Copyright: Copyright (c) 2001 Company: The difference between thr ChineseTokenizer and the CJKTokenizer (id=23545) is that they have different token parsing logic. |
Methods in org.apache.lucene.analysis.cn that return TokenStream | |
---|---|
TokenStream |
ChineseAnalyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader. |
Constructors in org.apache.lucene.analysis.cn with parameters of type TokenStream | |
---|---|
ChineseFilter(TokenStream in)
|
Uses of TokenStream in org.apache.lucene.analysis.cz |
---|
Methods in org.apache.lucene.analysis.cz that return TokenStream | |
---|---|
TokenStream |
CzechAnalyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader. |
Uses of TokenStream in org.apache.lucene.analysis.de |
---|
Subclasses of TokenStream in org.apache.lucene.analysis.de | |
---|---|
class |
GermanStemFilter
A filter that stems German words. |
Methods in org.apache.lucene.analysis.de that return TokenStream | |
---|---|
TokenStream |
GermanAnalyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader. |
Constructors in org.apache.lucene.analysis.de with parameters of type TokenStream | |
---|---|
GermanStemFilter(TokenStream in)
|
|
GermanStemFilter(TokenStream in,
Set exclusionSet)
Builds a GermanStemFilter that uses an exclusiontable. |
Uses of TokenStream in org.apache.lucene.analysis.el |
---|
Subclasses of TokenStream in org.apache.lucene.analysis.el | |
---|---|
class |
GreekLowerCaseFilter
Normalizes token text to lower case, analyzing given ("greek") charset. |
Methods in org.apache.lucene.analysis.el that return TokenStream | |
---|---|
TokenStream |
GreekAnalyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader. |
Constructors in org.apache.lucene.analysis.el with parameters of type TokenStream | |
---|---|
GreekLowerCaseFilter(TokenStream in,
char[] charset)
|
Uses of TokenStream in org.apache.lucene.analysis.fr |
---|
Subclasses of TokenStream in org.apache.lucene.analysis.fr | |
---|---|
class |
ElisionFilter
Removes elisions from a token stream. |
class |
FrenchStemFilter
A filter that stemms french words. |
Methods in org.apache.lucene.analysis.fr that return TokenStream | |
---|---|
TokenStream |
FrenchAnalyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader. |
Constructors in org.apache.lucene.analysis.fr with parameters of type TokenStream | |
---|---|
ElisionFilter(TokenStream input)
Constructs an elision filter with standard stop words |
|
ElisionFilter(TokenStream input,
Set articles)
Constructs an elision filter with a Set of stop words |
|
ElisionFilter(TokenStream input,
String[] articles)
Constructs an elision filter with an array of stop words |
|
FrenchStemFilter(TokenStream in)
|
|
FrenchStemFilter(TokenStream in,
Set exclusiontable)
|
Uses of TokenStream in org.apache.lucene.analysis.ngram |
---|
Subclasses of TokenStream in org.apache.lucene.analysis.ngram | |
---|---|
class |
EdgeNGramTokenFilter
Tokenizes the given token into n-grams of given size(s). |
class |
EdgeNGramTokenizer
Tokenizes the input from an edge into n-grams of given size(s). |
class |
NGramTokenFilter
Tokenizes the input into n-grams of the given size(s). |
class |
NGramTokenizer
Tokenizes the input into n-grams of the given size(s). |
Constructors in org.apache.lucene.analysis.ngram with parameters of type TokenStream | |
---|---|
EdgeNGramTokenFilter(TokenStream input)
|
|
EdgeNGramTokenFilter(TokenStream input,
EdgeNGramTokenFilter.Side side,
int minGram,
int maxGram)
Creates EdgeNGramTokenFilter that can generate n-grams in the sizes of the given range |
|
EdgeNGramTokenFilter(TokenStream input,
String sideLabel,
int minGram,
int maxGram)
Creates EdgeNGramTokenFilter that can generate n-grams in the sizes of the given range |
|
NGramTokenFilter(TokenStream input)
Creates NGramTokenFilter with default min and max n-grams. |
|
NGramTokenFilter(TokenStream input,
int minGram,
int maxGram)
Creates NGramTokenFilter with given min and max n-grams. |
Uses of TokenStream in org.apache.lucene.analysis.nl |
---|
Subclasses of TokenStream in org.apache.lucene.analysis.nl | |
---|---|
class |
DutchStemFilter
A filter that stems Dutch words. |
Methods in org.apache.lucene.analysis.nl that return TokenStream | |
---|---|
TokenStream |
DutchAnalyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided TextReader. |
Constructors in org.apache.lucene.analysis.nl with parameters of type TokenStream | |
---|---|
DutchStemFilter(TokenStream _in)
|
|
DutchStemFilter(TokenStream _in,
Set exclusiontable)
Builds a DutchStemFilter that uses an exclusiontable. |
|
DutchStemFilter(TokenStream _in,
Set exclusiontable,
Map stemdictionary)
|
Uses of TokenStream in org.apache.lucene.analysis.payloads |
---|
Subclasses of TokenStream in org.apache.lucene.analysis.payloads | |
---|---|
class |
NumericPayloadTokenFilter
Assigns a payload to a token based on the Token.type() |
class |
TokenOffsetPayloadTokenFilter
Adds the Token.setStartOffset(int)
and Token.setEndOffset(int)
First 4 bytes are the start |
class |
TypeAsPayloadTokenFilter
Makes the Token.type() a payload. |
Constructors in org.apache.lucene.analysis.payloads with parameters of type TokenStream | |
---|---|
NumericPayloadTokenFilter(TokenStream input,
float payload,
String typeMatch)
|
|
TokenOffsetPayloadTokenFilter(TokenStream input)
|
|
TypeAsPayloadTokenFilter(TokenStream input)
|
Uses of TokenStream in org.apache.lucene.analysis.ru |
---|
Subclasses of TokenStream in org.apache.lucene.analysis.ru | |
---|---|
class |
RussianLetterTokenizer
A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters in a given "russian charset". |
class |
RussianLowerCaseFilter
Normalizes token text to lower case, analyzing given ("russian") charset. |
class |
RussianStemFilter
A filter that stems Russian words. |
Methods in org.apache.lucene.analysis.ru that return TokenStream | |
---|---|
TokenStream |
RussianAnalyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader. |
Constructors in org.apache.lucene.analysis.ru with parameters of type TokenStream | |
---|---|
RussianLowerCaseFilter(TokenStream in,
char[] charset)
|
|
RussianStemFilter(TokenStream in,
char[] charset)
|
Uses of TokenStream in org.apache.lucene.analysis.sinks |
---|
Subclasses of TokenStream in org.apache.lucene.analysis.sinks | |
---|---|
class |
DateRecognizerSinkTokenizer
Attempts to parse the Token.termBuffer() as a Date using a DateFormat . |
class |
TokenRangeSinkTokenizer
Counts the tokens as they go by and saves to the internal list those between the range of lower and upper, exclusive of upper |
class |
TokenTypeSinkTokenizer
If the Token.type() matches the passed in typeToMatch then
add it to the sink |
Uses of TokenStream in org.apache.lucene.analysis.snowball |
---|
Subclasses of TokenStream in org.apache.lucene.analysis.snowball | |
---|---|
class |
SnowballFilter
A filter that stems words using a Snowball-generated stemmer. |
Methods in org.apache.lucene.analysis.snowball that return TokenStream | |
---|---|
TokenStream |
SnowballAnalyzer.tokenStream(String fieldName,
Reader reader)
Constructs a StandardTokenizer filtered by a StandardFilter , a LowerCaseFilter and a StopFilter . |
Constructors in org.apache.lucene.analysis.snowball with parameters of type TokenStream | |
---|---|
SnowballFilter(TokenStream in,
String name)
Construct the named stemming filter. |
Uses of TokenStream in org.apache.lucene.analysis.standard |
---|
Subclasses of TokenStream in org.apache.lucene.analysis.standard | |
---|---|
class |
StandardFilter
Normalizes tokens extracted with StandardTokenizer . |
class |
StandardTokenizer
A grammar-based tokenizer constructed with JFlex |
Methods in org.apache.lucene.analysis.standard that return TokenStream | |
---|---|
TokenStream |
StandardAnalyzer.reusableTokenStream(String fieldName,
Reader reader)
|
TokenStream |
StandardAnalyzer.tokenStream(String fieldName,
Reader reader)
Constructs a StandardTokenizer filtered by a StandardFilter , a LowerCaseFilter and a StopFilter . |
Constructors in org.apache.lucene.analysis.standard with parameters of type TokenStream | |
---|---|
StandardFilter(TokenStream in)
Construct filtering in. |
Uses of TokenStream in org.apache.lucene.analysis.th |
---|
Subclasses of TokenStream in org.apache.lucene.analysis.th | |
---|---|
class |
ThaiWordFilter
TokenFilter that use java.text.BreakIterator to break each Token that is Thai into separate Token(s) for each Thai word. |
Methods in org.apache.lucene.analysis.th that return TokenStream | |
---|---|
TokenStream |
ThaiAnalyzer.tokenStream(String fieldName,
Reader reader)
|
Constructors in org.apache.lucene.analysis.th with parameters of type TokenStream | |
---|---|
ThaiWordFilter(TokenStream input)
|
Uses of TokenStream in org.apache.lucene.document |
---|
Methods in org.apache.lucene.document that return TokenStream | |
---|---|
TokenStream |
Fieldable.tokenStreamValue()
The value of the field as a TokenStream, or null. |
TokenStream |
Field.tokenStreamValue()
The value of the field as a TokesStream, or null. |
Methods in org.apache.lucene.document with parameters of type TokenStream | |
---|---|
void |
Field.setValue(TokenStream value)
Expert: change the value of this field. |
Constructors in org.apache.lucene.document with parameters of type TokenStream | |
---|---|
Field(String name,
TokenStream tokenStream)
Create a tokenized and indexed field that is not stored. |
|
Field(String name,
TokenStream tokenStream,
Field.TermVector termVector)
Create a tokenized and indexed field that is not stored, optionally with storing term vectors. |
Uses of TokenStream in org.apache.lucene.index.memory |
---|
Subclasses of TokenStream in org.apache.lucene.index.memory | |
---|---|
class |
SynonymTokenFilter
Injects additional tokens for synonyms of token terms fetched from the underlying child stream; the child stream must deliver lowercase tokens for synonyms to be found. |
Methods in org.apache.lucene.index.memory that return TokenStream | |
---|---|
TokenStream |
MemoryIndex.keywordTokenStream(Collection keywords)
Convenience method; Creates and returns a token stream that generates a token for each keyword in the given collection, "as is", without any transforming text analysis. |
TokenStream |
PatternAnalyzer.tokenStream(String fieldName,
Reader reader)
Creates a token stream that tokenizes all the text in the given Reader; This implementation forwards to tokenStream(String, String) and is
less efficient than tokenStream(String, String) . |
TokenStream |
PatternAnalyzer.tokenStream(String fieldName,
String text)
Creates a token stream that tokenizes the given string into token terms (aka words). |
Methods in org.apache.lucene.index.memory with parameters of type TokenStream | |
---|---|
void |
MemoryIndex.addField(String fieldName,
TokenStream stream)
Equivalent to addField(fieldName, stream, 1.0f) . |
void |
MemoryIndex.addField(String fieldName,
TokenStream stream,
float boost)
Iterates over the given token stream and adds the resulting terms to the index; Equivalent to adding a tokenized, indexed, termVectorStored, unstored, Lucene Field . |
Constructors in org.apache.lucene.index.memory with parameters of type TokenStream | |
---|---|
SynonymTokenFilter(TokenStream input,
SynonymMap synonyms,
int maxSynonyms)
Creates an instance for the given underlying stream and synonym table. |
Uses of TokenStream in org.apache.lucene.search.highlight |
---|
Methods in org.apache.lucene.search.highlight that return TokenStream | |
---|---|
static TokenStream |
TokenSources.getAnyTokenStream(IndexReader reader,
int docId,
String field,
Analyzer analyzer)
A convenience method that tries a number of approaches to getting a token stream. |
static TokenStream |
TokenSources.getTokenStream(IndexReader reader,
int docId,
String field)
|
static TokenStream |
TokenSources.getTokenStream(IndexReader reader,
int docId,
String field,
Analyzer analyzer)
|
static TokenStream |
TokenSources.getTokenStream(TermPositionVector tpv)
|
static TokenStream |
TokenSources.getTokenStream(TermPositionVector tpv,
boolean tokenPositionsGuaranteedContiguous)
Low level api. |
Methods in org.apache.lucene.search.highlight with parameters of type TokenStream | |
---|---|
String |
Highlighter.getBestFragment(TokenStream tokenStream,
String text)
Highlights chosen terms in a text, extracting the most relevant section. |
String[] |
Highlighter.getBestFragments(TokenStream tokenStream,
String text,
int maxNumFragments)
Highlights chosen terms in a text, extracting the most relevant sections. |
String |
Highlighter.getBestFragments(TokenStream tokenStream,
String text,
int maxNumFragments,
String separator)
Highlights terms in the text , extracting the most relevant sections and concatenating the chosen fragments with a separator (typically "..."). |
TextFragment[] |
Highlighter.getBestTextFragments(TokenStream tokenStream,
String text,
boolean mergeContiguousFragments,
int maxNumFragments)
Low level api to get the most relevant (formatted) sections of the document. |
Uses of TokenStream in org.apache.lucene.wikipedia.analysis |
---|
Subclasses of TokenStream in org.apache.lucene.wikipedia.analysis | |
---|---|
class |
WikipediaTokenizer
Extension of StandardTokenizer that is aware of Wikipedia syntax. |
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |