net.sf.jabb.util.text.word
Class AnalyzedText

java.lang.Object
  extended by net.sf.jabb.util.text.word.AnalyzedText

public class AnalyzedText
extends Object

Information about the text after analysis, including: original text, list of segmented words, list of segmented words after de-duplication, text length category, and result of keywords matching.
对文本进行分析之后的信息,包括:原文、拆分开的词或字的清单、去重复之后的拆分开的词或字的清单、 文本长度类别、关键词字匹配结果。

Author:
Zhengmao HU (James)

Field Summary
protected  TextAnalyzer analyzer
           
protected  Object lengthCategory
           
protected  Map<Object,org.apache.commons.lang.mutable.MutableInt> matchedKeywords
           
protected  String text
           
protected  Set<String> uniqueWords
           
protected  List<String> words
           
 
Constructor Summary
AnalyzedText(TextAnalyzer analyzer, String text)
          Constructor.
 
Method Summary
 Object getLengthCategory()
           
 Map<Object,org.apache.commons.lang.mutable.MutableInt> getMatchedKeywords()
           
 String getText()
           
 Set<String> getUniqueWords()
           
 List<String> getWords()
           
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

analyzer

protected TextAnalyzer analyzer

text

protected String text

words

protected List<String> words

uniqueWords

protected Set<String> uniqueWords

lengthCategory

protected Object lengthCategory

matchedKeywords

protected Map<Object,org.apache.commons.lang.mutable.MutableInt> matchedKeywords
Constructor Detail

AnalyzedText

public AnalyzedText(TextAnalyzer analyzer,
                    String text)
Constructor.

Parameters:
analyzer - The analyzer
text - The text to be analyzed
Method Detail

getText

public String getText()
Returns:
供分析的原始文本
Original text for analysis.

getWords

public List<String> getWords()
Returns:
组成原文的全部词、字,按出现的次序排列。
All words that consist the original text, in the order of appearance.

getUniqueWords

public Set<String> getUniqueWords()
Returns:
组成原文的不重复的词、字。
Unique words that consist the original text.

getLengthCategory

public Object getLengthCategory()
Returns:
文本长度类别
Category according to the length of the original text.

getMatchedKeywords

public Map<Object,org.apache.commons.lang.mutable.MutableInt> getMatchedKeywords()
Returns:
匹配上的关键词所对应的attachment(在Map的Key中),以及它们出现的次数(在Map的Value中)
For each keywords that find in the text, return its attachment (as the Key in the Map) and occurrences count (as the Value in the Map).

toString

public String toString()
Overrides:
toString in class Object


Copyright © 2012. All Rights Reserved.