net.sf.jabb.util.text.word
Class TextAnalyzer

java.lang.Object
  extended by net.sf.jabb.util.text.word.TextAnalyzer
Direct Known Subclasses:
FastTextAnalyzer, MmsegTextAnalyzer

public abstract class TextAnalyzer
extends Object

Text Analyzer; Result of the analysis will be hold in AnalyzedText.
文本分析器;分析的结果会放在AnalyzedText中。

Author:
Zhengmao HU (James)

Field Summary
protected  String dictionaryPath
           
protected  Map<String,? extends Object> keywordDefinitions
           
protected  TreeMap<Integer,? extends Object> lengthDefinitions
           
static int TYPE_FAST
          使用KeywordMatcher与自定义的字典表进行分词(试验中,尚不完善)
static int TYPE_MMSEG_COMPLEX
          使用com.chenlb.mmseg4j.ComplexSeg进行分词
static int TYPE_MMSEG_MAXWORD
          使用com.chenlb.mmseg4j.MaxWordSeg进行分词
static int TYPE_MMSEG_SIMPLE
          使用com.chenlb.mmseg4j.SimpleSeg进行分词
 
Constructor Summary
protected TextAnalyzer(String dictionaryPath, Map<String,? extends Object> keywordDefinitions, Map<Integer,? extends Object> lengthDefinitions)
          Constructor that will be used internally.
仅供内部使用的构造方法。
 
Method Summary
 AnalyzedText analyze(String text)
          对文本进行立刻分析,不用lazy方式。
 AnalyzedText analyze(String text, boolean lazy)
          对文本进行分析。
static TextAnalyzer createInstance(int type)
          Create an instance of TextAnalyzer.
创建一个文本分析器实例。
static TextAnalyzer createInstance(int type, Map<String,? extends Object> keywordDefinitions, Map<Integer,? extends Object> lengthDefinitions)
          Create an instance of TextAnalyzer.
创建一个文本分析器实例。
static TextAnalyzer createInstance(int type, String dictionaryPath, Map<String,? extends Object> keywordDefinitions, Map<Integer,? extends Object> lengthDefinitions)
          Create an instance of TextAnalyzer.
创建一个文本分析器实例。
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

TYPE_MMSEG_SIMPLE

public static final int TYPE_MMSEG_SIMPLE
使用com.chenlb.mmseg4j.SimpleSeg进行分词

See Also:
Constant Field Values

TYPE_MMSEG_MAXWORD

public static final int TYPE_MMSEG_MAXWORD
使用com.chenlb.mmseg4j.MaxWordSeg进行分词

See Also:
Constant Field Values

TYPE_MMSEG_COMPLEX

public static final int TYPE_MMSEG_COMPLEX
使用com.chenlb.mmseg4j.ComplexSeg进行分词

See Also:
Constant Field Values

TYPE_FAST

public static final int TYPE_FAST
使用KeywordMatcher与自定义的字典表进行分词(试验中,尚不完善)

See Also:
Constant Field Values

dictionaryPath

protected String dictionaryPath

keywordDefinitions

protected Map<String,? extends Object> keywordDefinitions

lengthDefinitions

protected TreeMap<Integer,? extends Object> lengthDefinitions
Constructor Detail

TextAnalyzer

protected TextAnalyzer(String dictionaryPath,
                       Map<String,? extends Object> keywordDefinitions,
                       Map<Integer,? extends Object> lengthDefinitions)
Constructor that will be used internally.
仅供内部使用的构造方法。

Parameters:
dictionaryPath - 字典文件路径
keywordDefinitions - 关键词字的定义
lengthDefinitions - 文本长度类别定义
Method Detail

createInstance

public static TextAnalyzer createInstance(int type,
                                          String dictionaryPath,
                                          Map<String,? extends Object> keywordDefinitions,
                                          Map<Integer,? extends Object> lengthDefinitions)
Create an instance of TextAnalyzer.
创建一个文本分析器实例。

Parameters:
type - TYPE_MMSEG_SIMPLE | TYPE_MMSEG_COMPLEX | TYPE_MMSEG_MAXWORD | TYPE_FAST
dictionaryPath - 字典文件路径,如果为null,则表示使用缺省位置的字典文件
keywordDefinitions - 关键词字的定义
lengthDefinitions - 文本长度类别定义
Returns:
A new instance of TextAnalyzer.
TextAnalyzer的一个实例。

createInstance

public static TextAnalyzer createInstance(int type,
                                          Map<String,? extends Object> keywordDefinitions,
                                          Map<Integer,? extends Object> lengthDefinitions)
Create an instance of TextAnalyzer.
创建一个文本分析器实例。

Parameters:
type - TYPE_MMSEG_SIMPLE | TYPE_MMSEG_COMPLEX | TYPE_MMSEG_MAXWORD | TYPE_FAST
keywordDefinitions - 关键词字的定义
lengthDefinitions - 文本长度类别定义
Returns:
A new instance of TextAnalyzer.
TextAnalyzer的一个实例。

createInstance

public static TextAnalyzer createInstance(int type)
Create an instance of TextAnalyzer.
创建一个文本分析器实例。

Parameters:
type - TYPE_MMSEG_SIMPLE | TYPE_MMSEG_COMPLEX | TYPE_MMSEG_MAXWORD | TYPE_FAST
Returns:
A new instance of TextAnalyzer.
TextAnalyzer的一个实例。

analyze

public AnalyzedText analyze(String text,
                            boolean lazy)
对文本进行分析。

Parameters:
text - 待分析的文本
lazy - 是否延迟分析(所谓延迟是指直到用到分析结果的时候才进行实质性分析)
Returns:
分析结果

analyze

public AnalyzedText analyze(String text)
对文本进行立刻分析,不用lazy方式。

Parameters:
text - 待分析的文本
Returns:
分析结果


Copyright © 2012. All Rights Reserved.