public class TextVectorizationEngine extends VectorizationEngine
conf, configProps, inputFormat, normalizeData, outputFilename, outputFormat, printStats, reader, shuffleOn, split, writer| Constructor and Description |
|---|
TextVectorizationEngine() |
| Modifier and Type | Method and Description |
|---|---|
void |
execute()
Currently the stock input format / RR gives us a vector already converted
- TODO: separate this into a transform plugin
Thoughts
- Inside the vectorization engine is a great place to put a pluggable transformation system [ TODO: v2 ]
- example: MNIST binarization could be a pluggable transform
- example: custom thresholding on blocks of pixels
Text Pipeline specific stuff
- so right now the TF-IDF stuff has 2 major issues
1.
|
addTransform, applyTransforms, initializepublic void execute()
throws IOException
execute in class VectorizationEngineIOExceptionCopyright © 2016. All rights reserved.