| Package | Description |
|---|---|
| org.canova.cli.transforms.image | |
| org.canova.cli.transforms.text.nlp |
| Modifier and Type | Class and Description |
|---|---|
class |
NormalizeTransform
For raw images like jpegs we need to perform transforms (normalize)
- here we need to scan across the dataset first to get min / max
Since this is an image specific normalizer we find out min and max across all "columns" / pixels in the image collection
- as opposed to just looking across columns between images in the colleciton
- because pixel intensity is linked across the image grid of pixels
Questions:
- are we able to do this in a way that will parallelize well later?
- probably not, most likely requires a v2 refactor for MR
Label Semantics
- NOTE: dont normalize the LABEL!
1.
|
| Modifier and Type | Class and Description |
|---|---|
class |
TfidfTextVectorizerTransform
Design Notes
- dropped inheritance hierarchy of previous impl, would not fit a parallel framework's scale needs
- we dont yet know what the principle design factors are going to be, so shying away from putting more interfaces in place
- need to focus on a two pass mentality:
1.
|
Copyright © 2015. All rights reserved.