Skip navigation links
A B C D E G H I L M N O P R S T V W 

A

action - Variable in class org.canova.cli.driver.CommandLineInterfaceDriver
 
addRecord(Collection<Writable>) - Method in class org.canova.cli.shuffle.Shuffler
 
addTransform() - Method in class org.canova.cli.vectorization.VectorizationEngine
These two methods are stubbing the future vector transform transform system

We want to separate the transform logic from the inputformat/recordreader - example: a "thresholding" function that binarizes the vector entries - example: a sampling function that takes a larger images and down-samples the image into a small vector

applyTransforms() - Method in class org.canova.cli.vectorization.VectorizationEngine
 
args - Variable in class org.canova.cli.subcommands.BaseSubCommand
 
args - Variable in class org.canova.cli.subcommands.Vectorize
 
AudioVectorizationEngine - Class in org.canova.cli.vectorization
 
AudioVectorizationEngine() - Constructor for class org.canova.cli.vectorization.AudioVectorizationEngine
 

B

BaseSubCommand - Class in org.canova.cli.subcommands
Basic sub command with the command line parser
BaseSubCommand(String[]) - Constructor for class org.canova.cli.subcommands.BaseSubCommand
 
binarize(String) - Method in class org.canova.cli.csv.schema.CSVSchemaColumn
 
binarize(String) - Method in class org.canova.cli.csv.transforms.Transforms
 

C

cache - Variable in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
 
close() - Method in class org.canova.cli.records.reader.LineRecordReader
 
collectStatistics(Collection<Writable>) - Method in class org.canova.cli.transforms.image.NormalizeTransform
 
collectStatistics(Collection<Writable>) - Method in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
Collect stats from the raw record (first pass) Schema: Writable[0]: go dogs, go 1 Writable[1]: label_A 1.
collectStatistics(Collection<Writable>) - Method in interface org.canova.cli.transforms.Transform
 
columnType - Variable in class org.canova.cli.csv.schema.CSVSchemaColumn
 
CommandLineInterfaceDriver - Class in org.canova.cli.driver
Command line interface driver
CommandLineInterfaceDriver() - Constructor for class org.canova.cli.driver.CommandLineInterfaceDriver
 
computeDatasetStatistics() - Method in class org.canova.cli.csv.schema.CSVInputSchema
We call this method once we've scanned the entire dataset once to gather column stats
computeStatistics() - Method in class org.canova.cli.csv.schema.CSVSchemaColumn
 
conf - Variable in class org.canova.cli.vectorization.VectorizationEngine
 
configProps - Variable in class org.canova.cli.subcommands.Vectorize
 
configProps - Variable in class org.canova.cli.vectorization.VectorizationEngine
 
configurationFile - Variable in class org.canova.cli.subcommands.Vectorize
 
convertTextRecordToTFIDFVector(String) - Method in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
 
copy(String) - Method in class org.canova.cli.csv.schema.CSVSchemaColumn
 
copy(String) - Method in class org.canova.cli.csv.transforms.Transforms
 
createInputFormat() - Method in class org.canova.cli.subcommands.Vectorize
Creates an input format
createOutputFormat() - Method in class org.canova.cli.subcommands.Vectorize
 
createReader(InputSplit, Configuration) - Method in class org.canova.cli.formats.input.TextInputFormat
 
createReader(InputSplit) - Method in class org.canova.cli.formats.input.TextInputFormat
 
createTokenizerFactory(Configuration) - Method in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
 
createVectorizationEngine() - Method in class org.canova.cli.subcommands.Vectorize
Creates an input format
CSVInputSchema - Class in org.canova.cli.csv.schema
 
CSVInputSchema() - Constructor for class org.canova.cli.csv.schema.CSVInputSchema
 
CSVSchemaColumn - Class in org.canova.cli.csv.schema
 
CSVSchemaColumn(String, CSVSchemaColumn.ColumnType, CSVSchemaColumn.TransformType) - Constructor for class org.canova.cli.csv.schema.CSVSchemaColumn
 
CSVSchemaColumn.ColumnType - Enum in org.canova.cli.csv.schema
 
CSVSchemaColumn.TransformType - Enum in org.canova.cli.csv.schema
 
CSVVectorizationEngine - Class in org.canova.cli.vectorization
Vectorization Engine - takes CSV input and converts it to a transformed vector output in a standard format - uses the input CSV schema and the collected statistics from a pre-pass
CSVVectorizationEngine() - Constructor for class org.canova.cli.vectorization.CSVVectorizationEngine
 

D

DatasetSummaryStatistics - Class in org.canova.cli.csv.statistics
Tracks statistics about the dataset being vectorized - right now just focused on the CLI+CSV conversion stuff Things to track - range for each column - avg for each column - median for each column - Future ideas - outlier List - automatically flag suspect data (stasitically) - suggest settings for training based on dataset
DatasetSummaryStatistics() - Constructor for class org.canova.cli.csv.statistics.DatasetSummaryStatistics
 
debugLoadedConfProperties() - Method in class org.canova.cli.subcommands.Vectorize
Dont change print stuff, its part of application console output UI
debugPringDatasetStatistics() - Method in class org.canova.cli.csv.schema.CSVInputSchema
 
debugPrintColumns() - Method in class org.canova.cli.csv.schema.CSVInputSchema
 
debugPrintColumns() - Method in class org.canova.cli.csv.schema.CSVSchemaColumn
 
debugPrintVocabList() - Method in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
 
DEFAULT_INPUT_FORMAT_CLASSNAME - Static variable in class org.canova.cli.subcommands.Vectorize
 
DEFAULT_OUTPUT_FORMAT_CLASSNAME - Static variable in class org.canova.cli.subcommands.Vectorize
 
DEFAULT_VECTORIZATION_ENGINE_CLASSNAME - Static variable in class org.canova.cli.subcommands.Vectorize
 
delimiter - Variable in class org.canova.cli.csv.schema.CSVInputSchema
 
doMain(String[]) - Method in class org.canova.cli.driver.CommandLineInterfaceDriver
 
doWithTokens(Tokenizer) - Method in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
 

E

evaluateColumnValue(String) - Method in class org.canova.cli.csv.schema.CSVSchemaColumn
This method collects dataset statistics about the column that we'll need later to 1.
evaluateInputRecord(String) - Method in class org.canova.cli.csv.schema.CSVInputSchema
 
evaluateStatistics() - Method in class org.canova.cli.transforms.image.NormalizeTransform
 
evaluateStatistics() - Method in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
This is where we'll take the dataset stats learned from the first pass and setup for the transform pass
evaluateStatistics() - Method in interface org.canova.cli.transforms.Transform
 
execute() - Method in interface org.canova.cli.subcommands.SubCommand
Execute the input
execute() - Method in class org.canova.cli.subcommands.Vectorize
 
execute() - Method in class org.canova.cli.vectorization.AudioVectorizationEngine
 
execute() - Method in class org.canova.cli.vectorization.CSVVectorizationEngine
This is where our custom vectorization engine does its thing
execute() - Method in class org.canova.cli.vectorization.ImageVectorizationEngine
In this case we are assuming that the Image input format gave us basically raw pixels Thoughts - Inside the vectorization engine is a great place to put a pluggable transformation system [ TODO: v2 ] - example: MNIST binarization could be a pluggable transform - example: custom thresholding on blocks of pixels
execute() - Method in class org.canova.cli.vectorization.TextVectorizationEngine
Currently the stock input format / RR gives us a vector already converted - TODO: separate this into a transform plugin

Thoughts - Inside the vectorization engine is a great place to put a pluggable transformation system [ TODO: v2 ] - example: MNIST binarization could be a pluggable transform - example: custom thresholding on blocks of pixels

Text Pipeline specific stuff - so right now the TF-IDF stuff has 2 major issues 1.

execute() - Method in class org.canova.cli.vectorization.VectorizationEngine
 
execute() - Method in class org.canova.cli.vectorization.VideoVectorizationEngine
In this case we are assuming that the Image input format gave us basically raw pixels Thoughts - Inside the vectorization engine is a great place to put a pluggable transformation system [ TODO: v2 ] - example: MNIST binarization could be a pluggable transform - example: custom thresholding on blocks of pixels

G

getColumnSchemaByName(String) - Method in class org.canova.cli.csv.schema.CSVInputSchema
 
getColumnSchemas() - Method in class org.canova.cli.csv.schema.CSVInputSchema
 
getConf() - Method in class org.canova.cli.records.reader.LineRecordReader
 
getCurrentDirectoryLabelPath() - Method in class org.canova.cli.records.reader.LineRecordReader
 
getLabelCount(String) - Method in class org.canova.cli.csv.schema.CSVSchemaColumn
 
getLabelID(String) - Method in class org.canova.cli.csv.schema.CSVSchemaColumn
 
getLabelID(String) - Method in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
 
getNumberOfLabelsSeen() - Method in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
 
getTransformedVectorSize() - Method in class org.canova.cli.csv.schema.CSVInputSchema
Returns how many columns a newly transformed vector should have
getVocabularySize() - Method in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
 

H

hasNext() - Method in class org.canova.cli.records.reader.LineRecordReader
 
hasNext() - Method in class org.canova.cli.shuffle.Shuffler
 

I

idf(double, double) - Static method in class org.canova.cli.transforms.text.nlp.NLPUtils
Calc the IDF component
ImageVectorizationEngine - Class in org.canova.cli.vectorization
Reads from InputFormats where (generally, but up to InputFormat) each Writable in Collection is a pixel Writes back out to the OutputFomat where we are assuming the last element is the double representing the class index
ImageVectorizationEngine() - Constructor for class org.canova.cli.vectorization.ImageVectorizationEngine
 
initialize(InputSplit) - Method in class org.canova.cli.records.reader.LineRecordReader
 
initialize(Configuration, InputSplit) - Method in class org.canova.cli.records.reader.LineRecordReader
Need to look at the lines in a set of files in directories
initialize(Configuration) - Method in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
 
initialize(InputSplit, InputFormat, OutputFormat, RecordReader, RecordWriter, Properties, String, Configuration) - Method in class org.canova.cli.vectorization.VectorizationEngine
 
INPUT_FORMAT - Static variable in class org.canova.cli.subcommands.Vectorize
 
inputFormat - Variable in class org.canova.cli.vectorization.VectorizationEngine
 
invalidDataEntries - Variable in class org.canova.cli.csv.schema.CSVSchemaColumn
 

L

label(String) - Method in class org.canova.cli.csv.schema.CSVSchemaColumn
 
label(String) - Method in class org.canova.cli.csv.transforms.Transforms
 
LineRecordReader - Class in org.canova.cli.records.reader
Hardcoded for serial only access (for now)
LineRecordReader() - Constructor for class org.canova.cli.records.reader.LineRecordReader
 
loadConfigFile() - Method in class org.canova.cli.subcommands.Vectorize
 

M

main(String[]) - Static method in class org.canova.cli.driver.CommandLineInterfaceDriver
 
maxValue - Variable in class org.canova.cli.csv.schema.CSVSchemaColumn
 
maxValue - Variable in class org.canova.cli.transforms.image.NormalizeTransform
 
MIN_WORD_FREQUENCY - Static variable in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
 
minValue - Variable in class org.canova.cli.csv.schema.CSVSchemaColumn
 
minValue - Variable in class org.canova.cli.transforms.image.NormalizeTransform
 
minWordFrequency - Variable in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
 

N

name - Variable in class org.canova.cli.csv.schema.CSVSchemaColumn
 
next() - Method in class org.canova.cli.records.reader.LineRecordReader
Major difference here: we ALWAYS append label as string - we've also kicked the responsibility of indexing labels UP a level to the vectorization engine for text NEED to look at the file iterator (iter)
next() - Method in class org.canova.cli.shuffle.Shuffler
 
NLPUtils - Class in org.canova.cli.transforms.text.nlp
 
NLPUtils() - Constructor for class org.canova.cli.transforms.text.nlp.NLPUtils
 
normalize(String) - Method in class org.canova.cli.csv.schema.CSVSchemaColumn
 
normalize(String) - Method in class org.canova.cli.csv.transforms.Transforms
 
NORMALIZE_DATA_FLAG - Static variable in class org.canova.cli.subcommands.Vectorize
 
normalizeData - Variable in class org.canova.cli.subcommands.Vectorize
 
normalizeData - Variable in class org.canova.cli.vectorization.VectorizationEngine
 
NormalizeTransform - Class in org.canova.cli.transforms.image
For raw images like jpegs we need to perform transforms (normalize) - here we need to scan across the dataset first to get min / max Since this is an image specific normalizer we find out min and max across all "columns" / pixels in the image collection - as opposed to just looking across columns between images in the colleciton - because pixel intensity is linked across the image grid of pixels Questions: - are we able to do this in a way that will parallelize well later? - probably not, most likely requires a v2 refactor for MR Label Semantics - NOTE: dont normalize the LABEL! 1.
NormalizeTransform() - Constructor for class org.canova.cli.transforms.image.NormalizeTransform
 

O

org.canova.cli.csv.schema - package org.canova.cli.csv.schema
 
org.canova.cli.csv.statistics - package org.canova.cli.csv.statistics
 
org.canova.cli.csv.transforms - package org.canova.cli.csv.transforms
 
org.canova.cli.driver - package org.canova.cli.driver
 
org.canova.cli.formats.input - package org.canova.cli.formats.input
 
org.canova.cli.records.reader - package org.canova.cli.records.reader
 
org.canova.cli.shuffle - package org.canova.cli.shuffle
 
org.canova.cli.subcommands - package org.canova.cli.subcommands
 
org.canova.cli.transforms - package org.canova.cli.transforms
 
org.canova.cli.transforms.image - package org.canova.cli.transforms.image
 
org.canova.cli.transforms.text.nlp - package org.canova.cli.transforms.text.nlp
 
org.canova.cli.vectorization - package org.canova.cli.vectorization
 
OUTPUT_FILENAME_KEY - Static variable in class org.canova.cli.subcommands.Vectorize
 
OUTPUT_FORMAT - Static variable in class org.canova.cli.subcommands.Vectorize
 
outputFilename - Variable in class org.canova.cli.vectorization.VectorizationEngine
 
outputFormat - Variable in class org.canova.cli.vectorization.VectorizationEngine
 
outputVectorFilename - Variable in class org.canova.cli.subcommands.Vectorize
 

P

parseSchemaFile(String) - Method in class org.canova.cli.csv.schema.CSVInputSchema
 
PRINT_STATS_FLAG - Static variable in class org.canova.cli.subcommands.Vectorize
 
printStats - Variable in class org.canova.cli.vectorization.VectorizationEngine
 
printUsage() - Static method in class org.canova.cli.driver.CommandLineInterfaceDriver
 
printUsage() - Static method in class org.canova.cli.subcommands.Vectorize
 

R

reader - Variable in class org.canova.cli.vectorization.VectorizationEngine
 
readFields(DataInput) - Method in class org.canova.cli.formats.input.TextInputFormat
 
recordLabels - Variable in class org.canova.cli.csv.schema.CSVSchemaColumn
 
recordLabels - Variable in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
 
records - Variable in class org.canova.cli.shuffle.Shuffler
 
relation - Variable in class org.canova.cli.csv.schema.CSVInputSchema
 

S

setConf(Configuration) - Method in class org.canova.cli.records.reader.LineRecordReader
 
SHUFFLE_DATA_FLAG - Static variable in class org.canova.cli.subcommands.Vectorize
 
shuffleOn - Variable in class org.canova.cli.vectorization.VectorizationEngine
 
Shuffler - Class in org.canova.cli.shuffle
Record Ordering Shuffler Just a good old fashioned way to shuffle the output of the records Yes, it is memory based (and bound) for now.
Shuffler() - Constructor for class org.canova.cli.shuffle.Shuffler
We could probably infer this, but I'm lazy
SKIP_HEADER_KEY - Static variable in class org.canova.cli.vectorization.CSVVectorizationEngine
 
split - Variable in class org.canova.cli.vectorization.VectorizationEngine
 
STOP_WORDS - Static variable in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
 
stopWords - Variable in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
 
SubCommand - Interface in org.canova.cli.subcommands
A subcommand used for handling input

T

TextInputFormat - Class in org.canova.cli.formats.input
 
TextInputFormat() - Constructor for class org.canova.cli.formats.input.TextInputFormat
 
TextVectorizationEngine - Class in org.canova.cli.vectorization
 
TextVectorizationEngine() - Constructor for class org.canova.cli.vectorization.TextVectorizationEngine
 
tf(int) - Static method in class org.canova.cli.transforms.text.nlp.NLPUtils
Term frequency https://en.wikipedia.org/wiki/Tf%E2%80%93idf
tfidf(double, double) - Static method in class org.canova.cli.transforms.text.nlp.NLPUtils
Return td * idf
TfidfTextVectorizerTransform - Class in org.canova.cli.transforms.text.nlp
Design Notes - dropped inheritance hierarchy of previous impl, would not fit a parallel framework's scale needs - we dont yet know what the principle design factors are going to be, so shying away from putting more interfaces in place - need to focus on a two pass mentality: 1.
TfidfTextVectorizerTransform() - Constructor for class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
 
TOKENIZER - Static variable in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
 
tokenizerFactory - Variable in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
 
totalRecords - Variable in class org.canova.cli.transforms.image.NormalizeTransform
 
transform - Variable in class org.canova.cli.csv.schema.CSVSchemaColumn
 
transform(Collection<Writable>) - Method in class org.canova.cli.transforms.image.NormalizeTransform
Transform a specific incoming vector in place TODO: is the label getting normalized here???
transform(Collection<Writable>) - Method in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
Transform the raw record w stats we've learned from the first pass Schema: Writable[0]: go dogs, go 1 Writable[1]: label_A 1.
Transform - Interface in org.canova.cli.transforms
 
transform(Collection<Writable>) - Method in interface org.canova.cli.transforms.Transform
 
transformColumnValue(String) - Method in class org.canova.cli.csv.schema.CSVSchemaColumn
 
Transforms - Class in org.canova.cli.csv.transforms
 
Transforms() - Constructor for class org.canova.cli.csv.transforms.Transforms
 

V

validCommandLineParameters - Variable in class org.canova.cli.subcommands.Vectorize
 
valueOf(String) - Static method in enum org.canova.cli.csv.schema.CSVSchemaColumn.ColumnType
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.canova.cli.csv.schema.CSVSchemaColumn.TransformType
Returns the enum constant of this type with the specified name.
values() - Static method in enum org.canova.cli.csv.schema.CSVSchemaColumn.ColumnType
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.canova.cli.csv.schema.CSVSchemaColumn.TransformType
Returns an array containing the constants of this enum type, in the order they are declared.
VECTORIZATION_ENGINE - Static variable in class org.canova.cli.subcommands.Vectorize
 
VectorizationEngine - Class in org.canova.cli.vectorization
 
VectorizationEngine() - Constructor for class org.canova.cli.vectorization.VectorizationEngine
 
Vectorize - Class in org.canova.cli.subcommands
Vectorize Command.
Vectorize() - Constructor for class org.canova.cli.subcommands.Vectorize
 
Vectorize(String[]) - Constructor for class org.canova.cli.subcommands.Vectorize
 
vectorize(String, String, CSVInputSchema) - Method in class org.canova.cli.vectorization.CSVVectorizationEngine
Use statistics collected from a previous pass to vectorize (or drop) each column
vectorizeToWritable(String, String, CSVInputSchema) - Method in class org.canova.cli.vectorization.CSVVectorizationEngine
Use statistics collected from a previous pass to vectorize (or drop) each column
VideoVectorizationEngine - Class in org.canova.cli.vectorization
 
VideoVectorizationEngine() - Constructor for class org.canova.cli.vectorization.VideoVectorizationEngine
 

W

wordFrequenciesForSentence(String) - Method in class org.canova.cli.transforms.text.nlp.TfidfTextVectorizerTransform
 
write(DataOutput) - Method in class org.canova.cli.formats.input.TextInputFormat
 
writer - Variable in class org.canova.cli.vectorization.VectorizationEngine
 
A B C D E G H I L M N O P R S T V W 
Skip navigation links

Copyright © 2016. All rights reserved.