SG++-Doxygen-Documentation
|
Configuration structure used for all kinds of SampleProviders including default values. More...
#include <DataSourceConfig.hpp>
Public Attributes | |
size_t | batchSize = 0 |
datadriven::DataTransformationConfig | dataTransformationConfig |
size_t | epochs = 1 |
The number of epochs to train on. More... | |
std::string | filePath = "" |
Valid path to a file on disk. More... | |
DataSourceFileType | fileType = DataSourceFileType::NONE |
Which type of input file are we dealing with? NONE for auto detection or generated artificial datasets. More... | |
bool | hasTargets = true |
whether the file has targets (i.e. More... | |
bool | isCompressed = false |
The dataset is gzip compressed. More... | |
size_t | numBatches = 1 |
How many batches should the dataset be split into for batch learning - if 1, take the entire dataset. More... | |
int64_t | randomSeed = -1 |
Seed for the shuffling prng. More... | |
std::vector< double > | readinClasses = std::vector<double>() |
Specifies the set of classes (targets) to be read-in from the data file Any line with a class not contained in this vector is skipped If hasTargets=false this is ignored If empty then all classes/targets are considered (default) More... | |
std::vector< size_t > | readinColumns = std::vector<size_t>() |
Specifies the set of columns (dimensions) to be read-in from the data file Starts at 0, order matters; Any column not contained in this vector is ignored as a dimension If empty, then all columns are read in (default) More... | |
size_t | readinCutoff = -1 |
After how many (valid) lines of the sourcefile to stop reading. More... | |
DataSourceShufflingType | shuffling = DataSourceShufflingType::sequential |
The type of shuffling to be applied to the data. More... | |
double | validationPortion = 0.3 |
Configuration structure used for all kinds of SampleProviders including default values.
size_t sgpp::datadriven::DataSourceConfig::batchSize = 0 |
datadriven::DataTransformationConfig sgpp::datadriven::DataSourceConfig::dataTransformationConfig |
size_t sgpp::datadriven::DataSourceConfig::epochs = 1 |
The number of epochs to train on.
Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().
std::string sgpp::datadriven::DataSourceConfig::filePath = "" |
Valid path to a file on disk.
Empty for generated artificial datasets
Referenced by sgpp::datadriven::DataSourceBuilder::crossValidationFromConfig(), sgpp::datadriven::DataSource::DataSource(), sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig(), and sgpp::datadriven::DataSourceBuilder::withPath().
DataSourceFileType sgpp::datadriven::DataSourceConfig::fileType = DataSourceFileType::NONE |
Which type of input file are we dealing with? NONE for auto detection or generated artificial datasets.
Referenced by sgpp::datadriven::DataSourceBuilder::crossValidationAssemble(), sgpp::datadriven::DataSourceBuilder::crossValidationFromConfig(), sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig(), sgpp::datadriven::DataSourceBuilder::splittingAssemble(), sgpp::datadriven::DataSourceBuilder::splittingFromConfig(), sgpp::datadriven::DataSourceBuilder::withFileType(), and sgpp::datadriven::DataSourceBuilder::withPath().
bool sgpp::datadriven::DataSourceConfig::hasTargets = true |
whether the file has targets (i.e.
supervised learning)
Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().
bool sgpp::datadriven::DataSourceConfig::isCompressed = false |
size_t sgpp::datadriven::DataSourceConfig::numBatches = 1 |
How many batches should the dataset be split into for batch learning - if 1, take the entire dataset.
Referenced by sgpp::datadriven::DataSource::end(), sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig(), sgpp::datadriven::DataSource::getNextSamples(), and sgpp::datadriven::DataSourceBuilder::inBatches().
int64_t sgpp::datadriven::DataSourceConfig::randomSeed = -1 |
Seed for the shuffling prng.
Referenced by sgpp::datadriven::DataShufflingFunctorFactory::buildDataShufflingFunctor(), and sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().
std::vector<double> sgpp::datadriven::DataSourceConfig::readinClasses = std::vector<double>() |
Specifies the set of classes (targets) to be read-in from the data file Any line with a class not contained in this vector is skipped If hasTargets=false this is ignored If empty then all classes/targets are considered (default)
Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().
std::vector<size_t> sgpp::datadriven::DataSourceConfig::readinColumns = std::vector<size_t>() |
Specifies the set of columns (dimensions) to be read-in from the data file Starts at 0, order matters; Any column not contained in this vector is ignored as a dimension If empty, then all columns are read in (default)
Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().
size_t sgpp::datadriven::DataSourceConfig::readinCutoff = -1 |
After how many (valid) lines of the sourcefile to stop reading.
Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().
DataSourceShufflingType sgpp::datadriven::DataSourceConfig::shuffling = DataSourceShufflingType::sequential |
The type of shuffling to be applied to the data.
Referenced by sgpp::datadriven::DataShufflingFunctorFactory::buildDataShufflingFunctor(), and sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().
double sgpp::datadriven::DataSourceConfig::validationPortion = 0.3 |