SG++-Doxygen-Documentation
sgpp::datadriven::DataSourceConfig Struct Reference

Configuration structure used for all kinds of SampleProviders including default values. More...

#include <DataSourceConfig.hpp>

Public Attributes

size_t batchSize = 0
 
datadriven::DataTransformationConfig dataTransformationConfig
 
size_t epochs = 1
 The number of epochs to train on. More...
 
std::string filePath = ""
 Valid path to a file on disk. More...
 
DataSourceFileType fileType = DataSourceFileType::NONE
 Which type of input file are we dealing with? NONE for auto detection or generated artificial datasets. More...
 
bool hasTargets = true
 whether the file has targets (i.e. More...
 
bool isCompressed = false
 The dataset is gzip compressed. More...
 
size_t numBatches = 1
 How many batches should the dataset be split into for batch learning - if 1, take the entire dataset. More...
 
int64_t randomSeed = -1
 Seed for the shuffling prng. More...
 
std::vector< double > readinClasses = std::vector<double>()
 Specifies the set of classes (targets) to be read-in from the data file Any line with a class not contained in this vector is skipped If hasTargets=false this is ignored If empty then all classes/targets are considered (default) More...
 
std::vector< size_t > readinColumns = std::vector<size_t>()
 Specifies the set of columns (dimensions) to be read-in from the data file Starts at 0, order matters; Any column not contained in this vector is ignored as a dimension If empty, then all columns are read in (default) More...
 
size_t readinCutoff = -1
 After how many (valid) lines of the sourcefile to stop reading. More...
 
DataSourceShufflingType shuffling = DataSourceShufflingType::sequential
 The type of shuffling to be applied to the data. More...
 
double validationPortion = 0.3
 

Detailed Description

Configuration structure used for all kinds of SampleProviders including default values.

Member Data Documentation

size_t sgpp::datadriven::DataSourceConfig::epochs = 1

The number of epochs to train on.

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

std::string sgpp::datadriven::DataSourceConfig::filePath = ""
bool sgpp::datadriven::DataSourceConfig::hasTargets = true

whether the file has targets (i.e.

supervised learning)

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

size_t sgpp::datadriven::DataSourceConfig::numBatches = 1

How many batches should the dataset be split into for batch learning - if 1, take the entire dataset.

Referenced by sgpp::datadriven::DataSource::end(), sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig(), sgpp::datadriven::DataSource::getNextSamples(), and sgpp::datadriven::DataSourceBuilder::inBatches().

int64_t sgpp::datadriven::DataSourceConfig::randomSeed = -1
std::vector<double> sgpp::datadriven::DataSourceConfig::readinClasses = std::vector<double>()

Specifies the set of classes (targets) to be read-in from the data file Any line with a class not contained in this vector is skipped If hasTargets=false this is ignored If empty then all classes/targets are considered (default)

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

std::vector<size_t> sgpp::datadriven::DataSourceConfig::readinColumns = std::vector<size_t>()

Specifies the set of columns (dimensions) to be read-in from the data file Starts at 0, order matters; Any column not contained in this vector is ignored as a dimension If empty, then all columns are read in (default)

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

size_t sgpp::datadriven::DataSourceConfig::readinCutoff = -1

After how many (valid) lines of the sourcefile to stop reading.

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

double sgpp::datadriven::DataSourceConfig::validationPortion = 0.3

The documentation for this struct was generated from the following file: