SG++-Doxygen-Documentation
sgpp::datadriven::ARFFTools Class Reference

Class that provides functionality to read ARFF files. More...

#include <ARFFTools.hpp>

Static Public Member Functions

static Dataset readARFF (std::istream &stream, bool hasTargets=true, size_t instanceCutoff=-1, std::vector< size_t > selectedCols=std::vector< size_t >(), std::vector< double > selectedTargets=std::vector< double >())
 Sequentially reads a ARFF file. More...
 
static Dataset readARFFFromFile (const std::string &filename, bool hasTargets=true, size_t instanceCutoff=-1, std::vector< size_t > selectedCols=std::vector< size_t >(), std::vector< double > selectedTargets=std::vector< double >())
 Wrapper from input type: File. More...
 
static Dataset readARFFFromString (const std::string &content, bool hasTargets=true, size_t instanceCutoff=-1, std::vector< size_t > selectedCols=std::vector< size_t >(), std::vector< double > selectedTargets=std::vector< double >())
 Wrapper from input type: String. More...
 
static void readARFFSize (std::istream &stream, size_t &numberInstances, size_t &dimension, bool hasTargets, std::vector< double > selectedTargets)
 Reads the size of a ARFF file. More...
 
static void readARFFSizeFromFile (const std::string &filename, size_t &numberInstances, size_t &dimension, bool hasTargets=true, std::vector< double > selectedTargets=std::vector< double >())
 Wrapper from input type: File. More...
 
static void readARFFSizeFromString (const std::string &content, size_t &numberInstances, size_t &dimension, bool hasTargets=true, std::vector< double > selectedTargets=std::vector< double >())
 Wrapper from input type: String. More...
 

Detailed Description

Class that provides functionality to read ARFF files.

Member Function Documentation

◆ readARFF()

Dataset sgpp::datadriven::ARFFTools::readARFF ( std::istream &  stream,
bool  hasTargets = true,
size_t  instanceCutoff = -1,
std::vector< size_t >  selectedCols = std::vector<size_t>(),
std::vector< double >  selectedTargets = std::vector<double>() 
)
static

Sequentially reads a ARFF file.

Parameters
streamcontains the raw data. Note: After this function exists, stream will be at eof. For further use it should be cleared and reset
hasTargetswhether the csv has columns for targets (supervised learning)
instanceCutoffmaximal number of instances to include in the returned Dataset. May not be reached if there are less than instanceCutoff (valid w.r.t. selectedTargets) rows in csv file. If the value is -1 i.e. the maximal value of the (unsigned) size_t, all valid rows are included
selectedColswhich columns are written to the DataMatrix as dimensions. Order matters, i.e. selectedCols = [0, 3, 2] will result in a DataMatrix with dim0 = row0, dim1 = row3, dim2 = row2. If hasTargets=true, the last row must not be specified here as it is written to the target vector, not the DataMatrix. If empty (default) all rows (except possible the target row) will be used in ascending order.
selectedTargetsfilter for targets. Only applicable if hasTargets=true. Only rows with target-entry (last column) equal to one of the entries in selectedTarget are written to the dataset. All other rows are skipped. Float-comparison uses 0.001 precision. This parameter is intended to use for classification with integer values as classes. If empty (default) all targets are admissible and all rows are written as rows (instances) in the dataset.
Returns
ARFF as Dataset

References dataset, sgpp::datadriven::Dataset::getData(), sgpp::datadriven::Dataset::getTargets(), python.statsfileInfo::i, python.utils.data_projections::line, readARFFSize(), sgpp::base::DataVector::set(), sgpp::base::DataMatrix::set(), and sgpp::datadriven::StringTokenizer::tokenize().

Referenced by readARFFFromFile(), and readARFFFromString().

◆ readARFFFromFile()

Dataset sgpp::datadriven::ARFFTools::readARFFFromFile ( const std::string &  filename,
bool  hasTargets = true,
size_t  instanceCutoff = -1,
std::vector< size_t >  selectedCols = std::vector<size_t>(),
std::vector< double >  selectedTargets = std::vector<double>() 
)
static

Wrapper from input type: File.

See readARFF for more details

References python.utils.converter::filename, and readARFF().

Referenced by hpx_main(), main(), and sgpp::datadriven::ArffFileSampleProvider::readFile().

◆ readARFFFromString()

Dataset sgpp::datadriven::ARFFTools::readARFFFromString ( const std::string &  content,
bool  hasTargets = true,
size_t  instanceCutoff = -1,
std::vector< size_t >  selectedCols = std::vector<size_t>(),
std::vector< double >  selectedTargets = std::vector<double>() 
)
static

◆ readARFFSize()

void sgpp::datadriven::ARFFTools::readARFFSize ( std::istream &  stream,
size_t &  numberInstances,
size_t &  dimension,
bool  hasTargets,
std::vector< double >  selectedTargets 
)
static

Reads the size of a ARFF file.

Parameters
streamcontains the raw data. Note: After this function exists, stream will be at eof. For further use it should be cleared and reset
[out]numberInstancesnumber of instances in the dataset
[out]dimensionnumber of columns (dimensions) in the dataset
hasTargetswhether the csv has a columns for targets (supervised learning). If true, dimension = number of columns - 1
selectedTargets(see readCSVPartial). If this vector is not empty (default) numberIstances reflects only the number of instance which are admissible with respect to selectedTargets. If empty all targets are admissible and all rows are considered as instances.

References python.leja::count, python.statsfileInfo::i, and python.utils.data_projections::line.

Referenced by readARFF(), readARFFSizeFromFile(), and readARFFSizeFromString().

◆ readARFFSizeFromFile()

void sgpp::datadriven::ARFFTools::readARFFSizeFromFile ( const std::string &  filename,
size_t &  numberInstances,
size_t &  dimension,
bool  hasTargets = true,
std::vector< double >  selectedTargets = std::vector<double>() 
)
static

Wrapper from input type: File.

See readARFFSize for more details

References python.utils.converter::filename, and readARFFSize().

◆ readARFFSizeFromString()

void sgpp::datadriven::ARFFTools::readARFFSizeFromString ( const std::string &  content,
size_t &  numberInstances,
size_t &  dimension,
bool  hasTargets = true,
std::vector< double >  selectedTargets = std::vector<double>() 
)
static

Wrapper from input type: String.

See readARFFSize for more details

References readARFFSize().


The documentation for this class was generated from the following files: