SG++-Doxygen-Documentation
|
Class that provides functionality to read ARFF files. More...
#include <ARFFTools.hpp>
Static Public Member Functions | |
static Dataset | readARFF (std::istream &stream, bool hasTargets=true, size_t instanceCutoff=-1, std::vector< size_t > selectedCols=std::vector< size_t >(), std::vector< double > selectedTargets=std::vector< double >()) |
Sequentially reads a ARFF file. More... | |
static Dataset | readARFFFromFile (const std::string &filename, bool hasTargets=true, size_t instanceCutoff=-1, std::vector< size_t > selectedCols=std::vector< size_t >(), std::vector< double > selectedTargets=std::vector< double >()) |
Wrapper from input type: File. More... | |
static Dataset | readARFFFromString (const std::string &content, bool hasTargets=true, size_t instanceCutoff=-1, std::vector< size_t > selectedCols=std::vector< size_t >(), std::vector< double > selectedTargets=std::vector< double >()) |
Wrapper from input type: String. More... | |
static void | readARFFSize (std::istream &stream, size_t &numberInstances, size_t &dimension, bool hasTargets, std::vector< double > selectedTargets) |
Reads the size of a ARFF file. More... | |
static void | readARFFSizeFromFile (const std::string &filename, size_t &numberInstances, size_t &dimension, bool hasTargets=true, std::vector< double > selectedTargets=std::vector< double >()) |
Wrapper from input type: File. More... | |
static void | readARFFSizeFromString (const std::string &content, size_t &numberInstances, size_t &dimension, bool hasTargets=true, std::vector< double > selectedTargets=std::vector< double >()) |
Wrapper from input type: String. More... | |
Class that provides functionality to read ARFF files.
|
static |
Sequentially reads a ARFF file.
stream | contains the raw data. Note: After this function exists, stream will be at eof. For further use it should be cleared and reset |
hasTargets | whether the csv has columns for targets (supervised learning) |
instanceCutoff | maximal number of instances to include in the returned Dataset. May not be reached if there are less than instanceCutoff (valid w.r.t. selectedTargets) rows in csv file. If the value is -1 i.e. the maximal value of the (unsigned) size_t, all valid rows are included |
selectedCols | which columns are written to the DataMatrix as dimensions. Order matters, i.e. selectedCols = [0, 3, 2] will result in a DataMatrix with dim0 = row0, dim1 = row3, dim2 = row2. If hasTargets=true, the last row must not be specified here as it is written to the target vector, not the DataMatrix. If empty (default) all rows (except possible the target row) will be used in ascending order. |
selectedTargets | filter for targets. Only applicable if hasTargets=true. Only rows with target-entry (last column) equal to one of the entries in selectedTarget are written to the dataset. All other rows are skipped. Float-comparison uses 0.001 precision. This parameter is intended to use for classification with integer values as classes. If empty (default) all targets are admissible and all rows are written as rows (instances) in the dataset. |
References dataset, sgpp::datadriven::Dataset::getData(), sgpp::datadriven::Dataset::getTargets(), python.statsfileInfo::i, python.utils.data_projections::line, readARFFSize(), sgpp::base::DataVector::set(), sgpp::base::DataMatrix::set(), and sgpp::datadriven::StringTokenizer::tokenize().
Referenced by readARFFFromFile(), and readARFFFromString().
|
static |
Wrapper from input type: File.
See readARFF for more details
References python.utils.converter::filename, and readARFF().
Referenced by hpx_main(), main(), and sgpp::datadriven::ArffFileSampleProvider::readFile().
|
static |
Wrapper from input type: String.
See readARFF for more details
References readARFF().
Referenced by sgpp::datadriven::MetaLearner::learnAndTestString(), sgpp::datadriven::MetaLearner::learnReferenceString(), sgpp::datadriven::MetaLearner::learnString(), and sgpp::datadriven::ArffFileSampleProvider::readString().
|
static |
Reads the size of a ARFF file.
stream | contains the raw data. Note: After this function exists, stream will be at eof. For further use it should be cleared and reset | |
[out] | numberInstances | number of instances in the dataset |
[out] | dimension | number of columns (dimensions) in the dataset |
hasTargets | whether the csv has a columns for targets (supervised learning). If true, dimension = number of columns - 1 | |
selectedTargets | (see readCSVPartial). If this vector is not empty (default) numberIstances reflects only the number of instance which are admissible with respect to selectedTargets. If empty all targets are admissible and all rows are considered as instances. |
References python.leja::count, python.statsfileInfo::i, and python.utils.data_projections::line.
Referenced by readARFF(), readARFFSizeFromFile(), and readARFFSizeFromString().
|
static |
Wrapper from input type: File.
See readARFFSize for more details
References python.utils.converter::filename, and readARFFSize().
|
static |