SG++-Doxygen-Documentation
|
Class that provides functionality to read CSV files. More...
#include <CSVTools.hpp>
Static Public Member Functions | |
static Dataset | readCSV (std::istream &stream, bool skipFirstLine=false, bool hasTargets=true, size_t instanceCutoff=-1, std::vector< size_t > selectedCols=std::vector< size_t >(), std::vector< double > selectedTargets=std::vector< double >()) |
Sequentially reads a CSV file. More... | |
static Dataset | readCSVFromFile (const std::string &filename, bool skipFirstLine=false, bool hasTargets=true, size_t instanceCutoff=-1, std::vector< size_t > selectedCols=std::vector< size_t >(), std::vector< double > selectedTargets=std::vector< double >()) |
Wrapper from input type: File. More... | |
static void | readCSVSize (std::istream &stream, size_t &numberInstances, size_t &dimension, bool skipFirstLine=false, bool hasTargets=true, std::vector< double > selectedTargets=std::vector< double >()) |
Reads the size of a CSV file. More... | |
static void | readCSVSizeFromFile (const std::string &filename, size_t &numberInstances, size_t &dimension, bool skipFirstLine=false, bool hasTargets=true, std::vector< double > selectedTargets=std::vector< double >()) |
Wrapper from input type: File. More... | |
Class that provides functionality to read CSV files.
|
static |
Sequentially reads a CSV file.
stream | constains the raw data. Note: After this function exists, stream will be at eof. For further use it should be cleared and reset |
skipFirstLine | whether to skip the first line while parsing This accomodates for a comment line with the data layout |
hasTargets | whether the csv has columns for targets (supervised learning) |
instanceCutoff | maximal number of instances to include in the returned Dataset. May not be reached if there are less than instanceCutoff (valid w.r.t. selectedTargets) rows in csv file. If the value is -1 i.e. the maximal value of the (unsigned) size_t, all valid rows are included |
selectedCols | which columns are written to the DataMatrix as dimensions. Order matters, i.e. selectedCols = [0, 3, 2] will result in a DataMatrix with dim0 = row0, dim1 = row3, dim2 = row2. If hasTargets=true, the last row must not be specified here as it is written to the target vector, not the DataMatrix. If empty (default) all rows (except possible the target row) will be used in ascending order. |
selectedTargets | filter for targets. Only applicable if hasTargets=true. Only rows with target-entry (last column) equal to one of the entries in selectedTarget are written to the dataset. All other rows are skipped. Float-comparison uses 0.001 precision. This parameter is intended to use for classification with integer values as classes. If empty (default) all targets are admissible and all rows are written as rows (instances) in the dataset. |
References dataset, sgpp::datadriven::Dataset::getData(), sgpp::datadriven::Dataset::getTargets(), python.statsfileInfo::i, python.utils.data_projections::line, readCSVSize(), sgpp::base::DataVector::set(), and sgpp::base::DataMatrix::set().
Referenced by readCSVFromFile().
|
static |
Wrapper from input type: File.
See readCSV for more details
References python.utils.converter::filename, and readCSV().
Referenced by sgpp::datadriven::CSVFileSampleProvider::readFile().
|
static |
Reads the size of a CSV file.
stream | contains the raw data. Note: After this function exists, stream will be at eof. For further use it should be cleared and reset | |
[out] | numberInstances | number of instances in the dataset |
[out] | dimension | number of columns (dimensions) in the dataset |
skipFirstLine | set to true if the first line of the CSV file is not a data line | |
hasTargets | whether the csv has a columns for targets (supervised learning). If true, dimension = number of columns - 1 | |
selectedTargets | (see readCSVPartial). If this vector is not empty (default) numberIstances reflects only the number of instance which are admissible with respect to selectedTargets. If empty all targets are admissible and all rows are considered as instances. |
References python.leja::count, python.statsfileInfo::i, python.utils.data_projections::line, and sgpp::datadriven::StringTokenizer::tokenize().
Referenced by readCSV(), and readCSVSizeFromFile().
|
static |
Wrapper from input type: File.
See readCSVSize for more details
References python.utils.converter::filename, and readCSVSize().