SG++-Doxygen-Documentation
sgpp::datadriven::CSVTools Class Reference

Class that provides functionality to read CSV files. More...

#include <CSVTools.hpp>

Static Public Member Functions

static Dataset readCSV (std::istream &stream, bool skipFirstLine=false, bool hasTargets=true, size_t instanceCutoff=-1, std::vector< size_t > selectedCols=std::vector< size_t >(), std::vector< double > selectedTargets=std::vector< double >())
 Sequentially reads a CSV file. More...
 
static Dataset readCSVFromFile (const std::string &filename, bool skipFirstLine=false, bool hasTargets=true, size_t instanceCutoff=-1, std::vector< size_t > selectedCols=std::vector< size_t >(), std::vector< double > selectedTargets=std::vector< double >())
 Wrapper from input type: File. More...
 
static void readCSVSize (std::istream &stream, size_t &numberInstances, size_t &dimension, bool skipFirstLine=false, bool hasTargets=true, std::vector< double > selectedTargets=std::vector< double >())
 Reads the size of a CSV file. More...
 
static void readCSVSizeFromFile (const std::string &filename, size_t &numberInstances, size_t &dimension, bool skipFirstLine=false, bool hasTargets=true, std::vector< double > selectedTargets=std::vector< double >())
 Wrapper from input type: File. More...
 

Detailed Description

Class that provides functionality to read CSV files.

Member Function Documentation

◆ readCSV()

Dataset sgpp::datadriven::CSVTools::readCSV ( std::istream &  stream,
bool  skipFirstLine = false,
bool  hasTargets = true,
size_t  instanceCutoff = -1,
std::vector< size_t >  selectedCols = std::vector<size_t>(),
std::vector< double >  selectedTargets = std::vector<double>() 
)
static

Sequentially reads a CSV file.

Parameters
streamconstains the raw data. Note: After this function exists, stream will be at eof. For further use it should be cleared and reset
skipFirstLinewhether to skip the first line while parsing This accomodates for a comment line with the data layout
hasTargetswhether the csv has columns for targets (supervised learning)
instanceCutoffmaximal number of instances to include in the returned Dataset. May not be reached if there are less than instanceCutoff (valid w.r.t. selectedTargets) rows in csv file. If the value is -1 i.e. the maximal value of the (unsigned) size_t, all valid rows are included
selectedColswhich columns are written to the DataMatrix as dimensions. Order matters, i.e. selectedCols = [0, 3, 2] will result in a DataMatrix with dim0 = row0, dim1 = row3, dim2 = row2. If hasTargets=true, the last row must not be specified here as it is written to the target vector, not the DataMatrix. If empty (default) all rows (except possible the target row) will be used in ascending order.
selectedTargetsfilter for targets. Only applicable if hasTargets=true. Only rows with target-entry (last column) equal to one of the entries in selectedTarget are written to the dataset. All other rows are skipped. Float-comparison uses 0.001 precision. This parameter is intended to use for classification with integer values as classes. If empty (default) all targets are admissible and all rows are written as rows (instances) in the dataset.
Returns
CSV as Dataset

References dataset, sgpp::datadriven::Dataset::getData(), sgpp::datadriven::Dataset::getTargets(), python.statsfileInfo::i, python.utils.data_projections::line, readCSVSize(), sgpp::base::DataVector::set(), and sgpp::base::DataMatrix::set().

Referenced by readCSVFromFile().

◆ readCSVFromFile()

Dataset sgpp::datadriven::CSVTools::readCSVFromFile ( const std::string &  filename,
bool  skipFirstLine = false,
bool  hasTargets = true,
size_t  instanceCutoff = -1,
std::vector< size_t >  selectedCols = std::vector<size_t>(),
std::vector< double >  selectedTargets = std::vector<double>() 
)
static

Wrapper from input type: File.

See readCSV for more details

References python.utils.converter::filename, and readCSV().

Referenced by sgpp::datadriven::CSVFileSampleProvider::readFile().

◆ readCSVSize()

void sgpp::datadriven::CSVTools::readCSVSize ( std::istream &  stream,
size_t &  numberInstances,
size_t &  dimension,
bool  skipFirstLine = false,
bool  hasTargets = true,
std::vector< double >  selectedTargets = std::vector<double>() 
)
static

Reads the size of a CSV file.

Parameters
streamcontains the raw data. Note: After this function exists, stream will be at eof. For further use it should be cleared and reset
[out]numberInstancesnumber of instances in the dataset
[out]dimensionnumber of columns (dimensions) in the dataset
skipFirstLineset to true if the first line of the CSV file is not a data line
hasTargetswhether the csv has a columns for targets (supervised learning). If true, dimension = number of columns - 1
selectedTargets(see readCSVPartial). If this vector is not empty (default) numberIstances reflects only the number of instance which are admissible with respect to selectedTargets. If empty all targets are admissible and all rows are considered as instances.

References python.leja::count, python.statsfileInfo::i, python.utils.data_projections::line, and sgpp::datadriven::StringTokenizer::tokenize().

Referenced by readCSV(), and readCSVSizeFromFile().

◆ readCSVSizeFromFile()

void sgpp::datadriven::CSVTools::readCSVSizeFromFile ( const std::string &  filename,
size_t &  numberInstances,
size_t &  dimension,
bool  skipFirstLine = false,
bool  hasTargets = true,
std::vector< double >  selectedTargets = std::vector<double>() 
)
static

Wrapper from input type: File.

See readCSVSize for more details

References python.utils.converter::filename, and readCSVSize().


The documentation for this class was generated from the following files: