SG++
sgpp::datadriven::Scorer Class Referenceabstract

Base class for supervised learning used to fit a model and quantify accuracy using a sgpp::datadriven::Metric with either testing or cross validation. More...

#include <Scorer.hpp>

Inheritance diagram for sgpp::datadriven::Scorer:
sgpp::datadriven::CrossValidation sgpp::datadriven::SplittingScorer

Public Member Functions

virtual double calculateScore (ModelFittingBase &model, Dataset &dataset, double *stdDeviation=nullptr)=0
 Train and test a model on a dataset and provide a score to quantify the approximation quality. More...
 
virtual Scorerclone () const =0
 Polymorphic clone pattern. More...
 
Scoreroperator= (const Scorer &rhs)
 Copy assign operator. More...
 
Scoreroperator= (Scorer &&rhs)=default
 Move assign operator. More...
 
 Scorer (Metric *metric, ShufflingFunctor *shuffling, int64_t seed=-1)
 Constructor. More...
 
 Scorer (const Scorer &rhs)
 Copy constructor. More...
 
 Scorer (Scorer &&rhs)=default
 Move constructor. More...
 
virtual ~Scorer ()=default
 virtual destructor. More...
 

Protected Member Functions

void randomizeIndices (const Dataset &data, std::vector< size_t > &randomizedIndices)
 Helper method to generate an ordering for the samples of the dataset based on the shuffling functor. More...
 
double refine (ModelFittingBase &model, Dataset &testDataset)
 Fit the model on the train dataset and evaluate the accuracy on the test set. More...
 
void splitSet (const Dataset &fullDataset, Dataset &trainDataset, Dataset &testDataset, const std::vector< size_t > &randomizedIndices, size_t offset=0)
 Split dataset into testing and training set. More...
 
double test (ModelFittingBase &model, Dataset &testDataset)
 evaluate the accuracy on the test set using the metric. More...
 
double train (ModelFittingBase &model, Dataset &trainDataset, Dataset &testDataset)
 Fit the model on the train dataset and evaluate the accuracy on the test set. More...
 

Protected Attributes

std::unique_ptr< Metricmetric
 sgpp::datadriven::Metric to be used to quantify accuracy of the fit. More...
 
std::unique_ptr< ShufflingFunctorshuffling
 sgpp::datadriven::ShufflingFunctor used to rearrange samples of a dataset in the desired manner, ready to be split into testing and training sets More...
 

Detailed Description

Base class for supervised learning used to fit a model and quantify accuracy using a sgpp::datadriven::Metric with either testing or cross validation.

Splits a dataset into testing and training parts, trains the model and measures the accuracy.

Constructor & Destructor Documentation

sgpp::datadriven::Scorer::Scorer ( Metric metric,
ShufflingFunctor shuffling,
int64_t  seed = -1 
)

Constructor.

Parameters
metricsgpp::datadriven::Metric to to quantify approximation quality of a trained model. Scorer will take ownership of this object.
shufflingsgpp::datadriven::ShufflingFunctor to rearrange samples of a dataset in the desired manner, ready to be split into testing and training sets. Scorer will take ownership of this object.
seedseed for randomization in sgpp::datadriven::ShufflingFunctor. Default is -1 which puts a random seed.

References shuffling.

sgpp::datadriven::Scorer::Scorer ( const Scorer rhs)

Copy constructor.

Parameters
rhsconst reference to the scorer object to copy from.

References metric, and shuffling.

sgpp::datadriven::Scorer::Scorer ( Scorer &&  rhs)
default

Move constructor.

Parameters
rhsR-value reference to a scorer object to moved from.
virtual sgpp::datadriven::Scorer::~Scorer ( )
virtualdefault

virtual destructor.

Member Function Documentation

virtual double sgpp::datadriven::Scorer::calculateScore ( ModelFittingBase model,
Dataset dataset,
double *  stdDeviation = nullptr 
)
pure virtual

Train and test a model on a dataset and provide a score to quantify the approximation quality.

If multiple models are trained, calculate the standard deviation between the different fits.

Parameters
modelA model to be fitted on the training part of the dataset.
datasetSet of samples to use for fitting and testing the model.
stdDeviationIf multiple models are trained (e.g. for cross validation) calculate standard deviation.
Returns
accuracy of the fit as calculated by the metric provided.

Implemented in sgpp::datadriven::CrossValidation, and sgpp::datadriven::SplittingScorer.

virtual Scorer* sgpp::datadriven::Scorer::clone ( ) const
pure virtual

Polymorphic clone pattern.

Returns
deep copy of this object. New object is owned by caller.

Implemented in sgpp::datadriven::CrossValidation, and sgpp::datadriven::SplittingScorer.

Scorer & sgpp::datadriven::Scorer::operator= ( const Scorer rhs)

Copy assign operator.

Parameters
rhsconst reference to the scorer object to copy from.
Returns
rerefernce to this with updated values.

References metric, and shuffling.

Scorer& sgpp::datadriven::Scorer::operator= ( Scorer &&  rhs)
default

Move assign operator.

Parameters
rhsR-value reference to an a scorer object to move from.
Returns
rerefernce to this with updated values.
void sgpp::datadriven::Scorer::randomizeIndices ( const Dataset data,
std::vector< size_t > &  randomizedIndices 
)
protected

Helper method to generate an ordering for the samples of the dataset based on the shuffling functor.

Parameters
dataDataset to be permuted.
randomizedIndicesvector with the same size as the dataset. Will be initialized with contiguous values (0 -> vector.size()) and permuted in place

References shuffling.

Referenced by sgpp::datadriven::CrossValidation::calculateScore(), and sgpp::datadriven::SplittingScorer::calculateScore().

double sgpp::datadriven::Scorer::refine ( ModelFittingBase model,
Dataset testDataset 
)
protected

Fit the model on the train dataset and evaluate the accuracy on the test set.

Includes some verbose output.

Parameters
modelmodel which is refined based on train dataset.
testDatasetdataset used quantify accuracy using metric.
Returns
accuracy of the fit after refinement.

References sgpp::datadriven::ModelFittingBase::getGrid(), sgpp::base::Grid::getSize(), sgpp::datadriven::ModelFittingBase::refine(), and test().

Referenced by sgpp::datadriven::CrossValidation::calculateScore(), and sgpp::datadriven::SplittingScorer::calculateScore().

void sgpp::datadriven::Scorer::splitSet ( const Dataset fullDataset,
Dataset trainDataset,
Dataset testDataset,
const std::vector< size_t > &  randomizedIndices,
size_t  offset = 0 
)
protected

Split dataset into testing and training set.

Parameters
fullDatasetfull dataset containing the samples to be split into testing and training set.
trainDatasetdataset where training samples will be stored. Needs to have the correct size.
testDatasetdataset where testing samples will be stored. Needs to have the correct size.
randomizedIndicesvector of permuted indices, describing in which order samples will be read from the full dataset.
offsetoffset the testing set by the desired amount of samples. Used to generate testing and training portions for cross validation. The samples skipped by the training set because of the offset, will be assigned to the training set.

References sgpp::base::DataVector::get(), sgpp::datadriven::Dataset::getData(), sgpp::datadriven::Dataset::getDimension(), sgpp::datadriven::Dataset::getNumberInstances(), sgpp::base::DataMatrix::getRow(), sgpp::datadriven::Dataset::getTargets(), sgpp::base::DataVector::set(), and sgpp::base::DataMatrix::setRow().

Referenced by sgpp::datadriven::CrossValidation::calculateScore(), and sgpp::datadriven::SplittingScorer::calculateScore().

double sgpp::datadriven::Scorer::test ( ModelFittingBase model,
Dataset testDataset 
)
protected

evaluate the accuracy on the test set using the metric.

Parameters
modelmodel to be fitted based on the train dataset.
testDatasetdataset used quantify accuracy using metric.
Returns
accuracy of the fit.

References sgpp::datadriven::ModelFittingBase::evaluate(), sgpp::datadriven::Dataset::getData(), sgpp::datadriven::Dataset::getNumberInstances(), sgpp::datadriven::Dataset::getTargets(), and metric.

Referenced by refine(), and train().

double sgpp::datadriven::Scorer::train ( ModelFittingBase model,
Dataset trainDataset,
Dataset testDataset 
)
protected

Fit the model on the train dataset and evaluate the accuracy on the test set.

Includes some verbose output.

Parameters
modelmodel to be fitted based on the train dataset
trainDatasetdataset used for fitting the model.
testDatasetdataset used quantify accuracy using metric.
Returns
accuracy of the fit.

References sgpp::datadriven::ModelFittingBase::fit(), and test().

Referenced by sgpp::datadriven::CrossValidation::calculateScore(), and sgpp::datadriven::SplittingScorer::calculateScore().

Member Data Documentation

std::unique_ptr<Metric> sgpp::datadriven::Scorer::metric
protected
std::unique_ptr<ShufflingFunctor> sgpp::datadriven::Scorer::shuffling
protected

sgpp::datadriven::ShufflingFunctor used to rearrange samples of a dataset in the desired manner, ready to be split into testing and training sets

Referenced by operator=(), randomizeIndices(), Scorer(), and sgpp::datadriven::SplittingScorer::SplittingScorer().


The documentation for this class was generated from the following files: