SG++
Classification Example MultipleClassRefinement
Helper to create learner

sgpp::datadriven::LearnerSGDE createSGDELearner(size_t dim, size_t level,
double lambda);

Helper to evaluate the classifiers

std::vector<std::string> doClassification(std::vector<sgpp::base::Grid*> grids,
std::vector<sgpp::base::DataVector*> alphas,
size_t classes);

This example shows how the multiple class classification refinement strategy is used. To do classification, for each class a PDF is approximated with LearnerSGDE and the class with the highest probability gets assigned for new data points to be classified. This example is merely a tech-example.

int main() {

All parameters are set in the beginning. Allows to have an overview over set parameter.

// Parameter of data set
std::string filepath = "../tests/data/";
std::string filename = "multipleClassesTest.arff";
// classes in ARFF are in [0,(classes-1)]
size_t classes = 4;
// Parameter for initial grid generation
size_t dim = 2;
size_t level = 4;
double lambda = 1e-2;
// Parameter for refinement
size_t numSteps = 5;
size_t numRefinements = 3;
size_t partCombined = 0;
double thresh = 0;
// Only calculation after here, no additional parameters set
sgpp::base::DataMatrix dataTrain = dataset.getData();
sgpp::base::DataVector targetTrain = dataset.getTargets();
std::cout << "Read training data: " << dataTrain.getNrows() << std::endl;

Empty DataMartix are created to be filled with the data points from the data set Using a vector, to be flexible for the amount of classes

std::vector<sgpp::base::DataMatrix> dataCl;
std::vector<sgpp::datadriven::LearnerSGDE> learner;
for ( size_t i = 0 ; i < classes ; i++ ) {
dataCl.push_back(sgpp::base::DataMatrix(0.0, dataTrain.getNcols()));
}

If classes are set to [0,classes-1] points are seperated into given classes. Independent of the amount of classes needed

Seperates the points into the different DataMatrix dependent on class

sgpp::base::DataVector row(dataTrain.getNcols());
for ( size_t i = 0 ; i < dataTrain.getNrows() ; i++ ) {
dataTrain.getRow(i, row);
dataCl.at((size_t)targetTrain.get(i)).appendRow(row);
}

Approximate a probability density function for the class data using LearnerSGDE, one for each class. Initialize the learners with the data

for ( size_t i = 0 ; i < classes ; i++ ) {
std::cout << "Data points of class " << std::setw(3) << std::right << i << ": ";
std::cout << std::setw(14) << std::right << dataCl.at(i).getNrows() << " | ";
learner.push_back(createSGDELearner(dim, level, lambda));
learner.back().initialize(dataCl.at(i));
}

Bundle grids and surplus vector pointer needed for refinement and evaluation

std::vector<sgpp::base::Grid*> grids;
std::vector<sgpp::base::DataVector*> alphas;
for ( size_t i = 0 ; i < classes ; i++ ) {
grids.push_back(learner.at(i).getGrid().get());
alphas.push_back(learner.at(i).getSurpluses().get());
}

Helper function it does the classification, gets the predictions and generates some error-output

std::vector<std::string> doClassification(std::vector<sgpp::base::Grid*> grids,
std::vector<sgpp::base::DataVector*> alphas,
size_t classes) {
double best_eval = -1000.0;
double eval = 0.0;
sgpp::base::DataVector indices(testData.getNrows());
sgpp::base::DataVector evals(testData.getNrows());
std::vector<std::unique_ptr<sgpp::base::OperationEval>> evalOps;
for (size_t i = 0; i < grids.size(); i++) {
std::unique_ptr<sgpp::base::OperationEval>
evalOps.push_back(std::move(e));
}