SG++-Doxygen-Documentation
Interaction Terms Aware Sparse Grids

This example compares standard sparse grids with sparse grids that only contain a subset of all possible interaction terms.

It uses the optical digits dataset as an example.

1 import numpy as np
2 import pysgpp as sg; sg.omp_set_num_threads(4)
3 import pandas as pd
4 import sklearn.preprocessing as pre

This function scales all predictors so that they are suitable for sparse grids.

1 def scale(df, scaler=None):
2  Y = df.ix[:,-1] # save Y (don't need to transform it/useless for cat. data!)
3  X = df.values
4  if scaler:
5  X = scaler.transform(X)
6  else:
7  scaler = pre.MinMaxScaler()
8  X = scaler.fit_transform(X)
9  index = df.index
10  columns = df.columns
11  df = pd.DataFrame(data=X, index=index, columns=columns)
12  df.ix[:,-1] = Y
13  return scaler, df

This function downloads the optical digits dataset and performs the necessary preprocessing steps.

1 def get_dataset():
2  train_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/optdigits.tra"
3  test_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/optdigits.tes'
4  print("Loading dataset from UCI repository.")
5  columns = ["x{}".format(i) for i in range(0, 64)] + ['digit']
6  df_train = pd.read_csv(train_url, header=None, index_col=None)
7  df_test = pd.read_csv(train_url, header=None, index_col=None)
8  df_train.columns=columns
9  df_test.columns=columns
10  print("Preprocessing dataset.")
11  df_complete = df_train.append(df_test, ignore_index=True)
12  scaler , _ = scale(df_complete)
13  _, df_train = scale(df_train, scaler)
14  _, df_test = scale(df_test, scaler)
15  return df_train, df_test

This function evaluates a sparse grid learner for a different set of interaction terms. To do this, it first trains a classification learner with the training set and then evaluates it using the testing part of the dataset.

1 def evaluate(X_tr, y_tr, X_te, y_te, interactions=None):
2  grid = sg.RegularGridConfiguration()
3  grid.dim_ = 64
4  grid.level_ = 2
5  grid.type_ = sg.GridType_ModLinear
6 
7  adapt = sg.AdaptivityConfiguration()
8  adapt.numRefinements_ = 0
9  adapt.noPoints_ = 0
10 
11  solv = sg.SLESolverConfiguration()
12  solv.maxIterations_ = 50
13  solv.eps_ = 10e-6
14  solv.threshold_ = 10e-6
15  solv.type_ = sg.SLESolverType_CG
16 
17  final_solv = solv
18  final_solv.maxIterations = 200
19 
20  regular = sg.RegularizationConfiguration()
21  regular.type_ = sg.RegularizationType_Identity
22  regular.exponentBase_ = 1.0
23  regular.lambda_ = 0.1
24 
25  X_tr = sg.DataMatrix(X_tr)
26  y_tr = sg.DataVector(y_tr)
27  X_te = sg.DataMatrix(X_te)
28  y_te = sg.DataVector(y_te)
29 
30  if interactions is None:
31  estimator = sg.ClassificationLearner(grid, adapt, solv, final_solv,regular)
32  else:
33  estimator = sg.ClassificationLearner(grid, adapt, solv, final_solv,regular, interactions)
34  estimator.train(X_tr,y_tr)
35  return estimator.getAccuracy(X_te,y_te)
36 
37 def main():
38  df_tr, df_te = get_dataset()
39  X_tr = np.array(df_tr.ix[:,0:-1])
40  y_tr = (df_tr.ix[:,-1]).values
41  X_te = np.array(df_te.ix[:,0:-1])
42  y_te = (df_te.ix[:,-1]).values

We first create all possible interactions between pixels whose pairwise \(L_2\) distance is smaller than \(\sqrt{2}\).

1  nn = sg.NearestNeighbors(8,8)
2  interactions = nn.getAllInteractions(3, 2**0.5)

We then compare a standard sparse grid with a sparse grid learner that only contains the aforementioned interaction terms.

1  standard_accuracy = evaluate(X_tr, y_tr, X_te, y_te)
2  print("The standard sparse grid achieved an accuracy of {:2.3f}".format(standard_accuracy))
3  ia_accuracy = evaluate(X_tr, y_tr, X_te, y_te, interactions)
4  print("The interaction aware grid achieved an accuracy of {:2.3f}".format(ia_accuracy))
5 
6 if __name__ == '__main__':
7  main()