SG++-Doxygen-Documentation
Gaussian Weight Priors

This example compares two different Gaussian priors for sparse grid regression.

The first one is the standard prior that assumes a constant, identical variance for each weight. The second, improved prior uses the additional assumption that the true distribution function is sufficiently smooth and then imposes a multivariate Gaussian with the covariance matrix $$\mathbf{\Gamma}_{i,i} = 0.25^{\vert \mathbf{l} \vert_1 - d}$$, where $$d$$ corresponds to the dimension of the grid and $$\mathbf{\vert \mathbf{l} \vert_1}$$ to the level sum of the ith grid point.

1 import requests as r
2 import numpy as np
3 import pandas as pd
4 import sklearn.preprocessing as pre
5 from sklearn.cross_validation import KFold
6 from scipy import stats
7 from zipfile import ZipFile
8 import io
9 import pysgpp as sg; sg.omp_set_num_threads(4)

This function scales all predictors so that they are suitable for sparse grids.

1 def scale(df, scaler=None):
2  Y = df.ix[:,-1] # save Y (don't need to transform it/useless for cat. data!)
3  X = df.values
4  if scaler:
5  X = scaler.transform(X)
6  else:
7  scaler = pre.MinMaxScaler()
8  X = scaler.fit_transform(X)
9  index = df.index
10  columns = df.columns
11  df = pd.DataFrame(data=X, index=index, columns=columns)
12  df.ix[:,-1] = Y
13  return scaler, df

This function performs a Box-Cox transformation, which (hopefully) results in data that is better distributed.

1 def transform_cox(df, lambdas):
2  scaler = pre.MinMaxScaler()
3  for variable in lambdas:
4  lamb = lambdas[variable]
5  if lamb == 1:
6  continue # identity transform
7  data_trans = stats.boxcox(df[variable] + 10e-1)
8  df[variable] = scaler.fit_transform(np.array(data_trans[0]).reshape(-1, 1))
9  return df

This function downloads the PowerPlant dataset and performs the necessary preprocessing steps.

1 def get_dataset():

Download and unzip the dataset.

1  data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00294/CCPP.zip"
2  print("Loading power plant dataset from the UCI repository.")
3  resp = r.get(data_url, stream=True)
4  data = ZipFile(io.BytesIO(resp.content))
5  with data.open('CCPP/Folds5x2_pp.xlsx') as xls:
6  df = pd.read_excel(xls)
7  print("Preprocessing dataset.")

Then scale and transform it.

1  _, df = scale(df)
2  lambdas = {'AP': 0,
3  'AT': 1.3353837296219406,
4  u'PE': 1,
5  'RH': 2.4604945158589104,
6  'V': -0.43989911518156471}
7  df = transform_cox(df, lambdas)
8  return df

This function returns a sparse grid regressing learner.

1 def make_estimator(lambda_reg, prior):
2  grid = sg.RegularGridConfiguration()
3  grid.dim_ = 4
4  grid.level_ = 5
5  grid.type_ = sg.GridType_ModLinear
6
7  adapt = sg.AdaptivityConfiguration()
8  adapt.numRefinements_ = 5
9  adapt.noPoints_ = 3
10
11  solv = sg.SLESolverConfiguration()
12  solv.maxIterations_ = 50
13  solv.eps_ = 10e-6
14  solv.threshold_ = 10e-6
15  solv.type_ = sg.SLESolverType_CG
16
17  final_solv = solv
18  final_solv.maxIterations = 200
19
20  regular = sg.RegularizationConfiguration()
21  regular.type_ = sg.RegularizationType_Diagonal
22  regular.exponentBase_ = prior
23  regular.lambda_ = lambda_reg
24
25  estimator = sg.RegressionLearner(grid, adapt, solv, final_solv,regular)
26  return estimator

This function returns the testing error for one estimator.

1 def evaluate_one(estimator, X, y, train, test):
2  train_X = sg.DataMatrix(X[train])
3  train_y = sg.DataVector(y[train])
4  test_X = sg.DataMatrix(X[test])
5  test_y = sg.DataVector(y[test])
6  estimator.train(train_X,train_y)
7  error = estimator.getMSE(test_X, test_y)
8  return error

This function returns the cv-error for one estimator.

1 def evaluate(estimator, cv, X, y):
2  errors = []
3  for (train, test) in cv:
4  error = evaluate_one(estimator, X, y, train, test)
5  errors.append(error)
6  return np.mean(errors)

This function performs a grid search for the best regularization parameters.

1 def grid_search(X, y, prior):
2  cv = KFold(X.shape[0], 10)
3  lambda_grid = np.logspace(-9, -4, num=4)
4  best_result = np.finfo(np.double).max
5  for l in lambda_grid:
6  estimator = make_estimator(l, 1.0)
7  mse = evaluate(estimator, cv, X, y)
8  print("Prior={} with lambda={:2.4e} achieved a RMSE of {:2.4e}.".format(prior, l, np.sqrt(mse)))
9  best_result = min(best_result, mse)
10  return np.sqrt(best_result)
11
12 def main():
13  df = get_dataset()
14  X = np.array(df.ix[:,0:-1])
15  y = (df.ix[:,-1]).values

A parameter of 1 corresponds to the standard ridge prior, one of 0.25 to the improved prior.

1  best_identity = grid_search(X, y, 1.0)
2  best_improved = grid_search(X, y, 0.25)
3  print("\nThe identity matrix achieved an RMSE of {:3.5f}, the diagonal of {:3.5f}.".format(best_identity, best_improved))
4
5 if __name__ == '__main__':
6  main()