Gaussian process regressor returns almost identical std for all datapoints

Question

Gaussian process regressor returns almost identical std for all datapoints

Ash

2020年2月29日 15:07

I am using a Gaussian process regressor as the regressor for active learning and I use its standard deviation to choose the next training inctance (the one with the highest std is chosen). However, the std values returned by the regressor are almost identical as shown below, that doesn't seem right, especially given that the algorithm's performance doesnt improve after having been taught with 20 new instances that it has queried. I use this data-set. the way I go about that is I divide the data into training, pool and test sets and I use the pool dataset to draw the next most informative instance. the pool data-set contains about 40,000 data instances which makes it more strange why they all returned almost identical std's. following is the function that is used to determine which instance to query:

def GP_regression_std(regressor, X_pool):
    _, std = regressor.predict(X_pool, return_std=True)
    query_idx = np.argmax(std)
    print(max(std))
    print(min(std))
    print(query_idx)
    return query_idx, X_pool[query_idx]

and the following is the code that calls the above function

# In this section a Gausian process regressor is used with the ActiveLearner
kernel = RBF(length_scale=1.0, length_scale_bounds=(1e-2, 1e3)) \
         + WhiteKernel(noise_level=1, noise_level_bounds=(1e-10, 1e+1))

regressor = ActiveLearner(
    estimator=GaussianProcessRegressor(kernel=kernel),
    query_strategy=GP_regression_std,
    X_training=X_train, y_training=y_train
)

y_pred = regressor.predict(X_test)
print(mean_squared_error(y_test, y_pred))

n_queries = 10
for idx in range(n_queries):
    query_idx, query_instance = regressor.query(X_pool)
    query_instance = query_instance.reshape(1,-1)
    query_label = y_pool[query_idx].reshape(1,-1)
    regressor.teach(query_instance, query_label)
    X_pool = np.delete(X_pool, query_idx, 0)

y_pred_final = regressor.predict(X_test)
print(mean_squared_error(y_test, y_pred_final))

and the following is the output:

39.80552437273547
3.1885401799262976
3.188521246966187
2807
3.1881109991791265
3.1880928244904516
21227
3.1876946235906867
3.1876781696380587
16901
3.18729221237594
3.187276617863699
15687
3.1869024420386807
3.186887562308426
17904
3.186524013419068
3.186510422046729
2204
3.186157844820368
3.1861446681566896
22230
3.1858019593432925
3.185789801369966
17653
3.1854569923377287
3.1854453279443624
37200
3.185121554225284
3.185110784757463
27299
39.89055846241666

Topic gaussian-process active-learning scikit-learn

Category Data Science

Gaussian process regressor returns almost identical std for all datapoints

About