Discussion:
[Scikit-learn-general] design of scorer interface
Aaron Staple
2014-10-27 02:33:27 UTC
Permalink
Greetings sklearn developers,

I’m a new sklearn contributor, and I’ve been working on a small project to
allow customization of the scoring metric used when scoring out of bag data
for random forests (see
https://github.com/scikit-learn/scikit-learn/pull/3723). In this PR,
@mblondel and I have been discussing an architectural issue that we would
like others to weigh in on.

While working on my implementation, I’ve run into a bit of difficulty using
the scorer implementation as it exists today - in particular, with the
interface expressed in _BaseScorer. The current _BaseScorer interface is
callable, accepting an estimator (utilized as a Predictor), along with some
prediction data points X, and returning a score. The various _BaseScorer
implementations compute a score by calling estimator.predict(X),
estimator.predict_proba(X), or estimator.decision_function(X) as needed,
possibly applying some transformations to the results, and then applying a
score function.

The issue I’ve run into is that predicting out of bag samples is a rather
specialized procedure because the model used differs for each training
point, based on how that point was used during fitting. Computing these
predictions is not particularly suited for implementation as a Predictor.
In addition, in the PR we’ve been discussing that idea that a random forest
estimator will make its out of bag predictions available as attributes,
allowing a user of the estimator to subsequently score these provided
predictions. Also, @mblondel mentioned that for his work on multiple-metric
grid search, he is interested in scoring predictions he computes outside of
a Predictor.

The difficulty is that the current scorers take an estimator and data
points, and compute predictions internally. They don’t accept externally
computed predictions.

I’ve written up a series of different generalized options for implementing
a system of scoring externally computed predictions (some are likely
undesirable but are provided as points of comparison):

1) Add a new implementation that’s completely separate from the existing
_BaseScorer class.

2) Use the existing _BaseScorer without changes. This means abusing the
Predictor interface and creating something like a dummy predictor that
ignores X and returns the externally computed predictions - predictions not
inherently based on the X variable, but which were externally computed
based on a known X value.

3) Add a private api to _BaseScorer for scoring externally computed
predictions. The private api can be called by a public helper function in
scorer.py.

4) Change the public api of _BaseScorer to make scoring of externally
computed predictions a public operation along with the existing
functionality. Also possibly rename _BaseScorer => BaseScorer.

5) Change the public api of _BaseScorer so that it only handles externally
computed predictions. The existing functionality would be implemented by
the caller (as a callback, since the required type of prediction data is
not known by the caller).

So far in the PR we’ve been looking at options 2, 3, and 4, with 4 seeming
like a good candidate. Once we decide on one of these options, I’d like to
follow up with stakeholders on the specifics of what the new interface will
look like.

Thanks,
Aaron Staple
Mathieu Blondel
2014-10-27 13:41:01 UTC
Permalink
In addition to out-of-bag scores and multi-metric grid search, there is
also LOO scores in the ridge regression module, as pointed out by Michael.

Option 4 seems like the best option to me.

We keep __call__(self, estimator, X, y) for backward compatibility and
because it is sometimes more convenient. But we also add a new method
get_score(self, y_pred, y_proba, y_decision) for computing scores from
pre-computed predictions. This is for example how we would implement it in
_ProbaScorer:

def get_score(self, y, y_pred=None, y_proba=None, y_decision=None,
sample_weight=None):
if y_proba is None:
raise ValueError("This scorer needs y_proba.")

if sample_weight is not None:
return self._sign * self._score_func(y, y_proba,
sample_weight=sample_weight,
**self._kwargs)
else:
return self._sign * self._score_func(y, y_proba, **self._kwargs)

M.
Post by Aaron Staple
Greetings sklearn developers,
I’m a new sklearn contributor, and I’ve been working on a small project to
allow customization of the scoring metric used when scoring out of bag data
for random forests (see
https://github.com/scikit-learn/scikit-learn/pull/3723). In this PR,
@mblondel and I have been discussing an architectural issue that we would
like others to weigh in on.
While working on my implementation, I’ve run into a bit of difficulty
using the scorer implementation as it exists today - in particular, with
the interface expressed in _BaseScorer. The current _BaseScorer interface
is callable, accepting an estimator (utilized as a Predictor), along with
some prediction data points X, and returning a score. The various
_BaseScorer implementations compute a score by calling
estimator.predict(X), estimator.predict_proba(X), or
estimator.decision_function(X) as needed, possibly applying some
transformations to the results, and then applying a score function.
The issue I’ve run into is that predicting out of bag samples is a rather
specialized procedure because the model used differs for each training
point, based on how that point was used during fitting. Computing these
predictions is not particularly suited for implementation as a Predictor.
In addition, in the PR we’ve been discussing that idea that a random forest
estimator will make its out of bag predictions available as attributes,
allowing a user of the estimator to subsequently score these provided
grid search, he is interested in scoring predictions he computes outside of
a Predictor.
The difficulty is that the current scorers take an estimator and data
points, and compute predictions internally. They don’t accept externally
computed predictions.
I’ve written up a series of different generalized options for implementing
a system of scoring externally computed predictions (some are likely
1) Add a new implementation that’s completely separate from the existing
_BaseScorer class.
2) Use the existing _BaseScorer without changes. This means abusing the
Predictor interface and creating something like a dummy predictor that
ignores X and returns the externally computed predictions - predictions not
inherently based on the X variable, but which were externally computed
based on a known X value.
3) Add a private api to _BaseScorer for scoring externally computed
predictions. The private api can be called by a public helper function in
scorer.py.
4) Change the public api of _BaseScorer to make scoring of externally
computed predictions a public operation along with the existing
functionality. Also possibly rename _BaseScorer => BaseScorer.
5) Change the public api of _BaseScorer so that it only handles externally
computed predictions. The existing functionality would be implemented by
the caller (as a callback, since the required type of prediction data is
not known by the caller).
So far in the PR we’ve been looking at options 2, 3, and 4, with 4 seeming
like a good candidate. Once we decide on one of these options, I’d like to
follow up with stakeholders on the specifics of what the new interface will
look like.
Thanks,
Aaron Staple
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andy
2014-10-28 18:23:03 UTC
Permalink
As for the oob scores, I don't currently see how you would use loo
scores with a scorer.
Is that for generalized cross-validation in RidgeCV?
Post by Mathieu Blondel
In addition to out-of-bag scores and multi-metric grid search, there
is also LOO scores in the ridge regression module, as pointed out by
Michael.
Option 4 seems like the best option to me.
We keep __call__(self, estimator, X, y) for backward compatibility and
because it is sometimes more convenient. But we also add a new method
get_score(self, y_pred, y_proba, y_decision) for computing scores from
pre-computed predictions. This is for example how we would implement
def get_score(self, y, y_pred=None, y_proba=None, y_decision=None,
raise ValueError("This scorer needs y_proba.")
return self._sign * self._score_func(y, y_proba,
sample_weight=sample_weight,
**self._kwargs)
return self._sign * self._score_func(y, y_proba, **self._kwargs)
M.
Greetings sklearn developers,
I’m a new sklearn contributor, and I’ve been working on a small
project to allow customization of the scoring metric used when
scoring out of bag data for random forests (see
https://github.com/scikit-learn/scikit-learn/pull/3723). In this
that we would like others to weigh in on.
While working on my implementation, I’ve run into a bit of
difficulty using the scorer implementation as it exists today - in
particular, with the interface expressed in _BaseScorer. The
current _BaseScorer interface is callable, accepting an estimator
(utilized as a Predictor), along with some prediction data points
X, and returning a score. The various _BaseScorer implementations
compute a score by calling estimator.predict(X),
estimator.predict_proba(X), or estimator.decision_function(X) as
needed, possibly applying some transformations to the results, and
then applying a score function.
The issue I’ve run into is that predicting out of bag samples is a
rather specialized procedure because the model used differs for
each training point, based on how that point was used during
fitting. Computing these predictions is not particularly suited
for implementation as a Predictor. In addition, in the PR we’ve
been discussing that idea that a random forest estimator will make
its out of bag predictions available as attributes, allowing a
user of the estimator to subsequently score these provided
multiple-metric grid search, he is interested in scoring
predictions he computes outside of a Predictor.
The difficulty is that the current scorers take an estimator and
data points, and compute predictions internally. They don’t accept
externally computed predictions.
I’ve written up a series of different generalized options for
implementing a system of scoring externally computed predictions
(some are likely undesirable but are provided as points of
1) Add a new implementation that’s completely separate from the
existing _BaseScorer class.
2) Use the existing _BaseScorer without changes. This means
abusing the Predictor interface and creating something like a
dummy predictor that ignores X and returns the externally computed
predictions - predictions not inherently based on the X variable,
but which were externally computed based on a known X value.
3) Add a private api to _BaseScorer for scoring externally
computed predictions. The private api can be called by a public
helper function in scorer.py.
4) Change the public api of _BaseScorer to make scoring of
externally computed predictions a public operation along with the
existing functionality. Also possibly rename _BaseScorer =>
BaseScorer.
5) Change the public api of _BaseScorer so that it only handles
externally computed predictions. The existing functionality would
be implemented by the caller (as a callback, since the required
type of prediction data is not known by the caller).
So far in the PR we’ve been looking at options 2, 3, and 4, with 4
seeming like a good candidate. Once we decide on one of these
options, I’d like to follow up with stakeholders on the specifics
of what the new interface will look like.
Thanks,
Aaron Staple
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Michael Eickenberg
2014-10-28 18:43:47 UTC
Permalink
It is true that RidgeGCV only does loo predictions and thus would need a
scorer that makes sense on one (ytrue, ypred) couple, such as mse. (The way
it is implemented now for arbitrary scorers is not correct). So point
taken, ridge gcv is an exception.

michael
Post by Andy
As for the oob scores, I don't currently see how you would use loo
scores with a scorer.
Is that for generalized cross-validation in RidgeCV?
In addition to out-of-bag scores and multi-metric grid search, there is
also LOO scores in the ridge regression module, as pointed out by Michael.
Option 4 seems like the best option to me.
We keep __call__(self, estimator, X, y) for backward compatibility and
because it is sometimes more convenient. But we also add a new method
get_score(self, y_pred, y_proba, y_decision) for computing scores from
pre-computed predictions. This is for example how we would implement it in
def get_score(self, y, y_pred=None, y_proba=None, y_decision=None,
raise ValueError("This scorer needs y_proba.")
return self._sign * self._score_func(y, y_proba,
sample_weight=sample_weight,
**self._kwargs)
return self._sign * self._score_func(y, y_proba, **self._kwargs)
M.
Post by Aaron Staple
Greetings sklearn developers,
I’m a new sklearn contributor, and I’ve been working on a small project
to allow customization of the scoring metric used when scoring out of bag
data for random forests (see
https://github.com/scikit-learn/scikit-learn/pull/3723). In this PR,
@mblondel and I have been discussing an architectural issue that we would
like others to weigh in on.
While working on my implementation, I’ve run into a bit of difficulty
using the scorer implementation as it exists today - in particular, with
the interface expressed in _BaseScorer. The current _BaseScorer interface
is callable, accepting an estimator (utilized as a Predictor), along with
some prediction data points X, and returning a score. The various
_BaseScorer implementations compute a score by calling
estimator.predict(X), estimator.predict_proba(X), or
estimator.decision_function(X) as needed, possibly applying some
transformations to the results, and then applying a score function.
The issue I’ve run into is that predicting out of bag samples is a
rather specialized procedure because the model used differs for each
training point, based on how that point was used during fitting. Computing
these predictions is not particularly suited for implementation as a
Predictor. In addition, in the PR we’ve been discussing that idea that a
random forest estimator will make its out of bag predictions available as
attributes, allowing a user of the estimator to subsequently score these
multiple-metric grid search, he is interested in scoring predictions he
computes outside of a Predictor.
The difficulty is that the current scorers take an estimator and data
points, and compute predictions internally. They don’t accept externally
computed predictions.
I’ve written up a series of different generalized options for
implementing a system of scoring externally computed predictions (some are
1) Add a new implementation that’s completely separate from the
existing _BaseScorer class.
2) Use the existing _BaseScorer without changes. This means abusing the
Predictor interface and creating something like a dummy predictor that
ignores X and returns the externally computed predictions - predictions not
inherently based on the X variable, but which were externally computed
based on a known X value.
3) Add a private api to _BaseScorer for scoring externally computed
predictions. The private api can be called by a public helper function in
scorer.py.
4) Change the public api of _BaseScorer to make scoring of externally
computed predictions a public operation along with the existing
functionality. Also possibly rename _BaseScorer => BaseScorer.
5) Change the public api of _BaseScorer so that it only handles
externally computed predictions. The existing functionality would be
implemented by the caller (as a callback, since the required type of
prediction data is not known by the caller).
So far in the PR we’ve been looking at options 2, 3, and 4, with 4
seeming like a good candidate. Once we decide on one of these options, I’d
like to follow up with stakeholders on the specifics of what the new
interface will look like.
Thanks,
Aaron Staple
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Andy
2014-10-28 18:21:04 UTC
Permalink
Hi.
Can you give a bit more details on 3 and 4?
And can you give an example use case?
When do you need scorers and out of bag samples? The scorers are used in
GridSearchCV and cross_val_score, but the out of bag samples basically
replace cross validation,
so I don't quite understand how these would work together.

I think it would be great if you could give a use-case and some (pseudo)
code on how it would look with your favourite solution.

Cheers,
Andy
Post by Aaron Staple
Greetings sklearn developers,
I’m a new sklearn contributor, and I’ve been working on a small
project to allow customization of the scoring metric used when scoring
out of bag data for random forests (see
https://github.com/scikit-learn/scikit-learn/pull/3723). In this PR,
@mblondel and I have been discussing an architectural issue that we
would like others to weigh in on.
While working on my implementation, I’ve run into a bit of difficulty
using the scorer implementation as it exists today - in particular,
with the interface expressed in _BaseScorer. The current _BaseScorer
interface is callable, accepting an estimator (utilized as a
Predictor), along with some prediction data points X, and returning a
score. The various _BaseScorer implementations compute a score by
calling estimator.predict(X), estimator.predict_proba(X), or
estimator.decision_function(X) as needed, possibly applying some
transformations to the results, and then applying a score function.
The issue I’ve run into is that predicting out of bag samples is a
rather specialized procedure because the model used differs for each
training point, based on how that point was used during fitting.
Computing these predictions is not particularly suited for
implementation as a Predictor. In addition, in the PR we’ve been
discussing that idea that a random forest estimator will make its out
of bag predictions available as attributes, allowing a user of the
estimator to subsequently score these provided predictions. Also,
@mblondel mentioned that for his work on multiple-metric grid search,
he is interested in scoring predictions he computes outside of a
Predictor.
The difficulty is that the current scorers take an estimator and data
points, and compute predictions internally. They don’t accept
externally computed predictions.
I’ve written up a series of different generalized options for
implementing a system of scoring externally computed predictions (some
1) Add a new implementation that’s completely separate from the
existing _BaseScorer class.
2) Use the existing _BaseScorer without changes. This means abusing
the Predictor interface and creating something like a dummy predictor
that ignores X and returns the externally computed predictions -
predictions not inherently based on the X variable, but which were
externally computed based on a known X value.
3) Add a private api to _BaseScorer for scoring externally computed
predictions. The private api can be called by a public helper function
in scorer.py.
4) Change the public api of _BaseScorer to make scoring of externally
computed predictions a public operation along with the existing
functionality. Also possibly rename _BaseScorer => BaseScorer.
5) Change the public api of _BaseScorer so that it only handles
externally computed predictions. The existing functionality would be
implemented by the caller (as a callback, since the required type of
prediction data is not known by the caller).
So far in the PR we’ve been looking at options 2, 3, and 4, with 4
seeming like a good candidate. Once we decide on one of these options,
I’d like to follow up with stakeholders on the specifics of what the
new interface will look like.
Thanks,
Aaron Staple
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Mathieu Blondel
2014-10-29 02:10:55 UTC
Permalink
Different metrics require different inputs (results of predict,
decision_function, predict_proba). To avoid branching in the grid search
and cross-validation, we thus introduced the scorer API. A scorer knows
what kind of input it needs and calls predict, decision_function,
predict_proba as needed. We would like to reuse the scorer logic for
out-of-bag scores as well, in order to avoid branching. The problem is that
the scorer API is not suitable if the predictions are already available.
RidgeCV works around this by creating a constant predictor but this is in
my opinion an ugly hack. The get_score method I proposed would avoid
branching, although it would require to compute y_pred, y_decision and
y_proba.

In the classification case, another idea would be to compute out-of-bag
probabilities. Then a score would be obtained by calling a
get_score_from_proba method. This method would be implemented as follows:

class _PredictScorer(_BaseScorer):
def get_score_from_proba(self, y, y_proba, classes):
y_pred = classes[np.argmax(y_proba)
return self._sign * self._score_func(y, y_pred, **self._kwargs)

class _ProbaScorer(_BaseScorer):
def get_score_from_proba(self, y, y_proba, classes):
return self._sign * self._score_func(y, y_proba, **self._kwargs)

The nice thing about predict_proba is that it returns consistently an array
of shape (n_samples, n_classes). decision_function is more problematic
because it doesn't return an array of shape (n_samples, 2) in the binary
case. There was a discussion a long time ago about adding a predict_score
method that would be more consistent in this regard but I don't remember
the discussion outcome.

I don't agree that RidgeCV is an exception. If your labels are binary, it
is perfectly valid to train a regressor on them and want to compute ranking
metrics like AUC or Average Precision. And there is RidgeClassifierCV too.

Mathieu
Post by Andy
Hi.
Can you give a bit more details on 3 and 4?
And can you give an example use case?
When do you need scorers and out of bag samples? The scorers are used in
GridSearchCV and cross_val_score, but the out of bag samples basically
replace cross validation,
so I don't quite understand how these would work together.
I think it would be great if you could give a use-case and some (pseudo)
code on how it would look with your favourite solution.
Cheers,
Andy
Greetings sklearn developers,
I’m a new sklearn contributor, and I’ve been working on a small project
to allow customization of the scoring metric used when scoring out of bag
data for random forests (see
https://github.com/scikit-learn/scikit-learn/pull/3723). In this PR,
@mblondel and I have been discussing an architectural issue that we would
like others to weigh in on.
While working on my implementation, I’ve run into a bit of difficulty
using the scorer implementation as it exists today - in particular, with
the interface expressed in _BaseScorer. The current _BaseScorer interface
is callable, accepting an estimator (utilized as a Predictor), along with
some prediction data points X, and returning a score. The various
_BaseScorer implementations compute a score by calling
estimator.predict(X), estimator.predict_proba(X), or
estimator.decision_function(X) as needed, possibly applying some
transformations to the results, and then applying a score function.
The issue I’ve run into is that predicting out of bag samples is a
rather specialized procedure because the model used differs for each
training point, based on how that point was used during fitting. Computing
these predictions is not particularly suited for implementation as a
Predictor. In addition, in the PR we’ve been discussing that idea that a
random forest estimator will make its out of bag predictions available as
attributes, allowing a user of the estimator to subsequently score these
multiple-metric grid search, he is interested in scoring predictions he
computes outside of a Predictor.
The difficulty is that the current scorers take an estimator and data
points, and compute predictions internally. They don’t accept externally
computed predictions.
I’ve written up a series of different generalized options for
implementing a system of scoring externally computed predictions (some are
1) Add a new implementation that’s completely separate from the existing
_BaseScorer class.
2) Use the existing _BaseScorer without changes. This means abusing the
Predictor interface and creating something like a dummy predictor that
ignores X and returns the externally computed predictions - predictions not
inherently based on the X variable, but which were externally computed
based on a known X value.
3) Add a private api to _BaseScorer for scoring externally computed
predictions. The private api can be called by a public helper function in
scorer.py.
4) Change the public api of _BaseScorer to make scoring of externally
computed predictions a public operation along with the existing
functionality. Also possibly rename _BaseScorer => BaseScorer.
5) Change the public api of _BaseScorer so that it only handles
externally computed predictions. The existing functionality would be
implemented by the caller (as a callback, since the required type of
prediction data is not known by the caller).
So far in the PR we’ve been looking at options 2, 3, and 4, with 4
seeming like a good candidate. Once we decide on one of these options, I’d
like to follow up with stakeholders on the specifics of what the new
interface will look like.
Thanks,
Aaron Staple
------------------------------------------------------------------------------
_______________________________________________
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Aaron Staple
2014-10-29 06:35:39 UTC
Permalink
Following up on Andy’s questions:

The scorer implementation provides a registry of named scorers, and these
scorers may implement specialized logic such as choosing to call an
appropriate predictor method or munging the output of predict_proba. My
task was to make oob scoring support the same set of named scoring metrics
as cv, so my inclination was to use the existing scorers rather than start
from scratch. (Writing a separate implementation would be option #1 in my
list above.)

I’ve also written up some examples (copying details from @mblondel’s
example earlier)

For #3, the interface might look something like:

class _BaseScorer(
):

@abstractmethod
def __call__(self, estimator, X, y_true, sample_weight=None):
pass

@abstractmethod
def _score(self, y_true, y_prediction=None, y_prediction_proba=None,
y_decision_function=None):
pass

class _ProbaScorer(_BaseScorer):




def _score(self, y, y_pred=None, y_proba=None, y_decision=None,
sample_weight=None):
if y_proba is None:
raise ValueError("This scorer needs y_proba.")
if sample_weight is not None:
return self._sign * self._score_func(y, y_proba,
sample_weight=sample_weight,
**self._kwargs)
else:
return self._sign * self._score_func(y, y_proba, **self._kwargs)


And then there would be a function

def getScore(scoring, y_true, y_prediction=None, y_prediction_proba=None,
y_decision_function=None):
return lookup(scoring)._score(y_true, y_prediction, y_prediction_proba,
y_decision_function)

(more detail in a possible variation of this at
https://github.com/staple/scikit-learn/blob/3455/sklearn/metrics/scorer.py,
where the __call__ and _score methods share an implementation.)

For #4,

class _BaseScorer(
):

@abstractmethod
def __call__(self, estimator, X, y_true, sample_weight=None):
pass

@abstractmethod
def get_score(self, y_true, y_prediction=None, y_prediction_proba=None,
y_decision_function=None):
pass

class _ProbaScorer(_BaseScorer):




def get_score(self, y, y_pred=None, y_proba=None, y_decision=None,
sample_weight=None):
if y_proba is None:
raise ValueError("This scorer needs y_proba.")
if sample_weight is not None:
return self._sign * self._score_func(y, y_proba,
sample_weight=sample_weight,
**self._kwargs)
else:
return self._sign * self._score_func(y, y_proba, **self._kwargs)
Post by Mathieu Blondel
Different metrics require different inputs (results of predict,
decision_function, predict_proba). To avoid branching in the grid search
and cross-validation, we thus introduced the scorer API. A scorer knows
what kind of input it needs and calls predict, decision_function,
predict_proba as needed. We would like to reuse the scorer logic for
out-of-bag scores as well, in order to avoid branching. The problem is that
the scorer API is not suitable if the predictions are already available.
RidgeCV works around this by creating a constant predictor but this is in
my opinion an ugly hack. The get_score method I proposed would avoid
branching, although it would require to compute y_pred, y_decision and
y_proba.
In the classification case, another idea would be to compute out-of-bag
probabilities. Then a score would be obtained by calling a
y_pred = classes[np.argmax(y_proba)
return self._sign * self._score_func(y, y_pred, **self._kwargs)
return self._sign * self._score_func(y, y_proba, **self._kwargs)
The nice thing about predict_proba is that it returns consistently an
array of shape (n_samples, n_classes). decision_function is more
problematic because it doesn't return an array of shape (n_samples, 2) in
the binary case. There was a discussion a long time ago about adding a
predict_score method that would be more consistent in this regard but I
don't remember the discussion outcome.
I don't agree that RidgeCV is an exception. If your labels are binary, it
is perfectly valid to train a regressor on them and want to compute ranking
metrics like AUC or Average Precision. And there is RidgeClassifierCV too.
Mathieu
Post by Andy
Hi.
Can you give a bit more details on 3 and 4?
And can you give an example use case?
When do you need scorers and out of bag samples? The scorers are used in
GridSearchCV and cross_val_score, but the out of bag samples basically
replace cross validation,
so I don't quite understand how these would work together.
I think it would be great if you could give a use-case and some (pseudo)
code on how it would look with your favourite solution.
Cheers,
Andy
Greetings sklearn developers,
I’m a new sklearn contributor, and I’ve been working on a small project
to allow customization of the scoring metric used when scoring out of bag
data for random forests (see
https://github.com/scikit-learn/scikit-learn/pull/3723). In this PR,
@mblondel and I have been discussing an architectural issue that we would
like others to weigh in on.
While working on my implementation, I’ve run into a bit of difficulty
using the scorer implementation as it exists today - in particular, with
the interface expressed in _BaseScorer. The current _BaseScorer interface
is callable, accepting an estimator (utilized as a Predictor), along with
some prediction data points X, and returning a score. The various
_BaseScorer implementations compute a score by calling
estimator.predict(X), estimator.predict_proba(X), or
estimator.decision_function(X) as needed, possibly applying some
transformations to the results, and then applying a score function.
The issue I’ve run into is that predicting out of bag samples is a
rather specialized procedure because the model used differs for each
training point, based on how that point was used during fitting. Computing
these predictions is not particularly suited for implementation as a
Predictor. In addition, in the PR we’ve been discussing that idea that a
random forest estimator will make its out of bag predictions available as
attributes, allowing a user of the estimator to subsequently score these
multiple-metric grid search, he is interested in scoring predictions he
computes outside of a Predictor.
The difficulty is that the current scorers take an estimator and data
points, and compute predictions internally. They don’t accept externally
computed predictions.
I’ve written up a series of different generalized options for
implementing a system of scoring externally computed predictions (some are
1) Add a new implementation that’s completely separate from the
existing _BaseScorer class.
2) Use the existing _BaseScorer without changes. This means abusing the
Predictor interface and creating something like a dummy predictor that
ignores X and returns the externally computed predictions - predictions not
inherently based on the X variable, but which were externally computed
based on a known X value.
3) Add a private api to _BaseScorer for scoring externally computed
predictions. The private api can be called by a public helper function in
scorer.py.
4) Change the public api of _BaseScorer to make scoring of externally
computed predictions a public operation along with the existing
functionality. Also possibly rename _BaseScorer => BaseScorer.
5) Change the public api of _BaseScorer so that it only handles
externally computed predictions. The existing functionality would be
implemented by the caller (as a callback, since the required type of
prediction data is not known by the caller).
So far in the PR we’ve been looking at options 2, 3, and 4, with 4
seeming like a good candidate. Once we decide on one of these options, I’d
like to follow up with stakeholders on the specifics of what the new
interface will look like.
Thanks,
Aaron Staple
------------------------------------------------------------------------------
_______________________________________________
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Aaron Staple
2014-11-03 00:44:52 UTC
Permalink
Hi folks,

I went ahead and made a POC for a more complete implementation of option #4:

https://github.com/staple/scikit-learn/commit/e76fa8887cd35ad7a249ee157067cd12c89bdefb

Aaron
Post by Aaron Staple
The scorer implementation provides a registry of named scorers, and these
scorers may implement specialized logic such as choosing to call an
appropriate predictor method or munging the output of predict_proba. My
task was to make oob scoring support the same set of named scoring metrics
as cv, so my inclination was to use the existing scorers rather than start
from scratch. (Writing a separate implementation would be option #1 in my
list above.)
example earlier)
@abstractmethod
pass
@abstractmethod
def _score(self, y_true, y_prediction=None, y_prediction_proba=None,
pass


def _score(self, y, y_pred=None, y_proba=None, y_decision=None,
raise ValueError("This scorer needs y_proba.")
return self._sign * self._score_func(y, y_proba,
sample_weight=sample_weight,
**self._kwargs)
return self._sign * self._score_func(y, y_proba, **self._kwargs)
And then there would be a function
def getScore(scoring, y_true, y_prediction=None, y_prediction_proba=None,
return lookup(scoring)._score(y_true, y_prediction, y_prediction_proba,
y_decision_function)
(more detail in a possible variation of this at
https://github.com/staple/scikit-learn/blob/3455/sklearn/metrics/scorer.py,
where the __call__ and _score methods share an implementation.)
For #4,
@abstractmethod
pass
@abstractmethod
def get_score(self, y_true, y_prediction=None, y_prediction_proba=None,
pass


def get_score(self, y, y_pred=None, y_proba=None, y_decision=None,
raise ValueError("This scorer needs y_proba.")
return self._sign * self._score_func(y, y_proba,
sample_weight=sample_weight,
**self._kwargs)
return self._sign * self._score_func(y, y_proba, **self._kwargs)
Post by Mathieu Blondel
Different metrics require different inputs (results of predict,
decision_function, predict_proba). To avoid branching in the grid search
and cross-validation, we thus introduced the scorer API. A scorer knows
what kind of input it needs and calls predict, decision_function,
predict_proba as needed. We would like to reuse the scorer logic for
out-of-bag scores as well, in order to avoid branching. The problem is that
the scorer API is not suitable if the predictions are already available.
RidgeCV works around this by creating a constant predictor but this is in
my opinion an ugly hack. The get_score method I proposed would avoid
branching, although it would require to compute y_pred, y_decision and
y_proba.
In the classification case, another idea would be to compute out-of-bag
probabilities. Then a score would be obtained by calling a
y_pred = classes[np.argmax(y_proba)
return self._sign * self._score_func(y, y_pred, **self._kwargs)
return self._sign * self._score_func(y, y_proba, **self._kwargs)
The nice thing about predict_proba is that it returns consistently an
array of shape (n_samples, n_classes). decision_function is more
problematic because it doesn't return an array of shape (n_samples, 2) in
the binary case. There was a discussion a long time ago about adding a
predict_score method that would be more consistent in this regard but I
don't remember the discussion outcome.
I don't agree that RidgeCV is an exception. If your labels are binary, it
is perfectly valid to train a regressor on them and want to compute ranking
metrics like AUC or Average Precision. And there is RidgeClassifierCV too.
Mathieu
Post by Andy
Hi.
Can you give a bit more details on 3 and 4?
And can you give an example use case?
When do you need scorers and out of bag samples? The scorers are used in
GridSearchCV and cross_val_score, but the out of bag samples basically
replace cross validation,
so I don't quite understand how these would work together.
I think it would be great if you could give a use-case and some (pseudo)
code on how it would look with your favourite solution.
Cheers,
Andy
Greetings sklearn developers,
I’m a new sklearn contributor, and I’ve been working on a small
project to allow customization of the scoring metric used when scoring out
of bag data for random forests (see
https://github.com/scikit-learn/scikit-learn/pull/3723). In this PR,
@mblondel and I have been discussing an architectural issue that we would
like others to weigh in on.
While working on my implementation, I’ve run into a bit of difficulty
using the scorer implementation as it exists today - in particular, with
the interface expressed in _BaseScorer. The current _BaseScorer interface
is callable, accepting an estimator (utilized as a Predictor), along with
some prediction data points X, and returning a score. The various
_BaseScorer implementations compute a score by calling
estimator.predict(X), estimator.predict_proba(X), or
estimator.decision_function(X) as needed, possibly applying some
transformations to the results, and then applying a score function.
The issue I’ve run into is that predicting out of bag samples is a
rather specialized procedure because the model used differs for each
training point, based on how that point was used during fitting. Computing
these predictions is not particularly suited for implementation as a
Predictor. In addition, in the PR we’ve been discussing that idea that a
random forest estimator will make its out of bag predictions available as
attributes, allowing a user of the estimator to subsequently score these
multiple-metric grid search, he is interested in scoring predictions he
computes outside of a Predictor.
The difficulty is that the current scorers take an estimator and data
points, and compute predictions internally. They don’t accept externally
computed predictions.
I’ve written up a series of different generalized options for
implementing a system of scoring externally computed predictions (some are
1) Add a new implementation that’s completely separate from the
existing _BaseScorer class.
2) Use the existing _BaseScorer without changes. This means abusing
the Predictor interface and creating something like a dummy predictor that
ignores X and returns the externally computed predictions - predictions not
inherently based on the X variable, but which were externally computed
based on a known X value.
3) Add a private api to _BaseScorer for scoring externally computed
predictions. The private api can be called by a public helper function in
scorer.py.
4) Change the public api of _BaseScorer to make scoring of externally
computed predictions a public operation along with the existing
functionality. Also possibly rename _BaseScorer => BaseScorer.
5) Change the public api of _BaseScorer so that it only handles
externally computed predictions. The existing functionality would be
implemented by the caller (as a callback, since the required type of
prediction data is not known by the caller).
So far in the PR we’ve been looking at options 2, 3, and 4, with 4
seeming like a good candidate. Once we decide on one of these options, I’d
like to follow up with stakeholders on the specifics of what the new
interface will look like.
Thanks,
Aaron Staple
------------------------------------------------------------------------------
_______________________________________________
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andy
2014-11-03 16:27:40 UTC
Permalink
Cool, I hope I have time to review it the next couple of days.
Post by Aaron Staple
Hi folks,
https://github.com/staple/scikit-learn/commit/e76fa8887cd35ad7a249ee157067cd12c89bdefb
Aaron
------------------------------------------------------------------------------
Aaron Staple
2014-11-28 08:14:43 UTC
Permalink
Hi Again Folks,

After discussion with Andreas, we decided to move to the PR stage with
option #4 (adding a get_score method to the scorer interface). Andreas
advised me that this PR should include fixing _RidgeGCV.fit so that it
calls the new get_score method.

In the above thread there was some discussion regarding whether or not
ridge cv is a case where the scorer interface should be used at all, and in
particular whether categorical scoring functions are valid for ridge cv. In
the final comment on this topic Mathieu suggested that the scorer interface
should be used, and that ideally categorical scoring functions would be
supported for RidgeCV on the 0-1 prediction domain and for
RidgeClassifierCV.

However, I tried to run a couple of test cases with 0-1 predictions for
RidgeCV and classification with RidgeClassifierCV, and I got some error
messages. It looks like one reason for this is that
LinearModel._center_data can convert the y values to non integers. In
addition, it appears that in the case of multiclass classification the
scorer is applied to the ravel()’ed list of one-vs-all classifiers and not
to the actual class predictions. Am I right in thinking that this can
affect the classification score for some scorers? For example, consider a
simple accuracy scorer and just one prediction. It is possibly for some
one-vs-all classifiers to be predicted correctly while the overall class
prediction is wrong - thus the accuracy score over the one-vs-all
classifiers would be nonzero while the overall classification accuracy is
zero. (In addition, if I am reading correctly I believe the y_true and
y_predicted values are possibly being passed incorrectly to the scorer
currently, and are being swapped with each other.)

Given these observations I wanted to double check 1) that we want to
support classification scorers and not just regression scorers at this
precise location in this code and 2) that I should start using get_score in
this location now, given that I believe at least some additional work will
be needed for support of classification scorers.

Thanks,
Aaron

PS Here are my simple test cases
Post by Andy
Post by Aaron Staple
import numpy as np
from sklearn.linear_model import RidgeClassifierCV, RidgeCV
clf = RidgeCV(scoring='roc_auc')
y = np.array([0, 1, 1])
X = np.array([[0, 0], [0, 1], [2, 3]])
clf.fit(X,y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "sklearn/linear_model/ridge.py", line 858, in fit
estimator.fit(X, y, sample_weight=sample_weight)
File "sklearn/linear_model/ridge.py", line 801, in fit
for i in range(len(self.alphas))]
File "sklearn/metrics/scorer.py", line 157, in __call__
raise ValueError("{0} format is not supported".format(y_type))
ValueError: continuous format is not supported
Post by Andy
Post by Aaron Staple
clf = RidgeClassifierCV(scoring='roc_auc')
y = np.array([0, 1, 1])
X = np.array([[0, 0], [0, 1], [2, 3]])
clf.fit(X,y)
/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:2499:
VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute
or function instead. To find the rank of a matrix see
`numpy.linalg.matrix_rank`.
VisibleDeprecationWarning)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "sklearn/linear_model/ridge.py", line 1069, in fit
_BaseRidgeCV.fit(self, X, Y, sample_weight=sample_weight)
File "sklearn/linear_model/ridge.py", line 858, in fit
estimator.fit(X, y, sample_weight=sample_weight)
File "sklearn/linear_model/ridge.py", line 801, in fit
for i in range(len(self.alphas))]
File "sklearn/metrics/scorer.py", line 157, in __call__
raise ValueError("{0} format is not supported".format(y_type))
ValueError: continuous format is not supported
Post by Andy
Cool, I hope I have time to review it the next couple of days.
Post by Aaron Staple
Hi folks,
https://github.com/staple/scikit-learn/commit/e76fa8887cd35ad7a249ee157067cd12c89bdefb
Post by Aaron Staple
Aaron
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Mathieu Blondel
2014-11-28 10:40:16 UTC
Permalink
Post by Aaron Staple
[...]
However, I tried to run a couple of test cases with 0-1 predictions for
RidgeCV and classification with RidgeClassifierCV, and I got some error
messages. It looks like one reason for this is that
LinearModel._center_data can convert the y values to non integers. In
addition, it appears that in the case of multiclass classification the
scorer is applied to the ravel()’ed list of one-vs-all classifiers and not
to the actual class predictions. Am I right in thinking that this can
affect the classification score for some scorers? For example, consider a
simple accuracy scorer and just one prediction. It is possibly for some
one-vs-all classifiers to be predicted correctly while the overall class
prediction is wrong - thus the accuracy score over the one-vs-all
classifiers would be nonzero while the overall classification accuracy is
zero. (In addition, if I am reading correctly I believe the y_true and
y_predicted values are possibly being passed incorrectly to the scorer
currently, and are being swapped with each other.)
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L800

Shouldn't this line use the unnormalized y? Otherwise, this is evaluating a
different problem.

BTW, the scorer handling in RidgeCV is currently broken.
Post by Aaron Staple
Given these observations I wanted to double check 1) that we want to
support classification scorers and not just regression scorers at this
precise location in this code and 2) that I should start using get_score in
this location now, given that I believe at least some additional work will
be needed for support of classification scorers.
I was more talking about ranking scorers.

# y contains binary values
y_pred = RandomForestRegressor().fit(X, y).predict(X)
print roc_auc_score(y, y_pred)

# y contains ordinal values
y_pred = RandomForestRegressor().fit(X, y).predict(X)
print ndcg_score(y, y_pred) # not yet in scikit-learn

For me these two usecases are perfectly legitimate. Now, I would really
like to use GridSearchCV to tune the RF hyper-parameters against AUC or
NDCG but the scorer API insists on calling either predict_proba or
decision_function.
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/scorer.py#L159

If we could detect that an estimator is a regressor, we could call
"predict" instead but we have currently no way to know that. We can't check
isinstance(estimator, RegressorMixin) since we can't even expect a
third-party regression class to inherent RegressorMixin (as per our current
API "specification").

M.
Mathieu Blondel
2014-11-28 15:05:08 UTC
Permalink
Here's a proof of concept that introduces a new method "predict_score":
https://github.com/mblondel/scikit-learn/commit/0b06d424ea0fe40148436846c287046549419f03

The role of this method is to get continuous-output predictions from both
classifiers and regressors in a consistent manner. This way the predicted
continuous outputs can be passed to ranking metrics like roc_auc_score. The
advantage of this solution is that third-party code can reimplement
"predict_score" without depending on scikit-learn.

Another solution is to use isinstance(estimator, RegressorMixin) inside the
scorer to detect if an estimator is a regressor and use predict instead of
predict_proba / decision_function. This assumes that the estimator inherits
from RegressorMixin and therefore, the code must depend on scikit-learn.

M.
Post by Mathieu Blondel
Post by Aaron Staple
[...]
However, I tried to run a couple of test cases with 0-1 predictions for
RidgeCV and classification with RidgeClassifierCV, and I got some error
messages. It looks like one reason for this is that
LinearModel._center_data can convert the y values to non integers. In
addition, it appears that in the case of multiclass classification the
scorer is applied to the ravel()’ed list of one-vs-all classifiers and not
to the actual class predictions. Am I right in thinking that this can
affect the classification score for some scorers? For example, consider a
simple accuracy scorer and just one prediction. It is possibly for some
one-vs-all classifiers to be predicted correctly while the overall class
prediction is wrong - thus the accuracy score over the one-vs-all
classifiers would be nonzero while the overall classification accuracy is
zero. (In addition, if I am reading correctly I believe the y_true and
y_predicted values are possibly being passed incorrectly to the scorer
currently, and are being swapped with each other.)
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L800
Shouldn't this line use the unnormalized y? Otherwise, this is evaluating
a different problem.
BTW, the scorer handling in RidgeCV is currently broken.
Post by Aaron Staple
Given these observations I wanted to double check 1) that we want to
support classification scorers and not just regression scorers at this
precise location in this code and 2) that I should start using get_score in
this location now, given that I believe at least some additional work will
be needed for support of classification scorers.
I was more talking about ranking scorers.
# y contains binary values
y_pred = RandomForestRegressor().fit(X, y).predict(X)
print roc_auc_score(y, y_pred)
# y contains ordinal values
y_pred = RandomForestRegressor().fit(X, y).predict(X)
print ndcg_score(y, y_pred) # not yet in scikit-learn
For me these two usecases are perfectly legitimate. Now, I would really
like to use GridSearchCV to tune the RF hyper-parameters against AUC or
NDCG but the scorer API insists on calling either predict_proba or
decision_function.
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/scorer.py#L159
If we could detect that an estimator is a regressor, we could call
"predict" instead but we have currently no way to know that. We can't check
isinstance(estimator, RegressorMixin) since we can't even expect a
third-party regression class to inherent RegressorMixin (as per our current
API "specification").
M.
Michael Eickenberg
2014-11-28 15:29:26 UTC
Permalink
Hi Mathieu,

is that the right name for this behaviour?

When I read the name, I thought you were proposing a function like
`fit_transform` in the sense that by default it would call `predict` and
then score the result with a given scorer and some ground truth information
(e.g. y_true from a cv fold). Any estimator that could do this better than
by following this standard procedure would then get its chance to do so.
The signature of this function would then have to take this ground truth
data and a scorer as optional inputs.

(Secretly I have been wanting this feature but never dared to ask if I can
implement it. The function cross_val_score would benefit from it.)

What you are proposing seems to group/generalize `predict_proba` and
`decision_function` into one. This is useful in many cases, but isn't there
a risk of introducing some uncontrollable magic here if several options are
available per estimator?

Michael
Post by Mathieu Blondel
https://github.com/mblondel/scikit-learn/commit/0b06d424ea0fe40148436846c287046549419f03
The role of this method is to get continuous-output predictions from both
classifiers and regressors in a consistent manner. This way the predicted
continuous outputs can be passed to ranking metrics like roc_auc_score. The
advantage of this solution is that third-party code can reimplement
"predict_score" without depending on scikit-learn.
Another solution is to use isinstance(estimator, RegressorMixin) inside
the scorer to detect if an estimator is a regressor and use predict instead
of predict_proba / decision_function. This assumes that the estimator
inherits from RegressorMixin and therefore, the code must depend on
scikit-learn.
M.
Post by Mathieu Blondel
Post by Aaron Staple
[...]
However, I tried to run a couple of test cases with 0-1 predictions for
RidgeCV and classification with RidgeClassifierCV, and I got some error
messages. It looks like one reason for this is that
LinearModel._center_data can convert the y values to non integers. In
addition, it appears that in the case of multiclass classification the
scorer is applied to the ravel()’ed list of one-vs-all classifiers and not
to the actual class predictions. Am I right in thinking that this can
affect the classification score for some scorers? For example, consider a
simple accuracy scorer and just one prediction. It is possibly for some
one-vs-all classifiers to be predicted correctly while the overall class
prediction is wrong - thus the accuracy score over the one-vs-all
classifiers would be nonzero while the overall classification accuracy is
zero. (In addition, if I am reading correctly I believe the y_true and
y_predicted values are possibly being passed incorrectly to the scorer
currently, and are being swapped with each other.)
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L800
Shouldn't this line use the unnormalized y? Otherwise, this is evaluating
a different problem.
BTW, the scorer handling in RidgeCV is currently broken.
Post by Aaron Staple
Given these observations I wanted to double check 1) that we want to
support classification scorers and not just regression scorers at this
precise location in this code and 2) that I should start using get_score in
this location now, given that I believe at least some additional work will
be needed for support of classification scorers.
I was more talking about ranking scorers.
# y contains binary values
y_pred = RandomForestRegressor().fit(X, y).predict(X)
print roc_auc_score(y, y_pred)
# y contains ordinal values
y_pred = RandomForestRegressor().fit(X, y).predict(X)
print ndcg_score(y, y_pred) # not yet in scikit-learn
For me these two usecases are perfectly legitimate. Now, I would really
like to use GridSearchCV to tune the RF hyper-parameters against AUC or
NDCG but the scorer API insists on calling either predict_proba or
decision_function.
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/scorer.py#L159
If we could detect that an estimator is a regressor, we could call
"predict" instead but we have currently no way to know that. We can't check
isinstance(estimator, RegressorMixin) since we can't even expect a
third-party regression class to inherent RegressorMixin (as per our current
API "specification").
M.
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Mathieu Blondel
2014-11-28 15:50:00 UTC
Permalink
On Sat, Nov 29, 2014 at 12:29 AM, Michael Eickenberg <
Post by Michael Eickenberg
Hi Mathieu,
is that the right name for this behaviour?
I agree, the name "predict_score" can be misleading. Another name I had in
mind would be "predict_confidence".
Post by Michael Eickenberg
When I read the name, I thought you were proposing a function like
`fit_transform` in the sense that by default it would call `predict` and
then score the result with a given scorer and some ground truth information
(e.g. y_true from a cv fold). Any estimator that could do this better than
by following this standard procedure would then get its chance to do so.
The signature of this function would then have to take this ground truth
data and a scorer as optional inputs.
(Secretly I have been wanting this feature but never dared to ask if I can
implement it. The function cross_val_score would benefit from it.)
What you are proposing seems to group/generalize `predict_proba` and
`decision_function` into one. This is useful in many cases, but isn't there
a risk of introducing some uncontrollable magic here if several options are
available per estimator?
The scorer API is already choosing decision_function arbitrarily when both
predict_proba and decision_function are available.
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/scorer.py#L159

However except on rare occassions (e.g., SVC because of Platt calibration),
predict_proba and decision function should agree on their predictions
(i.e., when taking the argmax).

This solution is intended to be "duck-typing" friendly. Personally, I think
it would make our lives easier if we could just assume that all regressors
inherit from RegressorMixin.

M.

Michael
Post by Michael Eickenberg
Post by Mathieu Blondel
https://github.com/mblondel/scikit-learn/commit/0b06d424ea0fe40148436846c287046549419f03
The role of this method is to get continuous-output predictions from both
classifiers and regressors in a consistent manner. This way the predicted
continuous outputs can be passed to ranking metrics like roc_auc_score. The
advantage of this solution is that third-party code can reimplement
"predict_score" without depending on scikit-learn.
Another solution is to use isinstance(estimator, RegressorMixin) inside
the scorer to detect if an estimator is a regressor and use predict instead
of predict_proba / decision_function. This assumes that the estimator
inherits from RegressorMixin and therefore, the code must depend on
scikit-learn.
M.
Post by Mathieu Blondel
Post by Aaron Staple
[...]
However, I tried to run a couple of test cases with 0-1 predictions for
RidgeCV and classification with RidgeClassifierCV, and I got some error
messages. It looks like one reason for this is that
LinearModel._center_data can convert the y values to non integers. In
addition, it appears that in the case of multiclass classification the
scorer is applied to the ravel()’ed list of one-vs-all classifiers and not
to the actual class predictions. Am I right in thinking that this can
affect the classification score for some scorers? For example, consider a
simple accuracy scorer and just one prediction. It is possibly for some
one-vs-all classifiers to be predicted correctly while the overall class
prediction is wrong - thus the accuracy score over the one-vs-all
classifiers would be nonzero while the overall classification accuracy is
zero. (In addition, if I am reading correctly I believe the y_true and
y_predicted values are possibly being passed incorrectly to the scorer
currently, and are being swapped with each other.)
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L800
Shouldn't this line use the unnormalized y? Otherwise, this is
evaluating a different problem.
BTW, the scorer handling in RidgeCV is currently broken.
Post by Aaron Staple
Given these observations I wanted to double check 1) that we want to
support classification scorers and not just regression scorers at this
precise location in this code and 2) that I should start using get_score in
this location now, given that I believe at least some additional work will
be needed for support of classification scorers.
I was more talking about ranking scorers.
# y contains binary values
y_pred = RandomForestRegressor().fit(X, y).predict(X)
print roc_auc_score(y, y_pred)
# y contains ordinal values
y_pred = RandomForestRegressor().fit(X, y).predict(X)
print ndcg_score(y, y_pred) # not yet in scikit-learn
For me these two usecases are perfectly legitimate. Now, I would really
like to use GridSearchCV to tune the RF hyper-parameters against AUC or
NDCG but the scorer API insists on calling either predict_proba or
decision_function.
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/scorer.py#L159
If we could detect that an estimator is a regressor, we could call
"predict" instead but we have currently no way to know that. We can't check
isinstance(estimator, RegressorMixin) since we can't even expect a
third-party regression class to inherent RegressorMixin (as per our current
API "specification").
M.
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Mathieu Blondel
2014-11-28 16:16:29 UTC
Permalink
I forgot to mention that in "Ridge", decision_function is an alias for
predict, precisely to allow grid searching against AUC and other ranking
metrics.

M.
Post by Mathieu Blondel
On Sat, Nov 29, 2014 at 12:29 AM, Michael Eickenberg <
Post by Michael Eickenberg
Hi Mathieu,
is that the right name for this behaviour?
I agree, the name "predict_score" can be misleading. Another name I had in
mind would be "predict_confidence".
Post by Michael Eickenberg
When I read the name, I thought you were proposing a function like
`fit_transform` in the sense that by default it would call `predict` and
then score the result with a given scorer and some ground truth information
(e.g. y_true from a cv fold). Any estimator that could do this better than
by following this standard procedure would then get its chance to do so.
The signature of this function would then have to take this ground truth
data and a scorer as optional inputs.
(Secretly I have been wanting this feature but never dared to ask if I
can implement it. The function cross_val_score would benefit from it.)
What you are proposing seems to group/generalize `predict_proba` and
`decision_function` into one. This is useful in many cases, but isn't there
a risk of introducing some uncontrollable magic here if several options are
available per estimator?
The scorer API is already choosing decision_function arbitrarily when both
predict_proba and decision_function are available.
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/scorer.py#L159
However except on rare occassions (e.g., SVC because of Platt
calibration), predict_proba and decision function should agree on their
predictions (i.e., when taking the argmax).
This solution is intended to be "duck-typing" friendly. Personally, I
think it would make our lives easier if we could just assume that all
regressors inherit from RegressorMixin.
M.
Michael
Post by Michael Eickenberg
Post by Mathieu Blondel
https://github.com/mblondel/scikit-learn/commit/0b06d424ea0fe40148436846c287046549419f03
The role of this method is to get continuous-output predictions from
both classifiers and regressors in a consistent manner. This way the
predicted continuous outputs can be passed to ranking metrics like
roc_auc_score. The advantage of this solution is that third-party code can
reimplement "predict_score" without depending on scikit-learn.
Another solution is to use isinstance(estimator, RegressorMixin) inside
the scorer to detect if an estimator is a regressor and use predict instead
of predict_proba / decision_function. This assumes that the estimator
inherits from RegressorMixin and therefore, the code must depend on
scikit-learn.
M.
Post by Mathieu Blondel
Post by Aaron Staple
[...]
However, I tried to run a couple of test cases with 0-1 predictions
for RidgeCV and classification with RidgeClassifierCV, and I got some error
messages. It looks like one reason for this is that
LinearModel._center_data can convert the y values to non integers. In
addition, it appears that in the case of multiclass classification the
scorer is applied to the ravel()’ed list of one-vs-all classifiers and not
to the actual class predictions. Am I right in thinking that this can
affect the classification score for some scorers? For example, consider a
simple accuracy scorer and just one prediction. It is possibly for some
one-vs-all classifiers to be predicted correctly while the overall class
prediction is wrong - thus the accuracy score over the one-vs-all
classifiers would be nonzero while the overall classification accuracy is
zero. (In addition, if I am reading correctly I believe the y_true and
y_predicted values are possibly being passed incorrectly to the scorer
currently, and are being swapped with each other.)
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L800
Shouldn't this line use the unnormalized y? Otherwise, this is
evaluating a different problem.
BTW, the scorer handling in RidgeCV is currently broken.
Post by Aaron Staple
Given these observations I wanted to double check 1) that we want to
support classification scorers and not just regression scorers at this
precise location in this code and 2) that I should start using get_score in
this location now, given that I believe at least some additional work will
be needed for support of classification scorers.
I was more talking about ranking scorers.
# y contains binary values
y_pred = RandomForestRegressor().fit(X, y).predict(X)
print roc_auc_score(y, y_pred)
# y contains ordinal values
y_pred = RandomForestRegressor().fit(X, y).predict(X)
print ndcg_score(y, y_pred) # not yet in scikit-learn
For me these two usecases are perfectly legitimate. Now, I would really
like to use GridSearchCV to tune the RF hyper-parameters against AUC or
NDCG but the scorer API insists on calling either predict_proba or
decision_function.
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/scorer.py#L159
If we could detect that an estimator is a regressor, we could call
"predict" instead but we have currently no way to know that. We can't check
isinstance(estimator, RegressorMixin) since we can't even expect a
third-party regression class to inherent RegressorMixin (as per our current
API "specification").
M.
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Aaron Staple
2014-11-29 02:33:09 UTC
Permalink
Hi Mathieu,

Thanks for the information you’ve provided about the ridge implementation
and your suggestions for scoring rankings.

First off, I’d like to try and contain the scope of the project I’m working
on. Would it be reasonable for me to implement adding get_score to scorer,
along with the oob implementation for random forests, first? This
discussion of the ridge code seems to have raised a new set of design and
implementation questions that might be coded separately.

Also, I am coming up to speed on your suggestions regarding support for
ranked scoring of regression predictions.

My impression is that in sklearn regressors typically implement predict,
while classifiers typically implement predict as well as predict_proba
and/or decision_function. Currently ThresholdScorer attempts to call
decision_function() and if that fails then it calls predict_proba(). What
if ThresholdScorer were extended to also call predict() if neither
decision_function or predict_proba exists? That way predict() would be
called for regressors without an interface change for estimators.

I suppose there may be cases where classifiers do not implement
predict_proba or decision_function and in such cases predict() would be
used instead of erring out when a threshold scorer is applied (the current
behavior). In this case, where a classifier supports predict only, and a
threshold scoring function is supplied, how bad would it be if the
predict() classification result is interpreted as a coarse (integer)
ordinal instead of erring out? I don’t know the answer to this - again,
please excuse my newness to sklearn - but I thought I would at least raise
the possibility of this implementation since it doesn’t require an
interface change.

Aaron
Post by Mathieu Blondel
I forgot to mention that in "Ridge", decision_function is an alias for
predict, precisely to allow grid searching against AUC and other ranking
metrics.
M.
Post by Mathieu Blondel
On Sat, Nov 29, 2014 at 12:29 AM, Michael Eickenberg <
Post by Michael Eickenberg
Hi Mathieu,
is that the right name for this behaviour?
I agree, the name "predict_score" can be misleading. Another name I had
in mind would be "predict_confidence".
Post by Michael Eickenberg
When I read the name, I thought you were proposing a function like
`fit_transform` in the sense that by default it would call `predict` and
then score the result with a given scorer and some ground truth information
(e.g. y_true from a cv fold). Any estimator that could do this better than
by following this standard procedure would then get its chance to do so.
The signature of this function would then have to take this ground truth
data and a scorer as optional inputs.
(Secretly I have been wanting this feature but never dared to ask if I
can implement it. The function cross_val_score would benefit from it.)
What you are proposing seems to group/generalize `predict_proba` and
`decision_function` into one. This is useful in many cases, but isn't there
a risk of introducing some uncontrollable magic here if several options are
available per estimator?
The scorer API is already choosing decision_function arbitrarily when
both predict_proba and decision_function are available.
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/scorer.py#L159
However except on rare occassions (e.g., SVC because of Platt
calibration), predict_proba and decision function should agree on their
predictions (i.e., when taking the argmax).
This solution is intended to be "duck-typing" friendly. Personally, I
think it would make our lives easier if we could just assume that all
regressors inherit from RegressorMixin.
M.
Michael
Post by Michael Eickenberg
Post by Mathieu Blondel
https://github.com/mblondel/scikit-learn/commit/0b06d424ea0fe40148436846c287046549419f03
The role of this method is to get continuous-output predictions from
both classifiers and regressors in a consistent manner. This way the
predicted continuous outputs can be passed to ranking metrics like
roc_auc_score. The advantage of this solution is that third-party code can
reimplement "predict_score" without depending on scikit-learn.
Another solution is to use isinstance(estimator, RegressorMixin) inside
the scorer to detect if an estimator is a regressor and use predict instead
of predict_proba / decision_function. This assumes that the estimator
inherits from RegressorMixin and therefore, the code must depend on
scikit-learn.
M.
Post by Mathieu Blondel
Post by Aaron Staple
[...]
However, I tried to run a couple of test cases with 0-1 predictions
for RidgeCV and classification with RidgeClassifierCV, and I got some error
messages. It looks like one reason for this is that
LinearModel._center_data can convert the y values to non integers. In
addition, it appears that in the case of multiclass classification the
scorer is applied to the ravel()’ed list of one-vs-all classifiers and not
to the actual class predictions. Am I right in thinking that this can
affect the classification score for some scorers? For example, consider a
simple accuracy scorer and just one prediction. It is possibly for some
one-vs-all classifiers to be predicted correctly while the overall class
prediction is wrong - thus the accuracy score over the one-vs-all
classifiers would be nonzero while the overall classification accuracy is
zero. (In addition, if I am reading correctly I believe the y_true and
y_predicted values are possibly being passed incorrectly to the scorer
currently, and are being swapped with each other.)
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L800
Shouldn't this line use the unnormalized y? Otherwise, this is
evaluating a different problem.
BTW, the scorer handling in RidgeCV is currently broken.
Post by Aaron Staple
Given these observations I wanted to double check 1) that we want to
support classification scorers and not just regression scorers at this
precise location in this code and 2) that I should start using get_score in
this location now, given that I believe at least some additional work will
be needed for support of classification scorers.
I was more talking about ranking scorers.
# y contains binary values
y_pred = RandomForestRegressor().fit(X, y).predict(X)
print roc_auc_score(y, y_pred)
# y contains ordinal values
y_pred = RandomForestRegressor().fit(X, y).predict(X)
print ndcg_score(y, y_pred) # not yet in scikit-learn
For me these two usecases are perfectly legitimate. Now, I would
really like to use GridSearchCV to tune the RF hyper-parameters against AUC
or NDCG but the scorer API insists on calling either predict_proba or
decision_function.
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/scorer.py#L159
If we could detect that an estimator is a regressor, we could call
"predict" instead but we have currently no way to know that. We can't check
isinstance(estimator, RegressorMixin) since we can't even expect a
third-party regression class to inherent RegressorMixin (as per our current
API "specification").
M.
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Mathieu Blondel
2014-11-29 03:57:39 UTC
Permalink
Post by Michael Eickenberg
Hi Mathieu,
Thanks for the information you’ve provided about the ridge implementation
and your suggestions for scoring rankings.
First off, I’d like to try and contain the scope of the project I’m
working on. Would it be reasonable for me to implement adding get_score to
scorer, along with the oob implementation for random forests, first? This
discussion of the ridge code seems to have raised a new set of design and
implementation questions that might be coded separately.
I understand you but we must also be careful of not creating a half-cooked
API.
Post by Michael Eickenberg
Also, I am coming up to speed on your suggestions regarding support for
ranked scoring of regression predictions.
My impression is that in sklearn regressors typically implement predict,
while classifiers typically implement predict as well as predict_proba
and/or decision_function. Currently ThresholdScorer attempts to call
decision_function() and if that fails then it calls predict_proba(). What
if ThresholdScorer were extended to also call predict() if neither
decision_function or predict_proba exists? That way predict() would be
called for regressors without an interface change for estimators.
This is a good suggestion. To summarize here are our options:

- introduce a new method predict_score / predict_confidence (pro:
duck-typing friendly, con: one more method)
- use isinstance(estimator, RegressorMixin) to detect regressors (pro:
simple, con: assume inheritance)
- make decision_function an alias of predict in all regressors (pro:
simple, con: can no longer detect classifiers with hasattribute(estimator,
"decision_function"))
- call predict if neither predict_proba nor decision_function are available
(pro:simple, con: can't raise an exception for classifiers with predict
only)

What do people think?

Mathieu
Joel Nothman
2014-11-30 07:29:40 UTC
Permalink
So far I only have a strong opinion on not relying on the presence of
decision_function or predict_proba to identify a classifier.

Also, is the distinction we seek between classifiers and regressors,
precisely, or between categorical and continuous predictors? (i.e. do we
care that clusterers and classifiers fall together?)
Post by Mathieu Blondel
Post by Michael Eickenberg
Hi Mathieu,
Thanks for the information you’ve provided about the ridge implementation
and your suggestions for scoring rankings.
First off, I’d like to try and contain the scope of the project I’m
working on. Would it be reasonable for me to implement adding get_score to
scorer, along with the oob implementation for random forests, first? This
discussion of the ridge code seems to have raised a new set of design and
implementation questions that might be coded separately.
I understand you but we must also be careful of not creating a half-cooked
API.
Post by Michael Eickenberg
Also, I am coming up to speed on your suggestions regarding support for
ranked scoring of regression predictions.
My impression is that in sklearn regressors typically implement predict,
while classifiers typically implement predict as well as predict_proba
and/or decision_function. Currently ThresholdScorer attempts to call
decision_function() and if that fails then it calls predict_proba(). What
if ThresholdScorer were extended to also call predict() if neither
decision_function or predict_proba exists? That way predict() would be
called for regressors without an interface change for estimators.
duck-typing friendly, con: one more method)
simple, con: assume inheritance)
simple, con: can no longer detect classifiers with hasattribute(estimator,
"decision_function"))
- call predict if neither predict_proba nor decision_function are
available (pro:simple, con: can't raise an exception for classifiers with
predict only)
What do people think?
Mathieu
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Alexandre Gramfort
2014-11-30 08:39:19 UTC
Permalink
What I suggest:

use isinstance(estimator, RegressorMixin) to know if we can use predict safely.
If we can't rely on inheritance, call predict if neither predict_proba
nor decision_function are available and check that the predicted
values are of type float32 or float64.

Alex
Andy
2014-12-01 16:10:51 UTC
Permalink
Post by Mathieu Blondel
duck-typing friendly, con: one more method)
simple, con: assume inheritance)
simple, con: can no longer detect classifiers with
hasattribute(estimator, "decision_function"))
- call predict if neither predict_proba nor decision_function are
available (pro:simple, con: can't raise an exception for classifiers
with predict only)
What do people think?
I think I would currently favour the "decision_function" option.
Do we ever make use of the presence of "decision_function" to decide
whether something is a classifier?
I am not aware of any.

On a side note: when do we need to distinguish classifiers and regressors?
We currently use it to switch between stratified cross-validation and
k-fold cross validation mainly, right?
And it is currently implemented using inheritance (so third-party
estimators are regressors by default).


After fitting, I think the presence of ``classes_`` would be a good way
to detect a classifier, but I guess we need that information before for
cross-validation.


Cheers,
Andy

Andy
2014-12-01 16:03:05 UTC
Permalink
First off, I’d like to try and contain the scope of the project I’m
working on. Would it be reasonable for me to implement adding
get_score to scorer, along with the oob implementation for random
forests, first? This discussion of the ridge code seems to have raised
a new set of design and implementation questions that might be coded
separately.
+1
Loading...