[Scikit-learn-general] design of scorer interface

Discussion:

Aaron Staple

2014-10-27 02:33:27 UTC

Greetings sklearn developers,

Iâm a new sklearn contributor, and Iâve been working on a small project to
allow customization of the scoring metric used when scoring out of bag data
for random forests (see
https://github.com/scikit-learn/scikit-learn/pull/3723). In this PR,
@mblondel and I have been discussing an architectural issue that we would
like others to weigh in on.

While working on my implementation, Iâve run into a bit of difficulty using
the scorer implementation as it exists today - in particular, with the
interface expressed in _BaseScorer. The current _BaseScorer interface is
callable, accepting an estimator (utilized as a Predictor), along with some
prediction data points X, and returning a score. The various _BaseScorer
implementations compute a score by calling estimator.predict(X),
estimator.predict_proba(X), or estimator.decision_function(X) as needed,
possibly applying some transformations to the results, and then applying a
score function.

The issue Iâve run into is that predicting out of bag samples is a rather
specialized procedure because the model used differs for each training
point, based on how that point was used during fitting. Computing these
predictions is not particularly suited for implementation as a Predictor.
In addition, in the PR weâve been discussing that idea that a random forest
estimator will make its out of bag predictions available as attributes,
allowing a user of the estimator to subsequently score these provided
predictions. Also, @mblondel mentioned that for his work on multiple-metric
grid search, he is interested in scoring predictions he computes outside of
a Predictor.

The difficulty is that the current scorers take an estimator and data
points, and compute predictions internally. They donât accept externally
computed predictions.

Iâve written up a series of different generalized options for implementing
a system of scoring externally computed predictions (some are likely
undesirable but are provided as points of comparison):

1) Add a new implementation thatâs completely separate from the existing
_BaseScorer class.

2) Use the existing _BaseScorer without changes. This means abusing the
Predictor interface and creating something like a dummy predictor that
ignores X and returns the externally computed predictions - predictions not
inherently based on the X variable, but which were externally computed
based on a known X value.

3) Add a private api to _BaseScorer for scoring externally computed
predictions. The private api can be called by a public helper function in
scorer.py.

4) Change the public api of _BaseScorer to make scoring of externally
computed predictions a public operation along with the existing
functionality. Also possibly rename _BaseScorer => BaseScorer.

5) Change the public api of _BaseScorer so that it only handles externally
computed predictions. The existing functionality would be implemented by
the caller (as a callback, since the required type of prediction data is
not known by the caller).

So far in the PR weâve been looking at options 2, 3, and 4, with 4 seeming
like a good candidate. Once we decide on one of these options, Iâd like to
follow up with stakeholders on the specifics of what the new interface will
look like.

Thanks,
Aaron Staple

Mathieu Blondel

2014-10-27 13:41:01 UTC