Discussion:
[Scikit-learn-general] Hyper-parameter autotuning
Olivier Grisel
2010-06-29 20:05:50 UTC
Permalink
Hi all again,

It would be great if we could setup a standard API for hyper-parameter
admissible ranges definitions and settings. That would allow us to to
perform automated parameter tuning with. For instance one could have:

MyClassyClassifier(object):

hyperparameters = {
'l1': {'type': float, 'range': [0, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3]},
'hidden_layer_units': {'type': int, 'range': 'range': [10, 50,
100, 500]},
'intercept': {'type': bool},
'normalize': {'type': bool},
}

def __init__(self, l1=10, hidden_layer_units=100, normalize=True,
intercept=True):
pass

def fit(self, X, y):
return self

def predict(self, X):
return -1

And then have a multiprocessing pool executor that queues cross
validation jobs for any combinations of the parameters (like the
grid_search.py script of the libsvm project. I think such a general
API + a default autotuner implementation would lower the barrier to
entry for newcomers and make scikits.learn concretely reach the goal
of "machine learning without learning the machinery".

Any opinion?
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Alexandre Gramfort
2010-06-30 10:42:11 UTC
Permalink
Hi,

you raise a point that we've started to discuss with Gael and Fabian.
Some open questions remain. For example a Lasso by coordinate
descent or by LARS do not work the same when trying to compute
the solution for different lambdas. A naive approach consists in
running N times the optimization in N different jobs but this is very
suboptimal. This option might however be the only one with
some classifiers like a regularized LDA for example.

Something that looks to me more general is to run
each fold of a cross-validation procedure in a different job.
I do it my self manually.

Also, Fabian and I suggested a fit_crossval method such as

clf.fit_crossval(X, y, crossval_generator, loss_func="optional")

Gael was not found of it as he wanted to keep the API simple ie. fit + predict

for now there is a separate object for the lasso and elastic-net
that is called LassoCV and ElasticNetCV that support

lasso_cv.fit(X, y, crossval_generator)

the loss is the RMSE.

you can see an example of this the examples folder and read
the code in glm.py

As you can see I don't have clear thoughts on this. I'm just
sharing some previous reflexions.

Alex

On Tue, Jun 29, 2010 at 10:05 PM, Olivier Grisel
Post by Olivier Grisel
Hi all again,
It would be great if we could setup a standard API for hyper-parameter
admissible ranges definitions and settings. That would allow us to to
  hyperparameters = {
      'l1': {'type': float, 'range': [0, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3]},
      'hidden_layer_units': {'type': int, 'range': 'range': [10, 50,
100, 500]},
      'intercept': {'type': bool},
      'normalize': {'type': bool},
  }
 def __init__(self, l1=10, hidden_layer_units=100, normalize=True,
      pass
      return self
      return -1
And then have a multiprocessing pool executor that queues cross
validation jobs for any combinations of the parameters (like the
grid_search.py script of the libsvm project. I think such a general
API + a default autotuner implementation would lower the barrier to
entry for newcomers and make scikits.learn concretely reach the goal
of "machine learning without learning the machinery".
Any opinion?
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Olivier Grisel
2010-06-30 11:29:01 UTC
Permalink
Post by Alexandre Gramfort
Hi,
you raise a point that we've started to discuss with Gael and Fabian.
Some open questions remain. For example a Lasso by coordinate
descent or by LARS do not work the same when trying to compute
the solution for different lambdas. A naive approach consists in
running N times the optimization in N different jobs but this is very
suboptimal. This option might however be the only one with
some classifiers like a regularized LDA for example.
Yes I agree that some implementations of supervised classifiers /
regressors already have more efficient ways of autotuning their hyper
parameters. In that case we can just treat those models as
hyperparameter free. For the implementations that don't have known
smart ways to do autotuning, making it trivial (from an API
standpoint) to run parallel CV.

This could be implemented as a generic adapter that turns any class
with the clf duck type "fit"/"predict" + hyperparameter annotations
into another class. This adapter could be implemented using a python
metaclass or using an explicit "adapter" design pattern.

One could hence have an automatically derived class OneClassSVCCV with
the C hyperparameter automatically dealt with and without having to
rewrite a new class from scratch:

OneClassSVCCV = ParallelAutotuner(svm.OneClassSVC, cv_factory=MyCrossValImpl)

and OneClassSVC(**kwargs) would return an object with method fit and
predict a usual, just hiding the parallel CV machinery to find the
best value of C at each call to the fit method.

Also another case of hyperparemeter type I forgot: categoricial
ranges: for instance the kernel used my SVM : linear, polynomial or
guaussian: but in that case this as an impact on the admissible ranges
of other hyperparameterrs (gamma only make sense for gaussian, not for
linear for instance). Hence for this case it might be better to treat
the different kernels types as different algorithms and not as
different hyperparameters.

RbfSVRCV = ParallelAutotuner(svm.SVR, kernel='rbf',
hyperparameters={grid of parameter ranges suitable for RBF SVR})
PolySVRCV = ParallelAutotuner(svm.SVR, kernel='poly',
hyperparameters={grid of parameter ranges suitable for polynomial
SVR}))

One should also be able to deal with more structured parameters
ranges: for instance to be able to treat the number of units in the
variable number of hidden layers of a neural nets such as Stacked RBMs

'hidden_units': {'type': 'int_2d', 'range': [[100], [500], [100, 100],
[500, 500], [500, 500, 500]]}

Also one this adapter scheme leaves the door open to more bayesian
approaches to dealing with hyperparameter sampling (BayesianAutotuner)
instead of a dumb parallel grid search. One could also have an
EvolutionaryAutotuner that tries to use GAs style operators on the
hyperparameter grid to focus on the most interesting combinations in
case there are many many parameters to tune.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Gael Varoquaux
2010-06-30 11:32:31 UTC
Permalink
Post by Olivier Grisel
hyperparameters = {
'l1': {'type': float, 'range': [0, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3]},
'hidden_layer_units': {'type': int, 'range': 'range': [10, 50,
100, 500]},
'intercept': {'type': bool},
'normalize': {'type': bool},
}
Agreed.

This means establishing a signature-like interface (a dictionnary is
fine) and the corresponding base class.

One thing I worry about is that this is a common problem, and that it has
already been solved a good deal of times. Of course Traits solves it, but
we can't afford having a dependency on it. There is traitlets, in the
IPython codebase, that is a much lighter dependency, but I just looked at
the code, and it uses MetaClasses, and is too abstract for my liking.

The reason I want to be a bit careful with this is that it is very easy
to get in a situation where the specified interface does not match the
actual one that the object implements.

So the question is: how do we make a base class that makes it a bit
easier and safer to ensure that the parameters actually somewhat
correspond to the specification.

I sould be worrying about organizaibng the euroscipy conference instead
of this problem (or listening to the talk I am into) but this is fun, and
I have implemented such problems a few times, so let me propose a
specification for the base class:


################################################################################
class BaseEstimator(object):

def __init__(self, **params):
assert hasattr(self, '_params'), 'Estimator class without parameter definition'
self._set_params(**params)


def _set_params(self, **params)
for key, value in params.iteritem():
assert key self._params, 'Specified parameter, %s, unknown'
assert isinstance(value, self.params[key])
setattr(self, key, value)


################################################################################
# Example concrete class

class AnEstimator(BaseEstimator):

_params = {'l1': float}

def fit(self, , Y, **params):
self._set_params(self, **params)
# Go on with the calculations...


Open problems: how do we make it so that the signature of the fit and the
__init__ actually look like real signatures, and don't have '**params'? I
believe that the way to solve this is to look at the decorator module
from Michele Simionato (http://pypi.python.org/pypi/decorator). The
question is: how to apply this to the __init__? In Python 2.6, we could
use class decorators, but we need to stick with 2.5. Applying a decorator
at class construction time is not an option because it will not reflect
the parameters added in subclasses, but it would be an option to use a
decorator of the subclass, adapting Simionato's code, and using the fact
that given the method, you can retrieve the class by it's 'im_class'
attribute.


Getting all this right requires a little bit of work, and I want to move
slowly here to avoid overdesign, and well and constraining designs.
Olivier, you known Python really well, and you seem to be interested by
this problem. Do you want to move forward and write some code as well as
some toy examples. That way we can start a design loop and make sure that
we are finding a solution that fits everybody.

Alright, back to listening to the talks :)

Gaël
Olivier Grisel
2010-06-30 11:49:21 UTC
Permalink
Agreed,

I can work in a branch to prototype such an approach and ask for a
global review on the list once there is actually some working code.
running. However I won't do that on the short term since I need to get
my stuff ready for my euroscipy talk before :)

I have also contacted the maintainers of http://mlcomp.org and they
will install numpy / scipy and the altas dev headers on there Amazon
EC2 virtual machine worker so that we will be able to launch
scikit-learns jobs on there standard datasets.

This will give me the opportunity to write the glue code for a 20 news
dataset for scikits.learn. 20news is around 40MB IIRC hence won't be
part of the source distrib of scikits.learn and I think it's better to
write a generic adapter for the MLComp format with instructions on how
to download the raw data from there.

I have checked the http://mldata.org effort too, but there version of
the 20 news data set is not formatted yet (just the raw archive from
UCI, no HDF5 version yet).
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Gael Varoquaux
2010-06-30 12:05:26 UTC
Permalink
Post by Olivier Grisel
I can work in a branch to prototype such an approach and ask for a
global review on the list once there is actually some working code.
running. However I won't do that on the short term since I need to get
my stuff ready for my euroscipy talk before :)
OK, no problem. It is good to know that we have someone thinking of such
problem in the background. I believe that better code come out of
such maturation.
Post by Olivier Grisel
I have also contacted the maintainers of http://mlcomp.org and they
will install numpy / scipy and the altas dev headers on there Amazon
EC2 virtual machine worker so that we will be able to launch
scikit-learns jobs on there standard datasets.
This will give me the opportunity to write the glue code for a 20 news
dataset for scikits.learn. 20news is around 40MB IIRC hence won't be
part of the source distrib of scikits.learn and I think it's better to
write a generic adapter for the MLComp format with instructions on how
to download the raw data from there.
Very very nice.
Post by Olivier Grisel
I have checked the http://mldata.org effort too, but there version of
the 20 news data set is not formatted yet (just the raw archive from
UCI, no HDF5 version yet).
I was with these guys last week. They are still ramping up. It will take
another 6 month probably to get going.

Thanks for the effort, all this will be very useful.

Gaël
Alexandre Gramfort
2010-06-30 12:59:33 UTC
Permalink
just to inspire (or not) future developments I've pushed on github
a code I've used from grid search

http://github.com/agramfort/scikit-learn/commit/29a02d737ed4eaa5862216da4dee1e3068f677b8

feedback welcome

Alex


On Wed, Jun 30, 2010 at 2:05 PM, Gael Varoquaux
Post by Gael Varoquaux
Post by Olivier Grisel
I can work in a branch to prototype such an approach and ask for a
global review on the list once there is actually some working code.
running. However I won't do that on the short term since I need to get
my stuff ready for my euroscipy talk before :)
OK, no problem. It is good to know that we have someone thinking of such
problem in the background. I believe that better code come out of
such maturation.
Post by Olivier Grisel
I have also contacted the maintainers of http://mlcomp.org and they
will install numpy / scipy and the altas dev headers on there Amazon
EC2 virtual machine worker so that we will be able to launch
scikit-learns jobs on there standard datasets.
This will give me the opportunity to write the glue code for a 20 news
dataset for scikits.learn. 20news is around 40MB IIRC hence won't be
part of the source distrib of scikits.learn and I think it's better to
write a generic adapter for the MLComp format with instructions on how
to download the raw data from there.
Very very nice.
Post by Olivier Grisel
I have checked the http://mldata.org effort too, but there version of
the 20 news data set is not formatted yet (just the raw archive from
UCI, no HDF5 version yet).
I was with these guys last week. They are still ramping up. It will take
another 6 month probably to get going.
Thanks for the effort, all this will be very useful.
Gaël
------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Olivier Grisel
2010-06-30 13:15:50 UTC
Permalink
Post by Alexandre Gramfort
just to inspire (or not) future developments I've pushed on github
a code I've used from grid search
http://github.com/agramfort/scikit-learn/commit/29a02d737ed4eaa5862216da4dee1e3068f677b8
Ok so this is almost exactly what I had in mind :) One would just need
to provide as set of documented yet overridable default parameters
grids for each algorithm and then scikits.learn would be completeley
noob friendly.

Have you already pushed it to source forge or is it just on your github?
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Alexandre Gramfort
2010-06-30 13:30:59 UTC
Permalink
Post by Olivier Grisel
Ok so this is almost exactly what I had in mind :)
great !

your python jargon had lost me :)
Post by Olivier Grisel
One would just need
to provide as set of documented yet overridable default parameters
grids for each algorithm and then scikits.learn would be completeley
noob friendly.
I start to get it.
Post by Olivier Grisel
Have you already pushed it to source forge or is it just on your github?
just on github

shall I put it on source-forge ?

Alex
Olivier Grisel
2010-06-30 14:39:17 UTC
Permalink
Post by Alexandre Gramfort
Post by Olivier Grisel
Ok so this is almost exactly what I had in mind :)
great !
your python jargon had lost me :)
Post by Olivier Grisel
One would just need
to provide as set of documented yet overridable default parameters
grids for each algorithm and then scikits.learn would be completeley
noob friendly.
I start to get it.
Post by Olivier Grisel
Have you already pushed it to source forge or is it just on your github?
just on github
shall I put it on source-forge ?
It looks good to me. We can use it as a base for further improvement
or I can fork your branch on github, as you which. I am not in a hurry
any way.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Alexandre Gramfort
2010-06-30 16:08:32 UTC
Permalink
Post by Olivier Grisel
It looks good to me. We can use it as a base for further improvement
or I can fork your branch on github, as you which. I am not in a hurry
any way.
yes I would say fork my branch and we'll merge it when we've converged
to something good.

Alex
Fabian Pedregosa
2010-06-30 19:55:59 UTC
Permalink
Post by Alexandre Gramfort
Post by Olivier Grisel
Ok so this is almost exactly what I had in mind :)
great !
your python jargon had lost me :)
Post by Olivier Grisel
One would just need
to provide as set of documented yet overridable default parameters
grids for each algorithm and then scikits.learn would be completeley
noob friendly.
I start to get it.
Post by Olivier Grisel
Have you already pushed it to source forge or is it just on your github?
just on github
shall I put it on source-forge ?
Hi Alex. Thanks for working on this, the example looks really cool. I'd
just like to ask some questions.

- You added a dependency on joblib. Not that I'm against it, but I
wonder whether you can achieve the same result using threads or the new
multiprocessing module, which is part of the standard library.

- If you have to use joblib, I highly recommend to hide inside the
GridSearch, so that the module can still be imported and run with job=1
even when joblib is not installed.

- The example is soo slow, do you think it could be possible to speed
it up , or is it just intrinsic to nested cross-validation ?


- Do you think we could provide some default parameters for
cross_val_factory and loss_func ?


Thanks,


PD: Gael moved joblib from bzr to git (amazing)!!
Post by Alexandre Gramfort
Alex
------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Alexandre Gramfort
2010-06-30 20:14:16 UTC
Permalink
Post by Fabian Pedregosa
Hi Alex. Thanks for working on this, the example looks really cool.
thx
Post by Fabian Pedregosa
I'd
just like to ask some questions.
sure
Post by Fabian Pedregosa
  - You added a dependency on joblib. Not that I'm against it, but I
wonder whether you can achieve the same result using threads or the new
multiprocessing module, which is part of the standard library.
joblib.parallel is a single file that gael put on joblib to attach it to
a bigger project. Many we could copy it to scikits.learn.parallel ?
I find the syntax quite simple and avoids to have to deal with
multiprocessing every time we write parallel code.
Post by Fabian Pedregosa
  - If you have to use joblib, I highly recommend to hide inside the
GridSearch, so that the module can still be imported and run with job=1
even when joblib is not installed.
that how joblib.parallel does it.
Post by Fabian Pedregosa
  - The example is soo slow, do you think it could be possible to speed
it up , or is it just intrinsic to  nested cross-validation ?
you can speed it up by defining

def crossval_generator(n_samples):
KFold(n_samples, 5)
Post by Fabian Pedregosa
  - Do you think we could provide some default parameters for
cross_val_factory and loss_func ?
yes. Definitely. However the loss_func is different if you work
in regression and classification.

shall we create a scikits.learn.loss with the most classical
loss functions?
Post by Fabian Pedregosa
Thanks,
my pleasure
Post by Fabian Pedregosa
PD: Gael moved joblib from bzr to git (amazing)!!
amazing :)

Alex
Olivier Grisel
2010-06-30 22:52:47 UTC
Permalink
As for joblib.Parallel vs multiprocessing, I have recently been
playing with multiprocessing and there is just a trick to know to
make it work well without stalling it by hitting Ctrl-C:

http://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool

Other than that it works really well. I can give a hand should you
want to reimplement GridSearch on top of multiprocessing. I don't mind
aving a copy of Gael's Parallel in scikits either if you like it
better.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Gael Varoquaux
2010-07-01 09:51:24 UTC
Permalink
Post by Olivier Grisel
As for joblib.Parallel vs multiprocessing, I have recently been
playing with multiprocessing and there is just a trick to know to
http://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool
I need to integrate this in joblib. Someone wants to beat me to it?

Gaël
Fabian Pedregosa
2010-07-01 15:28:07 UTC
Permalink
Post by Alexandre Gramfort
Post by Fabian Pedregosa
Hi Alex. Thanks for working on this, the example looks really cool.
thx
Post by Fabian Pedregosa
I'd
just like to ask some questions.
sure
Post by Fabian Pedregosa
- You added a dependency on joblib. Not that I'm against it, but I
wonder whether you can achieve the same result using threads or the new
multiprocessing module, which is part of the standard library.
joblib.parallel is a single file that gael put on joblib to attach it to
a bigger project. Many we could copy it to scikits.learn.parallel ?
I find the syntax quite simple and avoids to have to deal with
multiprocessing every time we write parallel code.
Yes, if it's just one file and we could ship it with the scikit it would
make things easier. As Gael suggested scikits.learn.external is a good
name, as it makes explicit that it's an external project.
Post by Alexandre Gramfort
Post by Fabian Pedregosa
- If you have to use joblib, I highly recommend to hide inside the
GridSearch, so that the module can still be imported and run with job=1
even when joblib is not installed.
that how joblib.parallel does it.
Post by Fabian Pedregosa
- The example is soo slow, do you think it could be possible to speed
it up , or is it just intrinsic to nested cross-validation ?
you can speed it up by defining
KFold(n_samples, 5)
OK, BTW it might be useful to be able to pass parameters to
cross_val_factory.
Post by Alexandre Gramfort
Post by Fabian Pedregosa
- Do you think we could provide some default parameters for
cross_val_factory and loss_func ?
yes. Definitely. However the loss_func is different if you work
in regression and classification.
shall we create a scikits.learn.loss with the most classical
loss functions?
Exactly, but maybe into scikits.learn.metric ?


However, I feel that this does not solve the model-specificity issue: it
won't be capable of using model-specific optimizations (like using the
computed path in the LAR case, or using built-in cross-validation
function in libsvm). But frankly, I don't know how we could implement
this without either putting some code into the classifiers or using some
sort of model-specific delegation into the GridSearch object.


Cheers,

~fabian
Gael Varoquaux
2010-07-02 07:15:09 UTC
Permalink
Post by Fabian Pedregosa
Post by Alexandre Gramfort
joblib.parallel is a single file that gael put on joblib to attach it to
a bigger project. Many we could copy it to scikits.learn.parallel ?
I find the syntax quite simple and avoids to have to deal with
multiprocessing every time we write parallel code.
Yes, if it's just one file and we could ship it with the scikit it would
make things easier. As Gael suggested scikits.learn.external is a good
name, as it makes explicit that it's an external project.
It's no longer one file, as it has now more avanced error management. But
you can still just grab the directory and stick it in externals: there
are only relative imports.
Post by Fabian Pedregosa
However, I feel that this does not solve the model-specificity issue: it
won't be capable of using model-specific optimizations (like using the
computed path in the LAR case, or using built-in cross-validation
function in libsvm). But frankly, I don't know how we could implement
this without either putting some code into the classifiers or using some
sort of model-specific delegation into the GridSearch object.
Indeed, I think that Alex, Vincent and I agree that we need
problem-specific code in the estimators.

Gaël
Olivier Grisel
2010-07-02 08:17:45 UTC
Permalink
Post by Gael Varoquaux
Post by Fabian Pedregosa
However, I feel that this does not solve the model-specificity issue: it
won't be capable of using model-specific optimizations (like using the
computed path in the LAR case, or using built-in cross-validation
function in libsvm). But frankly, I don't know how we could implement
this without either putting some code into the classifiers or using some
sort of model-specific delegation into the GridSearch object.
Indeed, I think that Alex, Vincent and I agree that we need
problem-specific code in the estimators.
Yes that's why I said that those implementations that are already able
to tune their own parameters such as Coordinate Descent with warm
restarts for Lasso can be considered parameter free and they don't
need to be wrapped by the GridSearch adapter. In my opinion the
GridSearch is really only interesting for SVM (with or without
kernels), regularized Logistic Regression (the liblinear
implementation, not the hypothethic coordinate descent version), and
the future stochastic gradient descent based algorithms (such as RBMs,
MLPs, and so on).
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Matthieu PERROT
2010-07-02 14:02:44 UTC
Permalink
Post by Fabian Pedregosa
However, I feel that this does not solve the model-specificity issue: it
won't be capable of using model-specific optimizations (like using the
computed path in the LAR case, or using built-in cross-validation
function in libsvm). But frankly, I don't know how we could implement
this without either putting some code into the classifiers or using some
sort of model-specific delegation into the GridSearch object.
We need to be carefull with the introduction of these model-specific methods,
especially with methods which could also exist elsewhere. The crossvalidation
is a good example. In a global generic crossvalidation scheme, in the case of
a dataset with highly unbalanced classes, we should ensure that each classe
is represented up to its relative proportion. In the case of the built-in
cross-validation function of libsvm, this caution is not followed*. So, the
behaviour is different which might be confusing for a beginner. IMHO, the
reasoning should be the same with other built-in or model-specific functions.

A simple solution is to have good docstrings. A better but more complex one is
to standardize the API of these model-specific functions with the general
API.

*: I have suggested a patch for this feature a few years ago but the author
was not really interested.
--
Matthieu Perrot
Emanuele Olivetti
2010-07-02 15:01:03 UTC
Permalink
On 07/02/2010 04:02 PM, Matthieu PERROT wrote:
...
Post by Matthieu PERROT
In a global generic crossvalidation scheme, in the case of
a dataset with highly unbalanced classes, we should ensure that each classe
is represented up to its relative proportion. In the case of the built-in
cross-validation function of libsvm, this caution is not followed*. So, the
behaviour is different which might be confusing for a beginner.
...

As far as I know libsvm does stratified cross-validation since v2.7 [*].
It is not much advertised but I had to deal with unbalanced datasets
recently and noticed that libsvm cross-validated accuracy was slightly
higher (on average) than plain non-stratified cross-validation. When
I manually did stratified cross-validation I got the same results as with
libsvm CV.

See svm_cross_validation() in svm.cpp.

Best,

E.

[*]: http://www.csie.ntu.edu.tw/~cjlin/libsvm/acknowledgements
Matthieu PERROT
2010-07-02 17:50:56 UTC
Permalink
Post by Emanuele Olivetti
As far as I know libsvm does stratified cross-validation since v2.7 [*].
It is not much advertised but I had to deal with unbalanced datasets
recently and noticed that libsvm cross-validated accuracy was slightly
higher (on average) than plain non-stratified cross-validation. When
I manually did stratified cross-validation I got the same results as with
libsvm CV.
See svm_cross_validation() in svm.cpp.
Best,
E.
[*]: http://www.csie.ntu.edu.tw/~cjlin/libsvm/acknowledgements
Thanks for the rectification. But, as far as I remember, I guess the folds
were theorically stratified but not practically, so it is was possible to
observe some folds without any representant of one class (given enough fold
according to the number of class).

Regarding the patch I have send, I understood my mistake reading my 5
years-old email to the author of libsvm (the version was 2.8 at that time).
It was actually about another issue related to unbalanced data during the
logistic regression fit used to get posteriors probabilities. Since then it
may have changed, I do not check it.

Aside this, the idea about model-specific function remains valid :).
--
Matthieu Perrot
Gael Varoquaux
2010-07-02 19:49:14 UTC
Permalink
This is why the current plan (which can be changed) is to have subclasses of the non cross-validated object that implement a specific autotuning strategy in the 'fit'. One can think of having several strategies, and the door is always open for the user to implement his own subclass.

I'd love to hear your feedback on this proposal.

Gael

----- Original message -----
Post by Matthieu PERROT
it won't be capable of using model-specific optimizations (like using
the computed path in the LAR case, or using built-in cross-validation
function in libsvm). But frankly, I don't know how we could implement
this without either putting some code into the classifiers or using
some sort of model-specific delegation into the GridSearch object.
We need to be carefull with the introduction of these model-specific
methods,  especially with methods which could also exist elsewhere. The
crossvalidation  is a good example. In a global generic crossvalidation
scheme, in the case of  a dataset with highly unbalanced classes, we
should ensure that each classe  is represented up to its relative
proportion. In the case of the built-in  cross-validation function of
libsvm, this caution is not followed*. So, the  behaviour is different
which might be confusing for a beginner. IMHO, the  reasoning should be
the same with other built-in or model-specific functions.
A simple solution is to have good docstrings. A better but more complex
one is  to standardize the API of these model-specific functions with
the general  API.
*: I have suggested a patch for this feature a few years ago but the
author  was not really interested.
--
Matthieu Perrot
------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Matthieu PERROT
2010-07-05 07:58:16 UTC
Permalink
I'm not aware of this plan. Without a doubt, it is a good way for user to
define their own strategies and bypass/hack into the API. But we might need a
more standardized approach to switch between generic methods from the core of
the scikit and built-in, specialized ones from specific models. Their
comparison or the switch from one to another would be easier this way. We
could use a fit_builtin(X, Y) method for an explicit call and a fit(X, Y,
builtin=True) for a more generic way to call the same method. But, this
proposal does not cover all cases, because several built-in method calls
could be combine and hide behind a fit_builtin(X, Y) call. For standard
strategies maybe we should favour : fit_strategyname(X, Y) and its related
fit(X, Y, strategy="strategy_name"), should-we ?
--
Matthieu Perrot
Post by Gael Varoquaux
This is why the current plan (which can be changed) is to have subclasses
of the non cross-validated object that implement a specific autotuning
strategy in the 'fit'. One can think of having several strategies, and the
door is always open for the user to implement his own subclass.
I'd love to hear your feedback on this proposal.
Gael
----- Original message -----
Post by Matthieu PERROT
it won't be capable of using model-specific optimizations (like using
the computed path in the LAR case, or using built-in cross-validation
function in libsvm). But frankly, I don't know how we could implement
this without either putting some code into the classifiers or using
some sort of model-specific delegation into the GridSearch object.
We need to be carefull with the introduction of these model-specific
methods,  especially with methods which could also exist elsewhere. The
crossvalidation  is a good example. In a global generic crossvalidation
scheme, in the case of  a dataset with highly unbalanced classes, we
should ensure that each classe  is represented up to its relative
proportion. In the case of the built-in  cross-validation function of
libsvm, this caution is not followed*. So, the  behaviour is different
which might be confusing for a beginner. IMHO, the  reasoning should be
the same with other built-in or model-specific functions.
A simple solution is to have good docstrings. A better but more complex
one is  to standardize the API of these model-specific functions with
the general  API.
*: I have suggested a patch for this feature a few years ago but the
author  was not really interested.
--
Matthieu Perrot
-------------------------------------------------------------------------
----- This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Matthieu Perrot
Gael Varoquaux
2010-07-01 09:50:50 UTC
Permalink
Post by Fabian Pedregosa
- You added a dependency on joblib. Not that I'm against it, but I
wonder whether you can achieve the same result using threads or the new
multiprocessing module, which is part of the standard library.
Joblib makes error management easier: you get meaningful tracebacks. I
think we should be using it.

On the other hand, we should not depend on it. I think we should simply
add a snapshot of joblib is scikits.learn.external. All the imports are
relative, so it should not be a lot of work.
Post by Fabian Pedregosa
- If you have to use joblib, I highly recommend to hide inside the
GridSearch, so that the module can still be imported and run with job=1
even when joblib is not installed.
+1.

Gaël
Gael Varoquaux
2010-07-17 14:12:09 UTC
Permalink
Post by Gael Varoquaux
Post by Olivier Grisel
hyperparameters = {
'l1': {'type': float, 'range': [0, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3]},
'hidden_layer_units': {'type': int, 'range': 'range': [10, 50,
100, 500]},
'intercept': {'type': bool},
'normalize': {'type': bool},
}
Agreed.
This means establishing a signature-like interface (a dictionnary is
fine) and the corresponding base class.
One thing I worry about is that this is a common problem, and that it has
already been solved a good deal of times. Of course Traits solves it, but
we can't afford having a dependency on it. There is traitlets, in the
IPython codebase, that is a much lighter dependency, but I just looked at
the code, and it uses MetaClasses, and is too abstract for my liking.
The reason I want to be a bit careful with this is that it is very easy
to get in a situation where the specified interface does not match the
actual one that the object implements.
So the question is: how do we make a base class that makes it a bit
easier and safer to ensure that the parameters actually somewhat
correspond to the specification.
So, Alex and I actually needed part of this feature yesterday, so we went
ahead and coded it. I was reasonnably happy with a pattern that emerged
out of that coding so I blogged about it to sum up our experience and
possible future work:

http://gael-varoquaux.info/blog/?p=134

The place where we needed it was for lamdba paths in cross-validated
Lasso and Elastic Net, in particular around glm.py, line 785:
http://github.com/GaelVaroquaux/scikit-learn/blob/master/scikits/learn/glm.py#L785

This means that we now (finally) have a base class for all the estimators
and that everything should derive from it.

What this gives us
======================

* A nice __repr__

* A way to query which parameters define a model (self._get_params())

What this imposes
====================

* To derive from scikits.learn.base_estimator.BaseEstimator

* To use only named arguments with sane defaults in the inits (which I
believe is good policy).

So, as you see, a super light framework. And it avoids using metaclasses.

Gaël
Olivier Grisel
2010-07-17 18:01:54 UTC
Permalink
I like it. Thanks for stepping up and implementing it.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Gael Varoquaux
2010-07-17 20:26:34 UTC
Permalink
If _you_ like it, then I am really proud.

Gael

----- Original message -----
Post by Olivier Grisel
I like it. Thanks for stepping up and implementing it.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Fabian Pedregosa
2010-07-20 10:49:02 UTC
Permalink
Post by Gael Varoquaux
Post by Gael Varoquaux
Post by Olivier Grisel
hyperparameters = {
'l1': {'type': float, 'range': [0, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3]},
'hidden_layer_units': {'type': int, 'range': 'range': [10, 50,
100, 500]},
'intercept': {'type': bool},
'normalize': {'type': bool},
}
Agreed.
This means establishing a signature-like interface (a dictionnary is
fine) and the corresponding base class.
One thing I worry about is that this is a common problem, and that it has
already been solved a good deal of times. Of course Traits solves it, but
we can't afford having a dependency on it. There is traitlets, in the
IPython codebase, that is a much lighter dependency, but I just looked at
the code, and it uses MetaClasses, and is too abstract for my liking.
The reason I want to be a bit careful with this is that it is very easy
to get in a situation where the specified interface does not match the
actual one that the object implements.
So the question is: how do we make a base class that makes it a bit
easier and safer to ensure that the parameters actually somewhat
correspond to the specification.
So, Alex and I actually needed part of this feature yesterday, so we went
ahead and coded it. I was reasonnably happy with a pattern that emerged
out of that coding so I blogged about it to sum up our experience and
http://gael-varoquaux.info/blog/?p=134
The place where we needed it was for lamdba paths in cross-validated
http://github.com/GaelVaroquaux/scikit-learn/blob/master/scikits/learn/glm.py#L785
This means that we now (finally) have a base class for all the estimators
and that everything should derive from it.
What this gives us
======================
* A nice __repr__
* A way to query which parameters define a model (self._get_params())
What this imposes
====================
* To derive from scikits.learn.base_estimator.BaseEstimator
* To use only named arguments with sane defaults in the inits (which I
believe is good policy).
So, as you see, a super light framework. And it avoids using metaclasses.
This looks great. I would prefer however to call the module base.py so
that in the future we can store there other base classes. I'll play with
it and send some feedback :-)

Fabian.

Loading...