Olivier Grisel
2012-03-08 19:18:28 UTC
Some fresh news from the hyperparameters tuning front-lines:
http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf
Some interesting snippets from the conclusion (I have not yet read the
rest of the paper):
"""
We have shown that random experiments are more efficient than grid
experiments for hyper-parameter optimization in the case of several
learning algorithms on several data sets. Our analysis of the
hyper-parameter response surface (Ψ) suggests that random experiments
are more efficient because not all hyper- parameters are equally
important to tune. Grid search experiments allocate too many trials to
the exploration of dimensions that do not matter and suffer from poor
coverage in dimensions that are important.
"""
"""
Random experiments are also easier to carry out than grid experiments
for practical reasons related to the statistical independence of every
trial.
• The experiment can be stopped any time and the trials form a
complete experiment.
• If extra computers become available, new trials can be added to an
experiment without having to adjust the grid and commit to a much
larger experiment.
• Every trial can be carried out asynchronously.
• If the computer carrying out a trial fails for any reason, its trial
can be either abandoned or restarted without jeopardizing the
experiment.
"""
I wonder how this would transpose to scikit-learn models that have
often much fewer hyper-parameters that the average Deep Belief
Network. Still it's very interesting food for thought if someone
want's to dive into improving the model selection tooling in the
scikit.
Maybe a new GSoC topic? Anybody would be interested as a mentor or candidate?
http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf
Some interesting snippets from the conclusion (I have not yet read the
rest of the paper):
"""
We have shown that random experiments are more efficient than grid
experiments for hyper-parameter optimization in the case of several
learning algorithms on several data sets. Our analysis of the
hyper-parameter response surface (Ψ) suggests that random experiments
are more efficient because not all hyper- parameters are equally
important to tune. Grid search experiments allocate too many trials to
the exploration of dimensions that do not matter and suffer from poor
coverage in dimensions that are important.
"""
"""
Random experiments are also easier to carry out than grid experiments
for practical reasons related to the statistical independence of every
trial.
• The experiment can be stopped any time and the trials form a
complete experiment.
• If extra computers become available, new trials can be added to an
experiment without having to adjust the grid and commit to a much
larger experiment.
• Every trial can be carried out asynchronously.
• If the computer carrying out a trial fails for any reason, its trial
can be either abandoned or restarted without jeopardizing the
experiment.
"""
I wonder how this would transpose to scikit-learn models that have
often much fewer hyper-parameters that the average Deep Belief
Network. Still it's very interesting food for thought if someone
want's to dive into improving the model selection tooling in the
scikit.
Maybe a new GSoC topic? Anybody would be interested as a mentor or candidate?
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel