Post by Lars BuitinckPost by James BergstraFurther to this: I started a project on github to look at how to
combine hyperopt with sklearn.
https://github.com/jaberg/hyperopt-sklearn
I've only wrapped on algorithm so far: Perceptron
https://github.com/jaberg/hyperopt-sklearn/blob/master/hpsklearn/perceptron.py
My idea is that little files like perceptron.py would encode
(a) domain expertise about what values make sense for a particular
hyper-parameter (see the `search_space()` function and
(b) a sklearn-style fit/predict interface that encapsulates search
over those hyper-parameters (see `AutoPerceptron`)
I'm not sure what your long-term goals with this project are, but I
1. The values might be problem-dependent rather than estimator
dependent. In your example, you're optimizing for accuracy, but you
might want to optimize for F1-score instead.
Good point, and if I understand correctly, it's related to your other
point below about GridSearch. I think you are pointing out that the
design of the AutoPerceptron is off the mark for 2 reasons:
1. There is only one line in that class that actually refers to
Perceptron, so why not make the actual estimator a constructor
argument? (I agree, it should be an argument.)
2. The class mainly consists of plumbing, but also is hard-coded to
compute classification error. This is silly, it would be better to use
either (a) the native loss of the estimator or else (b) some specific
user-supplied validation metric.
I agree with both of these points. Let me know if I misunderstood you though.
Post by Lars Buitinck2. The number is estimators is *huge* if you also consider
combinations like SelectKBest(chi2) -> RBFSamples -> SGDClassifier
pipelines (a classifier that I was trying out only yesterday).
Yes, the number of estimators in a search space can be huge. In my
research on visual system models I found that hyperopt was
surprisingly useful, even in the face of daunting configuration
problems. The point of this project, for me, is to see how it stacks
up.
One design aspect that doesn't come through in the current code sample
is that the hard-coded parameter spaces (which I'll come to in a
second) must compose. What I mean is that if someone has written up a
standard SGDClassifier search space, and someone has coded up search
spaces for SelectKBest and RBFSamples, then you should be able to just
string those all together and search the joint space without much
trouble.
Your particular case is exactly the sort of case I would hope
eventually to address - it's difficult to give sensible defaults to
each of those modules before knowing either (a) what kind of data they
will process and (b) what's going on in the rest of the pipeline.
Playing with a bunch of interacting variables as measured by
long-running programs is hard for people; automatic methods don't
actually have to be all that efficient to be competitive.
Post by Lars Buitinck3. The estimator parameters change sometimes, so this would have to be
kept in sync with scikit-learn.
This is a price I was expecting to have to pay, I don't see any way
around it. Part of the value of this library is encoding parameter
ranges for specific estimators. That tight coupling is not something
to be dodged.
- James
Post by Lars BuitinckWhen I wrote the scikit-learn wrapper for NLTK [1], I chose a strategy
where *no scikit-learn code is imported at all* (except when the user
runs the demo or unit tests). Instead, the user is responsible for
importing it and constructing the appropriate estimator. This makes
the code robust to API changes, and it can handle arbitrarily complex
sklearn.Pipeline objects, as well as estimators that follow the API
conventions but are not in scikit-learn proper.
I think a similar approach can be followed here. While some
suggestions for parameters to try might be shipped as examples, an
estimator- and evaluation-agnostic wrapper class ("meta-estimator") is
a stronger basis for a package like the one you're writing.
scikit-learn's own GridSearch is also implemented like this, to a
large extent.
[1] https://github.com/nltk/nltk/blob/f7f3b73f0f051639d87cfeea43b0aabf6f167b8f/nltk/classify/scikitlearn.py
Thanks, yes, there is a strong similarity between what I'm trying to
do and GridSearch, so it makes sense to use similar strategies for
comparing model outputs. The "AutoPerceptron" class would be improved
by being more generic, like GridSearch.
- James