Mathieu - thanks so much for this in depth response. It clarified a fair
amount of what I was seeing in the gradient boosting source code that
differs from my prior understanding of standard GBMs. It also became clear
from reviewing the code that to implement RGF I would have to modify both
tree.py and _tree.pyx as well to accomplish, as you described it, the
interleaving of tree induction and boosting. I would need to make it so a
DecisionTreeRegressor - or perhaps a separate class implementing
BaseDecisionTree - can add one new split at a time, as well as add a
function to return loss reductions that would result from each possible
split.
Understandable that scikit-learn wants to focus on more mature algorithms,
so perhaps I'll spend my efforts more on writing a python wrapper for
Johnson and Zhang's implementation of RGF, at least for now. Personally I
do think it is fairly well proven though, as Johnson and Zhang have been
quite successful with it in competitions. To a lesser extent so have I for
that matter :).
One question I do have - you said that Option 3 is the one scikit-learn
implements (setting one separate weight per leaf node of the last
estimator, by line search). Perhaps I'm overlooking something, but to me
the code looks like it is implementing Option 1 (using the same constant
value (`learning_rate`) for all estimators).
Take the loss function LeastSquaresError for simplicity. Unless I'm
mistaken, the predictions are simply updated by this statement:
y_pred[:, k] += learning_rate * tree.predict(X).ravel()
where learning_rate is a constant, and that's it. Isn't that Option 1? I
don't see any line search.
Best,
Ken
Post by Mathieu BlondelI read the RGF paper. Interesting method but definitely too early to
include it in scikit-learn (we focus on mature algorithms). This also looks
more complicated to implement than gradient boosting, since tree induction
and boosting are interleaved.
The paper also clarified what "fully corrective" means, thanks. To
summarize, here are different strategies for setting the weights in
1. using the same constant value (`learning_rate`) for all estimators
2. setting the weight of the last estimator by line search (other weights
are kept fixed)
3. setting one separate weight per leaf node of the last estimator, by
line search
4. refitting all estimators weights (including the past ones)
5. refitting all leaf nodes of all estimators?
Some authors [1] argued that option 1 is sufficient in practice to get
good performance since our goal is to do well on test data, not training
data.
Option 3 is what scikit-learn implements. Option 4 is the fully corrective
case. It basically amounts to a least squares for square loss or to
logisic regression for log loss (using each estimator predictions as
features). One thing though is that our implementation doesn't store the
estimator weights explicitly (leaves are updated directly). Setting a
penalty (l1 or l2) on the estimator weights (i.e., on the functional)
should prevent overfitting. Option 5 sounds pretty computationally
expensive, although the update doesn't need to be done at every iteration.
I recently re-implemented gradient boosting [2]. One difference in my
implementation is that it is possible to use any base learner (not just
trees). So my implementation stores estimator weights explicitly and uses
option 2 above. The fully corrective updates (option 4) might be easier to
add to my implementation.
BTW, the notion of fully corrective updates is also present in the
matching pursuit (-> orthogonal matching pursuit) and frank-wolfe
literatures.
Mathieu
[1] "Boosting Algorithms: Regularization, Prediction and Model Fitting",
Peter B Ìuhlmann and Torsten Hothorn (thanks to Peter for telling me about
this paper)
[2]
https://github.com/mblondel/ivalice/blob/master/ivalice/impl/gradient_boosting.py
Mathieu
Post by c TAKESyes - In fact my real goal is to implement RGF ultimately, though I had
considered building/forking off the current GradientBoostingRegressor
package as a starting point A) b/c I'm new to developing for scikit-learn
and B) to maintain as much consistency as possible with the rest of the
package.
Upon further review though, I don't think there's much point in adding
fully corrective updates to GBR because without the regularization (the
rest of RGF) it is probably useless and leads to over fitting, as per
http://stat.rutgers.edu/home/tzhang/software/rgf/tpami14-rgf.pdf.
So it would likely be more useful to go ahead and create RGF as an
entirely separate module. But that could take some time, of course.
Is anyone working on RGF for sklearn?
Arnaud, thanks for directing me to your sparse implementation, I will
have a look! It would certainly be excellent to have this work for all
decision tree algorithms.
Ken
On Tue, Sep 16, 2014 at 11:16 AM, Peter Prettenhofer <
Post by Peter PrettenhoferThe only reference I know is the Regularized Greedy Forest paper by
Johnson and Zhang [1]
I havent read the primary source (by Zhang as well).
[1] http://arxiv.org/abs/1109.0887
Post by Mathieu BlondelCould you give a reference for gradient boosting with fully corrective
updates?
Since the philosophy of gradient boosting is to fit each tree against
the residuals (or negative gradient) so far, I am wondering how such fully
corrective update would work...
Mathieu
Post by c TAKESIs anyone working on making Gradient Boosting Regressor work with
sparse matrices?
Or is anyone working on adding an option for fully corrective gradient
boosting, I.E. all trees in the ensemble are re-weighted at each iteration?
These are things I would like to see and may be able to help with if
no one is currently working on them.
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Peter Prettenhofer
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general