Discussion:
[Scikit-learn-general] GSoC 2014 Proposal - Improving Linear Models (First draft)
Manoj Kumar
2014-03-06 17:41:52 UTC
Permalink
Hello,

I have prepared a wiki page for the first draft of my GSoC proposal after
several discussions. Please do have a look and provide me feedback.
https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2014-Application:-Improved-Linear-Models

Thanks
--
Regards,
Manoj Kumar,
Mech Undergrad
http://manojbits.wordpress.com
Alexandre Gramfort
2014-03-06 20:08:53 UTC
Permalink
hi Manoj,

looks like a pretty decent proposal to me.

Cheers,
Alex


On Thu, Mar 6, 2014 at 6:41 PM, Manoj Kumar
Post by Manoj Kumar
Hello,
I have prepared a wiki page for the first draft of my GSoC proposal after
several discussions. Please do have a look and provide me feedback.
https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2014-Application:-Improved-Linear-Models
Thanks
--
Regards,
Manoj Kumar,
Mech Undergrad
http://manojbits.wordpress.com
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to
Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization and
the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Robert Layton
2014-03-07 00:33:17 UTC
Permalink
I agree, it is a strong project and important stuff to do. I feel that the
motivation is lacking purpose -- why bother doing this project? (that's not
a rhetorical question). At present, the project feels like "here are some
things to do, so I'll do them", without any real reason why they should be
the ones done.

Something like "linear models are an effective form of classifier for these
reasons, X, Y, Z. The existing scikit-learn implementation has the
following limitations, A, B, C. In this project I will address these
limitations by performing the following, J, K, L."

Good luck!





On 7 March 2014 07:08, Alexandre Gramfort <
Post by Alexandre Gramfort
hi Manoj,
looks like a pretty decent proposal to me.
Cheers,
Alex
On Thu, Mar 6, 2014 at 6:41 PM, Manoj Kumar
Post by Manoj Kumar
Hello,
I have prepared a wiki page for the first draft of my GSoC proposal after
several discussions. Please do have a look and provide me feedback.
https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2014-Application:-Improved-Linear-Models
Post by Manoj Kumar
Thanks
--
Regards,
Manoj Kumar,
Mech Undergrad
http://manojbits.wordpress.com
------------------------------------------------------------------------------
Post by Manoj Kumar
Subversion Kills Productivity. Get off Subversion & Make the Move to
Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization and
the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
Post by Manoj Kumar
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to
Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization and
the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Manoj Kumar
2014-03-07 04:07:23 UTC
Permalink
Hi Layton.

Thanks for the feedback. I shall make that a bit stronger. Thanks.
Post by Robert Layton
I agree, it is a strong project and important stuff to do. I feel that the
motivation is lacking purpose -- why bother doing this project? (that's not
a rhetorical question). At present, the project feels like "here are some
things to do, so I'll do them", without any real reason why they should be
the ones done.
Something like "linear models are an effective form of classifier for
these reasons, X, Y, Z. The existing scikit-learn implementation has the
following limitations, A, B, C. In this project I will address these
limitations by performing the following, J, K, L."
Good luck!
On 7 March 2014 07:08, Alexandre Gramfort <
Post by Alexandre Gramfort
hi Manoj,
looks like a pretty decent proposal to me.
Cheers,
Alex
On Thu, Mar 6, 2014 at 6:41 PM, Manoj Kumar
Post by Manoj Kumar
Hello,
I have prepared a wiki page for the first draft of my GSoC proposal
after
Post by Manoj Kumar
several discussions. Please do have a look and provide me feedback.
https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2014-Application:-Improved-Linear-Models
Post by Manoj Kumar
Thanks
--
Regards,
Manoj Kumar,
Mech Undergrad
http://manojbits.wordpress.com
------------------------------------------------------------------------------
Post by Manoj Kumar
Subversion Kills Productivity. Get off Subversion & Make the Move to
Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization
and
Post by Manoj Kumar
the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
Post by Manoj Kumar
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to
Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization and
the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to
Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization and
the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,
Manoj Kumar,
Mech Undergrad
http://manojbits.wordpress.com
Lars Buitinck
2014-03-07 09:54:19 UTC
Permalink
Post by Manoj Kumar
I have prepared a wiki page for the first draft of my GSoC proposal after
several discussions. Please do have a look and provide me feedback.
https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2014-Application:-Improved-Linear-Models
Technical question: do we need separate LR and multinomial LR? If CD
can fit multinomial LR instead of OvA LR, then I'm +1 on merging
MultinomialRegressionCV and LogisticRegressionCV. That may leave more
time for niceties like documentation :)
Vlad Niculae
2014-03-07 10:05:56 UTC
Permalink
In some cases it might be preferable to fit an OvA model. In those
cases, I think the
user code would look nicer and more explicit if it'd use the
sklearn.multiclass.OneVsRest encoder.

The downside is that we'll need to go through an ugly deprecation cycle
for a major class in the library.
With the long term in mind I agree with Lars.

My 2c,
Vlad
Post by Lars Buitinck
Post by Manoj Kumar
I have prepared a wiki page for the first draft of my GSoC proposal after
several discussions. Please do have a look and provide me feedback.
https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2014-Application:-Improved-Linear-Models
Technical question: do we need separate LR and multinomial LR? If CD
can fit multinomial LR instead of OvA LR, then I'm +1 on merging
MultinomialRegressionCV and LogisticRegressionCV. That may leave more
time for niceties like documentation :)
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Manoj Kumar
2014-03-07 10:26:23 UTC
Permalink
Hi Lars,

I'm sorry but I'm not quite able to get you. Do you mean logistic
Regression itself would handle the multi-output case, instead of the one vs
All it does now?
Post by Vlad Niculae
In some cases it might be preferable to fit an OvA model. In those
cases, I think the
user code would look nicer and more explicit if it'd use the
sklearn.multiclass.OneVsRest encoder.
The downside is that we'll need to go through an ugly deprecation cycle
for a major class in the library.
With the long term in mind I agree with Lars.
My 2c,
Vlad
Post by Lars Buitinck
Post by Manoj Kumar
I have prepared a wiki page for the first draft of my GSoC proposal
after
Post by Lars Buitinck
Post by Manoj Kumar
several discussions. Please do have a look and provide me feedback.
https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2014-Application:-Improved-Linear-Models
Post by Lars Buitinck
Technical question: do we need separate LR and multinomial LR? If CD
can fit multinomial LR instead of OvA LR, then I'm +1 on merging
MultinomialRegressionCV and LogisticRegressionCV. That may leave more
time for niceties like documentation :)
------------------------------------------------------------------------------
Post by Lars Buitinck
Subversion Kills Productivity. Get off Subversion & Make the Move to
Perforce.
Post by Lars Buitinck
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization
and the
Post by Lars Buitinck
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
Post by Lars Buitinck
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,
Manoj Kumar,
Mech Undergrad
http://manojbits.wordpress.com
Lars Buitinck
2014-03-07 11:40:59 UTC
Permalink
I'm sorry but I'm not quite able to get you. Do you mean logistic Regression
itself would handle the multi-output case, instead of the one vs All it does
now?
No, for multi-output you'd still want OvR, of course. What I'm saying
(asking) is that (whether) the CD-based LogisticRegressionCV can do
multinomial regression for the multiclass case (?), either by default
or as an extra. Then maybe we can get rid of the additional
multinomial L-BFGS estimator. I proposed it primarily because hacking
true multinomial LR into Liblinear is more work than writing a new
estimator.
Manoj Kumar
2014-03-07 12:53:20 UTC
Permalink
Okay.

Firstly, currently in my timeline, I have put the coordinate-descent based
solver at the end of my timeline. Do you want me to move it just after we
get LogisticRegression (and LogisticRegressionCV) merged, since then we
would be able to see if multinomialLR can be done away with or not.

Secondly, since LR and multinomial LR have different cost functions,
wouldn't it be better to keep them separate? Just like how we have a
separate MultiTaskElasticNet and an ElasticNet. At-least that's what Alex
told me when I inquired about the need for a separate class when y.ndim > 1

Sorry for my noobness. I'm still relatively new to the community.



No, for multi-output you'd still want OvR, of course. What I'm saying
Post by Lars Buitinck
(asking) is that (whether) the CD-based LogisticRegressionCV can do
multinomial regression for the multiclass case (?), either by default
or as an extra. Then maybe we can get rid of the additional
multinomial L-BFGS estimator. I proposed it primarily because hacking
true multinomial LR into Liblinear is more work than writing a new
estimator.
Manoj Kumar
2014-03-20 02:51:34 UTC
Permalink
Hello Everyone,

I have updated my final proposal on melange.
http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/manojkumar/5662278724616192.
Hope I have addressed all issues.


On Fri, Mar 7, 2014 at 6:23 PM, Manoj Kumar
Post by Manoj Kumar
Okay.
Firstly, currently in my timeline, I have put the coordinate-descent based
solver at the end of my timeline. Do you want me to move it just after we
get LogisticRegression (and LogisticRegressionCV) merged, since then we
would be able to see if multinomialLR can be done away with or not.
Secondly, since LR and multinomial LR have different cost functions,
wouldn't it be better to keep them separate? Just like how we have a
separate MultiTaskElasticNet and an ElasticNet. At-least that's what Alex
told me when I inquired about the need for a separate class when y.ndim > 1
Sorry for my noobness. I'm still relatively new to the community.
No, for multi-output you'd still want OvR, of course. What I'm saying
Post by Lars Buitinck
(asking) is that (whether) the CD-based LogisticRegressionCV can do
multinomial regression for the multiclass case (?), either by default
or as an extra. Then maybe we can get rid of the additional
multinomial L-BFGS estimator. I proposed it primarily because hacking
true multinomial LR into Liblinear is more work than writing a new
estimator.
--
Regards,
Manoj Kumar,
Mech Undergrad
http://manojbits.wordpress.com
Mathieu Blondel
2014-03-07 11:16:53 UTC
Permalink
I think it will depend on the multiclass LR objective used. Depending on
the objective, we need to learn n_classes vectors or n_classes - 1 vectors.
In the former case, a multiclass LR will do twice more work as a binary LR.

One advantage of OvA is that it is embarrassingly parallel w.r.t. classes.

Requiring users to use OneVsResClassifier kind of goes against the
principles we have followed so far IMO.

M.
Post by Vlad Niculae
In some cases it might be preferable to fit an OvA model. In those
cases, I think the
user code would look nicer and more explicit if it'd use the
sklearn.multiclass.OneVsRest encoder.
The downside is that we'll need to go through an ugly deprecation cycle
for a major class in the library.
With the long term in mind I agree with Lars.
My 2c,
Vlad
Post by Lars Buitinck
Post by Manoj Kumar
I have prepared a wiki page for the first draft of my GSoC proposal
after
Post by Lars Buitinck
Post by Manoj Kumar
several discussions. Please do have a look and provide me feedback.
https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2014-Application:-Improved-Linear-Models
Post by Lars Buitinck
Technical question: do we need separate LR and multinomial LR? If CD
can fit multinomial LR instead of OvA LR, then I'm +1 on merging
MultinomialRegressionCV and LogisticRegressionCV. That may leave more
time for niceties like documentation :)
------------------------------------------------------------------------------
Post by Lars Buitinck
Subversion Kills Productivity. Get off Subversion & Make the Move to
Perforce.
Post by Lars Buitinck
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization
and the
Post by Lars Buitinck
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
Post by Lars Buitinck
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Loading...