Discussion:
[Scikit-learn-general] module names normalization
Mathieu Blondel
2010-11-14 17:23:35 UTC
Permalink
Hello,

We started a discussion on normalizing the module names:
https://github.com/scikit-learn/scikit-learn/pull/14

I start a thread to continue the discussion here on the mailing-list,
as it is more convenient and allows everyone to participate.

Currently, there seems to be a consensus for explicit names rather
than acronyms or abbreviations. I think this is a good thing but we
need to take care of ridiculously long names.

Some modifications are easy:

* gmm:
from scikits.learn.gaussian_mixture import GaussianMixture

* logistic:
from scikits.learn.glm.logistic_regression import LogisticRegression

Some modifications are more difficult. What to do of fastica, pca,
lda, qda, hmm, sgd, svm? Here are some I like:

* hmm:
from scikits.learn.hidden_markov import GaussianHMM

* lda:
from scikits.learn.linear_discriminant import LDA

or

from scikits.learn.fisher_discriminant import FisherDiscriminant

Another question to normalize is singular vs plural, e.g

from scikits.learn.gaussian_process import GaussianProcess
vs
from scikits.learn.gaussian_processes import GaussianProcess

I'm +1 for singular.

Another question to normalize is module grouping. In my opinion,
module grouping can potentially be dangerous, so let's be careful
here.

Gael suggests to rename glm to linear_models (or linear_model?). I'm
+1 for linear_model (generalized_linear_model seems too long anyway).

Shall LDA, SVM and SGD go to the linear_model group? I'd say yes for
LDA, no for SVM and SGD as they are quite big modules on their own.
Any opinion?

Please participate to the discussion so we can converge to the best
naming scheme :)

Mathieu
Olivier Grisel
2010-11-14 17:39:14 UTC
Permalink
Post by Mathieu Blondel
Hello,
https://github.com/scikit-learn/scikit-learn/pull/14
I start a thread to continue the discussion here on the mailing-list,
as it is more convenient and allows everyone to participate.
Currently, there seems to be a consensus for explicit names rather
than acronyms or abbreviations. I think this is a good thing but we
need to take care of ridiculously long names.
from scikits.learn.gaussian_mixture import GaussianMixture
from scikits.learn.glm.logistic_regression import LogisticRegression
Some modifications are more difficult. What to do of fastica, pca,
from scikits.learn.hidden_markov import GaussianHMM
from scikits.learn.linear_discriminant import LDA
or
from scikits.learn.fisher_discriminant import FisherDiscriminant
Another question to normalize is singular vs plural, e.g
from scikits.learn.gaussian_process import GaussianProcess
vs
from scikits.learn.gaussian_processes import GaussianProcess
I'm +1 for singular.
Another question to normalize is module grouping. In my opinion,
module grouping can potentially be dangerous, so let's be careful
here.
Gael suggests to rename glm to linear_models (or linear_model?). I'm
+1 for linear_model (generalized_linear_model seems too long anyway).
Shall LDA, SVM and SGD go to the linear_model group? I'd say yes for
LDA, no for SVM and SGD as they are quite big modules on their own.
Any opinion?
Please participate to the discussion so we can converge to the best
naming scheme :)
Overall +1.

As for the linear module i'ts already quite big. Maybe we should split
it instead, for instance we could have:

scikits.learn.coordinate_descent
scikits.learn.least_angle
scikits.learn.bayes
scikits.learn.least_squares (with ridge regression included?)
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Fabian Pedregosa
2010-11-14 19:10:00 UTC
Permalink
Hi Mathieu.

Thanks for writing this proposal! I was long wanting to do this, but
never found the courage :-)

I'm +1 for all the names you proposed. I also like the proposal of
Olivier to break the glm module.

Cheers
Post by Mathieu Blondel
Hello,
https://github.com/scikit-learn/scikit-learn/pull/14
I start a thread to continue the discussion here on the mailing-list,
as it is more convenient and allows everyone to participate.
Currently, there seems to be a consensus for explicit names rather
than acronyms or abbreviations. I think this is a good thing but we
need to take care of ridiculously long names.
from scikits.learn.gaussian_mixture import GaussianMixture
from scikits.learn.glm.logistic_regression import LogisticRegression
Some modifications are more difficult. What to do of fastica, pca,
from scikits.learn.hidden_markov import GaussianHMM
from scikits.learn.linear_discriminant import LDA
or
from scikits.learn.fisher_discriminant import FisherDiscriminant
Another question to normalize is singular vs plural, e.g
from scikits.learn.gaussian_process import GaussianProcess
vs
from scikits.learn.gaussian_processes import GaussianProcess
I'm +1 for singular.
Another question to normalize is module grouping. In my opinion,
module grouping can potentially be dangerous, so let's be careful
here.
Gael suggests to rename glm to linear_models (or linear_model?). I'm
+1 for linear_model (generalized_linear_model seems too long anyway).
Shall LDA, SVM and SGD go to the linear_model group? I'd say yes for
LDA, no for SVM and SGD as they are quite big modules on their own.
Any opinion?
Please participate to the discussion so we can converge to the best
naming scheme :)
Mathieu
------------------------------------------------------------------------------
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Matthieu Brucher
2010-11-14 23:03:12 UTC
Permalink
Post by Mathieu Blondel
Some modifications are more difficult. What to do of fastica, pca,
Hi,

PCA could go inside the (soon available?) manifold module, as it is
used to reduce dimension. Everything ICA related perhaps also?

Matthieu
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Olivier Grisel
2010-11-14 23:18:11 UTC
Permalink
Post by Matthieu Brucher
Post by Mathieu Blondel
Some modifications are more difficult. What to do of fastica, pca,
Hi,
PCA could go inside the (soon available?) manifold module, as it is
used to reduce dimension. Everything ICA related perhaps also?
As I explained on the linear models case, I would rather have more
top-level modules of moderate sizes and complexities than a few big
modules with many independent algorithms inside.

We can and we must use the documentation to introduce the related
algorithms together and explain how their implementations differ while
having similar purposes rather that using a deep package / modules
hierarchy to achieve that goal. Flat is better than nested :)
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Gael Varoquaux
2010-11-15 00:05:16 UTC
Permalink
Interesting discussion... It clearly shows that different people have
different points of view.

Here is the way I think of it:

I'd like sub-package names to reflect user goals, rather than
optimisation methods, or abstract classes of problems.

Obvious, this philosophy cannot really work for everything: we have to
balance it with using well-known problems and solutions.

However, I must say that I am not terribly enthousiastic with Olivier's
suggestion of breaking up the GLM in packages named after the
optimisation strategie used to solve a regression problem: for a
non-specialist, it will not be obvious that coordinate_descent.Lasso is
the same thing as least_angle.Lasso, and that they both solve a
regression problem. I addition, having them far appart in the import code
means that the user is less likely to 'guess' that he could/should change
optimisation algorithm depending on his data, for the same kind of task.

Grouping in classes of problem that are user-oriented seems preferable to
me. And while 'flat is better than nested', the problem is where to put
the branching. I'd prefer having fairly full sub-packages, as long as
they are named with a name that a user can identify. For instance, I
could see someone importing 'scikits.learn.cluster' and tab-completing on
it to see what clustering algorithms are available. This would be a bit
similar to the organisation of scipy. Also, I would favor having less
packages to import from and having more content in them, with this
content being imported directly in the __init__ of sub-packages.

On the other hand, too much generality is dangerous. Just after
suggesting linear_model, it appeared to me that it might be to general.
While it is true that PCA can be thought of as a manifold learning
problem, it is also a latent factor analysis problem, a dictionary
learning problem, a matrix factorisation problem... We shouldn't require
our users to understand the 'big picture' of machine learning to use the
scikit.

Here are a few suggestions/gut feelings (I am given them numbers to
facilite discussion):

1. 'glm' becomes 'regression' with the same content
2. pca + fastica go in a 'decomposition' sub-package, in which NMF,
sparse-PCA, dictionnary learning will go.
3. I agree with 'hmm' -> 'hidden_markov'
4. I am wondering if gmm.py should go in a sub-package called 'mixture',
and be called 'gaussian' in it.
5. I don't know what to do of qda and lda. My gut feeling tells me they
sould go together. Any suggestion.
6. svm is the name of an optimisation algorithm, not a class of problems.
On the other hand, it is such a well known algorithm that people
expect to find it where it currently it.
7. I don't know what to do of sgd. It's a really cool optimization
technique. It's useful many places, but I can't think of where to fit
it in a task-oriented view. (that's why I vote to simply not change
it).
8. I am a bit worried with the profusion of the word 'Gaussian', we could
have 'gaussian_process', 'gaussian_mixture', 'gaussian_graphs'. It
seems that we can avoid the two last ones with 'mixture.gaussian' and
using 'covariance' instead of ggm. However, 'Gaussian' and 'linear'
now raise warning flags for me, as being fairly non-informative words.
9. Do Gaussian processes belong to regression? I would tend to think that
they don't, as they are most often not used to solve the same problem
as what is in the glm package but with global view of the field, they
are certainly a regression.

T'is about time that I grab some sleep. How do people feel about the
different alternatives?

Gaël
Post by Olivier Grisel
Post by Matthieu Brucher
Post by Mathieu Blondel
Some modifications are more difficult. What to do of fastica, pca,
Hi,
PCA could go inside the (soon available?) manifold module, as it is
used to reduce dimension. Everything ICA related perhaps also?
As I explained on the linear models case, I would rather have more
top-level modules of moderate sizes and complexities than a few big
modules with many independent algorithms inside.
We can and we must use the documentation to introduce the related
algorithms together and explain how their implementations differ while
having similar purposes rather that using a deep package / modules
hierarchy to achieve that goal. Flat is better than nested :)
--
Gael Varoquaux
Research Fellow, INSERM
Associate researcher, INRIA
Laboratoire de Neuro-Imagerie Assistee par Ordinateur
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone: ++ 33-1-69-08-78-35
Mobile: ++ 33-6-28-25-64-62
http://gael-varoquaux.info
Olivier Grisel
2010-11-15 00:27:04 UTC
Permalink
Post by Gael Varoquaux
Interesting discussion... It clearly shows that different people have
different points of view.
   I'd like sub-package names to reflect user goals, rather than
   optimisation methods, or abstract classes of problems.
Obvious, this philosophy cannot really work for everything: we have to
balance it with using well-known problems and solutions.
However, I must say that I am not terribly enthousiastic with Olivier's
suggestion of breaking up the GLM in packages named after the
optimisation strategie used to solve a regression problem: for a
non-specialist, it will not be obvious that coordinate_descent.Lasso is
the same thing as least_angle.Lasso, and that they both solve a
regression problem. I addition, having them far appart in the import code
means that the user is less likely to 'guess' that he could/should change
optimisation algorithm depending on his data, for the same kind of task.
Grouping in classes of problem that are user-oriented seems preferable to
me. And while 'flat is better than nested', the problem is where to put
the branching. I'd prefer having fairly full sub-packages, as long as
they are named with a name that a user can identify. For instance, I
could see someone importing 'scikits.learn.cluster' and tab-completing on
it to see what clustering algorithms are available. This would be a bit
similar to the organisation of scipy. Also, I would favor having less
packages to import from and having more content in them, with this
content being imported directly in the __init__ of sub-packages.
On the other hand, too much generality is dangerous. Just after
suggesting linear_model, it appeared to me that it might be to general.
While it is true that PCA can be thought of as a manifold learning
problem, it is also a latent factor analysis problem, a dictionary
learning problem, a matrix factorisation problem... We shouldn't require
our users to understand the 'big picture' of machine learning to use the
scikit.
Here are a few suggestions/gut feelings (I am given them numbers to
1. 'glm' becomes 'regression' with the same content
But svm and sgd can also be used for regression (linear or not for
SVM) and coordinate_descent might be used for classification in the
future. Furthermore where to you put LinearSVC and Logistic regression
(both currently implemented using liblinear) in that frame? Having
special handling for svm and sgd makes the package layouting logic
inconsistent and might hurt the user understanding of it as well.

It's very hard to model this after user intent, because sometimes the
user's intent is to use a specific algorithms (e.g. Support Vector
Machines or Hidden Markov Models) while other times she wants to do
some clustering without really knowing what are the available
algorithms for such a task.

Maybe we could do the flat approach I introduced above and further add
"task oriented" virtual modules such as "classification",
"regression", "clustering", "density_estimation", "decomposition",
...with import aliases to the implementation oriented modules. But
then the number of modules might explode (how about time series
forecasting which can be seen as regression but might not be obvious
to the user and sequence labeling which is some kind of structured
prediction ...).

Or maybe the top-level scikits.learn docstring could give the list of
classes that can be used for each of the above tasks.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Mathieu Blondel
2010-11-15 03:53:41 UTC
Permalink
On Mon, Nov 15, 2010 at 9:27 AM, Olivier Grisel
Post by Olivier Grisel
Post by Gael Varoquaux
Interesting discussion... It clearly shows that different people have
different points of view.
   I'd like sub-package names to reflect user goals, rather than
   optimisation methods, or abstract classes of problems.
Obvious, this philosophy cannot really work for everything: we have to
balance it with using well-known problems and solutions.
However, I must say that I am not terribly enthousiastic with Olivier's
suggestion of breaking up the GLM in packages named after the
optimisation strategie used to solve a regression problem: for a
non-specialist, it will not be obvious that coordinate_descent.Lasso is
the same thing as least_angle.Lasso, and that they both solve a
regression problem. I addition, having them far appart in the import code
means that the user is less likely to 'guess' that he could/should change
optimisation algorithm depending on his data, for the same kind of task.
Grouping in classes of problem that are user-oriented seems preferable to
me. And while 'flat is better than nested', the problem is where to put
the branching. I'd prefer having fairly full sub-packages, as long as
they are named with a name that a user can identify. For instance, I
could see someone importing 'scikits.learn.cluster' and tab-completing on
it to see what clustering algorithms are available. This would be a bit
similar to the organisation of scipy. Also, I would favor having less
packages to import from and having more content in them, with this
content being imported directly in the __init__ of sub-packages.
On the other hand, too much generality is dangerous. Just after
suggesting linear_model, it appeared to me that it might be to general.
While it is true that PCA can be thought of as a manifold learning
problem, it is also a latent factor analysis problem, a dictionary
learning problem, a matrix factorisation problem... We shouldn't require
our users to understand the 'big picture' of machine learning to use the
scikit.
Here are a few suggestions/gut feelings (I am given them numbers to
1. 'glm' becomes 'regression' with the same content
But svm and sgd can also be used for regression (linear or not for
SVM) and coordinate_descent might be used for classification in the
future. Furthermore where to you put LinearSVC and Logistic regression
(both currently implemented using liblinear) in that frame? Having
special handling for svm and sgd makes the package layouting logic
inconsistent and might hurt the user understanding of it as well.
It's very hard to model this after user intent, because sometimes the
user's intent is to use a specific algorithms (e.g. Support Vector
Machines or Hidden Markov Models) while other times she wants to do
some clustering without really knowing what are the available
algorithms for such a task.
Maybe we could do the flat approach I introduced above and further add
"task oriented" virtual modules such as "classification",
"regression", "clustering", "density_estimation", "decomposition",
...with import aliases to the implementation oriented modules. But
then the number of modules might explode (how about time series
forecasting which can be seen as regression but might not be obvious
to the user and sequence labeling which is some kind of structured
prediction ...).
The flat layout seems to be the least risky as you always inevitably
get names that don't fit well in any category or that would fit in
more than one category.

Olivier's suggestion has some issues but it seems to be a good
compromise between flat and nested. It is informative for the user and
close to what scipy does. For HMM and CRF, "sequence" or
"sequence_labeling" are two possible choices.

However,
Gael Varoquaux
2010-11-15 07:23:56 UTC
Permalink
Post by Mathieu Blondel
Post by Olivier Grisel
Post by Gael Varoquaux
1. 'glm' becomes 'regression' with the same content
But svm and sgd can also be used for regression (linear or not for
SVM) and coordinate_descent might be used for classification in the
future. Furthermore where to you put LinearSVC and Logistic regression
(both currently implemented using liblinear) in that frame? Having
special handling for svm and sgd makes the package layouting logic
inconsistent and might hurt the user understanding of it as well.
Yes, agreed. I was putting logistic regression in there, as it is a
regression. There seems to me to be some unity in what is in 'glm', but I
can't seem to find what quantifies it. Anybody has an idea (at the end of
the mail, I give Hastie's view :P)?
Post by Mathieu Blondel
Post by Olivier Grisel
It's very hard to model this after user intent, because sometimes the
user's intent is to use a specific algorithms (e.g. Support Vector
Machines or Hidden Markov Models) while other times she wants to do
some clustering without really knowing what are the available
algorithms for such a task.
Yes it's hard, but it's also a lot of the value. There are so many
packages (include some that I codevelopped :P) where the user coming to
the package with a problem has no clue how to solve it and needs first to
do a lot of reading.
Post by Mathieu Blondel
Post by Olivier Grisel
Maybe we could do the flat approach I introduced above and further add
"task oriented" virtual modules such as "classification",
"regression", "clustering", "density_estimation", "decomposition",
...with import aliases to the implementation oriented modules. But
then the number of modules might explode (how about time series
forecasting which can be seen as regression but might not be obvious
to the user and sequence labeling which is some kind of structured
prediction ...).
The flat layout seems to be the least risky as you always inevitably
get names that don't fit well in any category or that would fit in
more than one category.
It's interesting that you call it flat, because it actually feels much
more nested to me. Grantd, we are both suggesting to go only 2 levels
deep, so we are not terribly nested. I guess I find your approach more
nested because there is more branching: many modules hanging off
scikits.learn, where I envision a few sub-packages hanging off
scikits.learn.

My gold standard is really scipy. While it has a lot of problems, the
import paths do tend to look quite right.

The important question is: what do we expose in the docs and for the
users. Ideally, I'd like the package structure to reflect the docs:
every 'subsection' in the docs would correspond to a sub-package (or a
sub-module for the small ones). This is now working reasonnably well for
quite a few sub-sections and I think it both makes it easy to isolate
code by usecase and to link the documentation with the code. The
challenge is to come up with good groupings. I still think that
focuss/coordinate_descent/least_angle is a grouping that is
incomprehensible to anyone who is not a machine-learning geek.

It seems that for 'cluster', 'mixture', 'decomposition', at least we have
a reasonnable agreement, right? The supervised problems seem to be the
hardest to split, as they all answer similar usecases. To guide the user,
it would still be great if we could put them in categories... I just
opened up the 'elements of statistical learning, to see how it was done,
and the groupings are (roughly):

- linear regression: least square, ridge, lasso
- linear classifiers: logistic (including penalized), lda
- svm: svc/svm
- nearest neighbors

Where do qda fit in this framework, I am not sure? Hastie puts them next
to LDA in a paragraph where he talks about 'general_discriminant'. I
could live with a subpackage called 'discriminant' with LDA + QDA in it,
and adding RLDA/RQDA to it as we go.

Gaël

PS: I'll be offline all day starting soon, due to no Internet at work.
Please don't start refactoring this today: we need to give it a good
think as we shouldn't be changing the structure too often (ideally never
between two minor releases after 1.0).
Olivier Grisel
2010-11-15 09:07:04 UTC
Permalink
Post by Gael Varoquaux
Post by Mathieu Blondel
Post by Olivier Grisel
Post by Gael Varoquaux
1. 'glm' becomes 'regression' with the same content
But svm and sgd can also be used for regression (linear or not for
SVM) and coordinate_descent might be used for classification in the
future. Furthermore where to you put LinearSVC and Logistic regression
(both currently implemented using liblinear) in that frame? Having
special handling for svm and sgd makes the package layouting logic
inconsistent and might hurt the user understanding of it as well.
Yes, agreed. I was putting logistic regression in there, as it is a
regression. There seems to me to be some unity in what is in 'glm', but I
can't seem to find what quantifies it. Anybody has an idea (at the end of
the mail, I give Hastie's view :P)?
To me Logistic Regression is not a regression model. It is both a
classification and a conditional probability density estimator (for
categorical class membership). E.g. It is not able to predict a
continuous housing price like the Lasso or ridge regression.

The unity of glm is that all the models in this module optimize some
objective function that implies loss(y, f(w.x + b)) + r(x) and make
their predictions only based on the form f(w.x + b) (either for
regression or classification). But in that case we should also include
the LinearSVC, the logistic regression based on liblinear and SGD's
linear models there. And if we ever implement backprob in SGD.

But if we do that it's definitely going to be much to bug and we are
mixing regression and classification anyway which is not respecting
the "user intent" frame ether.
Post by Gael Varoquaux
Post by Mathieu Blondel
Post by Olivier Grisel
It's very hard to model this after user intent, because sometimes the
user's intent is to use a specific algorithms (e.g. Support Vector
Machines or Hidden Markov Models) while other times she wants to do
some clustering without really knowing what are the available
algorithms for such a task.
Yes it's hard, but it's also a lot of the value. There are so many
packages (include some that I codevelopped :P) where the user coming to
the package with a problem has no clue how to solve it and needs first to
do a lot of reading.
Post by Mathieu Blondel
Post by Olivier Grisel
Maybe we could do the flat approach I introduced above and further add
"task oriented" virtual modules such as "classification",
"regression", "clustering", "density_estimation", "decomposition",
...with import aliases to the implementation oriented modules. But
then the number of modules might explode (how about time series
forecasting which can be seen as regression but might not be obvious
to the user and sequence labeling which is some kind of structured
prediction ...).
The flat layout seems to be the least risky as you always inevitably
get names that don't fit well in any category or that would fit in
more than one category.
It's interesting that you call it flat, because it actually feels much
more nested to me. Grantd, we are both suggesting to go only 2 levels
deep, so we are not terribly nested. I guess I find your approach more
nested because there is more branching: many modules hanging off
scikits.learn, where I envision a few sub-packages hanging off
scikits.learn.
Ok let us call my approach "implementation centric" and your approach
"task centric".

I was affraid to have too nested module if we went for a structure such as:

scikits.learn.classification.support_vector.SVC
scikits.learn.classification.support_vector.LinearSVC
scikits.learn.classification.logistic.Logistic
(also based on liblinear)
scikits.learn.classification.stochastic_gradient.SGD

scikits.learn.regression.least_angle.Lasso
scikits.learn.regression.coordinate_descent.Lasso
scikits.learn.regression.coordinate_descent.ElasticNet
scikits.learn.regression.stochastic_gradient.SGD
scikits.learn.regresion.support_vector.NuSVR
scikits.learn.regression.support_vector.SVR

But this approach is not possible because it woud be too complicated
to split the svm and sgd modules by task. For instance by changing the
SGD loss parameter from hinge to squared error or huber we can switch
from classification to regression while keeping 99% of the code the
same
Post by Gael Varoquaux
It seems that for 'cluster', 'mixture', 'decomposition', at least we have
a reasonnable agreement, right? The supervised problems seem to be the
hardest to split, as they all answer similar usecases. To guide the user,
it would still be great if we could put them in categories... I just
opened up the 'elements of statistical learning, to see how it was done,
   - linear regression: least square, ridge, lasso
   - linear classifiers: logistic (including penalized), lda
   - svm: svc/svm
   - nearest neighbors
Yes and SVM can be used for linear regression and classifications and
most people seem to miss this point. Furthermore you either focus on
the task (classification or regression) and the choose the impl (least
square, lasso with LARS) or the implementation algorithm (SVM) and
then the task (classification or regression). If don't find this
approach very consistent.

svm are not that special to deserve there own "impl centric" module
while the rest of the module are "task centric". Technically the SGD
class with hinge loss and L2 penalty is a linear SVM: only the
algorithm is different: stochastic gradient descent of the primal
objective function while SVC/libsvm is optimizing the dual using SMO
(Sequential Minimal Optimization) and liblinear/LinearSVC is using
some sort of coordinate descent to optimize either the primal or the
dual.
Post by Gael Varoquaux
Where do qda fit in this framework, I am not sure? Hastie puts them next
to LDA in a paragraph where he talks about 'general_discriminant'. I
could live with a subpackage called 'discriminant' with LDA + QDA in it,
and adding RLDA/RQDA to it as we go.
Gaël
PS: I'll be offline all day starting soon, due to no Internet at work.
Please don't start refactoring this today: we need to give it a good
think as we shouldn't be changing the structure too often (ideally never
between two minor releases after 1.0).
Be sure I won't start refactoring anything without reaching a consensus first :)
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Peter Prettenhofer
2010-11-15 09:25:29 UTC
Permalink
My two cents:

I think it's a good idea to either stick to the conceptual model of
"Elements of statistical learning Edition 2" [1] or Bishops "Pattern
Recognition and Machine Learning" [2] because these are - in my point
of view - the text books with which most people approach machine
learning. I would prefer the former because it's a) more recent, b)
freely available, and c) puts special emphasis on path algorithms and
regularization which we provide too.

I'd suggest:
- a package for linear regression (current glm)
- a package for linear classification (LogisticRegression, GNB, LDA, QDA)
- a separate package for SVM
Both Bishop and Hastie et al. treat them separately - I think the
current package is so diverse that it deserves a separate package.

Currently, I've no clue where to place the sgd package. The best place
would be linear classification. However, we can extend the code easily
to regression too...

best,
Peter
Post by Gael Varoquaux
Post by Mathieu Blondel
Post by Olivier Grisel
Post by Gael Varoquaux
1. 'glm' becomes 'regression' with the same content
But svm and sgd can also be used for regression (linear or not for
SVM) and coordinate_descent might be used for classification in the
future. Furthermore where to you put LinearSVC and Logistic regression
(both currently implemented using liblinear) in that frame? Having
special handling for svm and sgd makes the package layouting logic
inconsistent and might hurt the user understanding of it as well.
Yes, agreed. I was putting logistic regression in there, as it is a
regression. There seems to me to be some unity in what is in 'glm', but I
can't seem to find what quantifies it. Anybody has an idea (at the end of
the mail, I give Hastie's view :P)?
Post by Mathieu Blondel
Post by Olivier Grisel
It's very hard to model this after user intent, because sometimes the
user's intent is to use a specific algorithms (e.g. Support Vector
Machines or Hidden Markov Models) while other times she wants to do
some clustering without really knowing what are the available
algorithms for such a task.
Yes it's hard, but it's also a lot of the value. There are so many
packages (include some that I codevelopped :P) where the user coming to
the package with a problem has no clue how to solve it and needs first to
do a lot of reading.
Post by Mathieu Blondel
Post by Olivier Grisel
Maybe we could do the flat approach I introduced above and further add
"task oriented" virtual modules such as "classification",
"regression", "clustering", "density_estimation", "decomposition",
...with import aliases to the implementation oriented modules. But
then the number of modules might explode (how about time series
forecasting which can be seen as regression but might not be obvious
to the user and sequence labeling which is some kind of structured
prediction ...).
The flat layout seems to be the least risky as you always inevitably
get names that don't fit well in any category or that would fit in
more than one category.
It's interesting that you call it flat, because it actually feels much
more nested to me. Grantd, we are both suggesting to go only 2 levels
deep, so we are not terribly nested. I guess I find your approach more
nested because there is more branching: many modules hanging off
scikits.learn, where I envision a few sub-packages hanging off
scikits.learn.
My gold standard is really scipy. While it has a lot of problems, the
import paths do tend to look quite right.
The important question is: what do we expose in the docs and for the
every 'subsection' in the docs would correspond to a sub-package (or a
sub-module for the small ones). This is now working reasonnably well for
quite a few sub-sections and I think it both makes it easy to isolate
code by usecase and to link the documentation with the code. The
challenge is to come up with good groupings. I still think that
focuss/coordinate_descent/least_angle is a grouping that is
incomprehensible to anyone who is not a machine-learning geek.
It seems that for 'cluster', 'mixture', 'decomposition', at least we have
a reasonnable agreement, right? The supervised problems seem to be the
hardest to split, as they all answer similar usecases. To guide the user,
it would still be great if we could put them in categories... I just
opened up the 'elements of statistical learning, to see how it was done,
   - linear regression: least square, ridge, lasso
   - linear classifiers: logistic (including penalized), lda
   - svm: svc/svm
   - nearest neighbors
Where do qda fit in this framework, I am not sure? Hastie puts them next
to LDA in a paragraph where he talks about 'general_discriminant'. I
could live with a subpackage called 'discriminant' with LDA + QDA in it,
and adding RLDA/RQDA to it as we go.
Gaël
PS: I'll be offline all day starting soon, due to no Internet at work.
Please don't start refactoring this today: we need to give it a good
think as we shouldn't be changing the structure too often (ideally never
between two minor releases after 1.0).
------------------------------------------------------------------------------
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Peter Prettenhofer
Mathieu Blondel
2010-11-15 10:13:22 UTC
Permalink
On Mon, Nov 15, 2010 at 4:23 PM, Gael Varoquaux
Post by Gael Varoquaux
It seems that for 'cluster', 'mixture', 'decomposition', at least we have
a reasonnable agreement, right? The supervised problems seem to be the
hardest to split, as they all answer similar usecases. To guide the user,
it would still be great if we could put them in categories... I just
opened up the 'elements of statistical learning, to see how it was done,
"mixture" is not something that an algorithm does. So GaussianMixture
should go in either "clustering" or "density_estimation"... Gaussian
Processes for regression can go to regression and Gaussian Processes
for classification can go to classification. We want to think in terms
of functionality, right?

Mathieu
j***@gmail.com
2010-11-15 13:31:01 UTC
Permalink
Post by Mathieu Blondel
On Mon, Nov 15, 2010 at 4:23 PM, Gael Varoquaux
Post by Gael Varoquaux
It seems that for 'cluster', 'mixture', 'decomposition', at least we have
a reasonnable agreement, right? The supervised problems seem to be the
hardest to split, as they all answer similar usecases. To guide the user,
it would still be great if we could put them in categories... I just
opened up the 'elements of statistical learning, to see how it was done,
"mixture" is not something that an algorithm does. So GaussianMixture
should go in either "clustering" or "density_estimation"... Gaussian
Processes for regression can go to regression and Gaussian Processes
for classification can go to classification. We want to think in terms
of functionality, right?
I like a task oriented view in general. But given the problems with
assigning the methods to one task group, an idea would be to double
import them, e.g. once in Regression and once in Classifier. This way
dir(regression) and dir(classifier) would show them. (And I don't
think the docs would increase, but I haven't tried that.)

For scipy, I think the scipy.stats namespace is too full. Importing
scipy.stats takes forever, because a large part of scipy is imported
as a dependency. If I had a choice I would at least split the basic
stats and the distributions to lower import time.

Josef
Post by Mathieu Blondel
Mathieu
------------------------------------------------------------------------------
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Fabian Pedregosa
2010-11-15 13:38:47 UTC
Permalink
On Mon, Nov 15, 2010 at 8:24 AM, Gael Varoquaux
Post by Gael Varoquaux
Post by Mathieu Blondel
Post by Olivier Grisel
Post by Gael Varoquaux
1. 'glm' becomes 'regression' with the same content
But svm and sgd can also be used for regression (linear or not for
SVM) and coordinate_descent might be used for classification in the
future. Furthermore where to you put LinearSVC and Logistic regression
(both currently implemented using liblinear) in that frame? Having
special handling for svm and sgd makes the package layouting logic
inconsistent and might hurt the user understanding of it as well.
Yes, agreed. I was putting logistic regression in there, as it is a
regression. There seems to me to be some unity in what is in 'glm', but I
can't seem to find what quantifies it. Anybody has an idea (at the end of
the mail, I give Hastie's view :P)?
Post by Mathieu Blondel
Post by Olivier Grisel
It's very hard to model this after user intent, because sometimes the
user's intent is to use a specific algorithms (e.g. Support Vector
Machines or Hidden Markov Models) while other times she wants to do
some clustering without really knowing what are the available
algorithms for such a task.
Yes it's hard, but it's also a lot of the value. There are so many
packages (include some that I codevelopped :P) where the user coming to
the package with a problem has no clue how to solve it and needs first to
do a lot of reading.
Post by Mathieu Blondel
Post by Olivier Grisel
Maybe we could do the flat approach I introduced above and further add
"task oriented" virtual modules such as "classification",
"regression", "clustering", "density_estimation", "decomposition",
...with import aliases to the implementation oriented modules. But
then the number of modules might explode (how about time series
forecasting which can be seen as regression but might not be obvious
to the user and sequence labeling which is some kind of structured
prediction ...).
The flat layout seems to be the least risky as you always inevitably
get names that don't fit well in any category or that would fit in
more than one category.
It's interesting that you call it flat, because it actually feels much
more nested to me. Grantd, we are both suggesting to go only 2 levels
deep, so we are not terribly nested. I guess I find your approach more
nested because there is more branching: many modules hanging off
scikits.learn, where I envision a few sub-packages hanging off
scikits.learn.
My gold standard is really scipy. While it has a lot of problems, the
import paths do tend to look quite right.
The important question is: what do we expose in the docs and for the
every 'subsection' in the docs would correspond to a sub-package (or a
sub-module for the small ones). This is now working reasonnably well for
quite a few sub-sections and I think it both makes it easy to isolate
code by usecase and to link the documentation with the code. The
challenge is to come up with good groupings. I still think that
focuss/coordinate_descent/least_angle is a grouping that is
incomprehensible to anyone who is not a machine-learning geek.
It seems that for 'cluster', 'mixture', 'decomposition', at least we have
a reasonnable agreement, right? The supervised problems seem to be the
hardest to split, as they all answer similar usecases. To guide the user,
it would still be great if we could put them in categories... I just
opened up the 'elements of statistical learning, to see how it was done,
   - linear regression: least square, ridge, lasso
   - linear classifiers: logistic (including penalized), lda
I don't think you should break glm into these subsections, since some
algorithms like the Lasso have variants used for regression and
classification.

As you said, the current scheme is not bad and blends well on most
cases with the documentation, but we need more explicit naming for
some module, which brings us back to Blondel's original proposal. My
feeling is that we should use that naming scheme (with glm ->
linear_models), and refactor gmm into mixture as Gael suggested.

Fabian.
Post by Gael Varoquaux
   - svm: svc/svm
   - nearest neighbors
Where do qda fit in this framework, I am not sure? Hastie puts them next
to LDA in a paragraph where he talks about 'general_discriminant'. I
could live with a subpackage called 'discriminant' with LDA + QDA in it,
and adding RLDA/RQDA to it as we go.
Gaël
PS: I'll be offline all day starting soon, due to no Internet at work.
Please don't start refactoring this today: we need to give it a good
think as we shouldn't be changing the structure too often (ideally never
between two minor releases after 1.0).
------------------------------------------------------------------------------
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Mathieu Blondel
2010-11-15 15:20:18 UTC
Permalink
On Mon, Nov 15, 2010 at 10:38 PM, Fabian Pedregosa
Post by Fabian Pedregosa
As you said, the current scheme is not bad and blends well on most
cases with the documentation, but we need more explicit naming for
some module, which brings us back to Blondel's original proposal. My
feeling is that we should use that naming scheme (with glm ->
linear_models), and refactor gmm into mixture as Gael suggested.
Pedregosa is right ;-). The discussion has shifted to module groups,
which is a more complicated story. Maybe we can try to find an
agreement on module names first? There are modules in my first post
for which I didn't see a suggestion yet: fastica, pca, ...

Mathieu
Gael Varoquaux
2010-11-16 00:35:24 UTC
Permalink
Post by Mathieu Blondel
On Mon, Nov 15, 2010 at 10:38 PM, Fabian Pedregosa
Post by Fabian Pedregosa
As you said, the current scheme is not bad and blends well on most
cases with the documentation, but we need more explicit naming for
some module, which brings us back to Blondel's original proposal. My
feeling is that we should use that naming scheme (with glm ->
linear_models), and refactor gmm into mixture as Gael suggested.
Pedregosa is right ;-).
I can see some comon trends coming up in the discussion, which is good
because it means that there is some blurry latent agreement on a few
things :).
Post by Mathieu Blondel
The discussion has shifted to module groups, which is a more
complicated story. Maybe we can try to find an agreement on module
names first? There are modules in my first post for which I didn't see
a suggestion yet: fastica, pca, ...
You didn't like my 'decomposition' suggestion? Maybe we could go for
'decompose', as in 'cluster' or 'optimize'.

OK, this thread has more food for thought, but it is late and I must
sleep now,

G
Mathieu Blondel
2010-11-16 04:22:56 UTC
Permalink
On Tue, Nov 16, 2010 at 9:35 AM, Gael Varoquaux
Post by Gael Varoquaux
You didn't like my 'decomposition' suggestion? Maybe we could go for
'decompose', as in 'cluster' or 'optimize'.
I liked it but even though one could use the shortcut

scikits.learn.decomposition.PCA

I think we still need to define the full path, e.g.

scikits.learn.decomposition.principal_components.PCA

Mathieu
Fabian Pedregosa
2010-11-22 08:31:46 UTC
Permalink
Thanks everybody for the feedback. If there aren't any objections,
I'll start this week to rename what is clearer to me and what most
people agreed:

* glm --> linear_model (and glm.lars --> linear_model.least_angle)

* gmm -> mixture, and also rename GMM to GaussianMixture.

* hmm --> hidden_markov

* new "decomposition" module with submodules pca and fastica

I'm not going to take action for now on lda, qda and sgd, as I didn't
see consensus and I do not have an opinion on those right now.

Fabian.
Post by Mathieu Blondel
On Tue, Nov 16, 2010 at 9:35 AM, Gael Varoquaux
Post by Gael Varoquaux
You didn't like my 'decomposition' suggestion? Maybe we could go for
'decompose', as in 'cluster' or 'optimize'.
I liked it but even though one could use the shortcut
scikits.learn.decomposition.PCA
I think we still need to define the full path, e.g.
scikits.learn.decomposition.principal_components.PCA
Mathieu
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Peter Prettenhofer
2010-11-22 08:46:40 UTC
Permalink
Concerning the sgd package, I'd propose:

* sgd --> stochastic_gradient

what do you think? Another question is, whether it should go into
linear_model... is LogisticRegression also going to be put into
linear_model?

best,
Peter
Post by Fabian Pedregosa
Thanks everybody for the feedback. If there aren't any objections,
I'll start this week to rename what is clearer to me and what most
 * glm --> linear_model (and glm.lars --> linear_model.least_angle)
 * gmm -> mixture, and also rename GMM to GaussianMixture.
 * hmm --> hidden_markov
 * new "decomposition" module with submodules pca and fastica
I'm not going to take action for now on lda, qda and sgd, as I didn't
see consensus and I do not have an opinion on those right now.
Fabian.
Post by Mathieu Blondel
On Tue, Nov 16, 2010 at 9:35 AM, Gael Varoquaux
Post by Gael Varoquaux
You didn't like my 'decomposition' suggestion? Maybe we could go for
'decompose', as in 'cluster' or 'optimize'.
I liked it but even though one could use the shortcut
scikits.learn.decomposition.PCA
I think we still need to define the full path, e.g.
scikits.learn.decomposition.principal_components.PCA
Mathieu
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Peter Prettenhofer
Gael Varoquaux
2010-11-22 08:48:59 UTC
Permalink
Post by Peter Prettenhofer
* sgd --> stochastic_gradient
I find it better: more explicit.
Post by Peter Prettenhofer
what do you think? Another question is, whether it should go into
linear_model...
Will we use this code to solve other problems than linear models at some
point? I can't tell, you are the expert.

G
Peter Prettenhofer
2010-11-22 08:50:50 UTC
Permalink
Post by Gael Varoquaux
[..]
Will we use this code to solve other problems than linear models at some
point? I can't tell, you are the expert.
No - it will stay linear in the number of parameters.

best,
Peter
--
Peter Prettenhofer
Olivier Grisel
2010-11-22 09:52:10 UTC
Permalink
Post by Peter Prettenhofer
Post by Gael Varoquaux
[..]
Will we use this code to solve other problems than linear models at some
point? I can't tell, you are the expert.
No - it will stay linear in the number of parameters.
I think Gael is asking whether we plan to implement non linear models
fitted with stochastic gradient descent, such as multi-layer
perceptrons or autoencoders.

I don't plan to work on this in the short term and maybe theano and
the future TML project is a better tool for such algorithms. So in the
short term I think we can restrict the stochastic_gradient module to
fit linerar models (classification or regression).
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Gael Varoquaux
2010-11-22 09:54:41 UTC
Permalink
Post by Olivier Grisel
Post by Peter Prettenhofer
Post by Gael Varoquaux
Will we use this code to solve other problems than linear models at some
point? I can't tell, you are the expert.
No - it will stay linear in the number of parameters.
I think Gael is asking whether we plan to implement non linear models
fitted with stochastic gradient descent, such as multi-layer
perceptrons or autoencoders.
Yes, that was the question.
Post by Olivier Grisel
I don't plan to work on this in the short term and maybe theano and
the future TML project is a better tool for such algorithms. So in the
short term I think we can restrict the stochastic_gradient module to
fit linerar models (classification or regression).
But I am right in thinking that it can be used for these problems? In
which case, I think I would prefer leaving it outside of 'linear_models'.
We could import some of the estimators it exposes for linear models in
linear_models, thought.

Gael
Olivier Grisel
2010-11-22 10:03:01 UTC
Permalink
Post by Gael Varoquaux
Post by Olivier Grisel
Post by Peter Prettenhofer
Post by Gael Varoquaux
Will we use this code to solve other problems than linear models at some
point? I can't tell, you are the expert.
No - it will stay linear in the number of parameters.
I think Gael is asking whether we plan to implement non linear models
fitted with stochastic gradient descent, such as multi-layer
perceptrons or autoencoders.
Yes, that was the question.
Post by Olivier Grisel
I don't plan to work on this in the short term and maybe theano and
the future TML project is a better tool for such algorithms. So in the
short term I think we can restrict the stochastic_gradient module to
fit linerar models (classification or regression).
But I am right in thinking that it can be used for these problems? In
which case, I think I would prefer leaving it outside of 'linear_models'.
We could import some of the estimators it exposes for linear models in
linear_models, thought.
As an algorithm SGD is indeed often used for non-linear models.
However the cython impl in scikit-learn is using the fact that we are
fitting a linear model to hardcode performance optimizations. It is
also using the fact that the output is a single dimension. It makes
the source code much simpler to read and to relate to the literature
(e.g. Leon Bottou's work).

If we want to implement the generic, non-linear, n-dimensional output
case it will probably be in a new cython file with little reuse of the
existing implementation.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Gael Varoquaux
2010-11-22 10:05:43 UTC
Permalink
Post by Olivier Grisel
As an algorithm SGD is indeed often used for non-linear models.
However the cython impl in scikit-learn is using the fact that we are
fitting a linear model to hardcode performance optimizations. It is
also using the fact that the output is a single dimension. It makes
the source code much simpler to read and to relate to the literature
(e.g. Leon Bottou's work).
If we want to implement the generic, non-linear, n-dimensional output
case it will probably be in a new cython file with little reuse of the
existing implementation.
OK. Then it seems to me that moving this code in linear_models actually
makes things clearer.

I'd love to have Alex (Gramfort)'s point of view, but he is on a plane
right now, for a long flight.

G
Alexandre Gramfort
2010-11-24 01:54:10 UTC
Permalink
Post by Gael Varoquaux
OK. Then it seems to me that moving this code in linear_models actually
makes things clearer.
I agree. I would however keep sgd stuff in a separate folder as for example
the base class is a bit different.
Maybe in linear_model/stochastic

from linear_model.stochastic import SGD

what do you think?
Post by Gael Varoquaux
I'd love to have Alex (Gramfort)'s point of view, but he is on a plane
right now, for a long flight.
plane landed ...

btw I'm not a huge fan of decomposition for ICA/PCA but I don't have any
better idea. I would personally keep it as it is as both methods are
major algorithms. I would not look into decomposition to find ICA
for example.

Alex
Ron Weiss
2010-11-24 02:11:36 UTC
Permalink
On Tue, Nov 23, 2010 at 5:54 PM, Alexandre Gramfort
Post by Alexandre Gramfort
btw I'm not a huge fan of decomposition for ICA/PCA but I don't have any
better idea. I would personally keep it as it is as both methods are
major algorithms. I would not look into decomposition to find ICA
for example.
What about transformation or projection instead? FWIW, I like to
think of both PCA and ICA (and LDA) as transformations from one space
to another.

-Ron
Alexandre Gramfort
2010-11-24 11:50:12 UTC
Permalink
What about transformation or projection instead?  FWIW, I like to
think of both PCA and ICA (and LDA) as transformations from one space
to another.
I think this confirms a point raised earlier. PCA and ICA can be used
for different purposes and it's hard to make them fit in only one box.

Alex

Fabian Pedregosa
2010-11-22 11:46:24 UTC
Permalink
On Mon, Nov 22, 2010 at 11:05 AM, Gael Varoquaux
Post by Gael Varoquaux
Post by Olivier Grisel
As an algorithm SGD is indeed often used for non-linear models.
However the cython impl in scikit-learn is using the fact that we are
fitting a linear model to hardcode performance optimizations. It is
also using the fact that the output is a single dimension. It makes
the source code much simpler to read and to relate to the literature
(e.g. Leon Bottou's work).
If we want to implement the generic, non-linear, n-dimensional output
case it will probably be in a new cython file with little reuse of the
existing implementation.
OK. Then it seems to me that moving this code in linear_models actually
makes things clearer.
+1
Peter Prettenhofer
2010-11-22 10:09:25 UTC
Permalink
Right - I was referring to the current code but yes, the package could
be a place to host multi-layer perceptrons and related stuff. What
about the ann package that used to be part of scikit learn - did it
also use SGD?

best,
Peter
Post by Olivier Grisel
Post by Peter Prettenhofer
Post by Gael Varoquaux
[..]
Will we use this code to solve other problems than linear models at some
point? I can't tell, you are the expert.
No - it will stay linear in the number of parameters.
I think Gael is asking whether we plan to implement non linear models
fitted with stochastic gradient descent, such as multi-layer
perceptrons or autoencoders.
I don't plan to work on this in the short term and maybe theano and
the future TML project is a better tool for such algorithms. So in the
short term I think we can restrict the stochastic_gradient module to
fit linerar models (classification or regression).
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Peter Prettenhofer
Matthieu Brucher
2010-11-15 07:12:43 UTC
Permalink
Post by Olivier Grisel
Post by Matthieu Brucher
Post by Mathieu Blondel
Some modifications are more difficult. What to do of fastica, pca,
Hi,
PCA could go inside the (soon available?) manifold module, as it is
used to reduce dimension. Everything ICA related perhaps also?
As I explained on the linear models case, I would rather have more
top-level modules of moderate sizes and complexities than a few big
modules with many independent algorithms inside.
I guess the manifold module should be a top-level module in any case.

Matthieu
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Fabian Pedregosa
2010-11-14 23:48:08 UTC
Permalink
On Mon, Nov 15, 2010 at 12:18 AM, Olivier Grisel
Post by Olivier Grisel
Post by Matthieu Brucher
Post by Mathieu Blondel
Some modifications are more difficult. What to do of fastica, pca,
Hi,
PCA could go inside the (soon available?) manifold module, as it is
used to reduce dimension. Everything ICA related perhaps also?
As I explained on the linear models case, I would rather have more
top-level modules of moderate sizes and complexities than a few big
modules with many independent algorithms inside.
+1
Post by Olivier Grisel
We can and we must use the documentation to introduce the related
algorithms together and explain how their implementations differ while
having similar purposes rather that using a deep package / modules
hierarchy to achieve that goal. Flat is better than nested :)
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Loading...