Olivier Grisel
2010-11-29 04:09:41 UTC
Hi again,
I finally found the time to finish watching Gunnar Martinsson's NIPS
tutorial on videolecture.net and the fast_svd method is indeed able to
recover a fairly accurate variant of the singular vectors even if the
data is not low rank (as this is often the case in practice) as long
as we perform a couple of power iteration steps (q = 3 sounds like a
good default parameter).
Furthermore in scikit-learn we don't really care of the singular
values / vectors are exact to the 7th decimal. We are mostly using PCA
/ SVD as a feature extractor / normalizer. Hence I think we could make
the PCA class use the approximate randomized method by default.
We need to investigate further on the kmeans SVD seeding strategy that
could get a really great boost from this method too when k is small.
Also NNMF can be seeded by a 2 SVDs:
http://www.cs.rpi.edu/~boutsc/files/nndsvd.pdf hence it might be
possible to make NNMF fast by combining both strategies (even though
it is probably not as interesting as the state of the art online
dictionary learning stuff).
Vlad, if you decide to start working on sparse PCA, NNMF and friends
be sure to familiarize yourself with this technique first:
http://videolectures.net/nips09_martinsson_mvll/ (this tutorial is a
bit long but excellent)
The manifold module might also benefit from this: if I understand
correctly some of those algorithms are based on SVDs of non linear
similarity matrices of the raw data (I am not sure though).
The spanning col / rows extraction (a.k.a skeleton summary extraction)
mentioned in the tutorial is another really interesting unsupervised
algorithms that would be very interesting to have in the scikit.
Good night (whatever your timezone is :)
---------- Forwarded message ----------
From: <***@github.com>
Date: 2010/11/29
Subject: [Scikit-learn-commits] [scikit-learn/scikit-learn] cd8c6b:
one more test for SVD
To: scikit-learn-***@lists.sourceforge.net
Branch: refs/heads/master
Home: https://github.com/scikit-learn/scikit-learn
Commit: cd8c6b00d390b61aaa7d6fd7a391c128cf132e42
https://github.com/scikit-learn/scikit-learn/commit/cd8c6b00d390b61aaa7d6fd7a391c128cf132e42
Author: Olivier Grisel <***@ensta.org>
Date: 2010-11-28 (Sun, 28 Nov 2010)
Changed paths:
M scikits/learn/utils/tests/test_svd.py
Log Message:
-----------
one more test for SVD
I finally found the time to finish watching Gunnar Martinsson's NIPS
tutorial on videolecture.net and the fast_svd method is indeed able to
recover a fairly accurate variant of the singular vectors even if the
data is not low rank (as this is often the case in practice) as long
as we perform a couple of power iteration steps (q = 3 sounds like a
good default parameter).
Furthermore in scikit-learn we don't really care of the singular
values / vectors are exact to the 7th decimal. We are mostly using PCA
/ SVD as a feature extractor / normalizer. Hence I think we could make
the PCA class use the approximate randomized method by default.
We need to investigate further on the kmeans SVD seeding strategy that
could get a really great boost from this method too when k is small.
Also NNMF can be seeded by a 2 SVDs:
http://www.cs.rpi.edu/~boutsc/files/nndsvd.pdf hence it might be
possible to make NNMF fast by combining both strategies (even though
it is probably not as interesting as the state of the art online
dictionary learning stuff).
Vlad, if you decide to start working on sparse PCA, NNMF and friends
be sure to familiarize yourself with this technique first:
http://videolectures.net/nips09_martinsson_mvll/ (this tutorial is a
bit long but excellent)
The manifold module might also benefit from this: if I understand
correctly some of those algorithms are based on SVDs of non linear
similarity matrices of the raw data (I am not sure though).
The spanning col / rows extraction (a.k.a skeleton summary extraction)
mentioned in the tutorial is another really interesting unsupervised
algorithms that would be very interesting to have in the scikit.
Good night (whatever your timezone is :)
---------- Forwarded message ----------
From: <***@github.com>
Date: 2010/11/29
Subject: [Scikit-learn-commits] [scikit-learn/scikit-learn] cd8c6b:
one more test for SVD
To: scikit-learn-***@lists.sourceforge.net
Branch: refs/heads/master
Home: https://github.com/scikit-learn/scikit-learn
Commit: cd8c6b00d390b61aaa7d6fd7a391c128cf132e42
https://github.com/scikit-learn/scikit-learn/commit/cd8c6b00d390b61aaa7d6fd7a391c128cf132e42
Author: Olivier Grisel <***@ensta.org>
Date: 2010-11-28 (Sun, 28 Nov 2010)
Changed paths:
M scikits/learn/utils/tests/test_svd.py
Log Message:
-----------
one more test for SVD
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel