David Marek
2012-05-14 22:12:34 UTC
Hi,
I have worked on multilayer perceptron and I've got a basic
implementation working. You can see it at
https://github.com/davidmarek/scikit-learn/tree/gsoc_mlp The most
important part is the sgd implementation, which can be found here
https://github.com/davidmarek/scikit-learn/blob/gsoc_mlp/sklearn/mlp/mlp_fast.pyx
I have encountered a few problems and I would like to know your opinion.
1) There are classes like SequentialDataset and WeightVector which are
used in sgd for linear_model, but I am not sure if I should use them
here as well. I have to do more with samples and weights than just
multiply and add them together. I wouldn't be able to use numpy
functions like tanh and do batch updates, would I? What do you think?
Am I missing something that would help me do everything I need with
SequentialDataset? I implemented my own LossFunction because I need a
vectorized version, I think that is the same problem.
2) I used Andreas' implementation as an inspiration and I am not sure
I understand some parts of it:
* Shouldn't the bias vector be initialized with ones instead of
zeros? I guess there is no difference.
* I am not sure why is the bias updated with:
bias_output += lr * np.mean(delta_o, axis=0)
shouldn't it be:
bias_output += lr / batch_size * np.mean(delta_o, axis=0)?
* Shouldn't the backward step for computing delta_h be:
delta_h[:] = np.dot(delta_o, weights_output.T) * hidden.doutput(x_hidden)
where hidden.doutput is a derivation of the activation function for
hidden layer?
I hope my questions are not too stupid. Thank you.
David
I have worked on multilayer perceptron and I've got a basic
implementation working. You can see it at
https://github.com/davidmarek/scikit-learn/tree/gsoc_mlp The most
important part is the sgd implementation, which can be found here
https://github.com/davidmarek/scikit-learn/blob/gsoc_mlp/sklearn/mlp/mlp_fast.pyx
I have encountered a few problems and I would like to know your opinion.
1) There are classes like SequentialDataset and WeightVector which are
used in sgd for linear_model, but I am not sure if I should use them
here as well. I have to do more with samples and weights than just
multiply and add them together. I wouldn't be able to use numpy
functions like tanh and do batch updates, would I? What do you think?
Am I missing something that would help me do everything I need with
SequentialDataset? I implemented my own LossFunction because I need a
vectorized version, I think that is the same problem.
2) I used Andreas' implementation as an inspiration and I am not sure
I understand some parts of it:
* Shouldn't the bias vector be initialized with ones instead of
zeros? I guess there is no difference.
* I am not sure why is the bias updated with:
bias_output += lr * np.mean(delta_o, axis=0)
shouldn't it be:
bias_output += lr / batch_size * np.mean(delta_o, axis=0)?
* Shouldn't the backward step for computing delta_h be:
delta_h[:] = np.dot(delta_o, weights_output.T) * hidden.doutput(x_hidden)
where hidden.doutput is a derivation of the activation function for
hidden layer?
I hope my questions are not too stupid. Thank you.
David