I don't have great answers to the other questions, though I too am interested in them.
#5) [1] has a some python code and timings mixed in to the docs. One such example (stacked denoising autoencoders on MNIST):
By default the code runs 15 pre-training epochs for each layer,
with a batch size of 1. The corruption level forthe first layer is
0.1, for the second 0.2 and 0.3 for the third. The pretraining
learning rate is was 0.001 and the finetuning learning rate is
0.1. Pre-training takes 585.01 minutes, with an average of 13
minutes per epoch. Fine-tuning is completed after 36 epochs in
444.2 minutes, with an average of 12.34 minutes per epoch. The
final validation score is 1.39% with a testing score of
1.3%. These results were obtained on a machine with an Intel Xeon
E5430 @ 2.66GHz CPU, with a single-threaded GotoBLAS.
#6) The size of the NN is not typically num_features * num_classes, but rather num_features * num_layers where num_layers is commonly 3-10 or so. If you want a (multi-class) classifier, you first feed your neural network a bunch of examples, unsupervised. Then once you've got your NN built, you feed the outputs of the NN to a classifier like SVM or SGD. The idea is that the net provides more meaningful features than you would have if you used hand crafted features or the raw input data itself.
#5) [1] has a some python code and timings mixed in to the docs. One such example (stacked denoising autoencoders on MNIST):
#6) The size of the NN is not typically num_features * num_classes, but rather num_features * num_layers where num_layers is commonly 3-10 or so. If you want a (multi-class) classifier, you first feed your neural network a bunch of examples, unsupervised. Then once you've got your NN built, you feed the outputs of the NN to a classifier like SVM or SGD. The idea is that the net provides more meaningful features than you would have if you used hand crafted features or the raw input data itself.[1] http://deeplearning.net/tutorial/SdA.html#sda