I don't have great answers to the other questions, though I too am interested in...

I don't have great answers to the other questions, though I too am interested in them.

#5) [1] has a some python code and timings mixed in to the docs. One such example (stacked denoising autoencoders on MNIST):

    By default the code runs 15 pre-training epochs for each layer,             
    with a batch size of 1. The corruption level forthe first layer is          
    0.1, for the second 0.2 and 0.3 for the third. The pretraining              
    learning rate is was 0.001 and the finetuning learning rate is              
    0.1. Pre-training takes 585.01 minutes, with an average of 13               
    minutes per epoch. Fine-tuning is completed after 36 epochs in              
    444.2 minutes, with an average of 12.34 minutes per epoch. The              
    final validation score is 1.39% with a testing score of                     
    1.3%. These results were obtained on a machine with an Intel Xeon           
    E5430 @ 2.66GHz CPU, with a single-threaded GotoBLAS.

#6) The size of the NN is not typically num_features * num_classes, but rather num_features * num_layers where num_layers is commonly 3-10 or so. If you want a (multi-class) classifier, you first feed your neural network a bunch of examples, unsupervised. Then once you've got your NN built, you feed the outputs of the NN to a classifier like SVM or SGD. The idea is that the net provides more meaningful features than you would have if you used hand crafted features or the raw input data itself.

[1] http://deeplearning.net/tutorial/SdA.html#sda