Hacker Newsnew | past | comments | ask | show | jobs | submit | tejask's commentslogin

Yes this is very much inspired by Geoff's work.


This is an interesting question. Technically, we capture a probability distribution in the code layer (between encoder and decoder). So you can sample from it multiple times and assess uncertainty. However, we have not really studied this.


thanks for the references! I like that many people are doing such things. After looking at the chairs paper, it seems like they render images given pose,shape,view etc (supervised setting). However, in our model, there is a twist as it is trained either completely unsupervised or biased to separate those variables (but it is never given the true values of those parameters ... just raw data).


One of the authors here. You are absolutely right! In fact, I am currently doing something similar but it is not working as well yet. As far as this work is concerned, we wanted to see how model-free can we go.


I don’t understand much of the paper but it looks awesome! I have two questions: Am I understanding it correctly that one would need to convert the internal representation to a textured triangle mesh in order to use ray tracing in the decoder stage? Is the encoder effectively similar to scene reconstruction via structure from motion?


there are many ways to parametrize the decoder. One of the ways is to constrain it to output an explicit mesh or volumetric representation and express the rendering pipeline so that it's differentiable. The encoder will then effectively learn an "inference algorithm" to get the best output. A feedforward neural network is not enough and recurrent computations will eventually be necessary.


Can you explain a bit more why the recurrent network structure becomes necessary at some point? Is that because reversing a CNN naturally means rendering by (de)convolution?


In order to approximately learn a "real" graphics engine with support for basic physics, just feed-forward computation might not be sufficient. A more natural way to learn graphics/physics might be to learn the temporal structure more explicitly. On the other hand, it might also be interesting to just add temporal convolution-deconvolution structure in the existing model. This is work in progress though.


In summary, the most interesting part for the general audience might be the following question -- can we learn a 3D rendering engine just from images or videos without any hand-engineering?

Apart from the interesting applications for computer graphics (like rendering novel viewpoints of an object from various viewpoints), this can also be directly used for vision applications. This is because computer vision can be thought of as the inverse of computer graphics.

Goal of computer graphics: scene description -> images

and

Goal of vision: images -> scene description.

Therefore, training a neural network to behave like a graphics engine is interesting from both these perspectives. We are a LONG way from even scratching the surface.


How long has this idea of making a 3D engine from conv nets been researched?


To the best of my knowledge, not much at all. It is an open question. Besides, a feedforward net is not going to be enough.


The first advice you give is a good reminder for people like me who have tons of unfinished projects.


My programming experience was unexpressed before college as I was mostly interested in basic sciences - mainly physics. It was only after high school did I seriously get into CS type of stuff as I started developing an intesrest in AI/algorithms. So for me, it is not hard to believe that there could be a lot of people with a hidden aptitude for programming. After all, programming is a means to an end. After high school, understanding the brain and developing AI was my "end", which obviously requires strong programming skills and thus subsequently triggered a stronger interest for programming.


I started programming late into highschool as well, and only a TI83 at that. My first rendezvous with network/gui programming started in college (though I was a CS major). Now, more than half a decade after graduation and working in silicon valley, I'm hacking on more serious projects like writing operating systems. I don't think starting late has affected me negatively too much. Just keep at it, you'll reach your full potential before you know it.


This is awesome!


my thoughts exactly


For an average teenager who is about to go to college, online education is a less favorable choice as it would probably be deemed less prestigious due to issues mentioned in the article. In the near future, the education industry could primarily influence markets which deal with professional or people who would use it for the sake of learning (few and difficult to indulge them over long periods).


Good concept. I would have liked to see a demo page (or video) before subscribing.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: