This is a neat approach. Basically it is a combination of:
(1) Fitting 3D stock models to existing models using a simple but interactive ray casting approach.
(2) Estimating soft lighting on objects fairly convincingly.
(3) Re-rendering the stock models using the artificial lighting and textures of the original photographs.
It is a pretty cool approach. There are real limitations to this but I think that the automated lighting estimate is just cool and has wide applications in the visual effects space.
They never seem to mention that in the paper, at least not prominently as I of course skimmed it today. But Photoshop already has a built in tool for this, so I guess they can just use the standard methods that seem to work fairly well.
Judging from the YouTube videos, the novel part is that they can fill out the part that is occluded from the photo (either using textures from the 3D model, or by using InPaint) because they refer to earlier work that already lets you cut out and manipulate the objects using 3D models.
(1) Fitting 3D stock models to existing models using a simple but interactive ray casting approach.
(2) Estimating soft lighting on objects fairly convincingly.
(3) Re-rendering the stock models using the artificial lighting and textures of the original photographs.
It is a pretty cool approach. There are real limitations to this but I think that the automated lighting estimate is just cool and has wide applications in the visual effects space.