3D Object Manipulation in a Single Photograph using Stock 3D Models

mxfh · on Aug 5, 2014

Anybody remembers MetaCreations Canoma released in 1999?

Worked also with only one photo.

Extending this into using known 3rd party geometries of identifiable objects instead of reconstructing by hand seems like a very logical extension in retrospect.

http://www.canoma.com/

http://digitalurban.blogspot.de/2006/12/great-software-from-...

As cited in the paper and by Canoma this 1996 paper by Paul Debevec is really where it all started: Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach

http://www.pauldebevec.com/Research/debevec-csd-96-893.pdf

Still very impressive Video: https://www.youtube.com/watch?v=RPhGEiM_6lM

drcode · on Aug 5, 2014

Darn, it looks like it won't be long before photo editing software can (1) Find stock models for all objects in a scene (2) Align them perfectly (3) Let you manipulate them arbitrarily (4) Render an output picture with all the changes applied that is virtually indistinguishable from a real photograph.

Once this happens (and it doesn't look like it'll take long) photography will no longer be an accurate reference for knowledge about the real world.

bhouston · on Aug 5, 2014

> photography will no longer be an accurate reference for knowledge about the real world.

I think we passed that point a few years ago. The trick now is to lower the barrier so that it is more accessible to more people.

The state of the art in rendering looks perfect:

http://www.ronenbekerman.com/inspiration/

Our product, Clara.io, makes it possible to render things like this out in real-time:

https://clara.io/view/bee73adb-ed90-47c0-8048-93accd56ff80/r...

mikeash · on Aug 5, 2014

Ages ago. Photo editing is almost as old as photography. It just used to be a whole lot harder. For a particularly notorious example of high-quality pre-computer "photoshopping", see: http://en.wikipedia.org/wiki/Censorship_of_images_in_the_Sov...

afro88 · on Aug 5, 2014

Still not there for me. The devil is in the detail - the sauce and the strawberries in the food picture, the books in the desk picture. Also, I still haven't seen a rendering of a human being that looks genuinely photorealistic.

I used to love that technology was pushing towards this point when I was a kid. Now it scares me. Maybe I'm getting old..

DanBC · on Aug 6, 2014

> the sauce and the strawberries in the food picture

Be interesting if people could correctly tell which images are renders and which are real, with > 50% accuracy, if they were given 50 sample images.

Because being told that an image is computer generated, and then saying "Yep, it's unrealistic" isn't particularly compelling argument.

afro88 · on Aug 6, 2014

I stand by that what I've mentioned looks fake (especially the strawberries and the sauce) but I think that would be a great exercise. Your brain believes what it expects. I had the expectation that these were renderings so it was easy to pick out flaws.

callumprentice · on Aug 6, 2014

http://alteredqualia.com/xg/examples/deferred_skin.html

Maybe not completely realistic but getting pretty good, especially in a browser.

thomaseng · on Aug 6, 2014

These renderings of a girl look pretty damn realistic to me: http://vimeo.com/40602544

afro88 · on Aug 6, 2014

The are very good. Knowing they're renderings though, you can see it in the face and hair.

DanBC made an interesting comment - it would interesting to see renderings like these in a double blind test with photographs and see how well they stack up.

pja · on Aug 6, 2014

Those are pictures of an adult, not a child. When you use the word "girl" to describe an adult woman you're implicitly belittling her. Don't be that guy.

goblin89 · on Aug 6, 2014

> Those are pictures of an adult, not a child. When you use the word "girl" to describe an adult woman you're implicitly belittling her. Don't be that guy.

Sorry, I'm not a native English speaker (so I'm not confident enough to downvote or anything), but your judging this use of “girl” as female version of “boy”, ignoring the overall mode of expression, doesn't seem adequate. I would have no objections if thomaseng's comment was more formal:

> I would consider these pictures of a girl quite lifelike

But it's not.

If I were the author, and the pictures were of a man, I'd totally say “guy”. Once you flip the gender, “guy” seems to become “girl”, not “woman”. (Again, given the overall informal style used.)

And as for the word “guy”, it doesn't sound in any way belittling a grown-up man (and you just used it yourself).

pja · on Aug 6, 2014

I would say that the male equivalent of "girl" is "boy", not "guy".

The English language is often unhelpful in that exact equivalents of the word you want that exist for one gender don't exist for the other, or else carry other connotations. Master vs Mistress for instance.

barrkel · on Aug 6, 2014

And the female equivalent of guy is girl. The word 'girl' genuinely has more shades than the submissive little box you want to put it in.

nkozyra · on Aug 6, 2014

Did tumblr buy HN?

atomical · on Aug 5, 2014

How fast should I be able to rotate? It's a little slow for me.

bhouston · on Aug 5, 2014

The real-time V-Ray raytracing is slow because it is very high quality -- it is doing real global illumination with no precalculations, favoring quality over speed.

For client side WebGL, click the Real-Time tab.

sovok · on Aug 5, 2014

It would be cool if the position and rotation would be synchronized between the two, so I could choose a nice view in the Real-Time tab and switch over to Photo Realistic to get the raytraced version.

bhouston · on Aug 6, 2014

It works that way in the editor (although you need a Clara.io account and you need to be logged in) if you set up multiple viewports with the same camera.

moubarak · on Aug 5, 2014

2 can already be done using active shape models.

http://www.isbe.man.ac.uk/~bim/Models/asms.html

daredevildave · on Aug 6, 2014

Did you know that much of the "photography" in shopping catalogues is already CGI?

http://www.dezeen.com/2013/11/27/cgi-renderings-for-catalogu...

http://petapixel.com/2012/08/24/ikea-slowly-shedding-photogr...

rasz_pl · on Aug 7, 2014

You just described holy grail in 3D vision, AI and robotics. Being able to make sense of the world from a 2D image is one of the cornerstones for building artificial brain.

rlpb · on Aug 6, 2014

I wonder when this will happen for video. At some point, video evidence (from unverified third parties) will have to become inadmissible.

Guvante · on Aug 5, 2014

To be fair naive analysis for photography hasn't been accurate for a long time and these methods will leave traces.

benwen · on Aug 5, 2014

Reminds me of the Running Man (1987) scene where, in supposed real-time, a video production editor synthetically composes Arnold Schwarzenegger's and Jesse Ventura's characters together in a deathmatch. One would have to go from rigid-component origami birds on static frames in this CMU paper to semi-solid human figures on moving frames in the movie. 3D models of famous actors' bodies are already made for special effects, painstakingly rendered and composited together in batch mode.

(Personal recollection: there was a solid model Shaq's head at 3d modeling company Viewpoint Datalabs back in the day. His head is huge.)

Stills from Running Man taken at about 01:19 - http://imgur.com/rQlxigG

bhouston · on Aug 5, 2014

This is a neat approach. Basically it is a combination of:

(1) Fitting 3D stock models to existing models using a simple but interactive ray casting approach.

(2) Estimating soft lighting on objects fairly convincingly.

(3) Re-rendering the stock models using the artificial lighting and textures of the original photographs.

It is a pretty cool approach. There are real limitations to this but I think that the automated lighting estimate is just cool and has wide applications in the visual effects space.

modulus1 · on Aug 5, 2014

And they appear to have some way to fill in the part of the photo occluded by the cut-out objects.

bhouston · on Aug 5, 2014

They never seem to mention that in the paper, at least not prominently as I of course skimmed it today. But Photoshop already has a built in tool for this, so I guess they can just use the standard methods that seem to work fairly well.

smackfu · on Aug 5, 2014

"We compute a mask for the object pixels, and use this mask to inpaint the background using the PatchMatch algorithm [Barnes et al. 2009]"

PaintMatch algorithm: http://gfx.cs.princeton.edu/pubs/Barnes_2009_PAR/index.php

bhouston · on Aug 5, 2014

Yes, that is the one that Photoshop has adopted and renamed " Content Aware Fill"! Details: http://www.adobe.com/technology/projects/patchmatch.html

cbhl · on Aug 5, 2014

Judging from the YouTube videos, the novel part is that they can fill out the part that is occluded from the photo (either using textures from the 3D model, or by using InPaint) because they refer to earlier work that already lets you cut out and manipulate the objects using 3D models.

thearn4 · on Aug 6, 2014

Yep, they probably used an off-the-shelf method for that. It's often called image interpolation or "inpainting" (https://en.wikipedia.org/wiki/Inpainting).

dm2 · on Aug 5, 2014

This is very impressive, but were the fingers behind the paper crane drawn in by hand? I don't see how any algorithm could create that kind of content.

I'd really like to see a video of someone starting with an image and using these algorithms and tools to create one of these effects from start to finish.

nkurz · on Aug 5, 2014

In the full paper they say "We use a separately captured background photograph for the chair, while for all other photos, we fill the background using Context-Aware Fill in Photoshop."

So I think the fingers indeed were filled in algorithmically. This is plausible since, as best as I can tell, current Context Aware Fill algorithms are based on magic.

Follow some of the links here for examples and detailed explanation: http://www.adobe.com/technology/projects/content-aware-fill....

dchichkov · on Aug 6, 2014

Yes, they are based on magic!

As per wikipedia. Imagination, also called the faculty of imagining, is the ability to form new images and sensations that are not perceived through senses such as sight, hearing, or other senses. Imagination is magic. Everyone knows that. So any generative model in general is ;)

dm2 · on Aug 6, 2014

"Any sufficiently advanced technology is indistinguishable from magic."

https://en.wikipedia.org/wiki/Clarke's_three_laws

bhouston · on Aug 5, 2014

The background is filled via this technique: https://news.ycombinator.com/item?id=8138678

wildpeaks · on Aug 6, 2014

If you like this kind of effect, you should also check out VideoCopilot because inserting 3D objects on top of reference images or video is a recurring use of After Effects (it even ships with a lite version of Cinema4D now).

Example with a 3D truck: http://www.videocopilot.net/tutorials/3d_truck_compositing/

This and Photoshop's context-aware fill (to help fill the holes left from removing the object in the reference image) are very handy to achieve such effects.

Osmium · on Aug 6, 2014

Ah, it's time for SIGGRAPH again! Excellent. As a layman, I always look forward to the new "looks like magic" results that come out of there.

bakbek · on Aug 6, 2014

Taking this approach geared towards pre-made 2d still imagery and implement on rendered stills out of a 3d model and some serious MAGIC can take place!

In this scenario you already have all 3d elements in hand, so no need to look for them, as well as the complete environment. lots of things that called for re-rendering can be done with this approach post render.

phkahler · on Aug 5, 2014

We need a way to do digital signatures on images such that they cannot be faked. It should verify the image, location, time, and serial number. I know, this seems impossible since someone (the camera) needs to know the private key and that could be compromised.

terhechte · on Aug 5, 2014

I've thought about this before, and it is actually pretty easy. You just do a hash of the image and push it into the bitcoin blockchain as a transaction. done. Only downside is that it will cost a wee bit of money and that you need to be connected to the internet at the point when you shoot the image (or, at the point where you want to have the image verified). See:

http://www.proofofexistence.com/about

darkmighty · on Aug 5, 2014

A composite solution: The camera's chip produces two signatures for each image: one fairly resistant to cropping, color correction, rotation, etc. and resistant to forging; and a second, pixel-perfect one. The photographer that wishes to be able to claim authenticity uploads the photos to the camera makers website, which verfifies the authenticity of both signatures and publishes both in a secure database indexed by the "editable" signature (and his name, if he wants proof of authorship).

Now, if the author wants to claim authenticity or ownership of a picture, he just has to present the original picture so that people can attest it is not significantly modified and/or he is indeed the author.

Of course, reading the private keys on the chip has to be very hard.

bhouston · on Aug 5, 2014

That only establishes the date of registration, not whether it was taken with a camera at a specific location on a specific date.

terhechte · on Aug 5, 2014

Oh right, I had only thought about proofing that an image hasn't been tampered with after a certain date, but of course it could have been tampered with before that.

acdha · on Aug 5, 2014

> We need a way to do digital signatures on images such that they cannot be faked.

This is almost impossible – the camera's processor can be tampered with and the environment can be altered (e.g. GPS spoofing).

The best we could do would be a notary service where a trusted third-party could produce a signature for a set of bits at a particular time. That would prevent either altering an unwilling third-party's photos or back-dating images after an important event.

All these are unreliable to an extent which suggests there will probably be a fair market for forensic photography software in the future…

kibibu · on Aug 6, 2014

One way that you can test for fakery with digital images is that CCDs have their own signature noise patterns:

See: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.186.... http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.186....

RyJones · on Aug 5, 2014

Nikon had something similar for the D2X in 2006. I assume there are newer versions for newer cameras, but I didn't look further.

http://imaging.nikon.com/lineup/software/img_auth/index.htm

sadly the spec does not say how it works.

Broken:

http://www.photographybay.com/2011/04/29/nikon-image-authent...

kens · on Aug 5, 2014

You'd need to put the key into the camera's processor chip, where it would be hard (but not impossible) to compromise. A minor problem is that any cropping, gamma correction, etc would invalidate the signature. A bigger problem I see is that you could print the fake scene on a big piece of paper, take a picture of that, and the digital signature would be totally valid.

phkahler · on Aug 6, 2014

>> A bigger problem I see is that you could print the fake scene on a big piece of paper, take a picture of that, and the digital signature would be totally valid.

That's why I want the date and camera SN to be authenticated as well. A photo of an actual event could be shown to have the correct date, while a photo of an altered print would have a later date.

stevebot · on Aug 5, 2014

This is cool. As a non-photo editter, can someone explain to me this statement?

"the user (c) interactively aligns the model to the photograph and provides a mask for the ground and shadow"

What is a mask for ground and shadow and how hard is it to develop one?

bhouston · on Aug 5, 2014

A mask in photoediting is usually a gray scale image that is white where you want to remove things and black where you want to keep things and often is in-between where you want soft edges.

To create these in photoshop you can use the magic wand tool and it selects things with similar colors. But you can create these types of masks in a variety of ways.

moubarak · on Aug 5, 2014

that means the user marks the boundary of the object to be manipulated, the rest of the image (including the object's shadow) will be treated as the background. This separation is called a mask in image processing lingo.

jgreen10 · on Aug 6, 2014

Why do the legs of the rendered chair connect at the bottom, while those are hidden in the picture and the stock 3D model does not show them?

DanBC · on Aug 6, 2014

They use publicly available 3d models. I'm more interested in how they get the colour of the hidden legs and hidden base of the chair correct.

3rd3 · on Aug 6, 2014

They fill hidden parts according to symmetries they find in the object’s geometry or make use of model’s textures or user-defined input if there is no symmetry.

> For areas of the object that do not satisfy the criteria of geometric symmetry and appearance similarity, such as the underside of the taxi cab in Figure 1, the assignment defaults to the stock model appear- ance. The assignment also defaults to the stock model appearance when after several iterations, the remaining parts of the object are partitioned into several small areas where the object lacks structural symmetries relative to the visible areas. In this case, we allow the user to fill the appearance in these areas on the texture map of the 3D model using PatchMatch.

jbhatab · on Aug 5, 2014

I tried downloading this and running it but they both had issues. Any way I can get on an email listing for when it is officially launched?

syshen · on Aug 6, 2014

you can also check out this app, https://itunes.apple.com/us/app/insta3d-instantly-create-you... , which also turns a person's selfie picture into a 3D avatar model.

nileshtrivedi · on Aug 6, 2014

So if you have a 3D model of a person, you can make completely fake videos of them doing something?

ygra · on Aug 6, 2014

I guess that was possible beforehand. Just take a look at movies.