Human perception is beyond recognition and reconstruction. From a single image, we’re able to explain what we see, reconstruct the scene in 3D, instantly identify repeating and symmetric objects, and also imagine how the image would have looked like if the objects within were in a different position or texture. In this talk, I will present our recent work on reconstructing, generating, and manipulating objects and scenes from visual input. The core idea is to exploit generic, causal structures behind the world, often realized in computer graphics as surfaces, objects, and procedures, and to integrate them with deep learning. I’ll focus on a few topics to demonstrate this idea: reconstructing shapes from a single image for objects outside training categories, generating shapes and their corresponding texture, and integrating reconstruction and generation for 3D-aware scene manipulation.

About the Speaker

Jiajun Wu is an Assistant Professor of Computer Science at Stanford University, working on computer vision, machine learning, and computational cognitive science. Before joining Stanford, he was a Visiting Faculty Researcher at Google Research. He received his PhD in Electrical Engineering and Computer Science at Massachusetts Institute of Technology. Wu’s research has been recognized through the ACM Doctoral Dissertation Award Honorable Mention, the MIT George M. Sprowls PhD Thesis Award in Artificial Intelligence and Decision-Making, the IROS Best Paper Award on Cognitive Robotics, and fellowships from Facebook, Nvidia, Samsung, and Adobe.