Radu Rusu, is the CEO & Co-Founder of Fyusion, a company that has the goal of building new, visually stunning 3D technologies that would enable them to solve complex visual problems with artificial intelligence. Together they developed and patented a new file format, called a .fyuse, which allows people to capture stunning 3D images from their smartphones, causing a social media sensation and drawing more than 100 million users through consumer mobile applications.
You’ve been working on 3D since 2012, you are currently the President and CEO of Open Perception, Inc. Could you share what the mission statement of this non-profit is?
I began my career in 3D data processing in the early 2000s while doing my postgraduate studies, with this idea in my head that I will make robots see and understand the world better from a visual perspective. That led me through about a decade of robotics-related 3D computer vision research, and in the early 2010s I realized that what I was working on could be applied to a much broader set of problems. Open Perception was created as a spinoff from Willow Garage, and took one of our BSD-licensed, open source initiatives—the Point Cloud Library (PCL) project—and continued to foster its growth. Open Perception, Inc. was incorporated in California in April 2012 as an independent organization created with the purpose of supporting the development, distribution, and adoption of open source software for 2D/3D processing of sensory data, with applications in research, education, and product development.
In 2014, you became the Co-Founder and CEO of Fyusion, Inc. Could you share the genesis story of Fyusion, Inc?
While engaged in robotics research, the cofounders of Fyusion and I realized that the bottlenecks were no longer algorithms but data formats. Machine learning had reached a peak in accuracy around that time in many domains because the type of data we were using, especially in visual formats, was two-dimensional (such as photographs and videos), while the world is three-dimensional. We felt the potential existed to transform the way people understand the world by leveraging 3D data in machine learning platforms.
In 2014, we decided to create a new type of 3D data, generated through computer vision and machine learning software, by fusing together multiple data sources and using extremely scalable commodity hardware available in our pockets—i.e., our smartphones.
We founded Fyusion with the goal of building new, visually stunning 3D technologies that would enable everyone to solve complex visual problems with artificial intelligence.
Together we developed and patented a new file format, called a .fyuse, which allows people to capture stunning 3D images from their smartphones. It immediately caused a social media sensation and drew more than 100 million users through consumer mobile applications.
What initially attracted you to the idea of reinventing the meaning of 3D for consumer applications?
We simply realized nobody had tackled this at scale. It was an unsolved problem. Just like in our PhD programs, the things that excite us intellectually are really complicated problems that someone said can’t be solved.
In this case, to some extent, they were right. The type of algorithms required to solve this were only partially thought through and the hardware required to run them did not exist, especially on edge devices such as smartphones. We actually had to wait until the iPhone4S came out so that we could run real-time 3D computer vision code on a smartphone, because prior to that, iPhones only had one CPU core. Once we started seeing what smartphone hardware could do, we became very interested in taking our computer vision and robotics research expertise and seeing what we could cram into these tiny cameras and CPUs/GPUs. It took a while to go back to the drawing board and rethink how to imagine and implement light field capture and processing all through software. Once we saw it working, Fyusion was off and running.
We used to have 2D photos in analog form, and then they just got digitized with everything else. The only instantiation that we had in the 3D world at scale was a “triangle mesh with a texture” (e.g., OBJ-like file formats) that came from computer games and computer graphics and were meant to represent artificially created objects in a game. They heavily depend on perfect geometry, which is impossible to obtain—how do you capture and represent water as a triangle mesh with a camera? What about transparent objects? Foliage? Things that are far away? And so on…
It was clear someone had to address the need for consumer-friendly 3D formats. It had to be based on a completely different paradigm, and solved in a “3D image rendering” way (i.e., light fields), and incorporate information that is available at the time of capture (such as camera orientation through a gyroscope sensor) that typically gets discarded when you capture a 2D image. And then of course, we are trying to re-infer that discarded information through machine learning.
This was our opportunity, and it’s what startups should dream of: find a really hard problem they are passionate about, wait for the right time and opening, and go crazy trying to solve it.
The core technology allows anyone to create immersive, interactive 3D images called .fyuses by moving any camera around a person, object or scene. Can you discuss the process for someone who wishes to create a fyuses using a mobile app?
We are still in the infancy of this technology, but the gist of it is: You take a smartphone that has an application written by Fyusion or a partner application that is leveraging our Fyusion ALIS SDK underneath, and you open up the camera. You get instructions on what to do, and if you follow them, you obtain a .fyuse on device that is a computer vision and machine learning processed “file object” which you can render on device, on the web, or on any AR/VR/MR headset.
What are some of the computer vision and machine learning technologies that are used to make this a reality?
There really isn’t a silver bullet here, but a vast cocktail of 3D computer vision and machine learning tools that we created for solving this problem. There are ideas from photogrammetry (because effectively we are creating a virtual camera array by moving a single camera in space), robotics (huge sensor fusion problem since we don’t have a single camera anymore, but rather a plethora of sensors that you can pull data from to help solve this problem), computer graphics (you can look into our Siggraph 2019 work to understand how we represent some of the underlying structures), and many more. All of this had to be done on device and runnable in real time, which means we leverage compute shaders and write code in assembly. As mentioned, this is just the beginning, and the more sensors and computational power that becomes available to us, the more we’ll use our ALIS throttle to improve several aspects of the technology. This is a long-term vision, and we have another decade-plus of work in front of us to be fully satisfied by the way digitized complex real-world scenes look.
It’s easy to visualize how .fyuses will be disruptive for VR applications. Can you discuss the type of current VR applications .fyuses can be used in?
We think that ANY VR application where digitizing a real world object and then displaying it is important, should benefit from leveraging our ALIS engine and .fyuses. There’s really no shortage of verticals and applications in ecommerce, healthcare, automotive, education, and beyond, and we’re very excited about this future.
What do you foresee as the future of VR applications for Fyuses?
We don’t see any limitations to the current technology, though our current focus is more on small-to-medium scenes and objects, and not large cityscapes.
I can easily visualize Fyuses being used in future augmented reality (AR) and Mixed reality (MR) applications. What’s your vision for the future of Fyuses in both an AR and MR setting?
We treat all the AR/VR/MR applications exactly the same: Once the 3D object has been digitized using our technology, it can be extracted from the scene and placed anywhere.
Has your team discussed the idea of having Fyuses crafted with a virtual assistant or AI?
We have not explored the opportunity to create interactive virtual avatars for people. This is an interesting possibility for sure, but we’re trying to stay focused on solving the current set of problems that we’re working on.
Is there anything else that you would like to share about Fyuses or Fyusion, Inc?
This might sound like a pitch but… we’re a bunch of crazy roboticists and 3D computer vision scientists, mixed in with CERN physicists, amazing hackers and engineers, and that’s just describing members of the core technical team. We like diversity of all kinds, because that makes us smarter and stronger as a team. If anything we are working on is of interest to anyone reading this, by all means please don’t be shy and get in touch with us. We’re doing our best to answer everyone, and you might find yourself in a situation where you come by for coffee and then stay for a decade.
Thank for the great interviews, readers who wish to learn more should visit Fyusion.
- The Black Box Problem in LLMs: Challenges and Emerging Solutions
- Alex Ratner, CEO & Co-Founder of Snorkel AI – Interview Series
- Circleboom Review: The Best AI-Powered Social Media Tool?
- Stable Video Diffusion: Latent Video Diffusion Models to Large Datasets
- Donny White, CEO & Co-Founder of Satisfi Labs – Interview Series