Intel Labs introduces new AI Diffusion Model for 3D
In collaboration with Blockade Labs, Intel Labs has introduced the Latent Diffusion Model for 3D (LDM3D), a novel model that uses generative AI to create realistic 3D visual content.
LDM3D is the industry's first model to generate a depth map using the diffusion process to create 3D images with 360-degree views that are vivid and immersive.
LDM3D is designed to revolutionise content creation, metaverse applications and digital experiences, transforming various industries, from entertainment and gaming to architecture and design.
Vasudev Lal, the AI/ML Research Scientist at Intel Labs, says: "Generative AI technology aims to further augment and enhance human creativity and save time."
"However, most of today's generative AI models are limited to generating 2D images, and only very few can generate 3D images from text prompts."
"Unlike existing latent stable diffusion models, LDM3D allows users to generate an image and a depth map from a given text prompt using almost the same number of parameters."
"It provides more accurate relative depth for each pixel in an image compared to standard post-processing methods for depth estimation and saves developers significant time to develop scenes," says Lal.
Intel's commitment to democratising AI aims to enable broader access to the benefits of AI through an open ecosystem.
Unlike existing diffusion models, which generally only generate 2D RGB images from text prompts, LDM3D allows users to generate both an image and a depth map from a given text prompt.
Using almost the same number of parameters as latent stable diffusion, LDM3D provides more accurate relative depth for each pixel in an image compared to standard post-processing methods for depth estimation.
The images and depth maps generated by LDM3D enable users to turn the text description of a serene tropical beach, a modern skyscraper or a sci-fi universe into a 360-degree detailed panorama.
This ability to capture depth information can instantly enhance overall realism and immersion, enabling innovative applications for industries ranging from entertainment and gaming to interior design and real estate listings, as well as virtual museums and immersive virtual reality (VR) experiences.
LDM3D was trained on a dataset constructed from a subset of 10,000 samples of the LAION-400M database, which contains over 400 million image-caption pairs. The team annotated the training corpus using the Dense Prediction Transformer (DPT) large-depth estimation model (previously developed at Intel Labs).
The LDM3D model is trained on an Intel AI supercomputer powered by Intel Xeon processors and Intel Habana Gaudi AI accelerators. The resulting model and pipeline combine generated RGB image and depth map to generate 360-degree views for immersive experiences.
To demonstrate the potential of LDM3D, Intel and Blockade researchers developed DepthFusion, an application that leverages standard 2D RGB photos and depth maps to create immersive and interactive 360-degree view experiences.
DepthFusion uses TouchDesigner, a node-based visual programming language for real-time interactive multimedia content, to turn text prompts into interactive, immersive digital experiences.
The introduction of LDM3D and DepthFusion paves the way for further advancements in multi-view generative AI and computer vision.
Intel will continue exploring the use of generative AI to augment human capabilities and build a robust ecosystem of open-source AI research and development that democratises access to this technology.
LDM3D is being open-sourced through HuggingFace, allowing AI researchers and practitioners to improve this system further and fine-tune it for custom applications.