Connect with us

Unite.AI

Kunal Kejriwal

"An engineer by profession, a writer by heart". Kunal is a technical writer with a deep love & understanding of AI and ML, dedicated to simplifying complex concepts in these fields through his engaging and informative documentation.

Artificial Intelligence March 12, 2024

InstantID: Zero-shot Identity-Preserving Generation in Seconds

AI-powered image generation technology has witnessed remarkable growth in the past few years ever since large text to image diffusion models like DALL-E, GLIDE, Stable Diffusion,...
Artificial Intelligence February 26, 2024

Mobile-Agents: Autonomous Multi-modal Mobile Device Agent With Visual Perception

The advent of Multimodal Large Language Models (MLLM) has ushered in a new era of mobile device agents, capable of understanding and interacting with the world...
Artificial Intelligence February 23, 2024

Guiding Instruction-Based Image Editing via Multimodal Large Language Models

Visual design tools and vision language models have widespread applications in the multimedia industry. Despite significant advancements in recent years, a solid understanding of these tools...
Artificial Intelligence February 22, 2024

OLMo: Enhancing the Science of Language Models

The development and progress of language models in the past few years have marked their presence almost everywhere, not only in NLP research but also in...
Artificial Intelligence February 13, 2024

HD-Painter: High Resolution Text-Guided Image Inpainting with Diffusion Models

Diffusion models have undoubtedly revolutionized the AI and ML industry, with their applications in real-time becoming an integral part of our everyday lives. After text-to-image models showcased...
Artificial Intelligence February 8, 2024

TinySAM : Pushing the Boundaries for Segment Anything Model

Object segmentation is a foundational and critically important field in modern computer vision. It plays a vital role in applications requiring extensive visual components, such as...
Artificial Intelligence February 5, 2024

OpenVoice: Versatile Instant Voice Cloning

In Text-to-Speech synthesis (TTS), Instant Voice Cloning (IVC) enables the TTS model to clone the voice of any reference speaker using a short audio sample, without...
Artificial Intelligence January 25, 2024

Visual Instruction Tuning for Pixel-Level Understanding with Osprey

With the recent enhancement of visual instruction tuning methods, Multimodal Large Language Models (MLLMs) have demonstrated remarkable general-purpose vision-language capabilities. These capabilities make them key building...
Artificial Intelligence January 23, 2024

Paint3D : Lighting-Less Diffusion Model for Image Generation

The rapid development of AI Generative models, especially deep generative AI models, has significantly advanced capabilities in natural language generation, 3D generation, image generation, and speech...
Artificial Intelligence January 19, 2024

How Single-View 3D Reconstruction Works?

Traditionally, models for single-view object reconstruction built on convolutional neural networks have shown remarkable performance in reconstruction tasks. In recent years, single-view 3D reconstruction has emerged...
Artificial Intelligence January 17, 2024

PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

Due to their exceptional content creation capabilities, Generative Large Language Models are now at the forefront of the AI revolution, with ongoing efforts to enhance their...
Artificial Intelligence January 16, 2024

Ferret: Refer and Ground at Any Granularity

Enabling spatial understanding in vision-language learning models remains a core research challenge. This understanding underpins two crucial capabilities: grounding and referring. Referring enables the model to...
Artificial Intelligence January 12, 2024

Splatter Image: Ultra-Fast Single-View 3D Reconstruction

Single-view 3D object reconstruction with convolutional networks have demonstrated remarkable capabilities. Single-view 3D reconstruction models generate the 3D model of any object using a single image...
Artificial Intelligence January 4, 2024

StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

Due to its vast potential and commercialization opportunities, particularly in gaming, broadcasting, and video streaming, the Metaverse is currently one of the fastest-growing technologies. Modern Metaverse...
Artificial Intelligence January 2, 2024

Self-Attention Guidance: Improving Sample Quality of Diffusion Models

Denoising Diffusion Models are generative AI frameworks that synthesize images from noise through an iterative denoising process. They are celebrated for their exceptional image generation capabilities...

More Posts

Page 3 of 61 234 5 6