Connect with us

Unite.AI

Kunal Kejriwal

"An engineer by profession, a writer by heart". Kunal is a technical writer with a deep love & understanding of AI and ML, dedicated to simplifying complex concepts in these fields through his engaging and informative documentation.

Artificial Intelligence11 months ago

SHOW-O: A Single Transformer Uniting Multimodal Understanding and Generation

Significant advancements in large language models (LLMs) have inspired the development of multimodal large language models (MLLMs). Early MLLM efforts, such as LLaVA, MiniGPT-4, and InstructBLIP,...
Artificial Intelligence1 year ago

EAGLE: Exploring the Design Space for Multimodal Large Language Models with a Mixture of Encoders

The ability to accurately interpret complex visual information is a crucial focus of multimodal large language models (MLLMs). Recent work shows that enhanced visual perception significantly...
Artificial Intelligence1 year ago

Sapiens: Foundation for Human Vision Models

The remarkable success of large-scale pretraining followed by task-specific fine-tuning for language modeling has established this approach as a standard practice. Similarly, computer vision methods are...
Artificial Intelligence1 year ago

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

Current long-context large language models (LLMs) can process inputs up to 100,000 tokens, yet they struggle to generate outputs exceeding even a modest length of 2,000...
Artificial Intelligence1 year ago

SGLang: Efficient Execution of Structured Language Model Programs

Large language models (LLMs) are increasingly utilized for complex tasks requiring multiple generation calls, advanced prompting techniques, control flow, and structured inputs/outputs. However, efficient systems for...
Artificial Intelligence1 year ago

MINT-1T: Scaling Open-Source Multimodal Data by 10x

Training frontier large multimodal models (LMMs) requires large-scale datasets with interleaved sequences of images and text in free form. Although open-source LMMs have evolved rapidly, there...
Artificial Intelligence1 year ago

DIAMOND: Visual Details Matter in Atari and Diffusion for World Modeling

It was in 2018, when the idea of reinforcement learning in the context of a neural network world model was first introduced, and soon, this fundamental...
Artificial Intelligence1 year ago

In-Paint3D: Image Generation using Lightning Less Diffusion Models

The advent of deep generative AI models has significantly accelerated the development of AI with remarkable capabilities in natural language generation, 3D generation, image generation, and...
Artificial Intelligence1 year ago

MARKLLM: An Open-Source Toolkit for LLM Watermarking

LLM watermarking, which integrates imperceptible yet detectable signals within model outputs to identify text generated by LLMs, is vital for preventing the misuse of large language...
Artificial Intelligence1 year ago

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Owing to its robust performance and broad applicability when compared to other methods, LoRA or Low-Rank Adaption is one of the most popular PEFT or Parameter...
Artificial Intelligence1 year ago

LightAutoML: AutoML Solution for a Large Financial Services Ecosystem

Although AutoML rose to popularity a few years ago, the ealy work on AutoML dates back to the early 90’s when scientists published the first papers...
Artificial Intelligence1 year ago

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

The recent progress and advancement of Large Language Models has experienced a significant increase in vision-language reasoning, understanding, and interaction capabilities. Modern frameworks achieve this by...
Artificial Intelligence1 year ago

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

The recent advancements in the architecture and performance of Multimodal Large Language Models or MLLMs has highlighted the significance of scalable data and models to enhance...
Artificial Intelligence1 year ago

MambaOut: Do We Really Need Mamba for Vision?

In modern machine learning and artificial intelligence frameworks, transformers are one of the most widely used components across various domains including GPT series, and BERT in...
Artificial Intelligence1 year ago

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

Recent frameworks attempting at text to video or T2V generation leverage diffusion models to add stability in their training process, and the Video Diffusion Model, one...

More Posts

Page 1 of 612 3 4 5 6