Connect with us

All posts tagged "Multimodal Large Language Model"

Artificial Intelligence2 months ago
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
The recent progress and advancement of Large Language Models has experienced a significant increase in vision-language reasoning, understanding, and interaction capabilities. Modern frameworks achieve this by...
Artificial Intelligence2 months ago
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
The recent advancements in the architecture and performance of Multimodal Large Language Models or MLLMs has highlighted the significance of scalable data and models to enhance...
Artificial Intelligence3 months ago
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
The advancements in large language models have significantly accelerated the development of natural language processing, or NLP. The introduction of the transformer framework proved to be...
Artificial Intelligence5 months ago
Mobile-Agents: Autonomous Multi-modal Mobile Device Agent With Visual Perception
The advent of Multimodal Large Language Models (MLLM) has ushered in a new era of mobile device agents, capable of understanding and interacting with the world...
Artificial Intelligence5 months ago
Guiding Instruction-Based Image Editing via Multimodal Large Language Models
Visual design tools and vision language models have widespread applications in the multimedia industry. Despite significant advancements in recent years, a solid understanding of these tools...
Artificial General Intelligence5 months ago
Exploring Gemini 1.5: How Google’s Latest Multimodal AI Model Elevates the AI Landscape Beyond Its Predecessor
In the rapidly evolving landscape of artificial intelligence, Google continues to lead with its pioneering developments in multimodal AI technologies. Shortly after the debut of Gemini...
Artificial Intelligence6 months ago
Ferret: Refer and Ground at Any Granularity
Enabling spatial understanding in vision-language learning models remains a core research challenge. This understanding underpins two crucial capabilities: grounding and referring. Referring enables the model to...