-
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 53 -
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Paper • 2404.16821 • Published • 58 -
Physically Grounded Vision-Language Models for Robotic Manipulation
Paper • 2309.02561 • Published • 9 -
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Paper • 2409.02813 • Published • 32
Mayank Kumar
mayankgrd
·
AI & ML interests
None yet
Recent Activity
updated
a dataset
16 days ago
mayankgrd/lerobot_so101_test
published
a dataset
16 days ago
mayankgrd/lerobot_so101_test
updated
a model
7 months ago
mayankgrd/medgemma-4b-it-sft-lora-crc100k
Organizations
None yet