7 31 41

Manan Shah

cs-mshah

https://cs-mshah.github.io/

AI & ML interests

Computer Vision

Recent Activity

upvoted a paper about 2 hours ago

Evaluating Parameter Efficient Methods for RLVR

liked a dataset about 3 hours ago

genrobot2025/10Kh-RealOmin-OpenData

liked a dataset 4 days ago

Daniellesry/TransPhy3D

View all activity

Organizations

upvoted a paper about 2 hours ago

Evaluating Parameter Efficient Methods for RLVR

Paper • 2512.23165 • Published 7 days ago • 23

upvoted 2 papers 5 days ago

ProEdit: Inversion-based Editing From Prompts Done Right

Paper • 2512.22118 • Published 9 days ago • 17

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Paper • 2512.23576 • Published 6 days ago • 63

upvoted a paper 8 days ago

Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

Paper • 2512.20557 • Published 12 days ago • 49

upvoted an article 11 days ago

Article

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

Jun 3, 2025

•

305

upvoted 2 articles about 1 month ago

Article

We Got Claude to Fine-Tune an Open Source LLM

Dec 4, 2025

•

558

Article

Continuous batching from first principles

Nov 25, 2025

•

293

upvoted 2 collections about 2 months ago

MetaCLIP2 Multilingual

Collection

8 items • Updated Nov 12, 2025 • 16

📄 FinePDFs

Collection

81 items • Updated Nov 11, 2025 • 26

upvoted a paper 3 months ago

Robot Learning: A Tutorial

Paper • 2510.12403 • Published Oct 14, 2025 • 119

upvoted an article 3 months ago

Article

Metric and Relative Monocular Depth Estimation: An Overview. Fine-Tuning Depth Anything V2 👐 📚

Jul 10, 2024

•

upvoted 2 papers 4 months ago

Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5, 2025 • 51

StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation

Paper • 2508.08248 • Published Aug 11, 2025 • 27

upvoted an article 6 months ago

Article

Efficient MultiModal Data Pipeline

Jul 8, 2025

•

upvoted an article 7 months ago

Article

GRPO for GUI Grounding Done Right

Jun 11, 2025

•

upvoted a paper 8 months ago

LightLab: Controlling Light Sources in Images with Diffusion Models

Paper • 2505.09608 • Published May 14, 2025 • 36

upvoted a collection 8 months ago

ViRFT Datasets

Collection

ViRFT Datasets • 8 items • Updated Feb 24, 2025 • 9

upvoted a paper 8 months ago

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Paper • 2504.06958 • Published Apr 9, 2025 • 13

upvoted 2 papers 9 months ago

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published Apr 22, 2025 • 36

Vidi: Large Multimodal Models for Video Understanding and Editing

Paper • 2504.15681 • Published Apr 22, 2025 • 14

Manan Shah

AI & ML interests

Recent Activity

Organizations

cs-mshah's activity

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

We Got Claude to Fine-Tune an Open Source LLM

Continuous batching from first principles

Metric and Relative Monocular Depth Estimation: An Overview. Fine-Tuning Depth Anything V2 👐 📚

Efficient MultiModal Data Pipeline

GRPO for GUI Grounding Done Right