-
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 18 -
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Paper • 2404.03648 • Published • 30 -
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 33 -
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Paper • 2405.19888 • Published • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2410.18603
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 9 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 48 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 72 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 38
-
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant
Paper • 2410.18603 • Published • 32 -
Agent S: An Open Agentic Framework that Uses Computers Like a Human
Paper • 2410.08164 • Published • 26 -
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 51
-
Benchmarking Agentic Workflow Generation
Paper • 2410.07869 • Published • 29 -
GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI
Paper • 2409.01392 • Published • 9 -
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows
Paper • 2409.17433 • Published • 9 -
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 34
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 68 -
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
Paper • 2408.06292 • Published • 126 -
MALT: Improving Reasoning with Multi-Agent LLM Training
Paper • 2412.01928 • Published • 45 -
AgentInstruct: Toward Generative Teaching with Agentic Flows
Paper • 2407.03502 • Published • 51
-
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 72 -
Tree Search for Language Model Agents
Paper • 2407.01476 • Published • 1 -
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper • 2401.10935 • Published • 5 -
OmniParser for Pure Vision Based GUI Agent
Paper • 2408.00203 • Published • 25
-
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant
Paper • 2410.18603 • Published • 32 -
A Survey of Small Language Models
Paper • 2410.20011 • Published • 46 -
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Paper • 2410.21220 • Published • 11 -
o1-Coder: an o1 Replication for Coding
Paper • 2412.00154 • Published • 44
-
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Paper • 2410.02740 • Published • 54 -
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Paper • 2410.01215 • Published • 39 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 121 -
EuroLLM: Multilingual Language Models for Europe
Paper • 2409.16235 • Published • 29
-
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration
Paper • 2310.00280 • Published • 3 -
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models
Paper • 2311.09278 • Published • 7 -
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper • 2402.07456 • Published • 46 -
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper • 2401.10935 • Published • 5
-
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 18 -
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Paper • 2404.03648 • Published • 30 -
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 33 -
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Paper • 2405.19888 • Published • 7
-
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 68 -
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
Paper • 2408.06292 • Published • 126 -
MALT: Improving Reasoning with Multi-Agent LLM Training
Paper • 2412.01928 • Published • 45 -
AgentInstruct: Toward Generative Teaching with Agentic Flows
Paper • 2407.03502 • Published • 51
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 9 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 48 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 72 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 38
-
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 72 -
Tree Search for Language Model Agents
Paper • 2407.01476 • Published • 1 -
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper • 2401.10935 • Published • 5 -
OmniParser for Pure Vision Based GUI Agent
Paper • 2408.00203 • Published • 25
-
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant
Paper • 2410.18603 • Published • 32 -
Agent S: An Open Agentic Framework that Uses Computers Like a Human
Paper • 2410.08164 • Published • 26 -
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 51
-
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant
Paper • 2410.18603 • Published • 32 -
A Survey of Small Language Models
Paper • 2410.20011 • Published • 46 -
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Paper • 2410.21220 • Published • 11 -
o1-Coder: an o1 Replication for Coding
Paper • 2412.00154 • Published • 44
-
Benchmarking Agentic Workflow Generation
Paper • 2410.07869 • Published • 29 -
GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI
Paper • 2409.01392 • Published • 9 -
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows
Paper • 2409.17433 • Published • 9 -
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 34
-
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Paper • 2410.02740 • Published • 54 -
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Paper • 2410.01215 • Published • 39 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 121 -
EuroLLM: Multilingual Language Models for Europe
Paper • 2409.16235 • Published • 29
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration
Paper • 2310.00280 • Published • 3 -
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models
Paper • 2311.09278 • Published • 7 -
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper • 2402.07456 • Published • 46 -
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper • 2401.10935 • Published • 5