59 29 12

Dhaval Patel

DhavalPatel

dhaval-patel-2b287033

AI & ML interests

None yet

Recent Activity

liked a dataset about 6 hours ago

ArtificialAnalysis/ITBench-AA

commentedon a paper 1 day ago

PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems

liked a dataset 1 day ago

ZLHe0/SenTSR-Bench

View all activity

Organizations

liked a dataset about 6 hours ago

ArtificialAnalysis/ITBench-AA

Viewer • Updated 29 days ago • 40 • 24.6k • 28

commented a paper 1 day ago

PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems

Paper • 2606.22388 • Published 4 days ago • 85 •

liked a dataset 1 day ago

ZLHe0/SenTSR-Bench

Viewer • Updated Feb 23 • 330 • 24 • 3

upvoted a paper 1 day ago

SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning

Paper • 2602.19455 • Published Feb 23 • 1

updated a collection 2 days ago

Enterprise Agents and Benchmarks

Collection

Enterprise agent ecosystem featuring AssetOpsBench (industrial) and ITBench (SRE, FinOps, CISO), CUGA to accelerate AI Automation • 21 items • Updated about 23 hours ago • 17

authored 3 papers 2 days ago

Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows

Paper • 2605.24219 • Published 30 days ago • 9

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

Paper • 2606.12674 • Published 15 days ago • 5

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

Paper • 2606.19704 • Published 7 days ago • 39

upvoted a paper 6 days ago

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

Paper • 2606.19704 • Published 7 days ago • 39

submitted a paper to Daily Papers 6 days ago

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

Paper • 2606.19704 • Published 7 days ago • 39

upvoted a paper 9 days ago

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

Paper • 2606.12674 • Published 15 days ago • 5

New activity in ibm-research/AssetOpsBench 18 days ago

Request for agent's traces

#11 opened 3 months ago by

kyzor

liked a dataset 18 days ago

AnandMayank/QueST-PartNetMobility-SAPIEN

Viewer • Updated 15 days ago • 50.6k • 17.4k • 9

commented on Harness, Scaffold, and the AI Agent Terms Worth Getting Right 22 days ago

Appreciate the nice writeup. Can we add a) Leaderboard, b) Benchmark

https://github.com/IBM/AssetOpsBench

upvoted an article 23 days ago

Article

Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic

ibm-research

•

23 days ago

• 88

New activity in ibm-research/AssetOpsBench 27 days ago

Update data/scenarios/all_utterance.jsonl

#12 opened 27 days ago by

shuxinl

upvoted an article 28 days ago

Article

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

ibm-research

•

28 days ago

• 17

upvoted a paper 29 days ago

Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows

Paper • 2605.24219 • Published 30 days ago • 9

submitted a paper to Daily Papers 29 days ago

Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows

Paper • 2605.24219 • Published 30 days ago • 9

commented a paper 29 days ago

Results and Retrospective Analysis of the CODS 2025 AssetOpsBench Challenge

Paper • 2605.08518 • Published May 8 • 11 •

Dhaval Patel

AI & ML interests

Recent Activity

Organizations

DhavalPatel's activity

Request for agent's traces

Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic

Update data/scenarios/all_utterance.jsonl

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM