2 23 29

Yiping Wang

ypwang61

https://ypwang61.github.io/

AI & ML interests

machine learning

Recent Activity

liked a dataset 16 days ago

siegelz/core-bench

upvoted a paper about 1 month ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

upvoted a paper about 1 month ago

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

View all activity

Organizations

None yet

liked a dataset 16 days ago

siegelz/core-bench

Preview • Updated Oct 1, 2024 • 324 • 8

upvoted 2 papers about 1 month ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 93

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Paper • 2511.19399 • Published Nov 24, 2025 • 60

upvoted a paper about 2 months ago

RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

Paper • 2511.07317 • Published Nov 10, 2025 • 15

liked 2 models about 2 months ago

hamishivi/OpenThinker3-1.5B-RLVE

Text Generation • 2B • Updated Nov 11, 2025 • 58 • 2

hamishivi/Nemotron-Research-Reasoning-Qwen-1.5B-v2-RLVE

Text Generation • 2B • Updated Nov 11, 2025 • 8 • 2

updated a collection about 2 months ago

One-Shot RLVR

Collection

Collections of models and papers for works: "Reinforcement Learning for Reasoning in Large Language Models with One Training Example" • 24 items • Updated 21 days ago • 1

liked a model 2 months ago

deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

Text Generation • 8B • Updated May 29, 2025 • 471k • • 1.01k

upvoted an article 2 months ago

Article

OpenReasoning-Nemotron: A Family of State-of-the-Art Distilled Reasoning Models

Jul 18, 2025

•

upvoted 2 papers 3 months ago

GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published Oct 1, 2025 • 89

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees

Paper • 2503.08893 • Published Mar 11, 2025 • 6

upvoted a collection 4 months ago

RecA

Collection

Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning! • 8 items • Updated Sep 22, 2025 • 14

upvoted 2 papers 4 months ago

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published Sep 2, 2025 • 83

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published Sep 1, 2025 • 76

Yiping Wang

AI & ML interests

Recent Activity

Organizations

ypwang61's activity

OpenReasoning-Nemotron: A Family of State-of-the-Art Distilled Reasoning Models