LLM4Math
updated
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Paper
•
2510.04721
•
Published
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language
Models
Paper
•
2505.02735
•
Published
•
33
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Paper
•
2504.18428
•
Published
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Paper
•
2502.10197
•
Published
TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts
Paper
•
2407.03203
•
Published
•
12
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
Paper
•
2503.21934
•
Published
Solving Inequality Proofs with Large Language Models
Paper
•
2506.07927
•
Published
•
20
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal
Reasoning
Paper
•
2505.05758
•
Published
MathBench: Evaluating the Theory and Application Proficiency of LLMs
with a Hierarchical Mathematics Benchmark
Paper
•
2405.12209
•
Published
ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark
Paper
•
2505.23851
•
Published
Theorem Prover as a Judge for Synthetic Data Generation
Paper
•
2502.13137
•
Published
•
1
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural
Language and Reinforcement Learning
Paper
•
2505.23754
•
Published
•
15
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale
Synthetic Data
Paper
•
2405.14333
•
Published
•
42
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models
Paper
•
2511.11134
•
Published
•
31