| | --- |
| | library_name: transformers |
| | license: mit |
| | base_model: google/gemma-2-2b |
| | tags: |
| | - llama-factory |
| | - full |
| | - generated_from_trainer |
| | model-index: |
| | - name: GraphMind-Gemma2-2B |
| | results: [] |
| | --- |
| | |
| |
|
| | # Model Card for GraphMind Series |
| |
|
| | This model card describes the **GraphMind** series of models, which are Large Language Models (LLMs) enhanced for generalized reasoning through continued pre-training on graph-based problems. |
| |
|
| | ## Model Description |
| |
|
| | GraphMind is a series of Large Language Models developed to improve the generalized reasoning capabilities of existing base models. |
| | The core innovation is the continued pre-training (CPT) on **GraphPile**, a large-scale 10.9 billion token dataset specifically designed with Graph Problem Reasoning (GPR) data. |
| |
|
| | By training on diverse and complex graph problems—which require sophisticated logical, topological, and relational reasoning—GraphMind models learn more robust and transferable reasoning patterns. |
| | This approach bridges the gap between domain-specific training (e.g., mathematics) and the need for universally capable and adaptable LLMs. |
| |
|
| | The GraphMind series is built upon three popular open-source models: |
| |
|
| | * Llama 3 |
| | * Llama 3.1 |
| | * Gemma 2 |
| |
|
| | ## Key Features |
| |
|
| | - **Enhanced General Reasoning**: Significant gains not only on graph-related tasks but also across mathematical, logical, commonsense, and code reasoning benchmarks. |
| | - **Superior Performance on Graph Problems**: Thanks to the GraphPile corpus, the models excel at tasks involving graph theory, such as pathfinding, network analysis, and topological sorting. |
| | - **Strong Transfer Learning**: Reasoning skills acquired from graph problems effectively transfer to other domains. |
| | - **Excellent Post-Training Potential**: Stronger foundation for fine-tuning on downstream tasks. For instance, the Gemma-based GraphMind fine-tuned on GSM8K achieves **23.6% higher accuracy** than its fine-tuned base model. |
| |
|
| | ## Performance |
| |
|
| | GraphMind models show consistent improvements over their base models across reasoning benchmarks. |
| |
|
| | **Generalization Improvements**: |
| |
|
| | - **Mathematical Reasoning**: up to **4.9%** average improvement across 11 datasets. |
| | - **Logical Reasoning**: **33.4%** improvement. |
| | - **Code Reasoning**: **46.3%** improvement. |
| | - **Commonsense Reasoning**: **7.8%** improvement. |
| | - **Multi-Hop QA**: **10.3%** improvement. |
| |
|
| | **Foundational Improvements**: |
| |
|
| | - **Graph Problem Reasoning**: Average improvement of **53.1%** compared to baseline models. |
| |
|
| | ## Training Data: The GraphPile Corpus |
| |
|
| | GraphMind's capabilities are derived from its training on **GraphPile**, the first large-scale corpus designed for continued pre-training using Graph Problem Reasoning data. |
| |
|
| | **Statistics**: |
| |
|
| | - **Total Tokens**: 10.9 Billion |
| | - **Total Samples**: 2.68 Million |
| | - **Graph Tasks**: 23 distinct tasks covering multiple reasoning paradigms |
| |
|
| | **Data Components**: |
| |
|
| | 1. **Chain-of-Thought (CoT) Data**: Step-by-step reasoning processes for graph problems, generated using program-guided methods. |
| | 2. **Program-of-Thought (PoT) Data**: Executable code solutions for graph problems, often derived from standard libraries. |
| | 3. **Trace-of-Execution (ToE) Data**: Records execution traces of graph algorithms, enabling learning from dynamic algorithmic processes. |
| | 4. **Real-world Graph Data**: Includes tasks from sources like DBpedia and DBLP, enriching the dataset with practical contexts. |
| |
|
| | ## Training Procedure |
| |
|
| | The GraphMind models were developed by performing continued pre-training on the GraphPile dataset. |
| |
|
| | * **Base Models**: Llama-3-8B, Llama-3.1-8B, Gemma-2-2B |
| | * **Learning Rate**: 3e-5 |
| | * **Epochs**: 3 |
| | * **Max Sequence Length**: 8192 |
| | * **Global Batch Size**: 1024 |
| | * **Hardware**: 32 × NVIDIA H100 GPUs |
| |
|
| | ## Intended Use and Limitations |
| |
|
| | ### Intended Use |
| |
|
| | These models are intended for use in research and development for tasks that demand strong, generalized reasoning. Potential applications include: |
| |
|
| | * Solving complex logical and mathematical problems. |
| | * Algorithmic reasoning and code generation for graph-related tasks. |
| | * Serving as powerful base models for fine-tuning on reasoning-intensive downstream tasks. |
| |
|
| | ### Limitations |
| |
|
| | * GraphPile is limited to 23 graph problem tasks; more diversity could improve results. |
| | * As reasoning-focused models, GraphMind may perform worse on simpler, non-reasoning tasks such as summarization or translation. |
| | * Further exploration of different GraphPile configurations could yield additional gains. |
| |
|
| | ## Available Models |
| |
|
| | * **HKUST-DSAIL/GraphMind-Gemma2-2B** |
| | * **HKUST-DSAIL/GraphMind-LLAMA-3.1-8B** |
| | * **HKUST-DSAIL/GraphMind-LLAMA-3-8B** |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{zhang2025improving, |
| | title={Improving LLMs' Generalized Reasoning Abilities by Graph Problems}, |
| | author={Qifan Zhang and Nuo Chen and Zehua Li and Miao Peng and Jing Tang and Jia Li}, |
| | year={2025}, |
| | eprint={2507.17168}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.AI}, |
| | url={https://arxiv.org/abs/2507.17168v1} |
| | } |
| | ``` |