Update README.md

88370c2 verified 6 months ago

5 kB

	---
	library_name: transformers
	license: mit
	base_model: google/gemma-2-2b
	tags:
	- llama-factory
	- full
	- generated_from_trainer
	model-index:
	- name: GraphMind-Gemma2-2B
	results: []
	---


	# Model Card for GraphMind Series

	This model card describes the GraphMind series of models, which are Large Language Models (LLMs) enhanced for generalized reasoning through continued pre-training on graph-based problems.

	## Model Description

	GraphMind is a series of Large Language Models developed to improve the generalized reasoning capabilities of existing base models.
	The core innovation is the continued pre-training (CPT) on GraphPile, a large-scale 10.9 billion token dataset specifically designed with Graph Problem Reasoning (GPR) data.

	By training on diverse and complex graph problems—which require sophisticated logical, topological, and relational reasoning—GraphMind models learn more robust and transferable reasoning patterns.
	This approach bridges the gap between domain-specific training (e.g., mathematics) and the need for universally capable and adaptable LLMs.

	The GraphMind series is built upon three popular open-source models:

	* Llama 3
	* Llama 3.1
	* Gemma 2

	## Key Features

	- Enhanced General Reasoning: Significant gains not only on graph-related tasks but also across mathematical, logical, commonsense, and code reasoning benchmarks.
	- Superior Performance on Graph Problems: Thanks to the GraphPile corpus, the models excel at tasks involving graph theory, such as pathfinding, network analysis, and topological sorting.
	- Strong Transfer Learning: Reasoning skills acquired from graph problems effectively transfer to other domains.
	- Excellent Post-Training Potential: Stronger foundation for fine-tuning on downstream tasks. For instance, the Gemma-based GraphMind fine-tuned on GSM8K achieves 23.6% higher accuracy than its fine-tuned base model.

	## Performance

	GraphMind models show consistent improvements over their base models across reasoning benchmarks.

	Generalization Improvements:

	- Mathematical Reasoning: up to 4.9% average improvement across 11 datasets.
	- Logical Reasoning: 33.4% improvement.
	- Code Reasoning: 46.3% improvement.
	- Commonsense Reasoning: 7.8% improvement.
	- Multi-Hop QA: 10.3% improvement.

	Foundational Improvements:

	- Graph Problem Reasoning: Average improvement of 53.1% compared to baseline models.

	## Training Data: The GraphPile Corpus

	GraphMind's capabilities are derived from its training on GraphPile, the first large-scale corpus designed for continued pre-training using Graph Problem Reasoning data.

	Statistics:

	- Total Tokens: 10.9 Billion
	- Total Samples: 2.68 Million
	- Graph Tasks: 23 distinct tasks covering multiple reasoning paradigms

	Data Components:

	1. Chain-of-Thought (CoT) Data: Step-by-step reasoning processes for graph problems, generated using program-guided methods.
	2. Program-of-Thought (PoT) Data: Executable code solutions for graph problems, often derived from standard libraries.
	3. Trace-of-Execution (ToE) Data: Records execution traces of graph algorithms, enabling learning from dynamic algorithmic processes.
	4. Real-world Graph Data: Includes tasks from sources like DBpedia and DBLP, enriching the dataset with practical contexts.

	## Training Procedure

	The GraphMind models were developed by performing continued pre-training on the GraphPile dataset.

	* Base Models: Llama-3-8B, Llama-3.1-8B, Gemma-2-2B
	* Learning Rate: 3e-5
	* Epochs: 3
	* Max Sequence Length: 8192
	* Global Batch Size: 1024
	* Hardware: 32 × NVIDIA H100 GPUs

	## Intended Use and Limitations

	### Intended Use

	These models are intended for use in research and development for tasks that demand strong, generalized reasoning. Potential applications include:

	* Solving complex logical and mathematical problems.
	* Algorithmic reasoning and code generation for graph-related tasks.
	* Serving as powerful base models for fine-tuning on reasoning-intensive downstream tasks.

	### Limitations

	* GraphPile is limited to 23 graph problem tasks; more diversity could improve results.
	* As reasoning-focused models, GraphMind may perform worse on simpler, non-reasoning tasks such as summarization or translation.
	* Further exploration of different GraphPile configurations could yield additional gains.

	## Available Models

	* HKUST-DSAIL/GraphMind-Gemma2-2B
	* HKUST-DSAIL/GraphMind-LLAMA-3.1-8B
	* HKUST-DSAIL/GraphMind-LLAMA-3-8B

	## Citation

	```bibtex
	@misc{zhang2025improving,
	title={Improving LLMs' Generalized Reasoning Abilities by Graph Problems},
	author={Qifan Zhang and Nuo Chen and Zehua Li and Miao Peng and Jing Tang and Jia Li},
	year={2025},
	eprint={2507.17168},
	archivePrefix={arXiv},
	primaryClass={cs.AI},
	url={https://arxiv.org/abs/2507.17168v1}
	}
	```