SaffalPoosh
/

reasoning_cpp_llm

@@ -19,44 +19,58 @@ datasets:
 <!-- Provide a quick summary of what the model is/does. -->
-this is qlora adapter trained on the CPP coding tasks and its trained for reasoning based generation.
-```python
 example_problem = """
 A robot is situated at the top-left corner of an m x n grid. The robot can only move either down or right at any point in time. It wants to reach the bottom-right corner of the grid. Some cells in the grid are blocked by obstacles. How many unique paths can the robot take to reach the destination?
 Constraints:
 Time limit per test: 2.0 seconds
 Memory limit per test: 256.0 megabytes
 1 ≤ m, n ≤ 100
 Grid cells are either 0 (empty) or 1 (obstacle).
 Input Format:
 The first line contains two integers m and n — the dimensions of the grid.
 The next m lines each contain n integers (0 or 1) representing the grid.
 Output Format:
 Print a single integer — the number of unique paths.
 Example:
-```input
 3 3
 0 0 0
 0 1 0
 0 0 0
-```
 """
 from unsloth import FastLanguageModel
 from transformers import TextStreamer
 model_path = "SaffalPoosh/reasoning_cpp_llm"
 max_seq_length = 16000
 dtype = None
 load_in_4bit = True
 model, tokenizer = FastLanguageModel.from_pretrained(
     model_name=model_path,
     max_seq_length=max_seq_length,
@@ -65,40 +79,82 @@ model, tokenizer = FastLanguageModel.from_pretrained(
     local_files_only=False
 )
-# this will download the base model and then patch by applying the lora adapters
-from transformers import TextIteratorStreamer
 FastLanguageModel.for_inference(model)
-from threading import Thread
 # Prepare Input Data
 input_text = example_problem
 inputs = tokenizer(input_text, return_tensors="pt")
-inputs = {k:v.to("cuda") for k,v in inputs.items()}
 # Initialize the text streamer
 text_streamer = TextIteratorStreamer(tokenizer, skip_special_tokens=False)
-# Perform Inference
-# _ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=8000)
-stream_catcher = Thread(target=model.generate, kwargs={**inputs, "do_sample": True, "streamer": text_streamer,
-        # "eos_token_id": tokenizer.eos_token_id,
-        "max_new_tokens": 10000})
 stream_catcher.start()
 with open("output.txt", "w") as f:
     for token in text_streamer:
         print(token, end="", flush=True)
         f.write(token)
-stream_catcher.join()
 ```
-  the `output.txt` file shows the output of generation.

 <!-- Provide a quick summary of what the model is/does. -->
+# Model Card for SaffalPoosh/reasoning_cpp_llm
+<!-- Provide a quick summary of what the model is/does. -->
+This is a QLoRA adapter trained on C++ coding tasks and designed for reasoning-based code generation. The model specializes in solving algorithmic problems with step-by-step reasoning and generating optimized C++ solutions.
+## Example Usage
+### Problem Example
+```python
 example_problem = """
 A robot is situated at the top-left corner of an m x n grid. The robot can only move either down or right at any point in time. It wants to reach the bottom-right corner of the grid. Some cells in the grid are blocked by obstacles. How many unique paths can the robot take to reach the destination?
 Constraints:
 Time limit per test: 2.0 seconds
 Memory limit per test: 256.0 megabytes
 1 ≤ m, n ≤ 100
 Grid cells are either 0 (empty) or 1 (obstacle).
 Input Format:
 The first line contains two integers m and n — the dimensions of the grid.
 The next m lines each contain n integers (0 or 1) representing the grid.
 Output Format:
 Print a single integer — the number of unique paths.
 Example:
+Input:
 3 3
 0 0 0
 0 1 0
 0 0 0
 """
+```
+### Model Loading and Inference
+```python
 from unsloth import FastLanguageModel
 from transformers import TextStreamer
+from transformers import TextIteratorStreamer
+from threading import Thread
+# Model configuration
 model_path = "SaffalPoosh/reasoning_cpp_llm"
 max_seq_length = 16000
 dtype = None
 load_in_4bit = True
+# Load model and tokenizer
 model, tokenizer = FastLanguageModel.from_pretrained(
     model_name=model_path,
     max_seq_length=max_seq_length,
     local_files_only=False
 )
+# This will download the base model and then patch by applying the LoRA adapters
 FastLanguageModel.for_inference(model)
 # Prepare Input Data
 input_text = example_problem
 inputs = tokenizer(input_text, return_tensors="pt")
+inputs = {k: v.to("cuda") for k, v in inputs.items()}
 # Initialize the text streamer
 text_streamer = TextIteratorStreamer(tokenizer, skip_special_tokens=False)
+# Perform Inference with streaming
+stream_catcher = Thread(
+    target=model.generate,
+    kwargs={
+        **inputs,
+        "do_sample": True,
+        "streamer": text_streamer,
+        "max_new_tokens": 10000
+    }
+)
 stream_catcher.start()
+# Stream output to console and file
 with open("output.txt", "w") as f:
     for token in text_streamer:
         print(token, end="", flush=True)
         f.write(token)
+stream_catcher.join()
 ```
+## Model Details
+- **Model Type**: QLoRA Fine-tuned Language Model
+- **Base Model**: [Specify base model if known]
+- **Training Focus**: C++ algorithmic problem solving with reasoning
+- **Max Sequence Length**: 16,000 tokens
+- **Quantization**: 4-bit loading supported
+- **Hardware Requirements**: CUDA-compatible GPU recommended
+## Training Details
+- **Training Method**: QLoRA (Quantized Low-Rank Adaptation)
+- **Dataset**: C++ coding tasks with reasoning annotations
+- **Task Type**: Code generation with step-by-step reasoning
+- **Optimization**: Focused on algorithmic problem solving
+## Usage Notes
+- The model generates reasoning-based solutions for C++ programming problems
+- Supports streaming inference for real-time output
+- The `output.txt` file contains the complete generated solution
+- Designed to handle competitive programming style problems with constraints
+## Output Format
+The model typically generates:
+1. Problem analysis and reasoning
+2. Algorithm explanation
+3. Complete C++ implementation
+4. Time and space complexity analysis
+## Requirements
+```python
+pip install unsloth transformers torch
+```
+## Hardware Requirements
+- **GPU**: CUDA-compatible GPU (recommended)
+- **Memory**: Sufficient VRAM for 4-bit quantized model
+- **Storage**: Space for base model download and adapter weights
+-