SaffalPoosh
/

reasoning_cpp_llm

Text Generation

Model card Files Files and versions

SaffalPoosh commited on Sep 2

Commit

fdd3c4b

·

verified ·

1 Parent(s): f40cfa2

Update README.md

added code for inference

Files changed (1) hide show

README.md +85 -0

README.md CHANGED Viewed

@@ -9,6 +9,9 @@ tags:
 - transformers
 - trl
 - unsloth
 ---
 # Model Card for Model ID
@@ -16,6 +19,88 @@ tags:
 <!-- Provide a quick summary of what the model is/does. -->
 ## Model Details

 - transformers
 - trl
 - unsloth
+license: apache-2.0
+datasets:
+- open-r1/codeforces-cots
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
+this is qlora adapter trained on the CPP coding tasks and its trained for reasoning based generation.
+```python
+example_problem = """
+A robot is situated at the top-left corner of an m x n grid. The robot can only move either down or right at any point in time. It wants to reach the bottom-right corner of the grid. Some cells in the grid are blocked by obstacles. How many unique paths can the robot take to reach the destination?
+Constraints:
+Time limit per test: 2.0 seconds
+Memory limit per test: 256.0 megabytes
+1 ≤ m, n ≤ 100
+Grid cells are either 0 (empty) or 1 (obstacle).
+Input Format:
+The first line contains two integers m and n — the dimensions of the grid.
+The next m lines each contain n integers (0 or 1) representing the grid.
+Output Format:
+Print a single integer — the number of unique paths.
+Example:
+```input
+3 3
+0 0 0
+0 1 0
+0 0 0
+```
+"""
+from unsloth import FastLanguageModel
+from transformers import TextStreamer
+model_path = "SaffalPoosh/reasoning_cpp_llm"
+max_seq_length = 16000
+dtype = None
+load_in_4bit = True
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name=model_path,
+    max_seq_length=max_seq_length,
+    dtype=dtype,
+    load_in_4bit=load_in_4bit,
+    local_files_only=False
+)
+# this will download the base model and then patch by applying the lora adapters
+from transformers import TextIteratorStreamer
+FastLanguageModel.for_inference(model)
+from threading import Thread
+# Prepare Input Data
+input_text = example_problem
+inputs = tokenizer(input_text, return_tensors="pt")
+inputs = {k:v.to("cuda") for k,v in inputs.items()}
+# Initialize the text streamer
+text_streamer = TextIteratorStreamer(tokenizer, skip_special_tokens=False)
+# Perform Inference
+# _ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=8000)
+stream_catcher = Thread(target=model.generate, kwargs={**inputs, "do_sample": True, "streamer": text_streamer,
+        # "eos_token_id": tokenizer.eos_token_id,
+        "max_new_tokens": 10000})
+stream_catcher.start()
+with open("output.txt", "w") as f:
+    for token in text_streamer:
+        print(token, end="", flush=True)
+        f.write(token)
+stream_catcher.join()
+```
+  the `output.txt` file shows the output of generation.
 ## Model Details