SaffalPoosh commited on
Commit
fdd3c4b
·
verified ·
1 Parent(s): f40cfa2

Update README.md

Browse files

added code for inference

Files changed (1) hide show
  1. README.md +85 -0
README.md CHANGED
@@ -9,6 +9,9 @@ tags:
9
  - transformers
10
  - trl
11
  - unsloth
 
 
 
12
  ---
13
 
14
  # Model Card for Model ID
@@ -16,6 +19,88 @@ tags:
16
  <!-- Provide a quick summary of what the model is/does. -->
17
 
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  ## Model Details
21
 
 
9
  - transformers
10
  - trl
11
  - unsloth
12
+ license: apache-2.0
13
+ datasets:
14
+ - open-r1/codeforces-cots
15
  ---
16
 
17
  # Model Card for Model ID
 
19
  <!-- Provide a quick summary of what the model is/does. -->
20
 
21
 
22
+ this is qlora adapter trained on the CPP coding tasks and its trained for reasoning based generation.
23
+
24
+
25
+
26
+ ```python
27
+
28
+ example_problem = """
29
+ A robot is situated at the top-left corner of an m x n grid. The robot can only move either down or right at any point in time. It wants to reach the bottom-right corner of the grid. Some cells in the grid are blocked by obstacles. How many unique paths can the robot take to reach the destination?
30
+ Constraints:
31
+ Time limit per test: 2.0 seconds
32
+ Memory limit per test: 256.0 megabytes
33
+ 1 ≤ m, n ≤ 100
34
+ Grid cells are either 0 (empty) or 1 (obstacle).
35
+ Input Format:
36
+ The first line contains two integers m and n — the dimensions of the grid.
37
+ The next m lines each contain n integers (0 or 1) representing the grid.
38
+ Output Format:
39
+ Print a single integer — the number of unique paths.
40
+ Example:
41
+ ```input
42
+ 3 3
43
+ 0 0 0
44
+ 0 1 0
45
+ 0 0 0
46
+ ```
47
+ """
48
+ from unsloth import FastLanguageModel
49
+ from transformers import TextStreamer
50
+
51
+ model_path = "SaffalPoosh/reasoning_cpp_llm"
52
+
53
+ max_seq_length = 16000
54
+ dtype = None
55
+ load_in_4bit = True
56
+
57
+
58
+
59
+
60
+ model, tokenizer = FastLanguageModel.from_pretrained(
61
+ model_name=model_path,
62
+ max_seq_length=max_seq_length,
63
+ dtype=dtype,
64
+ load_in_4bit=load_in_4bit,
65
+ local_files_only=False
66
+ )
67
+
68
+ # this will download the base model and then patch by applying the lora adapters
69
+
70
+
71
+ from transformers import TextIteratorStreamer
72
+ FastLanguageModel.for_inference(model)
73
+ from threading import Thread
74
+ # Prepare Input Data
75
+ input_text = example_problem
76
+ inputs = tokenizer(input_text, return_tensors="pt")
77
+ inputs = {k:v.to("cuda") for k,v in inputs.items()}
78
+ # Initialize the text streamer
79
+ text_streamer = TextIteratorStreamer(tokenizer, skip_special_tokens=False)
80
+
81
+ # Perform Inference
82
+ # _ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=8000)
83
+
84
+ stream_catcher = Thread(target=model.generate, kwargs={**inputs, "do_sample": True, "streamer": text_streamer,
85
+ # "eos_token_id": tokenizer.eos_token_id,
86
+
87
+ "max_new_tokens": 10000})
88
+ stream_catcher.start()
89
+
90
+ with open("output.txt", "w") as f:
91
+ for token in text_streamer:
92
+ print(token, end="", flush=True)
93
+ f.write(token)
94
+ stream_catcher.join()
95
+
96
+ ```
97
+
98
+ the `output.txt` file shows the output of generation.
99
+
100
+
101
+
102
+
103
+
104
 
105
  ## Model Details
106