Instructions to use Felladrin/Minueza-32M-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Felladrin/Minueza-32M-Base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Felladrin/Minueza-32M-Base")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Felladrin/Minueza-32M-Base")
model = AutoModelForCausalLM.from_pretrained("Felladrin/Minueza-32M-Base")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Felladrin/Minueza-32M-Base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Felladrin/Minueza-32M-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Felladrin/Minueza-32M-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Felladrin/Minueza-32M-Base

SGLang

How to use Felladrin/Minueza-32M-Base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Felladrin/Minueza-32M-Base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Felladrin/Minueza-32M-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Felladrin/Minueza-32M-Base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Felladrin/Minueza-32M-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Felladrin/Minueza-32M-Base with Docker Model Runner:
```
docker model run hf.co/Felladrin/Minueza-32M-Base
```

Transfer learning?

by Zemulax - opened Jul 10, 2024

Discussion

Zemulax

Jul 10, 2024

Hi there, I just wanted to know if you pretrained this model without using gpt or any other model as a boost. like from literal scratch were you did not load any pretraained checkpoint. I need help.
Thanks

Felladrin

Owner Jul 24, 2024

Hi, @Zemulax !

Yes, it was trained from scratch, without using any other model.
I used specifically this command line, listed under "Creating a model on the fly", on Transformers examples:

You can also read more about the making of this model here:
The making of Minueza-32M: Transformer model trained from scratch

Zemulax

Jul 26, 2024

I read your incredible story. Its similar to what I want to achieve.
However, I have 5billion tokens at my fingertips that I want to utilise. I am struggling with lr. How do I set the learning rate, which lr is suitable for my situation. I have done research but still cannot come to a draw. Please help

Felladrin

Owner Aug 10, 2024

Ah, the learning rate...
I believe each dataset has its own unique LR sweet spot.
Before actually starting training the model, I suggest doing a warmup training (using only 10K samples from your dataset) with 4 different LRs and then checking which one provided the best responses. Then you'll have at least a better starting point.
The first four LRs that I try on are: 5e-5, 5e-6, 8e-7, 2e-4.

Zemulax

Aug 15, 2024

Thank you Victor, and oh,how much did it cost you to pretrain, what GPUS did you use and cloud provider.

Felladrin

Owner Aug 17, 2024

I trained Minueza-32M all locally, on a Macbook M1. It took some weeks and I thought I'd have an increase in the electricity bill, but in the end, I didn't notice any difference, so I'd say there were no costs.

Zemulax

Aug 20, 2024

wow awesome. Thank you broh, this has been helpful. I am taking it a step further by pretraining something similar to GPT1 or 2-small. Its quite a journey I must say

Felladrin changed discussion status to closed Oct 22, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment