Instructions to use Salesforce/codegen-16B-multi with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Salesforce/codegen-16B-multi with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Salesforce/codegen-16B-multi")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen-16B-multi")
model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-16B-multi")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Salesforce/codegen-16B-multi with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Salesforce/codegen-16B-multi"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/codegen-16B-multi",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Salesforce/codegen-16B-multi

SGLang

How to use Salesforce/codegen-16B-multi with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Salesforce/codegen-16B-multi" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/codegen-16B-multi",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Salesforce/codegen-16B-multi" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/codegen-16B-multi",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Salesforce/codegen-16B-multi with Docker Model Runner:
```
docker model run hf.co/Salesforce/codegen-16B-multi
```

fine tune memory?

by glicerico - opened Mar 26, 2023

Discussion

glicerico

Mar 26, 2023

I am trying to fine tune this model using deepspeed, as suggested in the model's repo: https://github.com/salesforce/jaxformer#a100-fine-tune
I have tried on up to 4 x A100 with a total of 360GB of RAM, but every time my training crashes before starting, after the memory gets fully used (monitored with htop).
How much memory do I need to fine tune this?

enijkamp

Salesforce AI Research org Mar 26, 2023

Here is a configuration for deepspeed, which should fit on a single A100 with CPU offloading, however, this may be slow:
https://github.com/salesforce/jaxformer/blob/main/jaxformer/hf/train.py

glicerico

Mar 26, 2023

thanks for replying @enijkamp . This is exactly what I am trying to use (with my own training data, a longer run, and saving checkpoints), but as I say above, loading the model uses more than 360GB of RAM.
I am not sure if I am activating CPU offloading, though... I suppose the default params in that file are enough?

glicerico

Mar 26, 2023

@enijkamp I've succeeded fine-tuning using TPU, but unfortunately can't find the 16B model checkpoints for this. I have read from last year issues that you haven't had time to upload sharding patterns... Any update on this?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment