Instructions to use FINAL-Bench/Darwin-27B-Opus with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FINAL-Bench/Darwin-27B-Opus with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="FINAL-Bench/Darwin-27B-Opus") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("FINAL-Bench/Darwin-27B-Opus") model = AutoModelForImageTextToText.from_pretrained("FINAL-Bench/Darwin-27B-Opus") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use FINAL-Bench/Darwin-27B-Opus with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FINAL-Bench/Darwin-27B-Opus" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/Darwin-27B-Opus", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/FINAL-Bench/Darwin-27B-Opus
- SGLang
How to use FINAL-Bench/Darwin-27B-Opus with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FINAL-Bench/Darwin-27B-Opus" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/Darwin-27B-Opus", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FINAL-Bench/Darwin-27B-Opus" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/Darwin-27B-Opus", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use FINAL-Bench/Darwin-27B-Opus with Docker Model Runner:
docker model run hf.co/FINAL-Bench/Darwin-27B-Opus
Great work
I really like both 35B and 27B Darwin models and i must say they perform better than both parents as your results suggest, more suprising thing is, that distills didnt work well for me but those models do.
Have you considered contacting some renown quantizers like @bartowski to provide high quality GGUFs and maybe some specialized quants? I would love to see mixed precision quants from @ubergarm and @steampunque because especially qwen3.5 family Q4-Q6 quants are outpeforming traditional quants when mixing high precision input/output layers.
Also i would like to point to QwenPaw Flash 9B model, its qwen3.5 9B fine tune thats outpeforming base model a lot and may be a nice model to work with in your mother/father scenarios. With this model i stumbled upon this adapter thats supposed to do very well, but couldnt try it due to lack of ggufs: https://huggingface.co/jason1966/CoPaw-Flash-9B-DataAnalyst-LoRA
I really like both 35B and 27B Darwin models and i must say they perform better than both parents as your results suggest, more suprising thing is, that distills didnt work well for me but those models do.
Have you considered contacting some renown quantizers like @bartowski to provide high quality GGUFs and maybe some specialized quants? I would love to see mixed precision quants from @ubergarm and @steampunque because especially qwen3.5 family Q4-Q6 quants are outpeforming traditional quants when mixing high precision input/output layers.
Also i would like to point to QwenPaw Flash 9B model, its qwen3.5 9B fine tune thats outpeforming base model a lot and may be a nice model to work with in your mother/father scenarios. With this model i stumbled upon this adapter thats supposed to do very well, but couldnt try it due to lack of ggufs: https://huggingface.co/jason1966/CoPaw-Flash-9B-DataAnalyst-LoRA
Thank you very much for the thoughtful feedback. I really appreciate it.
I am especially glad to hear that the Darwin 35B and 27B models worked well for you, and it is very encouraging to hear that they felt stronger than the parent models in actual use.
Also, bartowski has already kindly created and released a full GGUF quant set for Darwin-35B-A3B-Opus, which I am very grateful for:
https://huggingface.co/bartowski/FINAL-Bench_Darwin-35B-A3B-Opus-GGUF
Your point about high-quality specialized quants and mixed-precision approaches is also very valuable. I agree that this direction is important, especially for models in the Qwen3.5 family.
And thank you as well for mentioning QwenPaw Flash 9B. It sounds very interesting, and I will definitely test it as a possible parent candidate in future Darwin experiments.
I truly appreciate your support, suggestions, and careful testing.