Text Generation
Transformers
Safetensors
PyTorch
nvidia
conversational

Update tool parser scripts for vLLM v.0.15.0

#6
by juanjucm HF Staff - opened

Description

When using the tool-calling deployment snippet with vLLM v0.15.0:

git clone https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2

vllm serve nvidia/NVIDIA-Nemotron-Nano-12B-v2 \
  --trust-remote-code \
  --mamba_ssm_cache_dtype float32 \
  --enable-auto-tool-choice \
  --tool-parser-plugin "NVIDIA-Nemotron-Nano-12B-v2/nemotron_toolcall_parser_no_streaming.py" \
  --tool-call-parser "nemotron_json"

there are several import errors (e.g. ModuleNotFoundError: No module named 'vllm.entrypoints.openai.protocol') that makes the deployment fail.

This PR removes unused imports as well as updates paths to fix import errors so it can run with vLLM v0.15.0.

Pinning vLLM version as pip install "vllm>=0.10.1,<0.13.0" will work fine with the current scripts. However, if installing latest version (i.e. v0.15.0) as recommended in the model card, the deployment will fail. Considering this, another option instead of merging this PR would be to update the provided installation snippet to the aforementioned one, making sure latest version of vLLM is not installed.

juanjucm changed pull request title from fix-tool-call-scripts to Fix tool parser scripts
juanjucm changed pull request status to open

When validated and merged, same changes should be applied to nvidia/NVIDIA-Nemotron-Nano-9B-v2.

juanjucm changed pull request title from Fix tool parser scripts to Fix tool parser scripts for vLLM v.0.15.0
juanjucm changed pull request title from Fix tool parser scripts for vLLM v.0.15.0 to Update tool parser scripts for vLLM v.0.15.0
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment