Run Claude Code, OpenCode & Frontier Coding Models on Your Own AI Infrastructure with DEH
TL;DR
- Build AI on premises with Dell Enterprise Hub (DEH) from Dell Technologies + Hugging Face.
- Claude Code (Anthropic) and OpenCode (open source) both point straight at that DEH endpoint — just set a base URL and a model name. No translation layer or proxy is needed.
- The payoff: a fully air-gapped, data-sovereign coding agent running a frontier open source model on your own Dell PowerEdge platforms — no tokens leave your data center.
How to use this guide: Every command lives in its own code block so you can copy it straight to your terminal. Replace
INFERENCE_HOSTwith your host where your model is deployed.
Why
Pairing a best-in-class agent CLI with open-weight frontier models on your own Dell hardware gives you:
- Data sovereignty — inference happens on local GPUs, behind your firewall.
- Model control — pin a version, fine-tune it, audit it.
- Capacity-based cost — pay for GPUs, not per token.
- Predictable latency — no third-party rate limits.
DEH serves models with vLLM / SGLang, exposing the standard OpenAI REST API (/v1/chat/completions, /v1/models). It also publishes "Goodput" scenarios with per-GPU SLOs so you can size a deployment:
| Scenario | Optimizes for | Good when… |
|---|---|---|
| Balanced | Context vs. concurrency | General day-to-day agentic coding |
| High concurrency | Many parallel requests | A shared team endpoint |
| Long context | Very large windows | Whole-repo / large-file reasoning |
Agentic coding pushes large contexts, so Long context or Balanced on H200/B300 is a strong default for both agents.
Frontier open-source coding models on Dell Enterprise Hub
Here are the frontier open models best suited to coding / agentic software engineering.
Purpose-built / agentic coding models
| Model | Params (total / active) | License | Why it's relevant |
|---|---|---|---|
| Qwen3-Coder-Next | 80B (sparse) | Apache 2.0 | Coding-focused; strong agentic reasoning + tool use; long-context; built for IDE/CLI. |
| GLM 5.1 | 754B MoE | MIT / NVIDIA | Z.ai flagship MoE for agentic engineering, long-horizon coding, repository generation, terminal tasks. |
| Kimi K2.6 | 1T MoE / 32B active | Modified MIT | Native-multimodal MoE for long-horizon coding, agentic workflows, autonomous orchestration. |
| MiniMax M2.7 | ~229–230B MoE | Other / NVIDIA | For complex software engineering, agentic tool use, long-context reasoning. |
| DeepSeek V4 Pro | 1.6T MoE / 49B active, 1M ctx | MIT | Frontier-scale long-context agentic reasoning. |
| DeepSeek V4 Flash | 284B MoE / 13B active, 1M ctx | MIT | Efficient long-context variant for agentic tasks. |
Strong general models with excellent coding ability
| Model | Params | License | Notes |
|---|---|---|---|
| GPT-OSS-120B | 117B | Apache 2.0 | OpenAI open model: production reasoning + agentic tasks; broad platform support. |
| GPT-OSS-20B | 21B | Apache 2.0 | Low-latency/local variant; runs on Dell Pro Max GB10. |
| Trinity Large Thinking | 398B MoE / ~13B active | Apache 2.0 | Reasoning-optimized, native long CoT, strong tool-calling. |
| Mistral Large 3 | 675B MoE / 41B active | Apache 2.0 | SOTA general-purpose multimodal MoE. |
| Qwen3.5 family | 9B / 27B / 397B-A17B | Apache 2.0 | Strong reasoning + multilingual; 27B fits GB10. |
| NVIDIA Nemotron 3 | 30B–120B | NVIDIA Open Model | Agentic + reasoning; Nano 30B runs on GB10. Ultra 550B / Super 120B / Nano 30B |
Quantization: Models ship in BF16 / FP8 / NVFP4. NVFP4 (4-bit, for Blackwell B300/GB10) shrinks memory and boosts throughput at minimal quality loss — ideal for fitting big coding MoEs on fewer GPUs.
How to pick
- Single workstation (Dell Pro Max GB10):
GPT-OSS-20B,Qwen3.5-27B, or quantizedQwen3-Coder-Next. - Team server (XE9680 / H100 / H200):
Qwen3-Coder-Next(80B) — the coding-tuned, Apache-2.0 sweet spot. - Max capability (XE9780 / B300):
GLM 5.1,Kimi K2.6, orDeepSeek V4 Profor long-horizon, whole-repo work.
Step-by-step
Both agents follow the same five steps: deploy a model → install the CLI → point it at the endpoint → run. The only differences are the config file each agent uses.
Step 1 — Deploy a model on Dell Enterprise Hub (shared)
- Log in to https://dell.hf.co.
- In the Model Catalog, filter by your Dell Platform and pick a coding model — e.g. Qwen3-Coder-Next.
- Click Deploy, choose the platform and a Goodput scenario (Balanced or Long context).
- Run the generated container command on your Dell PowerEdge server. It serves an OpenAI-compatible API on port
8000.
Check that the endpoint is live:
curl http://INFERENCE_HOST:8000/v1/models
Both Claude Code and OpenCode will talk to this same endpoint. No proxy, no gateway, no translation layer.
Step 2 — Install the agent
Claude Code (native installer, no Node.js needed):
curl -fsSL https://claude.ai/install.sh | bash
claude --version
OpenCode (install script):
curl -fsSL https://opencode.ai/install | bash
opencode --version
Step 3 — Point the agent at your DEH model
Pick the tab for your agent and paste the block. Both write a single config file. Replace INFERENCE_HOST and the model name.
Claude Code — ~/.claude/settings.json:
mkdir -p ~/.claude && cat > ~/.claude/settings.json <<'EOF'
{
"env": {
"ANTHROPIC_BASE_URL": "http://INFERENCE_HOST:8000",
"ANTHROPIC_API_KEY": "dummy",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "NVIDIA-Nemotron-3-Super-120B-A12B-BF16",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "qwen3-coder-next"
},
"theme": "dark"
}
EOF
ANTHROPIC_BASE_URL→ your DEH endpoint instead ofapi.anthropic.com.ANTHROPIC_API_KEY→dummy(a local endpoint ignores it; use a real key for a secured endpoint).ANTHROPIC_DEFAULT_*_MODEL→ maps Claude's three tiers onto your model. Point them at different DEH models if you want a big model for Opus and a fast one for Haiku.
OpenCode — ~/.config/opencode/opencode.json:
mkdir -p ~/.config/opencode && cat > ~/.config/opencode/opencode.json <<'EOF'
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"vllm": {
"npm": "@ai-sdk/openai-compatible",
"name": "Dell Enterprise Hub (local)",
"options": { "baseURL": "http://INFERENCE_HOST:8000/v1" },
"models": { "qwen3-coder-next": { "name": "Qwen3 Coder Next" } }
}
},
"model": "vllm/qwen3-coder-next"
}
EOF
options.baseURL→ the DEH endpoint, including/v1.model→ the active model asprovider/model.
Step 4 — Add credentials (OpenCode only)
Claude Code already has its key in settings.json. OpenCode stores credentials separately:
mkdir -p ~/.local/share/opencode && cat > ~/.local/share/opencode/auth.json <<'EOF'
{ "vllm": { "type": "api", "key": "dummy" } }
EOF
Use a real key for a secured endpoint, and never commit auth.json.
Step 5 — (Optional) Add project conventions
Each agent reads a guidance file from your project root each session — CLAUDE.md for Claude Code, AGENTS.md for OpenCode. Create both at once:
cat > CLAUDE.md <<'EOF'
# Project conventions
## Python environment
This project uses **uv**. Do NOT use `pip` directly.
- Run a script: `uv run python <script>`
- Run tests: `uv run pytest`
- Add a dep: `uv add <package>`
- Sync env: `uv sync`
EOF
cp CLAUDE.md AGENTS.md
Tip: in OpenCode you can run
/initto auto-generateAGENTS.mdfrom your repo.
Step 6 — Run it
cd ~/my-project
claude # Claude Code
# or
opencode # OpenCode
Every request now flows straight to your DEH model on your own hardware.
Config & file-location cheat sheet
| Concern | Claude Code | OpenCode |
|---|---|---|
| User config | ~/.claude/settings.json |
~/.config/opencode/opencode.json |
| Project (shared) config | <project>/.claude/settings.json |
<project>/opencode.json |
| Credentials | env in settings (ANTHROPIC_API_KEY) |
~/.local/share/opencode/auth.json (❌ never commit) |
| Project guidance | CLAUDE.md |
AGENTS.md |
Validation & troubleshooting
Quick endpoint sanity check (bypasses both agents):
curl http://INFERENCE_HOST:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"qwen3-coder-next","messages":[{"role":"user","content":"Reverse a linked list in Rust."}]}'
| Symptom | Agent | Likely cause | Fix |
|---|---|---|---|
Connection errors / claude doctor fails |
Claude Code | Wrong base URL or host unreachable | Confirm ANTHROPIC_BASE_URL and curl the endpoint. |
command not found: opencode |
OpenCode | Install dir not on PATH |
Add $HOME/.opencode/bin to PATH. |
ProviderInitError |
OpenCode | Invalid config | Re-check opencode.json; re-auth via /connect. |
| Copy/paste broken in TUI | OpenCode | No clipboard utility | Install xclip / xsel / wl-clipboard. |
| Model "not found" | both | Name mismatch | Model name must match what /v1/models reports. |
| Truncated long outputs | both | Context/output SLO too small | Redeploy with the Long context Goodput scenario. |
| Tool calls failing | both | Weak tool-use model | Use a strong tool-use model (Qwen3-Coder-Next, GLM 5.1, Trinity Large Thinking). |
Production hardening checklist (both)
- Pin the model version in DEH for reproducibility.
- Right-size with Goodput SLOs to match context + concurrency.
- Tier models: large for primary (GLM 5.1 / Kimi K2.6), fast for auxiliary (GPT-OSS-20B).
- Secure shared endpoints with TLS + real keys; never commit secrets.
- Prefer NVFP4/FP8 on Blackwell (B300/GB10) to fit bigger coding MoEs per GPU.
- Keep
CLAUDE.md/AGENTS.mdrich — explicit build/test/lint commands improve open-model reliability.
7. Which should you choose?
| Aspect | Claude Code | OpenCode |
|---|---|---|
| License | Proprietary (Anthropic) | Open source |
| Connects to DEH | Direct (ANTHROPIC_BASE_URL) |
Direct (baseURL) |
| Config | env vars in settings.json |
provider block in opencode.json |
| Credentials | env (ANTHROPIC_API_KEY) |
auth.json |
| Project guidance file | CLAUDE.md |
AGENTS.md |
| Model tiers | Opus/Sonnet/Haiku env mapping | model + small_model |
- Want the most polished, batteries-included agent UX? Claude Code.
- Want a 100% open stack? OpenCode.
- They use entirely separate config directories, so you can run both side by side in the same repo and cross check each other's work.
Conclusion
Whether you choose Anthropic's Claude Code or the open-source OpenCode or both, the destination is the same: a frontier open-weight coding model running on your own Dell PowerEdge Platforms, with your source code never leaving the data center.
You can start with Qwen3-Coder-Next or gemma-4-31B-it, then scale up to GLM 5.1 / Kimi K2.6 / MiniMax2.7 / DeepSeek V4 / Nemotron 3 Ultra on H200/B300 for the most demanding work. Either harness, any recommended frontier coding model and all on-prem.
References
- Dell Enterprise Hub - https://dell.hf.co (Model Catalog, App Catalog, Docs, Optimized Deployments, Security, Goodput Scenarios)
- Anthropic Claude Code - Build, debug, and ship with natural language - https://claude.com/product/claude-code
- OpenCode - The open source AI coding agent - https://opencode.ai/
