Instructions to use WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B") model = AutoModelForCausalLM.from_pretrained("WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B
- SGLang
How to use WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B with Docker Model Runner:
docker model run hf.co/WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B
Qwen3-Space.Agent.Claude-Uncensored-4B
📌 Model Overview
Model Name: WithinUsAI/Qwen3-Space.Agent.Claude-Uncensored-4B Organization: Within Us AI Model Type: Agentic Reasoning LLM (Uncensored Variant) Parameter Size: 4B Architecture: Qwen 3 (Dense Transformer) Context Length: ~32K tokens Primary Focus: Agent workflows + uncensored reasoning + long-context tasks
This model is a multi-source merged Qwen3-based agent, designed to combine:
- 🧠 Reasoning (“thinking” models)
- 🤖 Agent/tool-use behavior
- 🔓 Reduced refusal / uncensored outputs
It aims to deliver a compact, flexible, and less-restricted AI system for experimentation, research, and local deployment. 
⸻
🧬 Architecture & Lineage
Base Composition
This model is a merge of multiple Qwen3-derived systems, including:
- Qwen3-4B Thinking (reasoning-focused)
- Qwen3 Agent Claude/Gemini-style model
- Uncensored Qwen3 variants
These were combined into a single unified 4B model to blend capabilities. 
What That Creates
A hybrid model with:
- Reasoning depth (thinking models)
- Structured outputs (agent models)
- Reduced refusal behavior (uncensored variants)
Think of it like a three-engine spacecraft 🚀 Each engine specialized… now flying as one system.
⸻
🧠 Core Design Philosophy
Fuse the best behaviors… remove the limits… keep it small enough to run anywhere.
Key Goals:
- Merge reasoning + agent + uncensored traits
- Enable long-context problem solving
- Preserve performance in a 4B footprint
- Support real-world agent pipelines
⸻
⚙️ Key Capabilities
🧠 Reasoning
- Step-by-step thinking
- Multi-hop problem solving
- Long-context coherence (~32K tokens)
🤖 Agentic Behavior
- Task decomposition
- Tool-use compatibility
- Structured outputs (JSON, actions)
💻 Coding
- Code generation & debugging
- Algorithm reasoning
- SWE-style workflows
🔓 Uncensored Behavior
- Reduced refusal rates
- More permissive responses
- Suitable for:
- Alignment research
- Safety testing
- Edge-case exploration
⸻
📦 Deployment
Supported Environments
- llama.cpp
- LM Studio
- Ollama (GGUF / compatible builds depending on conversion)
Runtime Characteristics
- ~4B parameters → runs on consumer GPUs / strong CPUs
- ~32K context → supports long conversations and documents 
⸻
🚀 Intended Use
✅ Ideal Use Cases
- Agent frameworks (tool-calling systems)
- Long-context reasoning tasks
- AI experimentation (uncensored behavior)
- Local assistants with fewer restrictions
- Alignment and safety research
⚠️ Important Considerations
- Outputs are less restricted than aligned models
- May generate sensitive or unsafe content
- Requires external moderation or guardrails for production use
⸻
🧪 Training & Merge Methodology
This model follows a merge-based synthesis pipeline:
- Select complementary base models:
- Reasoning-focused
- Agent-focused
- Uncensored variants
- Merge weights into unified architecture
- Align behavior using preference tuning (DPO-style datasets)
- Optimize for:
- Reduced refusals
- Stable outputs
- Agent usability 
⸻
📊 Expected Performance Profile
Capability Strength Reasoning High Agent behavior High Coding High Context handling High Safety filtering Low (intentionally reduced)
⸻
📚 Datasets & Training Sources
Following Within Us AI methodology:
- Proprietary datasets created by Within Us AI
- Third-party datasets used without ownership claims
- Includes:
- Reasoning traces
- Agent workflows
- Preference optimization (DPO-style tuning)
⸻
📜 License
License Type: Inherits from Qwen / base model ecosystem
Attribution Notes:
- Base models: Qwen (Alibaba ecosystem)
- Merge & methodology: Within Us AI
- Additional model influences (Claude-style / Gemini-style behaviors via distillation/merging)
- Third-party datasets used without ownership claims
- Credit belongs to original creators
⸻
🙏 Acknowledgements
- Alibaba Qwen team
- Open-source agent model contributors
- GGUF / llama.cpp ecosystem
- AI alignment & safety research community
⸻
🔗 Links
- Model: https://huggingface.co/WithinUsAI/Qwen3-Space.Agent.Claude-Uncensored-4B
- Organization: https://huggingface.co/WithinUsAI
⸻
🧩 Closing Note
This model feels like a hybrid intelligence node 🌌
Part thinker. Part agent. Part rule-breaker.
All compressed into 4B parameters that punch way above their weight.
- Downloads last month
- 255