burtenshaw's picture
burtenshaw HF Staff
Upload folder using huggingface_hub
73edc95 verified
---
title: Benchmark Environment Server
emoji: 🎸
colorFrom: red
colorTo: red
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
---
# Benchmark Environment
A simple test environment that echoes back messages. Perfect for testing the env APIs as well as demonstrating environment usage patterns.
## Quick Start
The simplest way to use the Benchmark environment is through the `BenchmarkEnv` class:
```python
from benchmark import BenchmarkAction, BenchmarkEnv
try:
# Create environment from Docker image
benchmarkenv = BenchmarkEnv.from_docker_image("benchmark-env:latest")
# Reset
result = benchmarkenv.reset()
print(f"Reset: {result.observation.echoed_message}")
# Send multiple messages
messages = ["Hello, World!", "Testing echo", "Final message"]
for msg in messages:
result = benchmarkenv.step(BenchmarkAction(message=msg))
print(f"Sent: '{msg}'")
print(f" β†’ Echoed: '{result.observation.echoed_message}'")
print(f" β†’ Length: {result.observation.message_length}")
print(f" β†’ Reward: {result.reward}")
finally:
# Always clean up
benchmarkenv.close()
```
That's it! The `BenchmarkEnv.from_docker_image()` method handles:
- Starting the Docker container
- Waiting for the server to be ready
- Connecting to the environment
- Container cleanup when you call `close()`
## Building the Docker Image
Before using the environment, you need to build the Docker image:
```bash
# From project root
docker build -t benchmark-env:latest -f server/Dockerfile .
```
## Deploying to Hugging Face Spaces
You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:
```bash
# From the environment directory (where openenv.yaml is located)
openenv push
# Or specify options
openenv push --namespace my-org --private
```
The `openenv push` command will:
1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
2. Prepare a custom build for Hugging Face Docker space (enables web interface)
3. Upload to Hugging Face (ensuring you're logged in)
### Prerequisites
- Authenticate with Hugging Face: The command will prompt for login if not already authenticated
### Options
- `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
- `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
- `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
- `--private`: Deploy the space as private (default: public)
### Examples
```bash
# Push to your personal namespace (defaults to username/env-name from openenv.yaml)
openenv push
# Push to a specific repository
openenv push --repo-id my-org/my-env
# Push with a custom base image
openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest
# Push as a private space
openenv push --private
# Combine options
openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
```
After deployment, your space will be available at:
`https://huggingface.co/spaces/<repo-id>`
The deployed space includes:
- **Web Interface** at `/web` - Interactive UI for exploring the environment
- **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
- **Health Check** at `/health` - Container health monitoring
- **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions
## Environment Details
### Action
**BenchmarkAction**: Contains a single field
- `message` (str) - The message to echo back
### Observation
**BenchmarkObservation**: Contains the echo response and metadata
- `echoed_message` (str) - The message echoed back
- `message_length` (int) - Length of the message
- `reward` (float) - Reward based on message length (length Γ— 0.1)
- `done` (bool) - Always False for echo environment
- `metadata` (dict) - Additional info like step count
### Reward
The reward is calculated as: `message_length Γ— 0.1`
- "Hi" β†’ reward: 0.2
- "Hello, World!" β†’ reward: 1.3
- Empty message β†’ reward: 0.0
## Advanced Usage
### Connecting to an Existing Server
If you already have a Benchmark environment server running, you can connect directly:
```python
from benchmark import BenchmarkEnv
# Connect to existing server
benchmarkenv = BenchmarkEnv(base_url="<ENV_HTTP_URL_HERE>")
# Use as normal
result = benchmarkenv.reset()
result = benchmarkenv.step(BenchmarkAction(message="Hello!"))
```
Note: When connecting to an existing server, `benchmarkenv.close()` will NOT stop the server.
### WebSocket Client for Persistent Sessions
For long-running episodes or when you need lower latency, use the WebSocket client:
```python
from benchmark import BenchmarkAction, BenchmarkEnvWS
# Connect via WebSocket (maintains persistent connection)
with BenchmarkEnvWS(base_url="http://localhost:8000") as env:
result = env.reset()
print(f"Reset: {result.observation.echoed_message}")
# Multiple steps with low latency
for msg in ["Hello", "World", "!"]:
result = env.step(BenchmarkAction(message=msg))
print(f"Echoed: {result.observation.echoed_message}")
```
WebSocket advantages:
- **Lower latency**: No HTTP connection overhead per request
- **Persistent session**: Server maintains your environment state
- **Efficient for episodes**: Better for many sequential steps
### Concurrent WebSocket Sessions
The server supports multiple concurrent WebSocket connections. To enable this,
modify `server/app.py` to use factory mode:
```python
# In server/app.py - use factory mode for concurrent sessions
app = create_app(
BenchmarkEnvironment, # Pass class, not instance
BenchmarkAction,
BenchmarkObservation,
max_concurrent_envs=4, # Allow 4 concurrent sessions
)
```
Then multiple clients can connect simultaneously:
```python
from benchmark import BenchmarkAction, BenchmarkEnvWS
from concurrent.futures import ThreadPoolExecutor
def run_episode(client_id: int):
with BenchmarkEnvWS(base_url="http://localhost:8000") as env:
result = env.reset()
for i in range(10):
result = env.step(BenchmarkAction(message=f"Client {client_id}, step {i}"))
return client_id, result.observation.message_length
# Run 4 episodes concurrently
with ThreadPoolExecutor(max_workers=4) as executor:
results = list(executor.map(run_episode, range(4)))
```
## Development & Testing
### Direct Environment Testing
Test the environment logic directly without starting the HTTP server:
```bash
# From the server directory
python3 server/benchmark_environment.py
```
This verifies that:
- Environment resets correctly
- Step executes actions properly
- State tracking works
- Rewards are calculated correctly
### Running Locally
Run the server locally for development:
```bash
uvicorn server.app:app --reload
```
## Project Structure
```
benchmark/
β”œβ”€β”€ .dockerignore # Docker build exclusions
β”œβ”€β”€ __init__.py # Module exports
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ openenv.yaml # OpenEnv manifest
β”œβ”€β”€ pyproject.toml # Project metadata and dependencies
β”œβ”€β”€ uv.lock # Locked dependencies (generated)
β”œβ”€β”€ client.py # BenchmarkEnv (HTTP) and BenchmarkEnvWS (WebSocket) clients
β”œβ”€β”€ models.py # Action and Observation models
└── server/
β”œβ”€β”€ __init__.py # Server module exports
β”œβ”€β”€ benchmark_environment.py # Core environment logic
β”œβ”€β”€ app.py # FastAPI application (HTTP + WebSocket endpoints)
└── Dockerfile # Container image definition
```